Bin Kang;Shuangshuang Wang;Zongyu Wang;Xin Li;Haie Dou;Lei Wang;Zhijie Xia
{"title":"Progressive Masking Oriented Self-Taught Learning for Occluded Facial Expression Recognition","authors":"Bin Kang;Shuangshuang Wang;Zongyu Wang;Xin Li;Haie Dou;Lei Wang;Zhijie Xia","doi":"10.1109/TAFFC.2025.3544677","DOIUrl":null,"url":null,"abstract":"Self-taught learning (STL) is a promising solution that reduces the performance gap between weakly supervised and fully supervised learning for easily accessible, label-free images. The success of traditional STL solutions relies on the assumption that the target appearance is completely visible and well-defined. In real-world facial expression recognition scenarios, however, saliency regions are often partially occluded, which significantly hampers the generalization capability of STL methods. Nevertheless, few studies have investigated the impact of occlusion on STL. In this paper, we propose an interweaved autoencoder network for weakly supervised facial expression recognition in occlusion scenarios. The key innovation of our network lies in the Residual Connection Union (RCU) blocks that can integrate the Convolutional Neural Network (CNN) and Transformer layers into a multi-scale structure. The RCU enables a progressive masking strategy to accurately identify and focus on contributive yet often overlooked image patches by analyzing the relationships among region-level target representations. In addition, we introduce a self-knowledge distillation module for the effective training of the proposed autoencoder network. Extensive experiments are conducted on four public datasets to demonstrate the superiority of our method over related works.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1277-1289"},"PeriodicalIF":9.8000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10902013/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Self-taught learning (STL) is a promising solution that reduces the performance gap between weakly supervised and fully supervised learning for easily accessible, label-free images. The success of traditional STL solutions relies on the assumption that the target appearance is completely visible and well-defined. In real-world facial expression recognition scenarios, however, saliency regions are often partially occluded, which significantly hampers the generalization capability of STL methods. Nevertheless, few studies have investigated the impact of occlusion on STL. In this paper, we propose an interweaved autoencoder network for weakly supervised facial expression recognition in occlusion scenarios. The key innovation of our network lies in the Residual Connection Union (RCU) blocks that can integrate the Convolutional Neural Network (CNN) and Transformer layers into a multi-scale structure. The RCU enables a progressive masking strategy to accurately identify and focus on contributive yet often overlooked image patches by analyzing the relationships among region-level target representations. In addition, we introduce a self-knowledge distillation module for the effective training of the proposed autoencoder network. Extensive experiments are conducted on four public datasets to demonstrate the superiority of our method over related works.
期刊介绍:
The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.