Yansong Huang;Junjie Peng;Wenqiang Zhang;Tong Zhao;Gan Chen;Shuhua Tan;Fen Yi;Lu Wang
{"title":"FERMixNet: An Occlusion Robust Facial Expression Recognition Model With Facial Mixing Augmentation and Mid-Level Representation Learning","authors":"Yansong Huang;Junjie Peng;Wenqiang Zhang;Tong Zhao;Gan Chen;Shuhua Tan;Fen Yi;Lu Wang","doi":"10.1109/TAFFC.2024.3454102","DOIUrl":null,"url":null,"abstract":"Facial expressions can provide a better understanding of people’s mental status and attitudes towards specific things. However, facial occlusion in real world is an unfavorable phenomenon that greatly affects the performance of facial expression recognition models. Recent works addressing the occlusion problem have primarily relied on attention mechanisms or occlusion discarding methods that focus on non-occluded regions of the face. However, these methods have not achieved a good balance between occlusion robustness and model efficiency. In this paper, we propose a simple and efficient model, called FERMixNet, for occluded facial expression recognition. The model incorporates a novel facial mixing augmentation strategy (FERMix) that generates new training samples by simulating real-world facial occlusion and preserving high expression-related semantic information. By co-training the original and newly generated samples, the model’s occlusion robustness is improved without increasing its complexity during inference. Additionally, to further enhance the model’s occlusion robustness, we include mid-level representation learning in the network to learn the discriminative non-occluded local features of the samples with low computational cost. Extensive experiments on four public facial occlusion datasets: Occlusion-RAF-DB, Occlusion-FERPlus and FED-RO show that the proposed model achieves state-of-the-art results which demonstrates the good robustness of our method for occluded facial expression recognition. Meanwhile, the proposed model also achieves state-of-the-art results on the in-the-wild facial expression datasets RAF-DB, AffectNet-8, and AffectNet-7. It proves that the proposed model has good application prospects in real world.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 2","pages":"639-654"},"PeriodicalIF":9.8000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10663852/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Facial expressions can provide a better understanding of people’s mental status and attitudes towards specific things. However, facial occlusion in real world is an unfavorable phenomenon that greatly affects the performance of facial expression recognition models. Recent works addressing the occlusion problem have primarily relied on attention mechanisms or occlusion discarding methods that focus on non-occluded regions of the face. However, these methods have not achieved a good balance between occlusion robustness and model efficiency. In this paper, we propose a simple and efficient model, called FERMixNet, for occluded facial expression recognition. The model incorporates a novel facial mixing augmentation strategy (FERMix) that generates new training samples by simulating real-world facial occlusion and preserving high expression-related semantic information. By co-training the original and newly generated samples, the model’s occlusion robustness is improved without increasing its complexity during inference. Additionally, to further enhance the model’s occlusion robustness, we include mid-level representation learning in the network to learn the discriminative non-occluded local features of the samples with low computational cost. Extensive experiments on four public facial occlusion datasets: Occlusion-RAF-DB, Occlusion-FERPlus and FED-RO show that the proposed model achieves state-of-the-art results which demonstrates the good robustness of our method for occluded facial expression recognition. Meanwhile, the proposed model also achieves state-of-the-art results on the in-the-wild facial expression datasets RAF-DB, AffectNet-8, and AffectNet-7. It proves that the proposed model has good application prospects in real world.
期刊介绍:
The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.