Information gap based knowledge distillation for occluded facial expression recognition

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Image and Vision Computing Pub Date : 2025-02-01 DOI:10.1016/j.imavis.2024.105365

Yan Zhang , Zenghui Li , Duo Shen , Ke Wang , Jia Li , Chenxing Xia

{"title":"Information gap based knowledge distillation for occluded facial expression recognition","authors":"Yan Zhang , Zenghui Li , Duo Shen , Ke Wang , Jia Li , Chenxing Xia","doi":"10.1016/j.imavis.2024.105365","DOIUrl":null,"url":null,"abstract":"<div><div>Facial Expression Recognition (FER) with occlusion presents a challenging task in computer vision because facial occlusions result in poor visual data features. Recently, the region attention technique has been introduced to address this problem by researchers, which make the model perceive occluded regions of the face and prioritize the most discriminative non-occluded regions. However, in real-world scenarios, facial images are influenced by various factors, including hair, masks and sunglasses, making it difficult to extract high-quality features from these occluded facial images. This inevitably limits the effectiveness of attention mechanisms. In this paper, we observe a correlation in facial emotion features from the same image, both with and without occlusion. This correlation contributes to addressing the issue of facial occlusions. To this end, we propose a Information Gap based Knowledge Distillation (IGKD) to explore the latent relationship. Specifically, our approach involves feeding non-occluded and masked images into separate teacher and student networks. Due to the incomplete emotion information in the masked images, there exists an information gap between the teacher and student networks. During training, we aim to minimize this gap to enable the student network to learn this relationship. To enhance the teacher’s guidance, we introduce a joint learning strategy where the teacher conducts knowledge distillation on the student during the training of the teacher. Additionally, we introduce two novel constraints, called knowledge learn and knowledge feedback loss, to supervise and optimize both the teacher and student networks. The reported experimental results show that IGKD outperforms other algorithms on four benchmark datasets. Specifically, our IGKD achieves 87.57% on Occlusion-RAF-DB, 87.33% on Occlusion-FERPlus, 64.86% on Occlusion-AffectNet, and 73.25% on FED-RO, clearly demonstrating its effectiveness and robustness. Source code is released at: .</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105365"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004700","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Facial Expression Recognition (FER) with occlusion presents a challenging task in computer vision because facial occlusions result in poor visual data features. Recently, the region attention technique has been introduced to address this problem by researchers, which make the model perceive occluded regions of the face and prioritize the most discriminative non-occluded regions. However, in real-world scenarios, facial images are influenced by various factors, including hair, masks and sunglasses, making it difficult to extract high-quality features from these occluded facial images. This inevitably limits the effectiveness of attention mechanisms. In this paper, we observe a correlation in facial emotion features from the same image, both with and without occlusion. This correlation contributes to addressing the issue of facial occlusions. To this end, we propose a Information Gap based Knowledge Distillation (IGKD) to explore the latent relationship. Specifically, our approach involves feeding non-occluded and masked images into separate teacher and student networks. Due to the incomplete emotion information in the masked images, there exists an information gap between the teacher and student networks. During training, we aim to minimize this gap to enable the student network to learn this relationship. To enhance the teacher’s guidance, we introduce a joint learning strategy where the teacher conducts knowledge distillation on the student during the training of the teacher. Additionally, we introduce two novel constraints, called knowledge learn and knowledge feedback loss, to supervise and optimize both the teacher and student networks. The reported experimental results show that IGKD outperforms other algorithms on four benchmark datasets. Specifically, our IGKD achieves 87.57% on Occlusion-RAF-DB, 87.33% on Occlusion-FERPlus, 64.86% on Occlusion-AffectNet, and 73.25% on FED-RO, clearly demonstrating its effectiveness and robustness. Source code is released at: .

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于信息间隙的遮挡面部表情识别知识蒸馏

由于面部遮挡导致视觉数据特征不佳，因此具有遮挡的面部表情识别在计算机视觉中是一项具有挑战性的任务。近年来，研究人员引入了区域注意技术来解决这一问题，该技术使模型感知人脸的遮挡区域，并优先考虑最具歧视性的非遮挡区域。然而，在现实场景中，面部图像受到各种因素的影响，包括头发、面具和太阳镜，因此很难从这些遮挡的面部图像中提取高质量的特征。这不可避免地限制了注意力机制的有效性。在本文中，我们观察了来自同一图像的面部情绪特征在有遮挡和没有遮挡的情况下的相关性。这种相关性有助于解决面部闭塞的问题。为此，我们提出了一种基于信息差距的知识蒸馏（IGKD）来探索潜在的关系。具体来说，我们的方法包括将未遮挡和遮挡的图像馈送到单独的教师和学生网络中。由于被蒙面图像中的情感信息不完整，师生网络之间存在信息鸿沟。在培训期间，我们的目标是尽量减少这种差距，使学生网络能够学习这种关系。为了加强教师的指导作用，我们引入了教师在培训过程中对学生进行知识提炼的联合学习策略。此外，我们引入了两个新的约束，称为知识学习和知识反馈损失，以监督和优化教师和学生网络。实验结果表明，IGKD算法在4个基准数据集上优于其他算法。具体来说，我们的IGKD在Occlusion-RAF-DB上达到87.57%，在Occlusion-FERPlus上达到87.33%，在Occlusion-AffectNet上达到64.86%，在FED-RO上达到73.25%，清楚地表明了它的有效性和鲁棒性。源代码发布在：。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.