Multimodal Sentimental Privileged Information Embedding for Improving Facial Expression Recognition

IF 9.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Affective Computing Pub Date : 2024-06-18 DOI:10.1109/TAFFC.2024.3415625
Ning Sun;Changwei You;Wenming Zheng;Jixin Liu;Lei Chai;Haian Sun
{"title":"Multimodal Sentimental Privileged Information Embedding for Improving Facial Expression Recognition","authors":"Ning Sun;Changwei You;Wenming Zheng;Jixin Liu;Lei Chai;Haian Sun","doi":"10.1109/TAFFC.2024.3415625","DOIUrl":null,"url":null,"abstract":"Facial expression recognition (FER) has always been one of the key task in affective computing. Over the years, researchers have worked to improve the performance of FER by designing models with more powerful feature extraction, embedding attention mechanism, and reconstructing missing information, etc. Different from the paradigms above, we attempt to improve FER performance by using multimodal sentiment data, such as audio and text, as privileged information (PI) for facial images. To this end, a multimodal privileged information embedded facial expression recognition network (MPI-FER) is proposed in this paper. During the training phase, this model achieves the PI embedding of multimodal data for FER by developing cross-modality translation between multimodal sentiment data. During the test phase, input images alone are sufficient for the model inference to accomplish the FER task input. The MPI-FER is a large-scale, heterogeneous deep neural network. To achieve effective training of this model with limited training samples, we design a multi-stage training strategy of module-wise pre-training followed by end-to-end fine-tuning. In addition, a strategy of filling the multimodal sentiment quaternion is proposed for implementing our method on a facial expression database consisting only of face images. We conducted extensive experiments to evaluate the proposed method on two databases of multimodal sentiment analysis (CH-SIMS and CMU-MOSI) and two databases of FER in the wild (RAF-DB and AffectNet). The results show that embedding multimodal sentiment data as privileged information into the FER task based on face images can significantly improve the accuracy of FER. Furthermore, by only using image in the test phase, the proposed method can achieve better results of multimodal sentiment analysis than those methods achieved by using multimodal sentimental data fusion.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 1","pages":"133-144"},"PeriodicalIF":9.8000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10561510/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Facial expression recognition (FER) has always been one of the key task in affective computing. Over the years, researchers have worked to improve the performance of FER by designing models with more powerful feature extraction, embedding attention mechanism, and reconstructing missing information, etc. Different from the paradigms above, we attempt to improve FER performance by using multimodal sentiment data, such as audio and text, as privileged information (PI) for facial images. To this end, a multimodal privileged information embedded facial expression recognition network (MPI-FER) is proposed in this paper. During the training phase, this model achieves the PI embedding of multimodal data for FER by developing cross-modality translation between multimodal sentiment data. During the test phase, input images alone are sufficient for the model inference to accomplish the FER task input. The MPI-FER is a large-scale, heterogeneous deep neural network. To achieve effective training of this model with limited training samples, we design a multi-stage training strategy of module-wise pre-training followed by end-to-end fine-tuning. In addition, a strategy of filling the multimodal sentiment quaternion is proposed for implementing our method on a facial expression database consisting only of face images. We conducted extensive experiments to evaluate the proposed method on two databases of multimodal sentiment analysis (CH-SIMS and CMU-MOSI) and two databases of FER in the wild (RAF-DB and AffectNet). The results show that embedding multimodal sentiment data as privileged information into the FER task based on face images can significantly improve the accuracy of FER. Furthermore, by only using image in the test phase, the proposed method can achieve better results of multimodal sentiment analysis than those methods achieved by using multimodal sentimental data fusion.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
嵌入多模态情感特权信息以提高面部表情识别能力
面部表情识别一直是情感计算领域的关键问题之一。多年来,研究者们通过设计更强大的特征提取模型、嵌入注意机制、重构缺失信息等方法来提高模型的性能。与上述范例不同,我们试图通过使用多模态情感数据(如音频和文本)作为面部图像的特权信息(PI)来提高FER性能。为此,本文提出了一种多模态特权信息嵌入式面部表情识别网络(MPI-FER)。在训练阶段,该模型通过在多模态情感数据之间进行跨模态翻译,实现了多模态数据的PI嵌入。在测试阶段,仅输入图像就足以使模型推理完成FER任务输入。MPI-FER是一个大规模的异构深度神经网络。为了在有限的训练样本下实现对该模型的有效训练,我们设计了一种多阶段的模块预训练策略,然后进行端到端微调。此外,提出了一种填充多模态情感四元数的策略,用于在仅由面部图像组成的面部表情数据库上实现我们的方法。我们在两个多模态情感分析数据库(CH-SIMS和CMU-MOSI)和两个野外情感分析数据库(RAF-DB和AffectNet)上进行了大量的实验来评估所提出的方法。结果表明,将多模态情感数据作为特权信息嵌入到基于人脸图像的特征识别任务中,可以显著提高特征识别的准确率。此外,该方法仅使用测试阶段的图像,可以获得比使用多模态情感数据融合的方法更好的多模态情感分析结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Affective Computing
IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS
CiteScore
15.00
自引率
6.20%
发文量
174
期刊介绍: The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.
期刊最新文献
Incremental Micro-Expression Recognition: A Benchmark Charting 15 Years of Progress in Deep Learning for Speech Emotion Recognition: a Replication Study Video-Based Cross-Domain Emotion Recognition Via Sample-Graph Relations Self-Distillation EchoReason: a Two-stage Clinically Aligned Vision-Language Framework for Interpretable Diseases Diagnosis from Multi-Modal Ultrasound Advancing Micro-Expression Recognition: a Task-Specific Framework Integrating Frequency Analysis and Structural Embedding
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1