SimFLE: Simple Facial Landmark Encoding for Self-Supervised Facial Expression Recognition in the Wild

IF 9.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Affective Computing Pub Date : 2024-09-30 DOI:10.1109/TAFFC.2024.3470980
Jiyong Moon;Hyeryung Jang;Seongsik Park
{"title":"SimFLE: Simple Facial Landmark Encoding for Self-Supervised Facial Expression Recognition in the Wild","authors":"Jiyong Moon;Hyeryung Jang;Seongsik Park","doi":"10.1109/TAFFC.2024.3470980","DOIUrl":null,"url":null,"abstract":"Facial expression recognition in the wild (FER-W) entails classifying facial emotions in natural environments. The major challenges in FER-W stem from the complexity and ambiguity of facial images, making it difficult to curate a large-scale labeled dataset for training. Additionally, the subtle differences in emotions often reside in the fine-grained details of local facial landmarks, demanding innovative solutions to capture these crucial features efficiently. To address these issues, we employ two distinct self-supervised methods. First, we adopt a contrastive learning method to capture generalized global representations, enabling the model to understand the semantic context of facial expressions without relying on labeled data. Simultaneously, we leverage masked image modeling to focus on embedding fine-grained, local facial landmark information at the patch-level. We introduce a novel module called FaceMAE, which aims to reconstruct the masked facial patches. The semantic masking scheme is designed to preserve highly activated feature activations, allowing the encoding of crucial details of unmasked facial landmarks and their relationships within the broader facial context at the patch-level. It finally guides the backbone network to calibrate the learned global features to be attentive to facial landmarks. Our proposed method, called <bold>Sim</b>ple <bold>F</b>acial <bold>L</b>andmark <bold>E</b>ncoding (<bold>SimFLE</b>), significantly outperforms supervised baseline and other self-supervised methods in terms of facial landmark localization and overall performance, as demonstrated through extensive experiments across several FER-W benchmarks.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 2","pages":"799-813"},"PeriodicalIF":9.8000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10700612/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Facial expression recognition in the wild (FER-W) entails classifying facial emotions in natural environments. The major challenges in FER-W stem from the complexity and ambiguity of facial images, making it difficult to curate a large-scale labeled dataset for training. Additionally, the subtle differences in emotions often reside in the fine-grained details of local facial landmarks, demanding innovative solutions to capture these crucial features efficiently. To address these issues, we employ two distinct self-supervised methods. First, we adopt a contrastive learning method to capture generalized global representations, enabling the model to understand the semantic context of facial expressions without relying on labeled data. Simultaneously, we leverage masked image modeling to focus on embedding fine-grained, local facial landmark information at the patch-level. We introduce a novel module called FaceMAE, which aims to reconstruct the masked facial patches. The semantic masking scheme is designed to preserve highly activated feature activations, allowing the encoding of crucial details of unmasked facial landmarks and their relationships within the broader facial context at the patch-level. It finally guides the backbone network to calibrate the learned global features to be attentive to facial landmarks. Our proposed method, called Simple Facial Landmark Encoding (SimFLE), significantly outperforms supervised baseline and other self-supervised methods in terms of facial landmark localization and overall performance, as demonstrated through extensive experiments across several FER-W benchmarks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SimFLE:用于野外自我监督面部表情识别的简单面部地标编码
野外面部表情识别(FER-W)需要对自然环境中的面部情绪进行分类。ferw的主要挑战来自面部图像的复杂性和模糊性,这使得难以策划用于训练的大规模标记数据集。此外,情绪的细微差异通常存在于局部面部地标的细粒度细节中,需要创新的解决方案来有效地捕捉这些关键特征。为了解决这些问题,我们采用了两种不同的自我监督方法。首先,我们采用对比学习方法捕获广义全局表示,使模型能够在不依赖标记数据的情况下理解面部表情的语义上下文。同时,我们利用掩模图像建模来专注于嵌入细粒度的局部面部地标信息。我们介绍了一个新的模块FaceMAE,旨在重建被遮挡的面部斑块。语义掩蔽方案旨在保留高度激活的特征激活,允许在补丁级对未被掩蔽的面部标志及其在更广泛的面部上下文中的关系的关键细节进行编码。最后引导骨干网络校准学习到的全局特征,以关注面部地标。我们提出的方法,称为简单面部地标编码(Simple Facial Landmark Encoding, SimFLE),在面部地标定位和整体性能方面明显优于监督基线和其他自监督方法,这是通过在几个FER-W基准测试中进行的广泛实验证明的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Affective Computing
IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS
CiteScore
15.00
自引率
6.20%
发文量
174
期刊介绍: The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.
期刊最新文献
MA-DLE: Speech-based Automatic Depression Level Estimation via Memory Augmentation Reversible Graph Neural Network-based Reaction Distribution Learning for Multiple Appropriate Facial Reactions Generation Context-Aware Toxicity-Adaptive Sampling for Affective Language Generation SLAB: A Self-supervised Label Generation Framework to Reduce Annotation Overhead Data Distribution Evolution for Robust EEG Emotion Recognition with Limited Data Resource
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1