Fine-Grained Temporal-Enhanced Transformer for Dynamic Facial Expression Recognition

IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Signal Processing Letters Pub Date : 2024-09-09 DOI:10.1109/LSP.2024.3456668
Yaning Zhang;Jiahe Zhang;Linlin Shen;Zitong Yu;Zan Gao
{"title":"Fine-Grained Temporal-Enhanced Transformer for Dynamic Facial Expression Recognition","authors":"Yaning Zhang;Jiahe Zhang;Linlin Shen;Zitong Yu;Zan Gao","doi":"10.1109/LSP.2024.3456668","DOIUrl":null,"url":null,"abstract":"Dynamic facial expression recognition (DFER) plays a vital role in understanding human emotions and behaviors. Existing efforts tend to fall into a single modality self-supervised pretraining learning paradigm, which limits the representation ability of models. Besides, coarse-grained temporal modeling struggles to capture subtle facial expression representations from various inputs. In this letter, we propose a novel method for DFER, termed fine-grained temporal-enhanced transformer (FTET-DFER), which consists of two stages. First, we employ the inherent correlation between visual and auditory modalities in real videos, to capture temporally dense representations such as facial movements and expressions, in a self-supervised audio-visual learning manner. Second, we utilize the learned embeddings as targets, to achieve the DFER. In addition, we design the FTET block to study fine-grained temporal-enhanced facial expression features based on intra-clip locally-enhanced relations as well as inter-clip locally-enhanced global relationships in videos. Extensive experiments show that FTET-DFER outperforms the state-of-the-arts through within-dataset and cross-dataset evaluation.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10669998/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Dynamic facial expression recognition (DFER) plays a vital role in understanding human emotions and behaviors. Existing efforts tend to fall into a single modality self-supervised pretraining learning paradigm, which limits the representation ability of models. Besides, coarse-grained temporal modeling struggles to capture subtle facial expression representations from various inputs. In this letter, we propose a novel method for DFER, termed fine-grained temporal-enhanced transformer (FTET-DFER), which consists of two stages. First, we employ the inherent correlation between visual and auditory modalities in real videos, to capture temporally dense representations such as facial movements and expressions, in a self-supervised audio-visual learning manner. Second, we utilize the learned embeddings as targets, to achieve the DFER. In addition, we design the FTET block to study fine-grained temporal-enhanced facial expression features based on intra-clip locally-enhanced relations as well as inter-clip locally-enhanced global relationships in videos. Extensive experiments show that FTET-DFER outperforms the state-of-the-arts through within-dataset and cross-dataset evaluation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于动态面部表情识别的细粒度时态增强变换器
动态面部表情识别(DFER)在理解人类情绪和行为方面起着至关重要的作用。现有的研究往往陷入单一模态自监督预训练学习模式,这限制了模型的表征能力。此外,粗粒度的时间建模难以捕捉来自各种输入的微妙面部表情表征。在这封信中,我们提出了一种新的 DFER 方法,称为细粒度时空增强变换器(FTET-DFER),它包括两个阶段。首先,我们利用真实视频中视觉和听觉模式之间固有的相关性,以自我监督的视听学习方式捕捉时间上密集的表征,如面部动作和表情。其次,我们利用学习到的嵌入作为目标,实现 DFER。此外,我们还设计了 FTET 模块,以研究基于视频片段内局部增强关系和片段间局部增强全局关系的细粒度时间增强面部表情特征。大量实验表明,通过数据集内和跨数据集评估,FTET-DFER 的表现优于同行。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Signal Processing Letters
IEEE Signal Processing Letters 工程技术-工程:电子与电气
CiteScore
7.40
自引率
12.80%
发文量
339
审稿时长
2.8 months
期刊介绍: The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.
期刊最新文献
KFA: Keyword Feature Augmentation for Open Set Keyword Spotting RFI-Aware and Low-Cost Maximum Likelihood Imaging for High-Sensitivity Radio Telescopes Audio Mamba: Bidirectional State Space Model for Audio Representation Learning System-Informed Neural Network for Frequency Detection Order Estimation of Linear-Phase FIR Filters for DAC Equalization in Multiple Nyquist Bands
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1