Modeling Fine-Grained Relations in Dynamic Space-Time Graphs for Video-Based Facial Expression Recognition

IF 9.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Affective Computing Pub Date : 2025-01-17 DOI:10.1109/TAFFC.2025.3530973
Changqin Huang;Fan Jiang;Zhongmei Han;Xiaodi Huang;Shijin Wang;Yanlai Zhu;Yunliang Jiang;Bin Hu
{"title":"Modeling Fine-Grained Relations in Dynamic Space-Time Graphs for Video-Based Facial Expression Recognition","authors":"Changqin Huang;Fan Jiang;Zhongmei Han;Xiaodi Huang;Shijin Wang;Yanlai Zhu;Yunliang Jiang;Bin Hu","doi":"10.1109/TAFFC.2025.3530973","DOIUrl":null,"url":null,"abstract":"Facial expressions in videos inherently mirror the dynamic nature of real-world facial events. Consequently, facial expression recognition (FER) should employ a dynamic graph-based representation to effectively capture the relational structure of facial expressions rather than relying on conventional grid or sequence methods. However, existing graph-based approaches have their limitations. Frame-level graph methods provide a coarse representation of the facial graph across time and space, while landmark-based graph methods need to introduce additional facial landmarks, resulting in a static graph structure. To address these challenges, we propose spatial-temporal relation-aware dynamic graph convolutional networks (ST-RDGCN). This fine-grained relation modeling approach enables the dynamic modeling of evolving facial expressions in videos through dynamic space-time graphs, eliminating the need for facial landmarks. ST-RDGCN encompasses three graph construction paradigms: dynamic independent space graph, dynamic joint space-time graph, and dynamic cross space-time graph. Furthermore, we propose a relation-aware space-time graph convolution (RSTG-Conv) operator to learn informative spatiotemporal correlations in dynamic space-time graphs. In extensive experimental evaluations, our ST-RDGCN demonstrates state-of-the-art performance on the five popular video-based FER datasets, achieving overall accuracy scores of 99.69%, 91.67%, 56.51%, 69.37%, and 49.03% on the CK+, Oulu-CASIA, AFEW, DFEW, and FERV39k datasets, respectively. In particular, our ST-RDGCN outperforms the current best method by 3.6% in UAR on the most challenging FERV39k dataset. Furthermore, our analysis reveals that the dynamic cross space-time graph scheme is the most effective among the three dynamic graph construction schemes.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1675-1692"},"PeriodicalIF":9.8000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10844531/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Facial expressions in videos inherently mirror the dynamic nature of real-world facial events. Consequently, facial expression recognition (FER) should employ a dynamic graph-based representation to effectively capture the relational structure of facial expressions rather than relying on conventional grid or sequence methods. However, existing graph-based approaches have their limitations. Frame-level graph methods provide a coarse representation of the facial graph across time and space, while landmark-based graph methods need to introduce additional facial landmarks, resulting in a static graph structure. To address these challenges, we propose spatial-temporal relation-aware dynamic graph convolutional networks (ST-RDGCN). This fine-grained relation modeling approach enables the dynamic modeling of evolving facial expressions in videos through dynamic space-time graphs, eliminating the need for facial landmarks. ST-RDGCN encompasses three graph construction paradigms: dynamic independent space graph, dynamic joint space-time graph, and dynamic cross space-time graph. Furthermore, we propose a relation-aware space-time graph convolution (RSTG-Conv) operator to learn informative spatiotemporal correlations in dynamic space-time graphs. In extensive experimental evaluations, our ST-RDGCN demonstrates state-of-the-art performance on the five popular video-based FER datasets, achieving overall accuracy scores of 99.69%, 91.67%, 56.51%, 69.37%, and 49.03% on the CK+, Oulu-CASIA, AFEW, DFEW, and FERV39k datasets, respectively. In particular, our ST-RDGCN outperforms the current best method by 3.6% in UAR on the most challenging FERV39k dataset. Furthermore, our analysis reveals that the dynamic cross space-time graph scheme is the most effective among the three dynamic graph construction schemes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为基于视频的面部表情识别建模动态时空图中的细粒度关系
视频中的面部表情本质上反映了现实世界面部事件的动态性。因此,面部表情识别(FER)应该采用基于动态图的表示来有效地捕获面部表情的关系结构,而不是依赖于传统的网格或序列方法。然而,现有的基于图的方法有其局限性。帧级图方法提供了跨越时间和空间的面部图的粗略表示,而基于地标的图方法需要引入额外的面部地标,导致静态图结构。为了解决这些挑战,我们提出了时空关系感知的动态图卷积网络(ST-RDGCN)。这种细粒度的关系建模方法可以通过动态时空图对视频中不断变化的面部表情进行动态建模,消除了对面部地标的需要。ST-RDGCN包含三种图构建范式:动态独立空间图、动态联合时空图和动态跨时空图。此外,我们提出了一种关系感知的时空图卷积算子(RSTG-Conv)来学习动态时空图中的信息时空相关性。在广泛的实验评估中,我们的ST-RDGCN在五种流行的基于视频的FER数据集上表现出了最先进的性能,在CK+、Oulu-CASIA、few、DFEW和FERV39k数据集上分别达到了99.69%、91.67%、56.51%、69.37%和49.03%的总体准确率。特别是,在最具挑战性的FERV39k数据集上,我们的ST-RDGCN在UAR上比目前最好的方法高出3.6%。此外,我们的分析表明,动态跨时空图方案是三种动态图构建方案中最有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Affective Computing
IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS
CiteScore
15.00
自引率
6.20%
发文量
174
期刊介绍: The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.
期刊最新文献
Explainable Affective Body Expression Recognition with Multi-Scale Spatiotemporal Encoding and LLM-Based Reasoning Personality Traits and Demographics Analysis in Online Mental Health Discourse EEG-Based Emotion Classification Using Deep Capsule Networks for Subject-Independent and Dependent Scenarios Nasal Dominance and Nostril Breathing Variability: Potential Biomarkers of Acute Stress Charting the Unspoken: Causal Inference-Guided LLM Augmentation for Emotion Recognition in Conversation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1