Automatic Depression Recognition With an Ensemble of Multimodal Spatio-Temporal Routing Features

IF 9.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Affective Computing Pub Date : 2025-02-18 DOI:10.1109/TAFFC.2025.3543226
Yaowei Wang;Zulong Lin;Chengrong Yang;Yujue Zhou;Yun Yang
{"title":"Automatic Depression Recognition With an Ensemble of Multimodal Spatio-Temporal Routing Features","authors":"Yaowei Wang;Zulong Lin;Chengrong Yang;Yujue Zhou;Yun Yang","doi":"10.1109/TAFFC.2025.3543226","DOIUrl":null,"url":null,"abstract":"Depression, driven by growing societal pressures, significantly disrupts individuals’ physical and mental health. Automatic Depression Recognition (ADR) via facial videos has gained attention to enhance diagnostic accuracy and efficiency. However, extant methods often segment videos, losing long-term behavioral cues and introducing noise, while also exhibiting performance drops across diverse cultural and racial datasets. This study proposes a multimodal ADR approach encompassing three key components: (1) Long-term Depression Behavior Module (LDBM) employing a Transformer to capture extended depression cues, (2) Noisy Information Elimination (NIE) strategy leveraging LDBM attention scores to reduce noise and boost diagnostic precision, and (3) Multimodal Spatio-temporal Routing Feature Ensemble (MSRE) that fuses texture, Facial Action Primitives (FAPs), and Remote Photoplethysmography (rPPG) data for improved cross-dataset generalizability. Experiments on AVEC 2013, AVEC 2014, and a newly constructed CMDep dataset of 123 clinically diagnosed participants validate our method, achieving MAE/RMSE scores of 5.38/6.74, 5.09/6.83, and 5.59/8.03, respectively. The CMDep dataset includes facial expression and voice signals, with labels derived from BDI-II scores. Additionally, our method has been integrated into a user-friendly mobile application, providing a tool for real-time self-assessment of depression. This integration broadens the scope of depression detection, making it accessible to diverse populations worldwide.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1855-1872"},"PeriodicalIF":9.8000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891723/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Depression, driven by growing societal pressures, significantly disrupts individuals’ physical and mental health. Automatic Depression Recognition (ADR) via facial videos has gained attention to enhance diagnostic accuracy and efficiency. However, extant methods often segment videos, losing long-term behavioral cues and introducing noise, while also exhibiting performance drops across diverse cultural and racial datasets. This study proposes a multimodal ADR approach encompassing three key components: (1) Long-term Depression Behavior Module (LDBM) employing a Transformer to capture extended depression cues, (2) Noisy Information Elimination (NIE) strategy leveraging LDBM attention scores to reduce noise and boost diagnostic precision, and (3) Multimodal Spatio-temporal Routing Feature Ensemble (MSRE) that fuses texture, Facial Action Primitives (FAPs), and Remote Photoplethysmography (rPPG) data for improved cross-dataset generalizability. Experiments on AVEC 2013, AVEC 2014, and a newly constructed CMDep dataset of 123 clinically diagnosed participants validate our method, achieving MAE/RMSE scores of 5.38/6.74, 5.09/6.83, and 5.59/8.03, respectively. The CMDep dataset includes facial expression and voice signals, with labels derived from BDI-II scores. Additionally, our method has been integrated into a user-friendly mobile application, providing a tool for real-time self-assessment of depression. This integration broadens the scope of depression detection, making it accessible to diverse populations worldwide.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于多模态时空路径特征的自动洼地识别
由日益增长的社会压力驱动的抑郁症严重破坏了个人的身心健康。基于面部视频的抑郁症自动识别(ADR)在提高诊断准确性和效率方面受到了广泛关注。然而,现有的方法经常分割视频,失去长期的行为线索并引入噪音,同时也表现出不同文化和种族数据集的性能下降。本研究提出了一种包括三个关键组成部分的多式联运ADR方法:(1)利用变压器捕获扩展抑郁线索的长期抑郁行为模块(LDBM);(2)利用LDBM注意力评分的噪声信息消除(NIE)策略来降低噪声并提高诊断精度;(3)融合纹理、面部动作原语(FAPs)和远程光体积特征图(rPPG)数据的多模态时空路由特征集成(MSRE),以提高跨数据集的可推广性。在AVEC 2013、AVEC 2014和新构建的123名临床诊断参与者的CMDep数据集上的实验验证了我们的方法,MAE/RMSE得分分别为5.38/6.74、5.09/6.83和5.59/8.03。CMDep数据集包括面部表情和语音信号,标签来自BDI-II分数。此外,我们的方法已经集成到一个用户友好的移动应用程序中,为抑郁症的实时自我评估提供了一个工具。这种整合扩大了抑郁症检测的范围,使其适用于世界各地的不同人群。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Affective Computing
IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS
CiteScore
15.00
自引率
6.20%
发文量
174
期刊介绍: The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.
期刊最新文献
SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting Weakly Supervised Learning for Facial Affective Behavior Analysis: a Review CWEFS: Brain volume conduction effects inspired channel-wise EEG feature selection for multi-dimensional emotion recognition LES-Talker: Fine-Grained Emotion Editing for Talking Head Generation in Linear Emotion Space Modeling Continuous Weak Temporal Trends for Video-based Micro-Expression Recognition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1