Automatic Depression Recognition With an Ensemble of Multimodal Spatio-Temporal Routing Features

IF 9.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Affective Computing Pub Date : 2025-02-18 DOI:10.1109/TAFFC.2025.3543226

Yaowei Wang;Zulong Lin;Chengrong Yang;Yujue Zhou;Yun Yang

{"title":"Automatic Depression Recognition With an Ensemble of Multimodal Spatio-Temporal Routing Features","authors":"Yaowei Wang;Zulong Lin;Chengrong Yang;Yujue Zhou;Yun Yang","doi":"10.1109/TAFFC.2025.3543226","DOIUrl":null,"url":null,"abstract":"Depression, driven by growing societal pressures, significantly disrupts individuals’ physical and mental health. Automatic Depression Recognition (ADR) via facial videos has gained attention to enhance diagnostic accuracy and efficiency. However, extant methods often segment videos, losing long-term behavioral cues and introducing noise, while also exhibiting performance drops across diverse cultural and racial datasets. This study proposes a multimodal ADR approach encompassing three key components: (1) Long-term Depression Behavior Module (LDBM) employing a Transformer to capture extended depression cues, (2) Noisy Information Elimination (NIE) strategy leveraging LDBM attention scores to reduce noise and boost diagnostic precision, and (3) Multimodal Spatio-temporal Routing Feature Ensemble (MSRE) that fuses texture, Facial Action Primitives (FAPs), and Remote Photoplethysmography (rPPG) data for improved cross-dataset generalizability. Experiments on AVEC 2013, AVEC 2014, and a newly constructed CMDep dataset of 123 clinically diagnosed participants validate our method, achieving MAE/RMSE scores of 5.38/6.74, 5.09/6.83, and 5.59/8.03, respectively. The CMDep dataset includes facial expression and voice signals, with labels derived from BDI-II scores. Additionally, our method has been integrated into a user-friendly mobile application, providing a tool for real-time self-assessment of depression. This integration broadens the scope of depression detection, making it accessible to diverse populations worldwide.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1855-1872"},"PeriodicalIF":9.8000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891723/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Depression, driven by growing societal pressures, significantly disrupts individuals’ physical and mental health. Automatic Depression Recognition (ADR) via facial videos has gained attention to enhance diagnostic accuracy and efficiency. However, extant methods often segment videos, losing long-term behavioral cues and introducing noise, while also exhibiting performance drops across diverse cultural and racial datasets. This study proposes a multimodal ADR approach encompassing three key components: (1) Long-term Depression Behavior Module (LDBM) employing a Transformer to capture extended depression cues, (2) Noisy Information Elimination (NIE) strategy leveraging LDBM attention scores to reduce noise and boost diagnostic precision, and (3) Multimodal Spatio-temporal Routing Feature Ensemble (MSRE) that fuses texture, Facial Action Primitives (FAPs), and Remote Photoplethysmography (rPPG) data for improved cross-dataset generalizability. Experiments on AVEC 2013, AVEC 2014, and a newly constructed CMDep dataset of 123 clinically diagnosed participants validate our method, achieving MAE/RMSE scores of 5.38/6.74, 5.09/6.83, and 5.59/8.03, respectively. The CMDep dataset includes facial expression and voice signals, with labels derived from BDI-II scores. Additionally, our method has been integrated into a user-friendly mobile application, providing a tool for real-time self-assessment of depression. This integration broadens the scope of depression detection, making it accessible to diverse populations worldwide.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于多模态时空路径特征的自动洼地识别

由日益增长的社会压力驱动的抑郁症严重破坏了个人的身心健康。基于面部视频的抑郁症自动识别（ADR）在提高诊断准确性和效率方面受到了广泛关注。然而，现有的方法经常分割视频，失去长期的行为线索并引入噪音，同时也表现出不同文化和种族数据集的性能下降。本研究提出了一种包括三个关键组成部分的多式联运ADR方法：(1)利用变压器捕获扩展抑郁线索的长期抑郁行为模块（LDBM）；(2)利用LDBM注意力评分的噪声信息消除（NIE）策略来降低噪声并提高诊断精度；(3)融合纹理、面部动作原语（FAPs）和远程光体积特征图（rPPG）数据的多模态时空路由特征集成（MSRE），以提高跨数据集的可推广性。在AVEC 2013、AVEC 2014和新构建的123名临床诊断参与者的CMDep数据集上的实验验证了我们的方法，MAE/RMSE得分分别为5.38/6.74、5.09/6.83和5.59/8.03。CMDep数据集包括面部表情和语音信号，标签来自BDI-II分数。此外，我们的方法已经集成到一个用户友好的移动应用程序中，为抑郁症的实时自我评估提供了一个工具。这种整合扩大了抑郁症检测的范围，使其适用于世界各地的不同人群。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

15.00

自引率

6.20%

发文量

174

期刊介绍： The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.