Yaowei Wang;Zulong Lin;Chengrong Yang;Yujue Zhou;Yun Yang
{"title":"Automatic Depression Recognition With an Ensemble of Multimodal Spatio-Temporal Routing Features","authors":"Yaowei Wang;Zulong Lin;Chengrong Yang;Yujue Zhou;Yun Yang","doi":"10.1109/TAFFC.2025.3543226","DOIUrl":null,"url":null,"abstract":"Depression, driven by growing societal pressures, significantly disrupts individuals’ physical and mental health. Automatic Depression Recognition (ADR) via facial videos has gained attention to enhance diagnostic accuracy and efficiency. However, extant methods often segment videos, losing long-term behavioral cues and introducing noise, while also exhibiting performance drops across diverse cultural and racial datasets. This study proposes a multimodal ADR approach encompassing three key components: (1) Long-term Depression Behavior Module (LDBM) employing a Transformer to capture extended depression cues, (2) Noisy Information Elimination (NIE) strategy leveraging LDBM attention scores to reduce noise and boost diagnostic precision, and (3) Multimodal Spatio-temporal Routing Feature Ensemble (MSRE) that fuses texture, Facial Action Primitives (FAPs), and Remote Photoplethysmography (rPPG) data for improved cross-dataset generalizability. Experiments on AVEC 2013, AVEC 2014, and a newly constructed CMDep dataset of 123 clinically diagnosed participants validate our method, achieving MAE/RMSE scores of 5.38/6.74, 5.09/6.83, and 5.59/8.03, respectively. The CMDep dataset includes facial expression and voice signals, with labels derived from BDI-II scores. Additionally, our method has been integrated into a user-friendly mobile application, providing a tool for real-time self-assessment of depression. This integration broadens the scope of depression detection, making it accessible to diverse populations worldwide.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1855-1872"},"PeriodicalIF":9.8000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891723/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Depression, driven by growing societal pressures, significantly disrupts individuals’ physical and mental health. Automatic Depression Recognition (ADR) via facial videos has gained attention to enhance diagnostic accuracy and efficiency. However, extant methods often segment videos, losing long-term behavioral cues and introducing noise, while also exhibiting performance drops across diverse cultural and racial datasets. This study proposes a multimodal ADR approach encompassing three key components: (1) Long-term Depression Behavior Module (LDBM) employing a Transformer to capture extended depression cues, (2) Noisy Information Elimination (NIE) strategy leveraging LDBM attention scores to reduce noise and boost diagnostic precision, and (3) Multimodal Spatio-temporal Routing Feature Ensemble (MSRE) that fuses texture, Facial Action Primitives (FAPs), and Remote Photoplethysmography (rPPG) data for improved cross-dataset generalizability. Experiments on AVEC 2013, AVEC 2014, and a newly constructed CMDep dataset of 123 clinically diagnosed participants validate our method, achieving MAE/RMSE scores of 5.38/6.74, 5.09/6.83, and 5.59/8.03, respectively. The CMDep dataset includes facial expression and voice signals, with labels derived from BDI-II scores. Additionally, our method has been integrated into a user-friendly mobile application, providing a tool for real-time self-assessment of depression. This integration broadens the scope of depression detection, making it accessible to diverse populations worldwide.
期刊介绍:
The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.