Jian Li;Yuliang Zhao;Huawei Zhang;Wayne Jason Li;Changzeng Fu;Chao Lian;Peng Shan
{"title":"Image Encoding and Fusion of Multi-Modal Data Enhance Depression Diagnosis in Parkinson's Disease Patients","authors":"Jian Li;Yuliang Zhao;Huawei Zhang;Wayne Jason Li;Changzeng Fu;Chao Lian;Peng Shan","doi":"10.1109/TAFFC.2024.3418415","DOIUrl":null,"url":null,"abstract":"The diagnosis of depression in individuals with Parkinson's Disease (PD) through the utilization of multimodal fusion techniques represents a significant domain. The primary challenge involves the creation of a robust fusion framework to address the heterogeneity among different modalities effectively. However, previous studies primarily focused on interactions between heterogeneous data, neglecting the structural similarities among isomorphic data, resulting in a substantial loss of feature information when merging heterogeneous data. In this study, we introduced a multi-modal data image encoding and fusion approach for diagnosing depression in PD patients. Additionally, we proposed a multi-modal dataset encompassing motion, facial expression, and audio data. First, we designed an RGB and sparse coding method to encode the multi-modal data, achieving the isomorphic transformation of multi-modal information and extracting feature information from lower-dimensional spaces. Furthermore, we introduced a Spatial-Temporal Network (STN) to fuse the three types of encoded images. We incorporated the Relation Global Attention (RGA) to enhance feature extraction and leverage all encoded image location feature nodes for balanced decision attention. Finally, recognizing the limitations of traditional machine learning algorithms in handling multi-tasks in medical diagnosis, we established a multi-task weighted loss function to achieve depression identification and severity prediction through Multi-Task learning (MTL).","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 1","pages":"145-160"},"PeriodicalIF":9.8000,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10570295/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The diagnosis of depression in individuals with Parkinson's Disease (PD) through the utilization of multimodal fusion techniques represents a significant domain. The primary challenge involves the creation of a robust fusion framework to address the heterogeneity among different modalities effectively. However, previous studies primarily focused on interactions between heterogeneous data, neglecting the structural similarities among isomorphic data, resulting in a substantial loss of feature information when merging heterogeneous data. In this study, we introduced a multi-modal data image encoding and fusion approach for diagnosing depression in PD patients. Additionally, we proposed a multi-modal dataset encompassing motion, facial expression, and audio data. First, we designed an RGB and sparse coding method to encode the multi-modal data, achieving the isomorphic transformation of multi-modal information and extracting feature information from lower-dimensional spaces. Furthermore, we introduced a Spatial-Temporal Network (STN) to fuse the three types of encoded images. We incorporated the Relation Global Attention (RGA) to enhance feature extraction and leverage all encoded image location feature nodes for balanced decision attention. Finally, recognizing the limitations of traditional machine learning algorithms in handling multi-tasks in medical diagnosis, we established a multi-task weighted loss function to achieve depression identification and severity prediction through Multi-Task learning (MTL).
期刊介绍:
The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.