基于深度多任务学习的音频模式相关精神障碍检测

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computer Speech and Language Pub Date : 2024-08-13 DOI:10.1016/j.csl.2024.101710
{"title":"基于深度多任务学习的音频模式相关精神障碍检测","authors":"","doi":"10.1016/j.csl.2024.101710","DOIUrl":null,"url":null,"abstract":"<div><p>The existence of correlation among mental disorders is a well-known phenomenon. Multi-task learning (MTL) has been reported to yield enhanced detection performance of a targeted mental disorder by leveraging its correlation with other related mental disorders, mainly in textual and visual modalities. The validation of the same on audio modality is yet to be explored. In this study, we explore homogeneous and heterogeneous MTL paradigms for detecting two correlated mental disorders, namely major depressive disorder (MDD) and post-traumatic stress disorder (PTSD), on a publicly available audio dataset. The detection of both disorders is interchangeably employed as an auxiliary task when the other is the main task. In addition, a few other tasks are employed as auxiliary tasks. The results show that both MTL paradigms, implemented using two considered deep-learning models, outperformed the corresponding single-task learning (STL). The best relative improvement in the detection performance of MDD and PTSD is found to be 29.9% and 28.8%, respectively. Furthermore, we analyzed the cross-corpus generalization of MTL using two distinct datasets that involve MDD/PTSD instances. The results indicate that the generalizability of MTL is significantly superior to that of STL. The best relative increment in the cross-corpus generalization performance of MDD and PTSD detection is found to be 25.0% and 56.5%, respectively.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0885230824000937/pdfft?md5=abe8ab646f019a4cea29fbd4acdd6557&pid=1-s2.0-S0885230824000937-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Deep multi-task learning based detection of correlated mental disorders using audio modality\",\"authors\":\"\",\"doi\":\"10.1016/j.csl.2024.101710\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The existence of correlation among mental disorders is a well-known phenomenon. Multi-task learning (MTL) has been reported to yield enhanced detection performance of a targeted mental disorder by leveraging its correlation with other related mental disorders, mainly in textual and visual modalities. The validation of the same on audio modality is yet to be explored. In this study, we explore homogeneous and heterogeneous MTL paradigms for detecting two correlated mental disorders, namely major depressive disorder (MDD) and post-traumatic stress disorder (PTSD), on a publicly available audio dataset. The detection of both disorders is interchangeably employed as an auxiliary task when the other is the main task. In addition, a few other tasks are employed as auxiliary tasks. The results show that both MTL paradigms, implemented using two considered deep-learning models, outperformed the corresponding single-task learning (STL). The best relative improvement in the detection performance of MDD and PTSD is found to be 29.9% and 28.8%, respectively. Furthermore, we analyzed the cross-corpus generalization of MTL using two distinct datasets that involve MDD/PTSD instances. The results indicate that the generalizability of MTL is significantly superior to that of STL. The best relative increment in the cross-corpus generalization performance of MDD and PTSD detection is found to be 25.0% and 56.5%, respectively.</p></div>\",\"PeriodicalId\":50638,\"journal\":{\"name\":\"Computer Speech and Language\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0885230824000937/pdfft?md5=abe8ab646f019a4cea29fbd4acdd6557&pid=1-s2.0-S0885230824000937-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Speech and Language\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0885230824000937\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824000937","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

精神障碍之间存在相关性是一个众所周知的现象。据报道,多任务学习(MTL)可利用目标精神障碍与其他相关精神障碍的相关性(主要是在文本和视觉模式中),提高目标精神障碍的检测性能。同样的方法在音频模式上的验证还有待探索。在本研究中,我们探索了同质和异质 MTL 范式,用于在公开可用的音频数据集上检测两种相关精神障碍,即重度抑郁障碍(MDD)和创伤后应激障碍(PTSD)。当一项任务是主要任务时,这两种疾病的检测则作为辅助任务交替使用。此外,还有一些其他任务也被用作辅助任务。结果表明,使用两种深度学习模型实现的 MTL 范式都优于相应的单任务学习(STL)。MDD 和 PTSD 检测性能的最佳相对改善率分别为 29.9% 和 28.8%。此外,我们还使用涉及 MDD/PTSD 实例的两个不同数据集分析了 MTL 的跨语料库泛化能力。结果表明,MTL 的泛化能力明显优于 STL。MDD 和 PTSD 检测的跨语料库泛化性能的最佳相对增量分别为 25.0% 和 56.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Deep multi-task learning based detection of correlated mental disorders using audio modality

The existence of correlation among mental disorders is a well-known phenomenon. Multi-task learning (MTL) has been reported to yield enhanced detection performance of a targeted mental disorder by leveraging its correlation with other related mental disorders, mainly in textual and visual modalities. The validation of the same on audio modality is yet to be explored. In this study, we explore homogeneous and heterogeneous MTL paradigms for detecting two correlated mental disorders, namely major depressive disorder (MDD) and post-traumatic stress disorder (PTSD), on a publicly available audio dataset. The detection of both disorders is interchangeably employed as an auxiliary task when the other is the main task. In addition, a few other tasks are employed as auxiliary tasks. The results show that both MTL paradigms, implemented using two considered deep-learning models, outperformed the corresponding single-task learning (STL). The best relative improvement in the detection performance of MDD and PTSD is found to be 29.9% and 28.8%, respectively. Furthermore, we analyzed the cross-corpus generalization of MTL using two distinct datasets that involve MDD/PTSD instances. The results indicate that the generalizability of MTL is significantly superior to that of STL. The best relative increment in the cross-corpus generalization performance of MDD and PTSD detection is found to be 25.0% and 56.5%, respectively.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computer Speech and Language
Computer Speech and Language 工程技术-计算机:人工智能
CiteScore
11.30
自引率
4.70%
发文量
80
审稿时长
22.9 weeks
期刊介绍: Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.
期刊最新文献
Editorial Board Enhancing analysis of diadochokinetic speech using deep neural networks Copiously Quote Classics: Improving Chinese Poetry Generation with historical allusion knowledge Significance of chirp MFCC as a feature in speech and audio applications Artificial disfluency detection, uh no, disfluency generation for the masses
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1