Toward Knowledge-Driven Speech-Based Models of Depression: Leveraging Spectrotemporal Variations in Speech Vowels

2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) Pub Date : 2022-09-27 DOI:10.1109/BHI56158.2022.9926939

Kexin Feng, Theodora Chaspari

{"title":"Toward Knowledge-Driven Speech-Based Models of Depression: Leveraging Spectrotemporal Variations in Speech Vowels","authors":"Kexin Feng, Theodora Chaspari","doi":"10.1109/BHI56158.2022.9926939","DOIUrl":null,"url":null,"abstract":"Psychomotor retardation associated with depression has been linked with tangible differences in vowel production. This paper investigates a knowledge-driven machine learning (ML) method that integrates spectrotemporal information of speech at the vowel-level to identify the depression. Low-level speech descriptors are learned by a convolutional neural network (CNN) that is trained for vowel classification. The temporal evolution of those low-level descriptors is modeled at the high-level within and across utterances via a long short-term memory (LSTM) model that takes the final depression decision. A modified version of the Local Interpretable Model-agnostic Explanations (LIME) is further used to identify the impact of the low-level spectrotemporal vowel variation on the decisions and observe the high-level temporal change of the depression likelihood. The proposed method outperforms baselines that model the spectrotemporal information in speech without integrating the vowel-based information, as well as ML models trained with conventional prosodic and spectrotemporal features. The conducted explainability analysis indicates that spectrotemporal information corresponding to non-vowel segments less important than the vowel-based information. Explainability of the high-level information capturing the segment-by-segment decisions is further inspected for participants with and without depression. The findings from this work can provide the foundation toward knowledge-driven interpretable decision-support systems that can assist clinicians to better understand fine-grain temporal changes in speech data, ultimately augmenting mental health diagnosis and care.","PeriodicalId":347210,"journal":{"name":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BHI56158.2022.9926939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Psychomotor retardation associated with depression has been linked with tangible differences in vowel production. This paper investigates a knowledge-driven machine learning (ML) method that integrates spectrotemporal information of speech at the vowel-level to identify the depression. Low-level speech descriptors are learned by a convolutional neural network (CNN) that is trained for vowel classification. The temporal evolution of those low-level descriptors is modeled at the high-level within and across utterances via a long short-term memory (LSTM) model that takes the final depression decision. A modified version of the Local Interpretable Model-agnostic Explanations (LIME) is further used to identify the impact of the low-level spectrotemporal vowel variation on the decisions and observe the high-level temporal change of the depression likelihood. The proposed method outperforms baselines that model the spectrotemporal information in speech without integrating the vowel-based information, as well as ML models trained with conventional prosodic and spectrotemporal features. The conducted explainability analysis indicates that spectrotemporal information corresponding to non-vowel segments less important than the vowel-based information. Explainability of the high-level information capturing the segment-by-segment decisions is further inspected for participants with and without depression. The findings from this work can provide the foundation toward knowledge-driven interpretable decision-support systems that can assist clinicians to better understand fine-grain temporal changes in speech data, ultimately augmenting mental health diagnosis and care.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

面向知识驱动的基于语音的抑郁症模型:利用语音元音的光谱时间变化

与抑郁症相关的精神运动迟缓与元音产生的明显差异有关。本文研究了一种知识驱动的机器学习(ML)方法，该方法在元音水平上整合语音的光谱时间信息来识别凹陷。低级语音描述符由经过元音分类训练的卷积神经网络(CNN)学习。这些低级描述符的时间演化通过长短期记忆(LSTM)模型在话语内部和话语之间的高层上建模，该模型采取最终的抑郁决策。进一步利用改进的局部可解释模型-不可知论解释(LIME)来确定低水平分频元音变化对决策的影响，并观察抑郁可能性的高水平时间变化。该方法优于对语音中的光谱时间信息进行建模而不集成基于元音的信息的基线，以及使用常规韵律和光谱时间特征训练的ML模型。所进行的可解释性分析表明，非元音片段对应的光谱时间信息不如基于元音的信息重要。对于有抑郁和没有抑郁的参与者，进一步检查了捕获分段决策的高级信息的可解释性。这项工作的发现可以为知识驱动的可解释决策支持系统提供基础，该系统可以帮助临床医生更好地理解语音数据的细粒度时间变化，最终增强心理健康诊断和护理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)

自引率

0.00%

发文量