{"title":"Speech vs. text: A comparative analysis of features for depression detection systems","authors":"M. Morales, Rivka Levitan","doi":"10.1109/SLT.2016.7846256","DOIUrl":null,"url":null,"abstract":"Depression is a serious illness that affects millions of people globally. In recent years, the task of automatic depression detection from speech has gained popularity. However, several challenges remain, including which features provide the best discrimination between classes or depression levels. Thus far, most research has focused on extracting features from the speech signal. However, the speech production system is complex and depression has been shown to affect many linguistic properties, including phonetics, semantics, and syntax. Therefore, we argue that researchers should look beyond the acoustic properties of speech by building features that capture syntactic structure and semantic content. We provide a comparative analyses of various features for depression detection. Using the same corpus, we evaluate how a system built on text-based features compares to a speech-based system. We find that a combination of features drawn from both speech and text lead to the best system performance.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"57","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846256","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 57
Abstract
Depression is a serious illness that affects millions of people globally. In recent years, the task of automatic depression detection from speech has gained popularity. However, several challenges remain, including which features provide the best discrimination between classes or depression levels. Thus far, most research has focused on extracting features from the speech signal. However, the speech production system is complex and depression has been shown to affect many linguistic properties, including phonetics, semantics, and syntax. Therefore, we argue that researchers should look beyond the acoustic properties of speech by building features that capture syntactic structure and semantic content. We provide a comparative analyses of various features for depression detection. Using the same corpus, we evaluate how a system built on text-based features compares to a speech-based system. We find that a combination of features drawn from both speech and text lead to the best system performance.