Gabriel Figueiredo Miller , Juan Camilo Vásquez-Correa , Juan Rafael Orozco-Arroyave , Elmar Nöth
{"title":"Representation learning strategies to model pathological speech: Effect of multiple spectral resolutions","authors":"Gabriel Figueiredo Miller , Juan Camilo Vásquez-Correa , Juan Rafael Orozco-Arroyave , Elmar Nöth","doi":"10.1016/j.csl.2023.101584","DOIUrl":null,"url":null,"abstract":"<div><p><span>This paper considers a representation learning<span> strategy to model speech signals from patients with Parkinson’s disease, with the goal of predicting the presence of the disease, and evaluating the level of degradation of a patient’s speech. In particular, we propose a novel fusion strategy that combines wideband and narrowband spectral resolutions using a representation learning strategy based on </span></span>autoencoders<span>, called the multi-spectral autoencoder. The proposed model is able to classify the speech from Parkinson’s disease patients with accuracy up to 97%. The proposed model is also able to assess the dysarthria severity of Parkinson’s disease patients with a Spearman correlation up to 0.79. These results outperform those observed in literature where the same problem was addressed with the same corpus.</span></p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230823001031","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This paper considers a representation learning strategy to model speech signals from patients with Parkinson’s disease, with the goal of predicting the presence of the disease, and evaluating the level of degradation of a patient’s speech. In particular, we propose a novel fusion strategy that combines wideband and narrowband spectral resolutions using a representation learning strategy based on autoencoders, called the multi-spectral autoencoder. The proposed model is able to classify the speech from Parkinson’s disease patients with accuracy up to 97%. The proposed model is also able to assess the dysarthria severity of Parkinson’s disease patients with a Spearman correlation up to 0.79. These results outperform those observed in literature where the same problem was addressed with the same corpus.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.