Ziming Liu, Lauren Proctor, Parker N. Collier, Xiaopeng Zhao
{"title":"Automatic Diagnosis and Prediction of Cognitive Decline Associated with Alzheimer’s Dementia through Spontaneous Speech","authors":"Ziming Liu, Lauren Proctor, Parker N. Collier, Xiaopeng Zhao","doi":"10.1109/ICSIPA52582.2021.9576784","DOIUrl":null,"url":null,"abstract":"With the increasing prevalence of Alzheimer’s disease (AD), it is important to develop detectable biomarkers to reliably identify AD in the early stage. Language deficit is one of the common signs that appear in the early stage of mild Alzheimer’s disease. Therefore, using natural language processing and related machine learning algorithms for AD diagnosis using patients’ speech recordings has drawn more attention in recent years. In this study, three approaches are proposed to extract features through speech recording in this model: (1) using fine-tuning pre-trained encoder model (BERT) for transcripts from automatic transcription, (2) hand-crafted linguistic features for transcripts from automatic transcription, and (3) selected acoustic features for denoised speech recordings. The three designed approaches are applied to three tasks: AD diagnosis, MMSE score prediction, and cognitive decline inference. The approach using BERT yields the best performance in all three challenge tasks based on cross-validation results using the training dataset. Specifically, in the AD diagnosis task, 5-fold cross-validation using encoded features based on transcripts generated from Deep Speech yields an average classification accuracy of 97.18%. In the MMSE score prediction task, 5-fold cross-validation using BERT encoded features based on transcripts generated from Deep Speech yields an average Root Mean Squared Error (RMSE) of 3.76. In the cognitive decline inference task, the leave-one-out cross-validation using BERT encoded features based on transcripts generated from Sphinx or Deep Speech yields an average classification accuracy of 100%. The analyses suggest that the combination of automatic transcription and BERT may produce a significant performance in AD related detection and prediction problems.","PeriodicalId":326688,"journal":{"name":"2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSIPA52582.2021.9576784","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
With the increasing prevalence of Alzheimer’s disease (AD), it is important to develop detectable biomarkers to reliably identify AD in the early stage. Language deficit is one of the common signs that appear in the early stage of mild Alzheimer’s disease. Therefore, using natural language processing and related machine learning algorithms for AD diagnosis using patients’ speech recordings has drawn more attention in recent years. In this study, three approaches are proposed to extract features through speech recording in this model: (1) using fine-tuning pre-trained encoder model (BERT) for transcripts from automatic transcription, (2) hand-crafted linguistic features for transcripts from automatic transcription, and (3) selected acoustic features for denoised speech recordings. The three designed approaches are applied to three tasks: AD diagnosis, MMSE score prediction, and cognitive decline inference. The approach using BERT yields the best performance in all three challenge tasks based on cross-validation results using the training dataset. Specifically, in the AD diagnosis task, 5-fold cross-validation using encoded features based on transcripts generated from Deep Speech yields an average classification accuracy of 97.18%. In the MMSE score prediction task, 5-fold cross-validation using BERT encoded features based on transcripts generated from Deep Speech yields an average Root Mean Squared Error (RMSE) of 3.76. In the cognitive decline inference task, the leave-one-out cross-validation using BERT encoded features based on transcripts generated from Sphinx or Deep Speech yields an average classification accuracy of 100%. The analyses suggest that the combination of automatic transcription and BERT may produce a significant performance in AD related detection and prediction problems.