利用声音作为生物标记的帕金森病分类机器学习智能系统

IF 2.3 Q3 MEDICAL INFORMATICS Healthcare Informatics Research Pub Date : 2022-07-01 Epub Date: 2022-07-31 DOI:10.4258/hir.2022.28.3.210

Ilias Tougui, Abdelilah Jilbab, Jamal El Mhamdi

{"title":"利用声音作为生物标记的帕金森病分类机器学习智能系统","authors":"Ilias Tougui, Abdelilah Jilbab, Jamal El Mhamdi","doi":"10.4258/hir.2022.28.3.210","DOIUrl":null,"url":null,"abstract":"Objectives: This study presents PD Predict, a machine learning system for Parkinson disease classification using voice as a biomarker.Methods: We first created an original set of recordings from the mPower study, and then extracted several audio features, such as mel-frequency cepstral coefficient (MFCC) components and other classical speech features, using a windowing procedure. The generated dataset was then divided into training and holdout sets. The training set was used to train two machine learning pipelines, and their performance was estimated using a nested subject-wise cross-validation approach. The holdout set was used to assess the generalizability of the pipelines for unseen data. The final pipelines were implemented in PD Predict and accessed through a prediction endpoint developed using the Django REST Framework. PD Predict is a two-component system: a desktop application that records audio recordings, extracts audio features, and makes predictions; and a server-side web application that implements the machine learning pipelines and processes incoming requests with the extracted audio features to make predictions. Our system is deployed and accessible via the following link: https://pdpredict.herokuapp.com/.Results: Both machine learning pipelines showed moderate performance, between 65% and 75% using the nested subject-wise cross-validation approach. Furthermore, they generalized well to unseen data and they did not overfit the training set.Conclusions: The architecture of PD Predict is clear, and the performance of the implemented machine learning pipelines is promising and confirms the usability of smartphone microphones for capturing digital biomarkers of disease.","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/0d/f8/hir-2022-28-3-210.PMC9388925.pdf","citationCount":"0","resultStr":"{\"title\":\"Machine Learning Smart System for Parkinson Disease Classification Using the Voice as a Biomarker.\",\"authors\":\"Ilias Tougui, Abdelilah Jilbab, Jamal El Mhamdi\",\"doi\":\"10.4258/hir.2022.28.3.210\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: This study presents PD Predict, a machine learning system for Parkinson disease classification using voice as a biomarker.Methods: We first created an original set of recordings from the mPower study, and then extracted several audio features, such as mel-frequency cepstral coefficient (MFCC) components and other classical speech features, using a windowing procedure. The generated dataset was then divided into training and holdout sets. The training set was used to train two machine learning pipelines, and their performance was estimated using a nested subject-wise cross-validation approach. The holdout set was used to assess the generalizability of the pipelines for unseen data. The final pipelines were implemented in PD Predict and accessed through a prediction endpoint developed using the Django REST Framework. PD Predict is a two-component system: a desktop application that records audio recordings, extracts audio features, and makes predictions; and a server-side web application that implements the machine learning pipelines and processes incoming requests with the extracted audio features to make predictions. Our system is deployed and accessible via the following link: https://pdpredict.herokuapp.com/.Results: Both machine learning pipelines showed moderate performance, between 65% and 75% using the nested subject-wise cross-validation approach. Furthermore, they generalized well to unseen data and they did not overfit the training set.Conclusions: The architecture of PD Predict is clear, and the performance of the implemented machine learning pipelines is promising and confirms the usability of smartphone microphones for capturing digital biomarkers of disease.\",\"PeriodicalId\":12947,\"journal\":{\"name\":\"Healthcare Informatics Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/0d/f8/hir-2022-28-3-210.PMC9388925.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Healthcare Informatics Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4258/hir.2022.28.3.210\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/7/31 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare Informatics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4258/hir.2022.28.3.210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/7/31 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

摘要

研究目的本研究介绍了帕金森病预测系统（PD Predict），这是一种利用声音作为生物标志物进行帕金森病分类的机器学习系统：我们首先创建了一组来自 mPower 研究的原始录音，然后使用窗化程序提取了一些音频特征，如 mel-frequency cepstral coefficient（MFCC）成分和其他经典语音特征。然后将生成的数据集分为训练集和保留集。训练集用于训练两个机器学习管道，并采用嵌套主体交叉验证方法对其性能进行评估。保留集用于评估管道对未见数据的通用性。最终管道在 PD Predict 中实现，并通过使用 Django REST 框架开发的预测端点进行访问。PD Predict 是一个由两部分组成的系统：一个桌面应用程序，用于记录音频录音、提取音频特征并进行预测；另一个服务器端网络应用程序，用于实现机器学习管道，并利用提取的音频特征处理传入请求以进行预测。我们的系统已部署完毕，可通过以下链接访问：https://pdpredict.herokuapp.com/.Results：采用嵌套主题交叉验证方法，两个机器学习管道都表现出了中等水平的性能，介于 65% 和 75% 之间。此外，它们还能很好地泛化到未见过的数据中，而且不会过度拟合训练集：PD Predict 的架构清晰明了，实施的机器学习管道性能良好，证实了智能手机麦克风在捕捉疾病数字生物标记物方面的可用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Machine Learning Smart System for Parkinson Disease Classification Using the Voice as a Biomarker.

Objectives: This study presents PD Predict, a machine learning system for Parkinson disease classification using voice as a biomarker.

Methods: We first created an original set of recordings from the mPower study, and then extracted several audio features, such as mel-frequency cepstral coefficient (MFCC) components and other classical speech features, using a windowing procedure. The generated dataset was then divided into training and holdout sets. The training set was used to train two machine learning pipelines, and their performance was estimated using a nested subject-wise cross-validation approach. The holdout set was used to assess the generalizability of the pipelines for unseen data. The final pipelines were implemented in PD Predict and accessed through a prediction endpoint developed using the Django REST Framework. PD Predict is a two-component system: a desktop application that records audio recordings, extracts audio features, and makes predictions; and a server-side web application that implements the machine learning pipelines and processes incoming requests with the extracted audio features to make predictions. Our system is deployed and accessible via the following link: https://pdpredict.herokuapp.com/.

Results: Both machine learning pipelines showed moderate performance, between 65% and 75% using the nested subject-wise cross-validation approach. Furthermore, they generalized well to unseen data and they did not overfit the training set.

Conclusions: The architecture of PD Predict is clear, and the performance of the implemented machine learning pipelines is promising and confirms the usability of smartphone microphones for capturing digital biomarkers of disease.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Healthcare Informatics Research MEDICAL INFORMATICS-

CiteScore

4.90

自引率

6.90%

发文量