Development and validation of a machine learning-based diagnostic model for Parkinson's disease in community-dwelling populations: Evidence from the China health and retirement longitudinal study (CHARLS)
Hongyang Fan , Sai Li , Xin Guo , Min Chen , Honggao Zhang , Yingzhu Chen
{"title":"Development and validation of a machine learning-based diagnostic model for Parkinson's disease in community-dwelling populations: Evidence from the China health and retirement longitudinal study (CHARLS)","authors":"Hongyang Fan , Sai Li , Xin Guo , Min Chen , Honggao Zhang , Yingzhu Chen","doi":"10.1016/j.parkreldis.2024.107182","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Parkinson's disease (PD) is a major neurodegenerative disorder in Middle-aged and elderly people.There is a pressing need for effective predictive models, particularly in chinese population. Objective:This study aims to develop and validate a machine learning-based diagnostic model to identify individuals with PD in community-dwelling populations using data from the China Health and Retirement Longitudinal Study (CHARLS).</div></div><div><h3>Methods</h3><div>We utilized data from 19,134 individuals aged 45 and above from the CHARLS dataset, with 265 adults reported to have PD. The external validation cohort included 1500 individuals, with 21 (1.4 %) having PD.The random forest (RF) algorithm was used to develop an interpretable PD prediction model, which was internally validated using 10-fold cross-validation and externally validated with a dataset from Northern Jiangsu People's Hospital. SHapley Additive exPlanation (SHAP) values were employed to elucidate the model's predictions.</div></div><div><h3>Results</h3><div>The RF model demonstrated robust performance with an Area Under the Curve (AUC) of 0.884 and high sensitivity, specificity, and F1 scores. The model's performance in external validation cohort, highlighting an AUC of 0.82 and an accuracy of 0.99. The model's performance remained consistent across internal and external validation cohorts. SHAP analysis provided insights into the importance and interaction of various predictors, enhancing model interpretability.</div></div><div><h3>Conclusion</h3><div>The study presents a highly accurate and interpretable machine learning-based diagnostic model to identify individuals with PD in middle-aged and older Chinese adults. By combined with predictive risk factors and chronic disease information, the model offers valuable insights for early identification and intervention, potentially mitigating PD progression.</div></div>","PeriodicalId":19970,"journal":{"name":"Parkinsonism & related disorders","volume":"130 ","pages":"Article 107182"},"PeriodicalIF":3.1000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parkinsonism & related disorders","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1353802024011945","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Parkinson's disease (PD) is a major neurodegenerative disorder in Middle-aged and elderly people.There is a pressing need for effective predictive models, particularly in chinese population. Objective:This study aims to develop and validate a machine learning-based diagnostic model to identify individuals with PD in community-dwelling populations using data from the China Health and Retirement Longitudinal Study (CHARLS).
Methods
We utilized data from 19,134 individuals aged 45 and above from the CHARLS dataset, with 265 adults reported to have PD. The external validation cohort included 1500 individuals, with 21 (1.4 %) having PD.The random forest (RF) algorithm was used to develop an interpretable PD prediction model, which was internally validated using 10-fold cross-validation and externally validated with a dataset from Northern Jiangsu People's Hospital. SHapley Additive exPlanation (SHAP) values were employed to elucidate the model's predictions.
Results
The RF model demonstrated robust performance with an Area Under the Curve (AUC) of 0.884 and high sensitivity, specificity, and F1 scores. The model's performance in external validation cohort, highlighting an AUC of 0.82 and an accuracy of 0.99. The model's performance remained consistent across internal and external validation cohorts. SHAP analysis provided insights into the importance and interaction of various predictors, enhancing model interpretability.
Conclusion
The study presents a highly accurate and interpretable machine learning-based diagnostic model to identify individuals with PD in middle-aged and older Chinese adults. By combined with predictive risk factors and chronic disease information, the model offers valuable insights for early identification and intervention, potentially mitigating PD progression.
期刊介绍:
Parkinsonism & Related Disorders publishes the results of basic and clinical research contributing to the understanding, diagnosis and treatment of all neurodegenerative syndromes in which Parkinsonism, Essential Tremor or related movement disorders may be a feature. Regular features will include: Review Articles, Point of View articles, Full-length Articles, Short Communications, Case Reports and Letter to the Editor.