Background
Cognitive impairment in older adults poses a growing public health challenge, yet accessible screening tools remain limited. We aimed to develop and validate an interpretable machine learning model for cognitive impairment prediction by routinely collecting clinical data.
Methods
We analyzed 1061 participants from the U.S. National Health and Nutrition Examination Survey (NHANES 2011–2014). Feature selection combined multivariable regression, restricted cubic splines, and the Boruta algorithm to identify 40 clinical, demographic, and socioeconomic variables. Twelve machine learning models (including Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and Random Forest (RF)) were trained and externally validated on NHANES 2001–2002 (n = 531). Model performance was evaluated by area under the receiver operating characteristic curve (AUC-ROC), calibration (Brier score), accuracy, and sensitivity. Additionally, an assessment of fairness was conducted across racial subgroups. Interpretability was enhanced via SHapley Additive exPlanations (SHAP).
Results
The SVM model demonstrated optimal generalizability, achieving an external validation AUC of 0.8265 (95 %CI: 0.7867–0.8582) with sustained calibration (Brier score = 0.1703). Subgroup analyses showed no statistically significant AUC differences (all P > 0.05). SHAP analysis identified socioeconomic factors, systemic inflammation indices, and metabolic markers as key predictors.
Limitations.
Generalizability may be limited to U.S. populations, and unmeasured biomarkers (e.g., amyloid-β) could affect prediction accuracy. Subgroup analyses for minorities were constrained by sample size.
Conclusion
Our interpretable prediction strategy enables rapid cognitive risk assessment using routine clinical data, providing a cost-effective decision support tool adaptable to electronic health record systems.
扫码关注我们
求助内容:
应助结果提醒方式:
