Explainable machine learning model for assessing health status in patients with comorbid coronary heart disease and depression: Development and validation study
{"title":"Explainable machine learning model for assessing health status in patients with comorbid coronary heart disease and depression: Development and validation study","authors":"Jiqing Li, Shuo Wu, Jianhua Gu","doi":"10.1016/j.ijmedinf.2025.105808","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Coronary heart disease (CHD) and depression frequently co-occur, significantly impacting patient outcomes. However, comprehensive health status assessment tools for this complex population are lacking. This study aimed to develop and validate an explainable machine learning model to evaluate overall health status in patients with comorbid CHD and depression.</div></div><div><h3>Methods</h3><div>Utilizing data from the 2021–2022 Behavioral Risk Factor Surveillance System, we developed and externally validated machine learning models to predict overall health status, defined as having both poor physical and mental health for ≥ 14 days in the past 30 days. Eleven machine learning algorithms were evaluated, including artificial neural networks, support vector machines, and ensemble methods. The SHapley Additive exPlanations (SHAP) method was employed to enhance model interpretability. Model performance was assessed using discrimination, calibration, and decision curve analysis.</div></div><div><h3>Results</h3><div>The study included 9,747 participants in the derivation cohort and 8,394 in the external validation cohort. Among the eleven algorithms evaluated, an optimized XGBoost model with eight key features demonstrated balanced performance. SHAP analysis revealed that employment status, physical activity, income, and age were the most influential predictors. The model maintained good discrimination (AUC 0.712, 95% CI 0.703–0.721 in derivation; AUC 0.711, 95% CI 0.701–0.721 in validation), calibration and clinical utility across both cohorts.</div></div><div><h3>Conclusion</h3><div>Our explainable machine learning model provides a novel, comprehensive approach to assessing health status in patients with comorbid CHD and depression, offering valuable insights for personalized management strategies.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"196 ","pages":"Article 105808"},"PeriodicalIF":4.1000,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625000255","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Coronary heart disease (CHD) and depression frequently co-occur, significantly impacting patient outcomes. However, comprehensive health status assessment tools for this complex population are lacking. This study aimed to develop and validate an explainable machine learning model to evaluate overall health status in patients with comorbid CHD and depression.
Methods
Utilizing data from the 2021–2022 Behavioral Risk Factor Surveillance System, we developed and externally validated machine learning models to predict overall health status, defined as having both poor physical and mental health for ≥ 14 days in the past 30 days. Eleven machine learning algorithms were evaluated, including artificial neural networks, support vector machines, and ensemble methods. The SHapley Additive exPlanations (SHAP) method was employed to enhance model interpretability. Model performance was assessed using discrimination, calibration, and decision curve analysis.
Results
The study included 9,747 participants in the derivation cohort and 8,394 in the external validation cohort. Among the eleven algorithms evaluated, an optimized XGBoost model with eight key features demonstrated balanced performance. SHAP analysis revealed that employment status, physical activity, income, and age were the most influential predictors. The model maintained good discrimination (AUC 0.712, 95% CI 0.703–0.721 in derivation; AUC 0.711, 95% CI 0.701–0.721 in validation), calibration and clinical utility across both cohorts.
Conclusion
Our explainable machine learning model provides a novel, comprehensive approach to assessing health status in patients with comorbid CHD and depression, offering valuable insights for personalized management strategies.
背景:冠心病(CHD)和抑郁症经常同时发生,显著影响患者的预后。然而,缺乏针对这一复杂人群的全面健康状况评估工具。本研究旨在开发和验证一个可解释的机器学习模型,以评估冠心病合并抑郁症患者的整体健康状况。方法:利用2021-2022年行为风险因素监测系统的数据,我们开发并外部验证了机器学习模型来预测整体健康状况,定义为在过去30天内身体和心理健康状况不佳≥14天。评估了11种机器学习算法,包括人工神经网络,支持向量机和集成方法。采用SHapley加性解释(SHAP)方法提高模型的可解释性。通过判别、校准和决策曲线分析来评估模型的性能。结果:该研究包括9,747名衍生队列参与者和8,394名外部验证队列参与者。在评估的11种算法中,具有8个关键特征的优化XGBoost模型表现出平衡的性能。SHAP分析显示,就业状况、体力活动、收入和年龄是影响最大的预测因素。模型具有良好的判别性(AUC 0.712, 95% CI 0.703-0.721);AUC 0.711, 95% CI 0.701-0.721(验证),校准和临床应用。结论:我们可解释的机器学习模型为评估冠心病合并抑郁症患者的健康状况提供了一种新颖、全面的方法,为个性化管理策略提供了有价值的见解。
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.