Fair prediction of 2-year stroke risk in patients with atrial fibrillation.

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of the American Medical Informatics Association Pub Date : 2024-12-01 DOI:10.1093/jamia/ocae170

Jifan Gao, Philip Mar, Zheng-Zheng Tang, Guanhua Chen

{"title":"Fair prediction of 2-year stroke risk in patients with atrial fibrillation.","authors":"Jifan Gao, Philip Mar, Zheng-Zheng Tang, Guanhua Chen","doi":"10.1093/jamia/ocae170","DOIUrl":null,"url":null,"abstract":"Objective: This study aims to develop machine learning models that provide both accurate and equitable predictions of 2-year stroke risk for patients with atrial fibrillation across diverse racial groups.Materials and methods: Our study utilized structured electronic health records (EHR) data from the All of Us Research Program. Machine learning models (LightGBM) were utilized to capture the relations between stroke risks and the predictors used by the widely recognized CHADS2 and CHA2DS2-VASc scores. We mitigated the racial disparity by creating a representative tuning set, customizing tuning criteria, and setting binary thresholds separately for subgroups. We constructed a hold-out test set that not only supports temporal validation but also includes a larger proportion of Black/African Americans for fairness validation.Results: Compared to the original CHADS2 and CHA2DS2-VASc scores, significant improvements were achieved by modeling their predictors using machine learning models (Area Under the Receiver Operating Characteristic curve from near 0.70 to above 0.80). Furthermore, applying our disparity mitigation strategies can effectively enhance model fairness compared to the conventional cross-validation approach.Discussion: Modeling CHADS2 and CHA2DS2-VASc risk factors with LightGBM and our disparity mitigation strategies achieved decent discriminative performance and excellent fairness performance. In addition, this approach can provide a complete interpretation of each predictor. These highlight its potential utility in clinical practice.Conclusions: Our research presents a practical example of addressing clinical challenges through the All of Us Research Program data. The disparity mitigation framework we proposed is adaptable across various models and data modalities, demonstrating broad potential in clinical informatics.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"2820-2828"},"PeriodicalIF":4.7000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631105/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae170","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: This study aims to develop machine learning models that provide both accurate and equitable predictions of 2-year stroke risk for patients with atrial fibrillation across diverse racial groups.

Materials and methods: Our study utilized structured electronic health records (EHR) data from the All of Us Research Program. Machine learning models (LightGBM) were utilized to capture the relations between stroke risks and the predictors used by the widely recognized CHADS2 and CHA2DS2-VASc scores. We mitigated the racial disparity by creating a representative tuning set, customizing tuning criteria, and setting binary thresholds separately for subgroups. We constructed a hold-out test set that not only supports temporal validation but also includes a larger proportion of Black/African Americans for fairness validation.

Results: Compared to the original CHADS2 and CHA2DS2-VASc scores, significant improvements were achieved by modeling their predictors using machine learning models (Area Under the Receiver Operating Characteristic curve from near 0.70 to above 0.80). Furthermore, applying our disparity mitigation strategies can effectively enhance model fairness compared to the conventional cross-validation approach.

Discussion: Modeling CHADS2 and CHA2DS2-VASc risk factors with LightGBM and our disparity mitigation strategies achieved decent discriminative performance and excellent fairness performance. In addition, this approach can provide a complete interpretation of each predictor. These highlight its potential utility in clinical practice.

Conclusions: Our research presents a practical example of addressing clinical challenges through the All of Us Research Program data. The disparity mitigation framework we proposed is adaptable across various models and data modalities, demonstrating broad potential in clinical informatics.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

对心房颤动患者 2 年中风风险的合理预测。

目的：本研究旨在开发机器学习模型，以准确、公平地预测不同种族群体心房颤动患者的 2 年中风风险：本研究旨在开发机器学习模型，为不同种族群体的心房颤动患者提供准确、公平的 2 年中风风险预测：我们的研究利用了 "我们所有人研究计划 "的结构化电子健康记录（EHR）数据。我们利用机器学习模型（LightGBM）来捕捉中风风险与被广泛认可的 CHADS2 和 CHA2DS2-VASc 评分所使用的预测因子之间的关系。我们通过创建具有代表性的调整集、定制调整标准以及为亚组分别设置二进制阈值来减少种族差异。我们构建了一个暂不测试集，它不仅支持时间验证，还包括更大比例的黑人/非裔美国人，用于公平性验证：结果：与最初的 CHADS2 和 CHA2DS2-VASc 评分相比，通过使用机器学习模型对其预测因子进行建模，结果有了显著改善（接收者工作特征曲线下面积从接近 0.70 提高到 0.80 以上）。此外，与传统的交叉验证方法相比，采用我们的差异缓解策略可以有效提高模型的公平性：讨论：利用 LightGBM 和我们的差异缓解策略对 CHADS2 和 CHA2DS2-VASc 危险因素建模，取得了良好的判别性能和出色的公平性。此外，这种方法还能提供对每个预测因子的完整解释。这些都凸显了它在临床实践中的潜在用途：我们的研究提供了一个通过 "全民研究计划 "数据应对临床挑战的实例。我们提出的差异缓解框架可适用于各种模型和数据模式，展示了临床信息学的广泛潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of the American Medical Informatics Association 医学-计算机：跨学科应用

CiteScore

14.50

自引率

7.80%

发文量

230

审稿时长

3-8 weeks

期刊介绍： JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.