比较可解释机器学习方法与传统统计方法评估中风风险模型:回顾性队列研究。

Q2 Medicine JMIR Cardio Pub Date : 2023-07-26 DOI:10.2196/47736
Sermkiat Lolak, John Attia, Gareth J McKay, Ammarin Thakkinstian
{"title":"比较可解释机器学习方法与传统统计方法评估中风风险模型:回顾性队列研究。","authors":"Sermkiat Lolak,&nbsp;John Attia,&nbsp;Gareth J McKay,&nbsp;Ammarin Thakkinstian","doi":"10.2196/47736","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Stroke has multiple modifiable and nonmodifiable risk factors and represents a leading cause of death globally. Understanding the complex interplay of stroke risk factors is thus not only a scientific necessity but a critical step toward improving global health outcomes.</p><p><strong>Objective: </strong>We aim to assess the performance of explainable machine learning models in predicting stroke risk factors using real-world cohort data by comparing explainable machine learning models with conventional statistical methods.</p><p><strong>Methods: </strong>This retrospective cohort included high-risk patients from Ramathibodi Hospital in Thailand between January 2010 and December 2020. We compared the performance and explainability of logistic regression (LR), Cox proportional hazard, Bayesian network (BN), tree-augmented Naïve Bayes (TAN), extreme gradient boosting (XGBoost), and explainable boosting machine (EBM) models. We used multiple imputation by chained equations for missing data and discretized continuous variables as needed. Models were evaluated using C-statistics and F<sub>1</sub>-scores.</p><p><strong>Results: </strong>Out of 275,247 high-risk patients, 9659 (3.5%) experienced a stroke. XGBoost demonstrated the highest performance with a C-statistic of 0.89 and an F<sub>1</sub>-score of 0.80 followed by EBM and TAN with C-statistics of 0.87 and 0.83, respectively; LR and BN had similar C-statistics of 0.80. Significant factors associated with stroke included atrial fibrillation (AF), hypertension (HT), antiplatelets, HDL, and age. AF, HT, and antihypertensive medication were common significant factors across most models, with AF being the strongest factor in LR, XGBoost, BN, and TAN models.</p><p><strong>Conclusions: </strong>Our study developed stroke prediction models to identify crucial predictive factors such as AF, HT, or systolic blood pressure or antihypertensive medication, anticoagulant medication, HDL, age, and statin use in high-risk patients. The explainable XGBoost was the best model in predicting stroke risk, followed by EBM.</p>","PeriodicalId":14706,"journal":{"name":"JMIR Cardio","volume":"7 ","pages":"e47736"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10413234/pdf/","citationCount":"1","resultStr":"{\"title\":\"Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study.\",\"authors\":\"Sermkiat Lolak,&nbsp;John Attia,&nbsp;Gareth J McKay,&nbsp;Ammarin Thakkinstian\",\"doi\":\"10.2196/47736\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Stroke has multiple modifiable and nonmodifiable risk factors and represents a leading cause of death globally. Understanding the complex interplay of stroke risk factors is thus not only a scientific necessity but a critical step toward improving global health outcomes.</p><p><strong>Objective: </strong>We aim to assess the performance of explainable machine learning models in predicting stroke risk factors using real-world cohort data by comparing explainable machine learning models with conventional statistical methods.</p><p><strong>Methods: </strong>This retrospective cohort included high-risk patients from Ramathibodi Hospital in Thailand between January 2010 and December 2020. We compared the performance and explainability of logistic regression (LR), Cox proportional hazard, Bayesian network (BN), tree-augmented Naïve Bayes (TAN), extreme gradient boosting (XGBoost), and explainable boosting machine (EBM) models. We used multiple imputation by chained equations for missing data and discretized continuous variables as needed. Models were evaluated using C-statistics and F<sub>1</sub>-scores.</p><p><strong>Results: </strong>Out of 275,247 high-risk patients, 9659 (3.5%) experienced a stroke. XGBoost demonstrated the highest performance with a C-statistic of 0.89 and an F<sub>1</sub>-score of 0.80 followed by EBM and TAN with C-statistics of 0.87 and 0.83, respectively; LR and BN had similar C-statistics of 0.80. Significant factors associated with stroke included atrial fibrillation (AF), hypertension (HT), antiplatelets, HDL, and age. AF, HT, and antihypertensive medication were common significant factors across most models, with AF being the strongest factor in LR, XGBoost, BN, and TAN models.</p><p><strong>Conclusions: </strong>Our study developed stroke prediction models to identify crucial predictive factors such as AF, HT, or systolic blood pressure or antihypertensive medication, anticoagulant medication, HDL, age, and statin use in high-risk patients. The explainable XGBoost was the best model in predicting stroke risk, followed by EBM.</p>\",\"PeriodicalId\":14706,\"journal\":{\"name\":\"JMIR Cardio\",\"volume\":\"7 \",\"pages\":\"e47736\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10413234/pdf/\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Cardio\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/47736\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Cardio","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/47736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 1

摘要

背景:卒中具有多种可改变和不可改变的危险因素,是全球死亡的主要原因。因此,了解中风危险因素之间复杂的相互作用不仅是科学上的必要,而且是改善全球健康状况的关键一步。目的:我们旨在通过比较可解释机器学习模型与传统统计方法,评估可解释机器学习模型在预测中风危险因素方面的性能。方法:该回顾性队列包括2010年1月至2020年12月期间泰国Ramathibodi医院的高危患者。我们比较了逻辑回归(LR)、Cox比例风险、贝叶斯网络(BN)、树增强Naïve贝叶斯(TAN)、极端梯度增强(XGBoost)和可解释增强机(EBM)模型的性能和可解释性。我们使用链式方程对缺失数据和需要的离散连续变量进行多次插值。采用c统计和f1评分对模型进行评价。结果:275247例高危患者中,9659例(3.5%)发生脑卒中。XGBoost的c统计量最高,为0.89,f1得分为0.80,其次是EBM和TAN, c统计量分别为0.87和0.83;LR和BN的c统计量相似,均为0.80。与卒中相关的重要因素包括房颤(AF)、高血压(HT)、抗血小板、高密度脂蛋白(HDL)和年龄。AF、HT和抗高血压药物是大多数模型中常见的显著因素,其中AF是LR、XGBoost、BN和TAN模型中最强的因素。结论:我们的研究建立了卒中预测模型,以确定高危患者的关键预测因素,如房颤、HT、收缩压或抗高血压药物、抗凝药物、HDL、年龄和他汀类药物的使用。可解释的XGBoost是预测中风风险的最佳模型,其次是EBM。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comparing Explainable Machine Learning Approaches With Traditional Statistical Methods for Evaluating Stroke Risk Models: Retrospective Cohort Study.

Background: Stroke has multiple modifiable and nonmodifiable risk factors and represents a leading cause of death globally. Understanding the complex interplay of stroke risk factors is thus not only a scientific necessity but a critical step toward improving global health outcomes.

Objective: We aim to assess the performance of explainable machine learning models in predicting stroke risk factors using real-world cohort data by comparing explainable machine learning models with conventional statistical methods.

Methods: This retrospective cohort included high-risk patients from Ramathibodi Hospital in Thailand between January 2010 and December 2020. We compared the performance and explainability of logistic regression (LR), Cox proportional hazard, Bayesian network (BN), tree-augmented Naïve Bayes (TAN), extreme gradient boosting (XGBoost), and explainable boosting machine (EBM) models. We used multiple imputation by chained equations for missing data and discretized continuous variables as needed. Models were evaluated using C-statistics and F1-scores.

Results: Out of 275,247 high-risk patients, 9659 (3.5%) experienced a stroke. XGBoost demonstrated the highest performance with a C-statistic of 0.89 and an F1-score of 0.80 followed by EBM and TAN with C-statistics of 0.87 and 0.83, respectively; LR and BN had similar C-statistics of 0.80. Significant factors associated with stroke included atrial fibrillation (AF), hypertension (HT), antiplatelets, HDL, and age. AF, HT, and antihypertensive medication were common significant factors across most models, with AF being the strongest factor in LR, XGBoost, BN, and TAN models.

Conclusions: Our study developed stroke prediction models to identify crucial predictive factors such as AF, HT, or systolic blood pressure or antihypertensive medication, anticoagulant medication, HDL, age, and statin use in high-risk patients. The explainable XGBoost was the best model in predicting stroke risk, followed by EBM.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
JMIR Cardio
JMIR Cardio Computer Science-Computer Science Applications
CiteScore
3.50
自引率
0.00%
发文量
25
审稿时长
12 weeks
期刊最新文献
Predicting Atrial Fibrillation Relapse Using Bayesian Networks: Explainable AI Approach. Wearable Electrocardiogram Technology: Help or Hindrance to the Modern Doctor? Technology Readiness Level and Self-Reported Health in Recipients of an Implantable Cardioverter Defibrillator: Cross-Sectional Study. A Medication Management App (Smart-Meds) for Patients After an Acute Coronary Syndrome: Pilot Pre-Post Mixed Methods Study. Causal Inference for Hypertension Prediction With Wearable E lectrocardiogram and P hotoplethysmogram Signals: Feasibility Study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1