XGBoost，一种新的可解释的人工智能技术，在心肌梗死的预测:英国生物银行队列研究。

IF 2.3 Q2 CARDIAC & CARDIOVASCULAR SYSTEMS Clinical Medicine Insights. Cardiology Pub Date : 2022-01-01 DOI:10.1177/11795468221133611

Alexander Moore, Max Bell

{"title":"XGBoost，一种新的可解释的人工智能技术，在心肌梗死的预测:英国生物银行队列研究。","authors":"Alexander Moore, Max Bell","doi":"10.1177/11795468221133611","DOIUrl":null,"url":null,"abstract":"We wanted to assess if \"Explainable AI\" in the form of extreme gradient boosting (XGBoost) could outperform traditional logistic regression in predicting myocardial infarction (MI) in a large cohort. Two machine learning methods, XGBoost and logistic regression, were compared in predicting risk of MI. The UK Biobank is a population-based prospective cohort including 502 506 volunteers with active consent, aged 40 to 69 years at recruitment from 2006 to 2010. These subjects were followed until end of 2019 and the primary outcome was myocardial infarction. Both models were trained using 90% of the cohort. The remaining 10% was used as a test set. Both models were equally precise, but the regression model classified more of the healthy class correctly. XGBoost was more accurate in identifying individuals who later suffered a myocardial infarction. Receiver operator characteristic (ROC) scores are class size invariant. In this metric XGBoost outperformed the logistic regression model, with ROC scores of 0.86 (accuracy 0.75 (CI ±0.00379) and 0.77 (accuracy 0.77 (CI ± 0.00369) respectively. Secondly, we demonstrate how SHAPley values can be used to visualize and interpret the predictions made by XGBoost models, both for the cohort test set and for individuals. The XGBoost machine learning model shows very promising results in evaluating risk of MI in a large and diverse population. This model can be used, and visualized, both for individual assessments and in larger cohorts. The predictions made by the XGBoost models, points toward a future where \"Explainable AI\" may help to bridge the gap between medicine and data science.","PeriodicalId":10419,"journal":{"name":"Clinical Medicine Insights. Cardiology","volume":"16 ","pages":"11795468221133611"},"PeriodicalIF":2.3000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9647306/pdf/","citationCount":"13","resultStr":"{\"title\":\"XGBoost, A Novel Explainable AI Technique, in the Prediction of Myocardial Infarction: A UK Biobank Cohort Study.\",\"authors\":\"Alexander Moore, Max Bell\",\"doi\":\"10.1177/11795468221133611\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We wanted to assess if \\\"Explainable AI\\\" in the form of extreme gradient boosting (XGBoost) could outperform traditional logistic regression in predicting myocardial infarction (MI) in a large cohort. Two machine learning methods, XGBoost and logistic regression, were compared in predicting risk of MI. The UK Biobank is a population-based prospective cohort including 502 506 volunteers with active consent, aged 40 to 69 years at recruitment from 2006 to 2010. These subjects were followed until end of 2019 and the primary outcome was myocardial infarction. Both models were trained using 90% of the cohort. The remaining 10% was used as a test set. Both models were equally precise, but the regression model classified more of the healthy class correctly. XGBoost was more accurate in identifying individuals who later suffered a myocardial infarction. Receiver operator characteristic (ROC) scores are class size invariant. In this metric XGBoost outperformed the logistic regression model, with ROC scores of 0.86 (accuracy 0.75 (CI ±0.00379) and 0.77 (accuracy 0.77 (CI ± 0.00369) respectively. Secondly, we demonstrate how SHAPley values can be used to visualize and interpret the predictions made by XGBoost models, both for the cohort test set and for individuals. The XGBoost machine learning model shows very promising results in evaluating risk of MI in a large and diverse population. This model can be used, and visualized, both for individual assessments and in larger cohorts. The predictions made by the XGBoost models, points toward a future where \\\"Explainable AI\\\" may help to bridge the gap between medicine and data science.\",\"PeriodicalId\":10419,\"journal\":{\"name\":\"Clinical Medicine Insights. Cardiology\",\"volume\":\"16 \",\"pages\":\"11795468221133611\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9647306/pdf/\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Medicine Insights. Cardiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/11795468221133611\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CARDIAC & CARDIOVASCULAR SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Medicine Insights. Cardiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/11795468221133611","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}

引用次数: 13

摘要

我们想评估极端梯度增强(XGBoost)形式的“可解释人工智能”在预测大型队列中的心肌梗死(MI)方面是否优于传统的逻辑回归。两种机器学习方法，XGBoost和logistic回归，在预测心肌梗死风险方面进行了比较。UK Biobank是一项基于人群的前瞻性队列研究，包括50256名自愿志愿者，年龄在40至69岁之间，招募时间为2006年至2010年。这些受试者被跟踪到2019年底，主要结局是心肌梗死。两个模型都使用90%的队列进行训练。剩余的10%作为测试集。两种模型都同样精确，但回归模型对健康人群的分类更准确。XGBoost在识别后来患有心肌梗死的个体方面更准确。接收算子特征(ROC)分数是类大小不变的。在该指标中，XGBoost优于logistic回归模型，ROC评分分别为0.86(准确率0.75 (CI±0.00379))和0.77(准确率0.77 (CI±0.00369))。其次，我们展示了SHAPley值如何用于可视化和解释XGBoost模型对队列测试集和个体的预测。XGBoost机器学习模型在评估大量不同人群的心肌梗死风险方面显示出非常有希望的结果。这个模型既可以用于个人评估，也可以用于更大的群体。XGBoost模型的预测指出，未来“可解释的人工智能”可能有助于弥合医学和数据科学之间的差距。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

XGBoost, A Novel Explainable AI Technique, in the Prediction of Myocardial Infarction: A UK Biobank Cohort Study.

We wanted to assess if "Explainable AI" in the form of extreme gradient boosting (XGBoost) could outperform traditional logistic regression in predicting myocardial infarction (MI) in a large cohort. Two machine learning methods, XGBoost and logistic regression, were compared in predicting risk of MI. The UK Biobank is a population-based prospective cohort including 502 506 volunteers with active consent, aged 40 to 69 years at recruitment from 2006 to 2010. These subjects were followed until end of 2019 and the primary outcome was myocardial infarction. Both models were trained using 90% of the cohort. The remaining 10% was used as a test set. Both models were equally precise, but the regression model classified more of the healthy class correctly. XGBoost was more accurate in identifying individuals who later suffered a myocardial infarction. Receiver operator characteristic (ROC) scores are class size invariant. In this metric XGBoost outperformed the logistic regression model, with ROC scores of 0.86 (accuracy 0.75 (CI ±0.00379) and 0.77 (accuracy 0.77 (CI ± 0.00369) respectively. Secondly, we demonstrate how SHAPley values can be used to visualize and interpret the predictions made by XGBoost models, both for the cohort test set and for individuals. The XGBoost machine learning model shows very promising results in evaluating risk of MI in a large and diverse population. This model can be used, and visualized, both for individual assessments and in larger cohorts. The predictions made by the XGBoost models, points toward a future where "Explainable AI" may help to bridge the gap between medicine and data science.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊