文献互助智能选刊最新文献

高级搜索发布求助登录注册

Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach.

IF 2.1 Q3 MEDICAL INFORMATICS Healthcare Informatics Research Pub Date : 2023-07-01 DOI:10.4258/hir.2023.29.3.228

Eka Miranda, Suko Adiarto, Faqir M Bhatti, Alfi Yusrotis Zakiyyah, Mediana Aryuni, Charles Bernando

{"title":"Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach.","authors":"Eka Miranda, Suko Adiarto, Faqir M Bhatti, Alfi Yusrotis Zakiyyah, Mediana Aryuni, Charles Bernando","doi":"10.4258/hir.2023.29.3.228","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>The number of deaths from cardiovascular disease is projected to reach 23.3 million by 2030. As a contribution to preventing this phenomenon, this paper proposed a machine learning (ML) model to predict patients with arteriosclerotic heart disease (AHD). We also interpreted the prediction model results based on the ML approach and deployed modelagnostic ML methods to identify informative features and their interpretations.</p><p><strong>Methods: </strong>We used a hematology Electronic Health Record (EHR) with information on erythrocytes, hematocrit, hemoglobin, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, leukocytes, thrombocytes, age, and sex. To detect and predict AHD, we explored random forest (RF), XGBoost, and AdaBoost models. We examined the prediction model results based on the confusion matrix and accuracy measures. We used the Shapley Additive exPlanations (SHAP) framework to interpret the ML model and quantify the contribution of features to predictions.</p><p><strong>Results: </strong>Our study included data from 6,837 patients, with 4,702 records from patients diagnosed with AHD and 2,135 records from patients without an AHD diagnosis. AdaBoost outperformed RF and XGBoost, achieving an accuracy of 0.78, precision of 0.82, F1-score of 0.85, and recall of 0.88. According to the SHAP summary bar plot method, hemoglobin was the most important attribute for detecting and predicting AHD patients. The SHAP local interpretability bar plot revealed that hemoglobin and mean corpuscular hemoglobin concentration had positive impacts on AHD prediction based on a single observation.</p><p><strong>Conclusions: </strong>ML models based on real clinical data can be used to predict AHD.</p>","PeriodicalId":12947,"journal":{"name":"Healthcare Informatics Research","volume":"29 3","pages":"228-238"},"PeriodicalIF":2.1000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/e7/3a/hir-2023-29-3-228.PMC10440196.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare Informatics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4258/hir.2023.29.3.228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 1

Abstract

Objectives: The number of deaths from cardiovascular disease is projected to reach 23.3 million by 2030. As a contribution to preventing this phenomenon, this paper proposed a machine learning (ML) model to predict patients with arteriosclerotic heart disease (AHD). We also interpreted the prediction model results based on the ML approach and deployed modelagnostic ML methods to identify informative features and their interpretations.

Methods: We used a hematology Electronic Health Record (EHR) with information on erythrocytes, hematocrit, hemoglobin, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, leukocytes, thrombocytes, age, and sex. To detect and predict AHD, we explored random forest (RF), XGBoost, and AdaBoost models. We examined the prediction model results based on the confusion matrix and accuracy measures. We used the Shapley Additive exPlanations (SHAP) framework to interpret the ML model and quantify the contribution of features to predictions.

Results: Our study included data from 6,837 patients, with 4,702 records from patients diagnosed with AHD and 2,135 records from patients without an AHD diagnosis. AdaBoost outperformed RF and XGBoost, achieving an accuracy of 0.78, precision of 0.82, F1-score of 0.85, and recall of 0.88. According to the SHAP summary bar plot method, hemoglobin was the most important attribute for detecting and predicting AHD patients. The SHAP local interpretability bar plot revealed that hemoglobin and mean corpuscular hemoglobin concentration had positive impacts on AHD prediction based on a single observation.

Conclusions: ML models based on real clinical data can be used to predict AHD.

Abstract Image

Abstract Image

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用电子健康记录了解动脉硬化性心脏病患者:机器学习和Shapley加性解释方法。

目标:到2030年，心血管疾病死亡人数预计将达到2 330万。为了预防这种现象，本文提出了一种机器学习(ML)模型来预测动脉硬化性心脏病(AHD)患者。我们还基于机器学习方法解释了预测模型结果，并部署了与模型无关的机器学习方法来识别信息特征及其解释。方法:我们使用血液学电子健康记录(EHR)，其中包含红细胞、红细胞压积、血红蛋白、平均红细胞血红蛋白、平均红细胞血红蛋白浓度、白细胞、血小板、年龄和性别等信息。为了检测和预测AHD，我们探索了随机森林(RF)、XGBoost和AdaBoost模型。我们检验了基于混淆矩阵和精度度量的预测模型结果。我们使用Shapley加性解释(SHAP)框架来解释ML模型，并量化特征对预测的贡献。结果:我们的研究纳入了6837例患者的数据，其中4702例来自诊断为AHD的患者，2135例来自未诊断为AHD的患者。AdaBoost优于RF和XGBoost，准确度为0.78，精密度为0.82,f1得分为0.85，召回率为0.88。根据SHAP汇总条形图方法，血红蛋白是检测和预测AHD患者最重要的属性。SHAP局部可解释性条形图显示，血红蛋白和平均红细胞血红蛋白浓度对单次观察的AHD预测有积极影响。结论:基于真实临床数据的ML模型可用于预测AHD。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Healthcare Informatics Research

Healthcare Informatics Research MEDICAL INFORMATICS-

CiteScore

4.90

自引率

6.90%

发文量

44

期刊最新文献

Educational Needs and Level of Knowledge in Standard Healthcare Terminology Use in Korea: A Cross-Sectional Survey. Machine Learning for Predicting Coronary Heart Disease Risk in Patients with Hypertension: An Ensemble Modeling Approach. Engaging with Facebook Health Support Groups among Australian Culturally and Linguistically Diverse Populations. Development and Evaluation of BABAT TB: A Smart System-Based Reminder Box for Enhancing Tuberculosis Medication Adherence. Review of the 2025 Fall Conference of the Korean Society of Medical Informatics: Generative AI in Healthcare Systems-From Insight to Impact.

0

微信

客服QQ

Book学术公众号

扫码关注我们

反馈

Book学术官方微信

Book学术文献互助

Book学术文献互助群
群号：604180095

文献互助智能选刊最新文献互助须知联系我们：info@booksci.cn

Book学术提供免费学术资源搜索服务，方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。

Copyright © 2023 Book学术 All rights reserved.

京公网安备 11010802042870号京ICP备2023020795号-1