Prediction and Analysis of Heart Disease Using Machine Learning

Yu Lin
{"title":"Prediction and Analysis of Heart Disease Using Machine Learning","authors":"Yu Lin","doi":"10.1109/RAAI52226.2021.9507928","DOIUrl":null,"url":null,"abstract":"Heart disease is one of the most significant causes of global mortality since its intricacy and the rate of misdiagnosis have brought a great challenge to medical workers. As machine learning has shown robust efficacy in decision-making and predictions, it is essential to construct a machine learning model to assist with heart disease diagnosis. In this paper, a heart-disease dataset from Cleveland was analyzed and preprocessed including cleaning, one-hot encoding, and standardization, and preliminary findings of relevant features to heart disease were discovered, such as male. In the process of model training, six machine learning algorithms (Logistic Regression, K-nearest Neighbors, Adaboost, CART, Random Forest, XGBoost) were applied, and the Random Forest was determined to be the optimal model after hyperparameter tuning and cross-validation as it outperformed other models with superior scores of accuracy (0.848), f1 (0.829), PRC-AUC (0.909), and ROC-AUC (0.917). In addition, the most relevant features, including reversible defect of thallium stress test, high ST depression caused by exercise relative to rest, and asymptomatic chest pain, etc., were unearthed by plotting permutation feature importance and partial importance plots of the Random Forest classifier.","PeriodicalId":293290,"journal":{"name":"2021 IEEE International Conference on Robotics, Automation and Artificial Intelligence (RAAI)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Robotics, Automation and Artificial Intelligence (RAAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAAI52226.2021.9507928","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Heart disease is one of the most significant causes of global mortality since its intricacy and the rate of misdiagnosis have brought a great challenge to medical workers. As machine learning has shown robust efficacy in decision-making and predictions, it is essential to construct a machine learning model to assist with heart disease diagnosis. In this paper, a heart-disease dataset from Cleveland was analyzed and preprocessed including cleaning, one-hot encoding, and standardization, and preliminary findings of relevant features to heart disease were discovered, such as male. In the process of model training, six machine learning algorithms (Logistic Regression, K-nearest Neighbors, Adaboost, CART, Random Forest, XGBoost) were applied, and the Random Forest was determined to be the optimal model after hyperparameter tuning and cross-validation as it outperformed other models with superior scores of accuracy (0.848), f1 (0.829), PRC-AUC (0.909), and ROC-AUC (0.917). In addition, the most relevant features, including reversible defect of thallium stress test, high ST depression caused by exercise relative to rest, and asymptomatic chest pain, etc., were unearthed by plotting permutation feature importance and partial importance plots of the Random Forest classifier.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用机器学习预测和分析心脏病
心脏病是全球最重要的死亡原因之一,其复杂性和误诊率给医务工作者带来了巨大的挑战。由于机器学习在决策和预测方面显示出强大的功效,因此构建机器学习模型以辅助心脏病诊断至关重要。本文对来自Cleveland的心脏病数据集进行了分析和预处理,包括清洗、one-hot编码和标准化,并初步发现了与心脏病相关的特征,如男性。在模型训练过程中,采用了Logistic Regression、K-nearest Neighbors、Adaboost、CART、Random Forest、XGBoost等6种机器学习算法,经过超参数调整和交叉验证,随机森林模型的准确率(0.848)、f1(0.829)、PRC-AUC(0.909)、ROC-AUC(0.917)得分均优于其他模型,被确定为最优模型。此外,通过绘制随机森林分类器的排列特征重要度和部分重要度图,挖掘出最相关的特征,包括铊应激测试的可逆性缺陷、运动引起的相对于休息的高ST洼地、无症状胸痛等。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Marine Boundary Guard (Jellyfish-Scallop-Flying Fish) Robot Based on Cloud-Sea Computing in 5G OGCE Robotic Applications in Generation Plants Gesture Recognition Based on sEMG and Support Vector Machine The Research on Applying Artificial Intelligence Technology to Virtual YouTuber Baseball Swing Pose Estimation Using OpenPose
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1