Analysis of Accuracy Metric of Machine Learning Algorithms in Predicting Heart Disease

Sajad Yousefi, Maryam Poornajaf
{"title":"Analysis of Accuracy Metric of Machine Learning Algorithms in Predicting Heart Disease","authors":"Sajad Yousefi, Maryam Poornajaf","doi":"10.30699/fhi.v12i0.402","DOIUrl":null,"url":null,"abstract":"Introduction: Heart disease is, for the most part, alluding to conditions that include limited or blocked veins that can prompt a heart attack, chest torment or stroke. Earlier identification of heart disease may reduce the death rate. The cost of medical diagnosis makes it perverse to cure it for the large amount of people early. Using machine learning models performed on dataset. This article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several supervised machine learning algorithms were utilized to diagnosis and prediction of heart disease such as logistic regression, decision tree, random forest and KNN. The algorithms are applied to a dataset taken from the Kaggle site including 70000 samples.  In algorithms, methods such as the importance of features, hold out validation, 10-fold cross-validation, stratified 10-fold cross-validation, leave one out cross-validation are the result of effective performance and increase accuracy. In addition, feature importance scores was estimated for each feature in some algorithms. These features were ranked based on feature importance score. All the work is done in the Anaconda environment based on python programming language and Scikit-learn library.Results: The algorithms performance is compared to each other so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, random forest algorithm with F1 score 92%, accuracy 92% and AUC ROC 95%, has better performance than other algorithms.Conclusion: The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the diagnosis and prediction of heart disease is compared to determine the most appropriate classifier.","PeriodicalId":154611,"journal":{"name":"Frontiers in Health Informatics","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30699/fhi.v12i0.402","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Heart disease is, for the most part, alluding to conditions that include limited or blocked veins that can prompt a heart attack, chest torment or stroke. Earlier identification of heart disease may reduce the death rate. The cost of medical diagnosis makes it perverse to cure it for the large amount of people early. Using machine learning models performed on dataset. This article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several supervised machine learning algorithms were utilized to diagnosis and prediction of heart disease such as logistic regression, decision tree, random forest and KNN. The algorithms are applied to a dataset taken from the Kaggle site including 70000 samples.  In algorithms, methods such as the importance of features, hold out validation, 10-fold cross-validation, stratified 10-fold cross-validation, leave one out cross-validation are the result of effective performance and increase accuracy. In addition, feature importance scores was estimated for each feature in some algorithms. These features were ranked based on feature importance score. All the work is done in the Anaconda environment based on python programming language and Scikit-learn library.Results: The algorithms performance is compared to each other so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, random forest algorithm with F1 score 92%, accuracy 92% and AUC ROC 95%, has better performance than other algorithms.Conclusion: The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the diagnosis and prediction of heart disease is compared to determine the most appropriate classifier.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
机器学习算法在心脏病预测中的精度度量分析
简介:心脏病,在很大程度上,暗指包括静脉受限或阻塞在内的疾病,这些疾病会引发心脏病发作、胸部疼痛或中风。及早发现心脏病可以降低死亡率。医疗诊断的费用使得对大量患者进行早期治疗是不合理的。使用机器学习模型在数据集上执行。本文旨在寻找最有效和准确的疾病预测机器学习模型。材料与方法:利用逻辑回归、决策树、随机森林、KNN等几种监督式机器学习算法进行心脏病的诊断与预测。这些算法被应用于从Kaggle网站获取的包括70,000个样本的数据集。在算法中,特征的重要性、hold out验证、10倍交叉验证、分层10倍交叉验证、leave out交叉验证等方法是有效性能和提高准确性的结果。此外,在一些算法中,对每个特征进行了特征重要性评分。这些特征根据特征的重要性评分进行排名。所有工作都是在基于python编程语言和Scikit-learn库的Anaconda环境中完成的。结果:将各算法性能进行比较,并根据ROC曲线及准确度、精密度、灵敏度、F1评分等标准对各模型进行性能评价。评价结果表明,随机森林算法F1得分为92%,准确率为92%,AUC ROC为95%,性能优于其他算法。结论:ROC曲线下面积和评价标准涉及到许多机器学习的分类算法来评估心脏病,确实,对心脏病的诊断和预测进行了比较,以确定最合适的分类器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.20
自引率
0.00%
发文量
0
期刊最新文献
Self-Care Application for Rheumatoid Arthritis: Identifying Key Data Elements Effective use of electronic health records system for healthcare delivery in Ghana Predictive Modeling of COVID-19 Hospitalization Using Twenty Machine Learning Classification Algorithms on Cohort Data Development and Usability Evaluation of a Web-Based Health Information Technology Dashboard of Quality and Economic Indicators Potentially Highly Effective Drugs for COVID-19: Virtual Screening and Molecular Docking Study Through PyRx-Vina Approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1