不同机器学习算法在冠状动脉疾病预测中的比较

Imran Chowdhury Dipto, Tanzila Islam, H. Rahman, A. F. M. Moshiur Rahman
{"title":"不同机器学习算法在冠状动脉疾病预测中的比较","authors":"Imran Chowdhury Dipto, Tanzila Islam, H. Rahman, A. F. M. Moshiur Rahman","doi":"10.4236/jdaip.2020.82003","DOIUrl":null,"url":null,"abstract":"Coronary Artery Disease (CAD) is the leading cause of mortality worldwide. It is a complex heart disease that is associated with numerous risk factors and a variety of Symptoms. During the past decade, Coronary Artery Disease (CAD) has undergone a remarkable evolution. The purpose of this research is to build a prototype system using different Machine Learning Algorithms (models) and compare their performance to identify a suitable model. This paper explores three most commonly used Machine Learning Algorithms named as Logistic Regression, Support Vector Machine and Artificial Neural Network. To conduct this research, a clinical dataset has been used. To evaluate the performance, different evaluation methods have been used such as Confusion Matrix, Stratified K-fold Cross Validation, Accuracy, AUC and ROC. To validate the results, the accuracy and AUC scores have been validated using the K-Fold Cross-validation technique. The dataset contains class imbalance, so the SMOTE Algorithm has been used to balance the dataset and the performance analysis has been carried out on both sets of data. The results show that accuracy scores of all the models have been increased while training the balanced dataset. Overall, Artificial Neural Network has the highest accuracy whereas Logistic Regression has the least accurate among the trained Algorithms.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Comparison of Different Machine Learning Algorithms for the Prediction of Coronary Artery Disease\",\"authors\":\"Imran Chowdhury Dipto, Tanzila Islam, H. Rahman, A. F. M. Moshiur Rahman\",\"doi\":\"10.4236/jdaip.2020.82003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Coronary Artery Disease (CAD) is the leading cause of mortality worldwide. It is a complex heart disease that is associated with numerous risk factors and a variety of Symptoms. During the past decade, Coronary Artery Disease (CAD) has undergone a remarkable evolution. The purpose of this research is to build a prototype system using different Machine Learning Algorithms (models) and compare their performance to identify a suitable model. This paper explores three most commonly used Machine Learning Algorithms named as Logistic Regression, Support Vector Machine and Artificial Neural Network. To conduct this research, a clinical dataset has been used. To evaluate the performance, different evaluation methods have been used such as Confusion Matrix, Stratified K-fold Cross Validation, Accuracy, AUC and ROC. To validate the results, the accuracy and AUC scores have been validated using the K-Fold Cross-validation technique. The dataset contains class imbalance, so the SMOTE Algorithm has been used to balance the dataset and the performance analysis has been carried out on both sets of data. The results show that accuracy scores of all the models have been increased while training the balanced dataset. Overall, Artificial Neural Network has the highest accuracy whereas Logistic Regression has the least accurate among the trained Algorithms.\",\"PeriodicalId\":71434,\"journal\":{\"name\":\"数据分析和信息处理(英文)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"数据分析和信息处理(英文)\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.4236/jdaip.2020.82003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"数据分析和信息处理(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.4236/jdaip.2020.82003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

摘要

冠状动脉疾病(CAD)是世界范围内导致死亡的主要原因。它是一种复杂的心脏病,与许多危险因素和各种症状有关。在过去的十年中,冠状动脉疾病(CAD)经历了显著的发展。本研究的目的是使用不同的机器学习算法(模型)构建一个原型系统,并比较它们的性能以确定合适的模型。本文探讨了三种最常用的机器学习算法:逻辑回归、支持向量机和人工神经网络。为了进行这项研究,我们使用了一个临床数据集。为了评估其性能,使用了不同的评估方法,如混淆矩阵、分层K-fold交叉验证、准确性、AUC和ROC。为了验证结果,使用K-Fold交叉验证技术验证了准确性和AUC分数。由于数据集存在类不平衡,因此采用SMOTE算法对数据集进行平衡,并对两组数据进行性能分析。结果表明,在平衡数据集的训练过程中,所有模型的准确率分数都有所提高。总的来说,人工神经网络的准确率最高,而逻辑回归的准确率最低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comparison of Different Machine Learning Algorithms for the Prediction of Coronary Artery Disease
Coronary Artery Disease (CAD) is the leading cause of mortality worldwide. It is a complex heart disease that is associated with numerous risk factors and a variety of Symptoms. During the past decade, Coronary Artery Disease (CAD) has undergone a remarkable evolution. The purpose of this research is to build a prototype system using different Machine Learning Algorithms (models) and compare their performance to identify a suitable model. This paper explores three most commonly used Machine Learning Algorithms named as Logistic Regression, Support Vector Machine and Artificial Neural Network. To conduct this research, a clinical dataset has been used. To evaluate the performance, different evaluation methods have been used such as Confusion Matrix, Stratified K-fold Cross Validation, Accuracy, AUC and ROC. To validate the results, the accuracy and AUC scores have been validated using the K-Fold Cross-validation technique. The dataset contains class imbalance, so the SMOTE Algorithm has been used to balance the dataset and the performance analysis has been carried out on both sets of data. The results show that accuracy scores of all the models have been increased while training the balanced dataset. Overall, Artificial Neural Network has the highest accuracy whereas Logistic Regression has the least accurate among the trained Algorithms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
91
期刊最新文献
A Hybrid Neural Network Model Based on Transfer Learning for Forecasting Forex Market Enhancing Police Officers’ Cybercrime Investigation Skills Using a Checklist Tool A Sufficient Statistical Test for Dynamic Stability Lung Cancer Prediction from Elvira Biomedical Dataset Using Ensemble Classifier with Principal Component Analysis Modelling Key Population Attrition in the HIV and AIDS Programme in Kenya Using Random Survival Forests with Synthetic Minority Oversampling Technique-Nominal Continuous
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1