Performance Comparison of Random Forest, Support Vector Machine and Neural Network in Health Classification of Stroke Patients

Windy Junita Sari, Nasya Amirah Melyani, Fadlan Arrazak, Muhammad Asyraf Bin Anahar, Ezza Addini, Zaid Husham Al-Sawaff, Selvakumar Manickam
{"title":"Performance Comparison of Random Forest, Support Vector Machine and Neural Network in Health Classification of Stroke Patients","authors":"Windy Junita Sari, Nasya Amirah Melyani, Fadlan Arrazak, Muhammad Asyraf Bin Anahar, Ezza Addini, Zaid Husham Al-Sawaff, Selvakumar Manickam","doi":"10.57152/predatecs.v2i1.1119","DOIUrl":null,"url":null,"abstract":"Stroke is the second most common cause of death globally, making up about 11% of all deaths from health-related deaths each year, the condition varies from mild to severe, with the potential for permanent or temporary damage, caused by non-traumatic cerebral circulatory disorders. This research began with data understanding through the acquisition of a stroke patient health dataset from Kaggle, consisting of 5110 records. The pre-processing stage involved transforming the data to optimize processing, converting numeric attributes to nominal, and preparing training and test data. The focus then shifted to stroke disease classification using Random Forest, Support Vector Machines, and Neural Networks algorithms. Data processing results from the Kaggle dataset showed high performance, with Random Forest achieving 98.58% accuracy, SVM 94.11%, and Neural Network 95.72%. Although SVM has the highest recall (99.41%), while Random Forest and ANN have high but slightly lower recall rates, 98.58% and 95.72% respectively. Model selection depends on the needs of the application, either focusing on precision, recall, or a balance of both. This research contributes to further understanding of stroke diagnosis and introduces new potential for classifying the disease.","PeriodicalId":516904,"journal":{"name":"Public Research Journal of Engineering, Data Technology and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Public Research Journal of Engineering, Data Technology and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.57152/predatecs.v2i1.1119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Stroke is the second most common cause of death globally, making up about 11% of all deaths from health-related deaths each year, the condition varies from mild to severe, with the potential for permanent or temporary damage, caused by non-traumatic cerebral circulatory disorders. This research began with data understanding through the acquisition of a stroke patient health dataset from Kaggle, consisting of 5110 records. The pre-processing stage involved transforming the data to optimize processing, converting numeric attributes to nominal, and preparing training and test data. The focus then shifted to stroke disease classification using Random Forest, Support Vector Machines, and Neural Networks algorithms. Data processing results from the Kaggle dataset showed high performance, with Random Forest achieving 98.58% accuracy, SVM 94.11%, and Neural Network 95.72%. Although SVM has the highest recall (99.41%), while Random Forest and ANN have high but slightly lower recall rates, 98.58% and 95.72% respectively. Model selection depends on the needs of the application, either focusing on precision, recall, or a balance of both. This research contributes to further understanding of stroke diagnosis and introduces new potential for classifying the disease.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
随机森林、支持向量机和神经网络在中风患者健康分类中的性能比较
中风是全球第二大常见死因,每年约占健康相关死亡总数的 11%,病情从轻到重不等,可能造成永久性或暂时性损害,由非创伤性脑循环障碍引起。这项研究首先通过从 Kaggle 获取由 5110 条记录组成的中风患者健康数据集来了解数据。预处理阶段包括转换数据以优化处理、将数字属性转换为名义属性以及准备训练和测试数据。然后,重点转向使用随机森林、支持向量机和神经网络算法进行中风疾病分类。来自 Kaggle 数据集的数据处理结果显示了很高的性能,随机森林的准确率达到 98.58%,支持向量机达到 94.11%,神经网络达到 95.72%。虽然 SVM 的召回率最高(99.41%),但随机森林和 ANN 的召回率也很高,分别为 98.58% 和 95.72%,但略低于 SVM。模型的选择取决于应用的需要,既可以注重精确度,也可以注重召回率,或者两者兼顾。这项研究有助于进一步了解中风诊断,并为疾病分类引入了新的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Application of Recurrent Neural Network Bi-Long Short-Term Memory, Gated Recurrent Unit and Bi-Gated Recurrent Unit for Forecasting Rupiah Against Dollar (USD) Exchange Rate Application of The Fuzzy Mamdani Method in Determining KIP-Kuliah Recipients for New Students Classifications of Offline Shopping Trends and Patterns with Machine Learning Algorithms Classification of Diabetes Mellitus Sufferers Eating Patterns Using K-Nearest Neighbors, Naïve Bayes and Decission Tree Evaluation of the Effectiveness of Neural Network Models for Analyzing Customer Review Sentiments on Marketplace
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1