Performance Comparison of Random Forest, Support Vector Machine and Neural Network in Health Classification of Stroke Patients

Public Research Journal of Engineering, Data Technology and Computer Science Pub Date : 2024-04-21 DOI:10.57152/predatecs.v2i1.1119

Windy Junita Sari, Nasya Amirah Melyani, Fadlan Arrazak, Muhammad Asyraf Bin Anahar, Ezza Addini, Zaid Husham Al-Sawaff, Selvakumar Manickam

{"title":"Performance Comparison of Random Forest, Support Vector Machine and Neural Network in Health Classification of Stroke Patients","authors":"Windy Junita Sari, Nasya Amirah Melyani, Fadlan Arrazak, Muhammad Asyraf Bin Anahar, Ezza Addini, Zaid Husham Al-Sawaff, Selvakumar Manickam","doi":"10.57152/predatecs.v2i1.1119","DOIUrl":null,"url":null,"abstract":"Stroke is the second most common cause of death globally, making up about 11% of all deaths from health-related deaths each year, the condition varies from mild to severe, with the potential for permanent or temporary damage, caused by non-traumatic cerebral circulatory disorders. This research began with data understanding through the acquisition of a stroke patient health dataset from Kaggle, consisting of 5110 records. The pre-processing stage involved transforming the data to optimize processing, converting numeric attributes to nominal, and preparing training and test data. The focus then shifted to stroke disease classification using Random Forest, Support Vector Machines, and Neural Networks algorithms. Data processing results from the Kaggle dataset showed high performance, with Random Forest achieving 98.58% accuracy, SVM 94.11%, and Neural Network 95.72%. Although SVM has the highest recall (99.41%), while Random Forest and ANN have high but slightly lower recall rates, 98.58% and 95.72% respectively. Model selection depends on the needs of the application, either focusing on precision, recall, or a balance of both. This research contributes to further understanding of stroke diagnosis and introduces new potential for classifying the disease.","PeriodicalId":516904,"journal":{"name":"Public Research Journal of Engineering, Data Technology and Computer Science","volume":"122 43","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Public Research Journal of Engineering, Data Technology and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.57152/predatecs.v2i1.1119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Stroke is the second most common cause of death globally, making up about 11% of all deaths from health-related deaths each year, the condition varies from mild to severe, with the potential for permanent or temporary damage, caused by non-traumatic cerebral circulatory disorders. This research began with data understanding through the acquisition of a stroke patient health dataset from Kaggle, consisting of 5110 records. The pre-processing stage involved transforming the data to optimize processing, converting numeric attributes to nominal, and preparing training and test data. The focus then shifted to stroke disease classification using Random Forest, Support Vector Machines, and Neural Networks algorithms. Data processing results from the Kaggle dataset showed high performance, with Random Forest achieving 98.58% accuracy, SVM 94.11%, and Neural Network 95.72%. Although SVM has the highest recall (99.41%), while Random Forest and ANN have high but slightly lower recall rates, 98.58% and 95.72% respectively. Model selection depends on the needs of the application, either focusing on precision, recall, or a balance of both. This research contributes to further understanding of stroke diagnosis and introduces new potential for classifying the disease.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

随机森林、支持向量机和神经网络在中风患者健康分类中的性能比较

中风是全球第二大常见死因，每年约占健康相关死亡总数的 11%，病情从轻到重不等，可能造成永久性或暂时性损害，由非创伤性脑循环障碍引起。这项研究首先通过从 Kaggle 获取由 5110 条记录组成的中风患者健康数据集来了解数据。预处理阶段包括转换数据以优化处理、将数字属性转换为名义属性以及准备训练和测试数据。然后，重点转向使用随机森林、支持向量机和神经网络算法进行中风疾病分类。来自 Kaggle 数据集的数据处理结果显示了很高的性能，随机森林的准确率达到 98.58%，支持向量机达到 94.11%，神经网络达到 95.72%。虽然 SVM 的召回率最高（99.41%），但随机森林和 ANN 的召回率也很高，分别为 98.58% 和 95.72%，但略低于 SVM。模型的选择取决于应用的需要，既可以注重精确度，也可以注重召回率，或者两者兼顾。这项研究有助于进一步了解中风诊断，并为疾病分类引入了新的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Public Research Journal of Engineering, Data Technology and Computer Science

自引率

0.00%

发文量