Windy Junita Sari, Nasya Amirah Melyani, Fadlan Arrazak, Muhammad Asyraf Bin Anahar, Ezza Addini, Zaid Husham Al-Sawaff, Selvakumar Manickam
{"title":"Performance Comparison of Random Forest, Support Vector Machine and Neural Network in Health Classification of Stroke Patients","authors":"Windy Junita Sari, Nasya Amirah Melyani, Fadlan Arrazak, Muhammad Asyraf Bin Anahar, Ezza Addini, Zaid Husham Al-Sawaff, Selvakumar Manickam","doi":"10.57152/predatecs.v2i1.1119","DOIUrl":null,"url":null,"abstract":"Stroke is the second most common cause of death globally, making up about 11% of all deaths from health-related deaths each year, the condition varies from mild to severe, with the potential for permanent or temporary damage, caused by non-traumatic cerebral circulatory disorders. This research began with data understanding through the acquisition of a stroke patient health dataset from Kaggle, consisting of 5110 records. The pre-processing stage involved transforming the data to optimize processing, converting numeric attributes to nominal, and preparing training and test data. The focus then shifted to stroke disease classification using Random Forest, Support Vector Machines, and Neural Networks algorithms. Data processing results from the Kaggle dataset showed high performance, with Random Forest achieving 98.58% accuracy, SVM 94.11%, and Neural Network 95.72%. Although SVM has the highest recall (99.41%), while Random Forest and ANN have high but slightly lower recall rates, 98.58% and 95.72% respectively. Model selection depends on the needs of the application, either focusing on precision, recall, or a balance of both. This research contributes to further understanding of stroke diagnosis and introduces new potential for classifying the disease.","PeriodicalId":516904,"journal":{"name":"Public Research Journal of Engineering, Data Technology and Computer Science","volume":"122 43","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Public Research Journal of Engineering, Data Technology and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.57152/predatecs.v2i1.1119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Stroke is the second most common cause of death globally, making up about 11% of all deaths from health-related deaths each year, the condition varies from mild to severe, with the potential for permanent or temporary damage, caused by non-traumatic cerebral circulatory disorders. This research began with data understanding through the acquisition of a stroke patient health dataset from Kaggle, consisting of 5110 records. The pre-processing stage involved transforming the data to optimize processing, converting numeric attributes to nominal, and preparing training and test data. The focus then shifted to stroke disease classification using Random Forest, Support Vector Machines, and Neural Networks algorithms. Data processing results from the Kaggle dataset showed high performance, with Random Forest achieving 98.58% accuracy, SVM 94.11%, and Neural Network 95.72%. Although SVM has the highest recall (99.41%), while Random Forest and ANN have high but slightly lower recall rates, 98.58% and 95.72% respectively. Model selection depends on the needs of the application, either focusing on precision, recall, or a balance of both. This research contributes to further understanding of stroke diagnosis and introduces new potential for classifying the disease.