Sujan Ray, Khaldoon Alshouiliy, A. Roy, Ali AlGhamdi, D. Agrawal
{"title":"基于Chi-Squared的AzureML脑卒中预测特征选择","authors":"Sujan Ray, Khaldoon Alshouiliy, A. Roy, Ali AlGhamdi, D. Agrawal","doi":"10.1109/IETC47856.2020.9249117","DOIUrl":null,"url":null,"abstract":"In the United States, stroke is the fifth prominent cause of fatality and it is a major reason of serious disability among the adult population [1]. Therefore, it is crucial that we can predict stroke accurately in order to be treated in early stages. Nowadays, use of Machine Learning (ML) algorithms have been in great demand to predict patient's condition in advance and inform the medical staff to avoid the risk of disease progression. Kaggle Healthcare dataset has been widely used by many researchers in this area for developing models for stroke prediction. The dataset has 43,400 instances and 10 features. This paper proposes a method for the analysis and prediction of stroke on the same dataset using Microsoft Azure Machine Learning (AzureML) which is a cloud-based platform. We have applied Chi-Squared test on the dataset for extracting the top features. The experiments are run on AzureML with the top 6 features as well as with all the features. In addition, we compare accuracy between the two models trained by the top 6 features and all the features. The performance of Two-class Decision Jungle with top 6 features has been set as the benchmark in our work. Two-Class Boosted Decision Tree, an ensemble learning method achieves 96.8% accuracy using the top 6 features. Our experimental results show that with the right features, we could improve the accuracy significantly for the stroke prediction, and it also takes less time to train the model.","PeriodicalId":186446,"journal":{"name":"2020 Intermountain Engineering, Technology and Computing (IETC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Chi-Squared Based Feature Selection for Stroke Prediction using AzureML\",\"authors\":\"Sujan Ray, Khaldoon Alshouiliy, A. Roy, Ali AlGhamdi, D. Agrawal\",\"doi\":\"10.1109/IETC47856.2020.9249117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the United States, stroke is the fifth prominent cause of fatality and it is a major reason of serious disability among the adult population [1]. Therefore, it is crucial that we can predict stroke accurately in order to be treated in early stages. Nowadays, use of Machine Learning (ML) algorithms have been in great demand to predict patient's condition in advance and inform the medical staff to avoid the risk of disease progression. Kaggle Healthcare dataset has been widely used by many researchers in this area for developing models for stroke prediction. The dataset has 43,400 instances and 10 features. This paper proposes a method for the analysis and prediction of stroke on the same dataset using Microsoft Azure Machine Learning (AzureML) which is a cloud-based platform. We have applied Chi-Squared test on the dataset for extracting the top features. The experiments are run on AzureML with the top 6 features as well as with all the features. In addition, we compare accuracy between the two models trained by the top 6 features and all the features. The performance of Two-class Decision Jungle with top 6 features has been set as the benchmark in our work. Two-Class Boosted Decision Tree, an ensemble learning method achieves 96.8% accuracy using the top 6 features. Our experimental results show that with the right features, we could improve the accuracy significantly for the stroke prediction, and it also takes less time to train the model.\",\"PeriodicalId\":186446,\"journal\":{\"name\":\"2020 Intermountain Engineering, Technology and Computing (IETC)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Intermountain Engineering, Technology and Computing (IETC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IETC47856.2020.9249117\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Intermountain Engineering, Technology and Computing (IETC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IETC47856.2020.9249117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Chi-Squared Based Feature Selection for Stroke Prediction using AzureML
In the United States, stroke is the fifth prominent cause of fatality and it is a major reason of serious disability among the adult population [1]. Therefore, it is crucial that we can predict stroke accurately in order to be treated in early stages. Nowadays, use of Machine Learning (ML) algorithms have been in great demand to predict patient's condition in advance and inform the medical staff to avoid the risk of disease progression. Kaggle Healthcare dataset has been widely used by many researchers in this area for developing models for stroke prediction. The dataset has 43,400 instances and 10 features. This paper proposes a method for the analysis and prediction of stroke on the same dataset using Microsoft Azure Machine Learning (AzureML) which is a cloud-based platform. We have applied Chi-Squared test on the dataset for extracting the top features. The experiments are run on AzureML with the top 6 features as well as with all the features. In addition, we compare accuracy between the two models trained by the top 6 features and all the features. The performance of Two-class Decision Jungle with top 6 features has been set as the benchmark in our work. Two-Class Boosted Decision Tree, an ensemble learning method achieves 96.8% accuracy using the top 6 features. Our experimental results show that with the right features, we could improve the accuracy significantly for the stroke prediction, and it also takes less time to train the model.