基于Chi-Squared的AzureML脑卒中预测特征选择

Sujan Ray, Khaldoon Alshouiliy, A. Roy, Ali AlGhamdi, D. Agrawal
{"title":"基于Chi-Squared的AzureML脑卒中预测特征选择","authors":"Sujan Ray, Khaldoon Alshouiliy, A. Roy, Ali AlGhamdi, D. Agrawal","doi":"10.1109/IETC47856.2020.9249117","DOIUrl":null,"url":null,"abstract":"In the United States, stroke is the fifth prominent cause of fatality and it is a major reason of serious disability among the adult population [1]. Therefore, it is crucial that we can predict stroke accurately in order to be treated in early stages. Nowadays, use of Machine Learning (ML) algorithms have been in great demand to predict patient's condition in advance and inform the medical staff to avoid the risk of disease progression. Kaggle Healthcare dataset has been widely used by many researchers in this area for developing models for stroke prediction. The dataset has 43,400 instances and 10 features. This paper proposes a method for the analysis and prediction of stroke on the same dataset using Microsoft Azure Machine Learning (AzureML) which is a cloud-based platform. We have applied Chi-Squared test on the dataset for extracting the top features. The experiments are run on AzureML with the top 6 features as well as with all the features. In addition, we compare accuracy between the two models trained by the top 6 features and all the features. The performance of Two-class Decision Jungle with top 6 features has been set as the benchmark in our work. Two-Class Boosted Decision Tree, an ensemble learning method achieves 96.8% accuracy using the top 6 features. Our experimental results show that with the right features, we could improve the accuracy significantly for the stroke prediction, and it also takes less time to train the model.","PeriodicalId":186446,"journal":{"name":"2020 Intermountain Engineering, Technology and Computing (IETC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Chi-Squared Based Feature Selection for Stroke Prediction using AzureML\",\"authors\":\"Sujan Ray, Khaldoon Alshouiliy, A. Roy, Ali AlGhamdi, D. Agrawal\",\"doi\":\"10.1109/IETC47856.2020.9249117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the United States, stroke is the fifth prominent cause of fatality and it is a major reason of serious disability among the adult population [1]. Therefore, it is crucial that we can predict stroke accurately in order to be treated in early stages. Nowadays, use of Machine Learning (ML) algorithms have been in great demand to predict patient's condition in advance and inform the medical staff to avoid the risk of disease progression. Kaggle Healthcare dataset has been widely used by many researchers in this area for developing models for stroke prediction. The dataset has 43,400 instances and 10 features. This paper proposes a method for the analysis and prediction of stroke on the same dataset using Microsoft Azure Machine Learning (AzureML) which is a cloud-based platform. We have applied Chi-Squared test on the dataset for extracting the top features. The experiments are run on AzureML with the top 6 features as well as with all the features. In addition, we compare accuracy between the two models trained by the top 6 features and all the features. The performance of Two-class Decision Jungle with top 6 features has been set as the benchmark in our work. Two-Class Boosted Decision Tree, an ensemble learning method achieves 96.8% accuracy using the top 6 features. Our experimental results show that with the right features, we could improve the accuracy significantly for the stroke prediction, and it also takes less time to train the model.\",\"PeriodicalId\":186446,\"journal\":{\"name\":\"2020 Intermountain Engineering, Technology and Computing (IETC)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Intermountain Engineering, Technology and Computing (IETC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IETC47856.2020.9249117\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Intermountain Engineering, Technology and Computing (IETC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IETC47856.2020.9249117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

在美国,中风是第五大致死原因,也是导致成年人严重残疾的主要原因[1]。因此,准确预测中风是至关重要的,以便在早期阶段进行治疗。目前,使用机器学习(ML)算法来提前预测患者的病情并通知医务人员以避免疾病进展的风险已经非常有需求。Kaggle医疗数据集已被该领域的许多研究人员广泛用于开发中风预测模型。该数据集有43400个实例和10个特征。本文提出了一种利用基于云的微软Azure机器学习(AzureML)平台对同一数据集进行笔划分析和预测的方法。我们对数据集进行了卡方检验,提取了顶部特征。实验是在AzureML上运行的,包含了前6个特性以及所有的特性。此外,我们比较了前6个特征训练的两个模型与所有特征的准确率。在我们的工作中,将具有前6个特征的两类决策丛林(Two-class Decision Jungle)的性能作为基准。两类提升决策树是一种集成学习方法,使用前6个特征,准确率达到96.8%。我们的实验结果表明,通过正确的特征,我们可以显著提高中风预测的准确性,并且可以减少模型的训练时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Chi-Squared Based Feature Selection for Stroke Prediction using AzureML
In the United States, stroke is the fifth prominent cause of fatality and it is a major reason of serious disability among the adult population [1]. Therefore, it is crucial that we can predict stroke accurately in order to be treated in early stages. Nowadays, use of Machine Learning (ML) algorithms have been in great demand to predict patient's condition in advance and inform the medical staff to avoid the risk of disease progression. Kaggle Healthcare dataset has been widely used by many researchers in this area for developing models for stroke prediction. The dataset has 43,400 instances and 10 features. This paper proposes a method for the analysis and prediction of stroke on the same dataset using Microsoft Azure Machine Learning (AzureML) which is a cloud-based platform. We have applied Chi-Squared test on the dataset for extracting the top features. The experiments are run on AzureML with the top 6 features as well as with all the features. In addition, we compare accuracy between the two models trained by the top 6 features and all the features. The performance of Two-class Decision Jungle with top 6 features has been set as the benchmark in our work. Two-Class Boosted Decision Tree, an ensemble learning method achieves 96.8% accuracy using the top 6 features. Our experimental results show that with the right features, we could improve the accuracy significantly for the stroke prediction, and it also takes less time to train the model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Virtual Reality Training in Electric Utility Sector - An Underground Application Study Case Different assignments as different contexts: predictors across assignments and outcome measures in CS1 2020 Intermountain Engineering, Technology and Computing (IETC) Micromachining of Silicon Carbide using Wire Electrical Discharge Machining Stereophonic Frequency Modulation using MATLAB: An Undergraduate Research Project
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1