{"title":"Imputation Techniques and Recursive Feature Elimination in Machine Learning Applied to Type II Diabetes Classification","authors":"V. P. Magboo, M. A. Magboo","doi":"10.1145/3508259.3508288","DOIUrl":null,"url":null,"abstract":"Type II diabetes is a chronic metabolic disease secondary to elevated blood glucose levels. Complications of this disease include heart attack, stroke, blindness, renal failure, lower limb amputation and mortality. Due to its rising prevalence and consequent mortality, it is important to identify at an early stage those patients at high risk of developing diabetes. We applied 8 machine learning techniques namely: support vector machine, logistic regression, k-nearest neighbor, naïve Bayes, decision tree, random forest, AdaBoost and XGBoost in predicting diabetes using a publicly available diabetes dataset. In our study, Naïve Bayes with median imputation and recursive feature elimination obtained the highest performance with an accuracy rate of 81.0%. Although the results are very promising, one major limitation in this study is the small number of samples in the dataset. Early accurate detection can help patients to proactively monitor their lifestyle habits mitigating the risks of complications of uncontrolled diabetes.","PeriodicalId":119217,"journal":{"name":"Artificial Intelligence and Cloud Computing Conference","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence and Cloud Computing Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508259.3508288","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Type II diabetes is a chronic metabolic disease secondary to elevated blood glucose levels. Complications of this disease include heart attack, stroke, blindness, renal failure, lower limb amputation and mortality. Due to its rising prevalence and consequent mortality, it is important to identify at an early stage those patients at high risk of developing diabetes. We applied 8 machine learning techniques namely: support vector machine, logistic regression, k-nearest neighbor, naïve Bayes, decision tree, random forest, AdaBoost and XGBoost in predicting diabetes using a publicly available diabetes dataset. In our study, Naïve Bayes with median imputation and recursive feature elimination obtained the highest performance with an accuracy rate of 81.0%. Although the results are very promising, one major limitation in this study is the small number of samples in the dataset. Early accurate detection can help patients to proactively monitor their lifestyle habits mitigating the risks of complications of uncontrolled diabetes.