{"title":"A Novel Approach for Classifying Diabetes’ Patients Based on Imputation and Machine Learning","authors":"K. Driss, W. Boulila, Amreen Batool, Jawad Ahmad","doi":"10.1109/UCET51115.2020.9205378","DOIUrl":null,"url":null,"abstract":"Since the last decade, many research studies has been conducted on machine learning-based diabetes disease prediction using diagnostic measurement. However, the main challenge in machine learning-based diabetes disease prediction is the preprocessing of data, which contains, in most cases missing values and outliers. For data analytics and accurate prediction, data cleansing is highly desired and recommended. The goal of this study is to predict diabetic patients using realworld datasets. The proposed approach is based on three main steps: cleansing, modelling, and storytelling. In the first step, an imputation process is conducted to remove missing values. Then, k-nearest neighbor’s algorithm is applied to classify patients. To evaluate the performance of the proposed approach, two criteria, namely the F1 score and the Receiver Operating Characteristic (ROC) has been used. F1 score and ROC curve show a clear distinction between diabetic and nondiabetic patients.","PeriodicalId":163493,"journal":{"name":"2020 International Conference on UK-China Emerging Technologies (UCET)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on UK-China Emerging Technologies (UCET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UCET51115.2020.9205378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Since the last decade, many research studies has been conducted on machine learning-based diabetes disease prediction using diagnostic measurement. However, the main challenge in machine learning-based diabetes disease prediction is the preprocessing of data, which contains, in most cases missing values and outliers. For data analytics and accurate prediction, data cleansing is highly desired and recommended. The goal of this study is to predict diabetic patients using realworld datasets. The proposed approach is based on three main steps: cleansing, modelling, and storytelling. In the first step, an imputation process is conducted to remove missing values. Then, k-nearest neighbor’s algorithm is applied to classify patients. To evaluate the performance of the proposed approach, two criteria, namely the F1 score and the Receiver Operating Characteristic (ROC) has been used. F1 score and ROC curve show a clear distinction between diabetic and nondiabetic patients.