Inès Rahmany, Sami Mahfoudhi, Mushira Freihat, T. Moulahi
{"title":"Missing Data Recovery in the E-health Context Based on Machine Learning Models","authors":"Inès Rahmany, Sami Mahfoudhi, Mushira Freihat, T. Moulahi","doi":"10.54364/aaiml.2022.1135","DOIUrl":null,"url":null,"abstract":"Diabetes mellitus is a set of metabolic illnesses characterized by abnormally high blood sugar levels. In 2017, 8.8% of the world’s population had diabetes. By 2045, it is expected that this percentage will have risen to approximately 10%. Missing data, a prevalent problem even in a well-designed and controlled study, can have a major impact on the conclusions that can be derived from the available data. Missing data may decrease a study’s statistical validity and lead to erroneous results due to distorted estimations. In this study, we hypothesize that (a) replacing missing values using machine learning techniques rather than the mean value and group mean value and (b) using SVM kernel RBF classifier will result in the highest level of accuracy in comparison to traditional techniques such as DT, RF, NB, SVM, AdaBoost, and ANN. The classification results improved significantly when using regression to replace the missing values over the group median or the mean. This is a 10% improvement over previously developed strategies that have been reported in the literature.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adv. Artif. Intell. Mach. Learn.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54364/aaiml.2022.1135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Diabetes mellitus is a set of metabolic illnesses characterized by abnormally high blood sugar levels. In 2017, 8.8% of the world’s population had diabetes. By 2045, it is expected that this percentage will have risen to approximately 10%. Missing data, a prevalent problem even in a well-designed and controlled study, can have a major impact on the conclusions that can be derived from the available data. Missing data may decrease a study’s statistical validity and lead to erroneous results due to distorted estimations. In this study, we hypothesize that (a) replacing missing values using machine learning techniques rather than the mean value and group mean value and (b) using SVM kernel RBF classifier will result in the highest level of accuracy in comparison to traditional techniques such as DT, RF, NB, SVM, AdaBoost, and ANN. The classification results improved significantly when using regression to replace the missing values over the group median or the mean. This is a 10% improvement over previously developed strategies that have been reported in the literature.