Elliot Mbunge, M. Sibiya, Sam Takavarasha, R. Millham, Garikayi B. Chemhaka, Benhildah Muchemwa, T. Dzinamarira
{"title":"Implementation of ensemble machine learning classifiers to predict diarrhoea with SMOTEENN, SMOTE, and SMOTETomek class imbalance approaches","authors":"Elliot Mbunge, M. Sibiya, Sam Takavarasha, R. Millham, Garikayi B. Chemhaka, Benhildah Muchemwa, T. Dzinamarira","doi":"10.1109/ICTAS56421.2023.10082744","DOIUrl":null,"url":null,"abstract":"Diarrhoea continues to be a major public health burden and cause of death among children under 5 years in many developing countries. Rotavirus vaccination, hygiene practices, clean water, and health promotion are among the preventive measures implemented to improve child health. Nevertheless, tackling diarrhoea also requires the integration of ensemble machine learning (ML) into health systems to improve child health. However, the integration of ensemble classifiers into health systems in many developing countries is still nascent. Therefore, this study applied SMOTE, SMOTEEN and SMOTETomek class imbalance approaches and ensemble ML classifiers to predict diarrhoea. Ensemble methods significantly improve the performance of conventional ML classifiers. The study revealed that the ExtraTrees classifier achieved a high recall of 96.3%, accuracy of 94.3%, precision of 93.8%, and F1-score of 95% when predicting diarrhoea with SMOTEENN as compared to SMOTE and SMOTETomek. The performance of the HistGradientBoosting classifier also improved and achieved a high recall of 95.2%, accuracy of 91.5%, precision of 90.4%, and F1-score of 92.7%. The paper also shows that ensemble methods are increasingly becoming state-of-the-art solutions for multiple challenges encountered with ML algorithms such as overfitting, computationally intensive, underfitting and representation. The paper also demonstrates how ensemble methods are becoming state-of-the-art solutions to multiple problems that arise with ML algorithms. There is a need to develop data-driven applications that incorporate ensemble methods to model and predict diarrhoea to assist policymakers to craft interventions aimed to improve child health.","PeriodicalId":158720,"journal":{"name":"2023 Conference on Information Communications Technology and Society (ICTAS)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Conference on Information Communications Technology and Society (ICTAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAS56421.2023.10082744","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Diarrhoea continues to be a major public health burden and cause of death among children under 5 years in many developing countries. Rotavirus vaccination, hygiene practices, clean water, and health promotion are among the preventive measures implemented to improve child health. Nevertheless, tackling diarrhoea also requires the integration of ensemble machine learning (ML) into health systems to improve child health. However, the integration of ensemble classifiers into health systems in many developing countries is still nascent. Therefore, this study applied SMOTE, SMOTEEN and SMOTETomek class imbalance approaches and ensemble ML classifiers to predict diarrhoea. Ensemble methods significantly improve the performance of conventional ML classifiers. The study revealed that the ExtraTrees classifier achieved a high recall of 96.3%, accuracy of 94.3%, precision of 93.8%, and F1-score of 95% when predicting diarrhoea with SMOTEENN as compared to SMOTE and SMOTETomek. The performance of the HistGradientBoosting classifier also improved and achieved a high recall of 95.2%, accuracy of 91.5%, precision of 90.4%, and F1-score of 92.7%. The paper also shows that ensemble methods are increasingly becoming state-of-the-art solutions for multiple challenges encountered with ML algorithms such as overfitting, computationally intensive, underfitting and representation. The paper also demonstrates how ensemble methods are becoming state-of-the-art solutions to multiple problems that arise with ML algorithms. There is a need to develop data-driven applications that incorporate ensemble methods to model and predict diarrhoea to assist policymakers to craft interventions aimed to improve child health.