Comparative analysis of machine learning algorithms for predicting diarrhea among under-five children in Ethiopia: Evidence from 2016 EDHS

IF 2.2 3区 医学 Q2 HEALTH CARE SCIENCES & SERVICES Health Informatics Journal Pub Date : 2024-09-14 DOI:10.1177/14604582241285769
Alemu Birara Zemariam, Wondosen Abey, Abdulaziz Kebede Kassaw, Ali Yimer
{"title":"Comparative analysis of machine learning algorithms for predicting diarrhea among under-five children in Ethiopia: Evidence from 2016 EDHS","authors":"Alemu Birara Zemariam, Wondosen Abey, Abdulaziz Kebede Kassaw, Ali Yimer","doi":"10.1177/14604582241285769","DOIUrl":null,"url":null,"abstract":"Background: Diarrhea is a major cause of mortality and morbidity in under-5 children globally, especially in developing countries like Ethiopia. Limited research has used machine learning to predict childhood diarrhea. This study aimed to compare the predictive performance of ML algorithms for diarrhea in under-5 children in Ethiopia. Methods: The study utilized a dataset of 9501 under-5 children from the Ethiopia Demographic and Health Survey 2016. Five ML algorithms were used to build and compare predictive models. The model performance was evaluated using various metrics in Python. Boruta feature selection was employed, and data balancing techniques such as under-sampling, over-sampling, adaptive synthetic sampling, and synthetic minority oversampling as well as hyper parameter tuning methods were explored. Association rule mining was conducted using the Apriori algorithm in R to determine relationships between independent and target variables. Results: 10.2% of children had diarrhea. The Random Forest model had the best performance with 93.2% accuracy, 98.4% sensitivity, 85.5% specificity, and 0.916 AUC. The top predictors were residence, wealth index, and child age, number of living children, deworming, wasting, mother’s occupation, and education. Association rule mining identified the top 7 rules most associated with under-5 diarrhea in Ethiopia. Conclusion: The RF achieved the highest performance for predicting childhood diarrhea. Policymakers and healthcare providers can use these findings to develop targeted interventions to reduce diarrhea. Customizing strategies based on the identified association rules has the potential to improve child health and decrease the impact of diarrhea in Ethiopia.","PeriodicalId":55069,"journal":{"name":"Health Informatics Journal","volume":"23 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Informatics Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/14604582241285769","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Diarrhea is a major cause of mortality and morbidity in under-5 children globally, especially in developing countries like Ethiopia. Limited research has used machine learning to predict childhood diarrhea. This study aimed to compare the predictive performance of ML algorithms for diarrhea in under-5 children in Ethiopia. Methods: The study utilized a dataset of 9501 under-5 children from the Ethiopia Demographic and Health Survey 2016. Five ML algorithms were used to build and compare predictive models. The model performance was evaluated using various metrics in Python. Boruta feature selection was employed, and data balancing techniques such as under-sampling, over-sampling, adaptive synthetic sampling, and synthetic minority oversampling as well as hyper parameter tuning methods were explored. Association rule mining was conducted using the Apriori algorithm in R to determine relationships between independent and target variables. Results: 10.2% of children had diarrhea. The Random Forest model had the best performance with 93.2% accuracy, 98.4% sensitivity, 85.5% specificity, and 0.916 AUC. The top predictors were residence, wealth index, and child age, number of living children, deworming, wasting, mother’s occupation, and education. Association rule mining identified the top 7 rules most associated with under-5 diarrhea in Ethiopia. Conclusion: The RF achieved the highest performance for predicting childhood diarrhea. Policymakers and healthcare providers can use these findings to develop targeted interventions to reduce diarrhea. Customizing strategies based on the identified association rules has the potential to improve child health and decrease the impact of diarrhea in Ethiopia.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
预测埃塞俄比亚五岁以下儿童腹泻的机器学习算法比较分析:来自 2016 年埃塞俄比亚人口与健康调查的证据
背景:腹泻是全球 5 岁以下儿童死亡和发病的主要原因,尤其是在埃塞俄比亚等发展中国家。利用机器学习预测儿童腹泻的研究有限。本研究旨在比较机器学习算法对埃塞俄比亚 5 岁以下儿童腹泻的预测性能。方法:研究利用了 2016 年埃塞俄比亚人口与健康调查中 9501 名 5 岁以下儿童的数据集。使用五种 ML 算法建立并比较预测模型。使用 Python 中的各种指标对模型性能进行了评估。采用了 Boruta 特征选择,并探索了数据平衡技术,例如欠采样、过度采样、自适应合成采样和合成少数过度采样以及超参数调整方法。使用 R 中的 Apriori 算法进行了关联规则挖掘,以确定自变量和目标变量之间的关系。结果10.2%的儿童患有腹泻。随机森林模型的准确率为 93.2%,灵敏度为 98.4%,特异性为 85.5%,AUC 为 0.916,表现最佳。最主要的预测因素是居住地、财富指数、儿童年龄、存活儿童数量、驱虫、消瘦、母亲职业和教育程度。关联规则挖掘确定了与埃塞俄比亚 5 岁以下儿童腹泻最相关的 7 条规则。结论:RF 在预测儿童腹泻方面的性能最高。政策制定者和医疗保健提供者可以利用这些发现制定有针对性的干预措施,以减少腹泻。根据已确定的关联规则定制策略,有可能改善埃塞俄比亚的儿童健康状况并减少腹泻的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Health Informatics Journal
Health Informatics Journal HEALTH CARE SCIENCES & SERVICES-MEDICAL INFORMATICS
CiteScore
7.80
自引率
6.70%
发文量
80
审稿时长
6 months
期刊介绍: Health Informatics Journal is an international peer-reviewed journal. All papers submitted to Health Informatics Journal are subject to peer review by members of a carefully appointed editorial board. The journal operates a conventional single-blind reviewing policy in which the reviewer’s name is always concealed from the submitting author.
期刊最新文献
Empowering healthcare education: A multilingual ontology for medical informatics and digital health (MIMO) integrated to artificial intelligence powered training in smart hospitals. Analysis of health recommendations using longitudinal quality of life data: QoL@TbA - A transformer-based approach. Analysis of total RNA as a potential biomarker of developmental neurotoxicity in silico. Characterizing pituitary adenomas in clinical notes: Corpus construction and its application in LLMs. HealthCheck: A method for evaluating persuasive mobile health applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1