Interpretable prediction of acute respiratory infection disease among under-five children in Ethiopia using ensemble machine learning and Shapley additive explanations (SHAP).

IF 2.9 3区 医学 Q2 HEALTH CARE SCIENCES & SERVICES DIGITAL HEALTH Pub Date : 2024-08-06 eCollection Date: 2024-01-01 DOI:10.1177/20552076241272739
Zinabu Bekele Tadese, Debela Tsegaye Hailu, Aschale Wubete Abebe, Shimels Derso Kebede, Agmasie Damtew Walle, Beminate Lemma Seifu, Teshome Demis Nimani
{"title":"Interpretable prediction of acute respiratory infection disease among under-five children in Ethiopia using ensemble machine learning and Shapley additive explanations (SHAP).","authors":"Zinabu Bekele Tadese, Debela Tsegaye Hailu, Aschale Wubete Abebe, Shimels Derso Kebede, Agmasie Damtew Walle, Beminate Lemma Seifu, Teshome Demis Nimani","doi":"10.1177/20552076241272739","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Although the prevalence of childhood illnesses has significantly decreased, acute respiratory infections continue to be the leading cause of death and disease among children in low- and middle-income countries. Seven percent of children under five experienced symptoms in the two weeks preceding the Ethiopian demographic and health survey. Hence, this study aimed to identify interpretable predicting factors of acute respiratory infection disease among under-five children in Ethiopia using machine learning analysis techniques.</p><p><strong>Methods: </strong>Secondary data analysis was performed using 2016 Ethiopian demographic and health survey data. Data were extracted using STATA and imported into Jupyter Notebook for further analysis. The presence of acute respiratory infection in a child under the age of 5 was the outcome variable, categorized as yes and no. Five ensemble boosting machine learning algorithms such as adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), Gradient Boost, CatBoost, and light gradient-boosting machine (LightGBM) were employed on a total sample of 10,641 children under the age of 5. The Shapley additive explanations technique was used to identify the important features and effects of each feature driving the prediction.</p><p><strong>Results: </strong><b>The</b> XGBoost model achieved an accuracy of 79.3%, an F1 score of 78.4%, a recall of 78.3%, a precision of 81.7%, and a receiver operating curve area under the curve of 86.1% after model optimization. Child age (month), history of diarrhea, number of living children, duration of breastfeeding, and mother's occupation were the top predicting factors of acute respiratory infection among children under the age of 5 in Ethiopia.</p><p><strong>Conclusion: </strong>The XGBoost classifier was the best predictive model with improved performance, and predicting factors of acute respiratory infection were identified with the help of the Shapely additive explanation. The findings of this study can help policymakers and stakeholders understand the decision-making process for acute respiratory infection prevention among under-five children in Ethiopia.</p>","PeriodicalId":51333,"journal":{"name":"DIGITAL HEALTH","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304488/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DIGITAL HEALTH","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20552076241272739","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Although the prevalence of childhood illnesses has significantly decreased, acute respiratory infections continue to be the leading cause of death and disease among children in low- and middle-income countries. Seven percent of children under five experienced symptoms in the two weeks preceding the Ethiopian demographic and health survey. Hence, this study aimed to identify interpretable predicting factors of acute respiratory infection disease among under-five children in Ethiopia using machine learning analysis techniques.

Methods: Secondary data analysis was performed using 2016 Ethiopian demographic and health survey data. Data were extracted using STATA and imported into Jupyter Notebook for further analysis. The presence of acute respiratory infection in a child under the age of 5 was the outcome variable, categorized as yes and no. Five ensemble boosting machine learning algorithms such as adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), Gradient Boost, CatBoost, and light gradient-boosting machine (LightGBM) were employed on a total sample of 10,641 children under the age of 5. The Shapley additive explanations technique was used to identify the important features and effects of each feature driving the prediction.

Results: The XGBoost model achieved an accuracy of 79.3%, an F1 score of 78.4%, a recall of 78.3%, a precision of 81.7%, and a receiver operating curve area under the curve of 86.1% after model optimization. Child age (month), history of diarrhea, number of living children, duration of breastfeeding, and mother's occupation were the top predicting factors of acute respiratory infection among children under the age of 5 in Ethiopia.

Conclusion: The XGBoost classifier was the best predictive model with improved performance, and predicting factors of acute respiratory infection were identified with the help of the Shapely additive explanation. The findings of this study can help policymakers and stakeholders understand the decision-making process for acute respiratory infection prevention among under-five children in Ethiopia.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用集合机器学习和夏普利加法解释(SHAP)对埃塞俄比亚五岁以下儿童急性呼吸道感染疾病进行可解释的预测。
背景:虽然儿童疾病的发病率已大幅下降,但急性呼吸道感染仍是中低收入国家儿童死亡和疾病的主要原因。在埃塞俄比亚人口与健康调查之前的两周内,7% 的五岁以下儿童出现过症状。因此,本研究旨在利用机器学习分析技术找出埃塞俄比亚五岁以下儿童急性呼吸道感染疾病的可解释预测因素:使用 2016 年埃塞俄比亚人口与健康调查数据进行二次数据分析。使用 STATA 提取数据并导入 Jupyter Notebook 进行进一步分析。5 岁以下儿童是否患有急性呼吸道感染是结果变量,分为 "是 "和 "否"。在总共 10,641 个 5 岁以下儿童样本中采用了自适应提升(AdaBoost)、极梯度提升(XGBoost)、梯度提升(Gradient Boost)、CatBoost 和轻梯度提升机(LightGBM)等五种集合提升机器学习算法。结果显示,XGBoost 模型的预测率达到了 90%:经过模型优化后,XGBoost 模型的准确率为 79.3%,F1 得分为 78.4%,召回率为 78.3%,精确率为 81.7%,接收者工作曲线下面积为 86.1%。儿童年龄(月)、腹泻史、存活儿童数、母乳喂养时间和母亲职业是埃塞俄比亚 5 岁以下儿童急性呼吸道感染的首要预测因素:XGBoost分类器是性能更好的最佳预测模型,在Shapely加法解释的帮助下确定了急性呼吸道感染的预测因素。本研究的结果有助于决策者和利益相关者了解埃塞俄比亚五岁以下儿童预防急性呼吸道感染的决策过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
DIGITAL HEALTH
DIGITAL HEALTH Multiple-
CiteScore
2.90
自引率
7.70%
发文量
302
期刊最新文献
A mixed-methods examination of the acceptability of, CareMOBI, a dementia-focused mhealth app, among primary care providers. Assessing the accuracy and clinical utility of GPT-4O in abnormal blood cell morphology recognition. Diagnosing epileptic seizures using combined features from independent components and prediction probability from EEG data. Exploring the feasibility, acceptability, usability and safety of a digitally supported self-management intervention for uncontrolled asthma: A pre-post pilot study in secondary care. Performance of ChatGPT on prehospital acute ischemic stroke and large vessel occlusion (LVO) stroke screening.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1