Machine learning models for prediction of double and triple burdens of non-communicable diseases in Bangladesh

IF 1.5 3区 社会学 Q2 DEMOGRAPHY Journal of Biosocial Science Pub Date : 2024-03-20 DOI:10.1017/s0021932024000063
Md. Akib Al-Zubayer, Khorshed Alam, Hasibul Hasan Shanto, Md. Maniruzzaman, Uttam Kumar Majumder, Benojir Ahammed
{"title":"Machine learning models for prediction of double and triple burdens of non-communicable diseases in Bangladesh","authors":"Md. Akib Al-Zubayer, Khorshed Alam, Hasibul Hasan Shanto, Md. Maniruzzaman, Uttam Kumar Majumder, Benojir Ahammed","doi":"10.1017/s0021932024000063","DOIUrl":null,"url":null,"abstract":"<p>Increasing prevalence of non-communicable diseases (NCDs) has become the leading cause of death and disability in Bangladesh. Therefore, this study aimed to measure the prevalence of and risk factors for double and triple burden of NCDs (DBNCDs and TBNCDs), considering diabetes, hypertension, and overweight and obesity as well as establish a machine learning approach for predicting DBNCDs and TBNCDs. A total of 12,151 respondents from the 2017 to 2018 Bangladesh Demographic and Health Survey were included in this analysis, where 10%, 27.4%, and 24.3% of respondents had diabetes, hypertension, and overweight and obesity, respectively. Chi-square test and multilevel logistic regression (LR) analysis were applied to select factors associated with DBNCDs and TBNCDs. Furthermore, six classifiers including decision tree (DT), LR, naïve Bayes (NB), k-nearest neighbour (KNN), random forest (RF), and extreme gradient boosting (XGBoost) with three cross-validation protocols (K2, K5, and K10) were adopted to predict the status of DBNCDs and TBNCDs. The classification accuracy (ACC) and area under the curve (AUC) were computed for each protocol and repeated 10 times to make them more robust, and then the average ACC and AUC were computed. The prevalence of DBNCDs and TBNCDs was 14.3% and 2.3%, respectively. The findings of this study revealed that DBNCDs and TBNCDs were significantly influenced by age, sex, marital status, wealth index, education and geographic region. Compared to other classifiers, the RF-based classifier provides the highest ACC and AUC for both DBNCDs (ACC = 81.06% and AUC = 0.93) and TBNCDs (ACC = 88.61% and AUC = 0.97) for the K10 protocol. A combination of considered two-step factor selections and RF-based classifier can better predict the burden of NCDs. The findings of this study suggested that decision-makers might adopt suitable decisions to control and prevent the burden of NCDs using RF classifiers.</p>","PeriodicalId":47742,"journal":{"name":"Journal of Biosocial Science","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biosocial Science","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1017/s0021932024000063","RegionNum":3,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DEMOGRAPHY","Score":null,"Total":0}
引用次数: 0

Abstract

Increasing prevalence of non-communicable diseases (NCDs) has become the leading cause of death and disability in Bangladesh. Therefore, this study aimed to measure the prevalence of and risk factors for double and triple burden of NCDs (DBNCDs and TBNCDs), considering diabetes, hypertension, and overweight and obesity as well as establish a machine learning approach for predicting DBNCDs and TBNCDs. A total of 12,151 respondents from the 2017 to 2018 Bangladesh Demographic and Health Survey were included in this analysis, where 10%, 27.4%, and 24.3% of respondents had diabetes, hypertension, and overweight and obesity, respectively. Chi-square test and multilevel logistic regression (LR) analysis were applied to select factors associated with DBNCDs and TBNCDs. Furthermore, six classifiers including decision tree (DT), LR, naïve Bayes (NB), k-nearest neighbour (KNN), random forest (RF), and extreme gradient boosting (XGBoost) with three cross-validation protocols (K2, K5, and K10) were adopted to predict the status of DBNCDs and TBNCDs. The classification accuracy (ACC) and area under the curve (AUC) were computed for each protocol and repeated 10 times to make them more robust, and then the average ACC and AUC were computed. The prevalence of DBNCDs and TBNCDs was 14.3% and 2.3%, respectively. The findings of this study revealed that DBNCDs and TBNCDs were significantly influenced by age, sex, marital status, wealth index, education and geographic region. Compared to other classifiers, the RF-based classifier provides the highest ACC and AUC for both DBNCDs (ACC = 81.06% and AUC = 0.93) and TBNCDs (ACC = 88.61% and AUC = 0.97) for the K10 protocol. A combination of considered two-step factor selections and RF-based classifier can better predict the burden of NCDs. The findings of this study suggested that decision-makers might adopt suitable decisions to control and prevent the burden of NCDs using RF classifiers.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
预测孟加拉国非传染性疾病双重和三重负担的机器学习模型
非传染性疾病(NCDs)发病率的不断上升已成为孟加拉国死亡和残疾的主要原因。因此,本研究旨在测量非传染性疾病双重和三重负担(DBNCDs 和 TBNCDs)的患病率和风险因素,同时考虑糖尿病、高血压、超重和肥胖,并建立预测 DBNCDs 和 TBNCDs 的机器学习方法。本次分析共纳入了 2017 年至 2018 年孟加拉国人口与健康调查的 12151 名受访者,其中分别有 10%、27.4% 和 24.3% 的受访者患有糖尿病、高血压以及超重和肥胖症。应用卡方检验和多层次逻辑回归(LR)分析来选择与 DBNCD 和 TBNCD 相关的因素。此外,还采用了六种分类器,包括决策树(DT)、LR、天真贝叶斯(NB)、k-近邻(KNN)、随机森林(RF)和极端梯度提升(XGBoost),并采用三种交叉验证方案(K2、K5和K10)来预测DBNCDs和TBNCDs的状态。计算每个方案的分类准确率(ACC)和曲线下面积(AUC),并重复10次以提高其稳健性,然后计算平均ACC和AUC。DBNCD 和 TBNCD 的发病率分别为 14.3% 和 2.3%。研究结果显示,年龄、性别、婚姻状况、财富指数、教育程度和地理区域对 DBNCD 和 TBNCD 有显著影响。与其他分类器相比,基于射频的分类器为 K10 方案的 DBNCDs(ACC = 81.06%,AUC = 0.93)和 TBNCDs(ACC = 88.61%,AUC = 0.97)提供了最高的 ACC 和 AUC。综合考虑两步因素选择和基于射频的分类器可以更好地预测非传染性疾病的负担。这项研究的结果表明,决策者可以利用射频分类器做出适当的决策,以控制和预防非传染性疾病的负担。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
3.00
自引率
6.70%
发文量
108
期刊介绍: Journal of Biosocial Science is a leading interdisciplinary and international journal in the field of biosocial science, the common ground between biology and sociology. It acts as an essential reference guide for all biological and social scientists working in these interdisciplinary areas, including social and biological aspects of reproduction and its control, gerontology, ecology, genetics, applied psychology, sociology, education, criminology, demography, health and epidemiology. Publishing original research papers, short reports, reviews, lectures and book reviews, the journal also includes a Debate section that encourages readers" comments on specific articles, with subsequent response from the original author.
期刊最新文献
Geographical disparities in temporal trends of low birth weight in Saskatchewan from 2002/2003 to 2021/2022: insights from a joinpoint regression analysis. Unveiling disparities: a non-linear decomposition analysis of the gap in menstrual hygiene material use between adolescent women in Aspirational and the remaining districts of India. Beyond the margins: antenatal health and healthcare behaviours among homeless women in Kolkata Municipal Corporation, India. 'They will be like a person with a disease': a qualitative investigation of variation in contraceptive side-effect experiences in Central Oromia, Ethiopia. Geographic inequities in neonatal survival in Nigeria: a cross-sectional evidence from spatial and artificial neural network analyses.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1