A Comparison of Machine Learning Techniques for the Detection of Type-2 Diabetes Mellitus: Experiences from Bangladesh

Inf. Comput. Pub Date : 2023-07-02 DOI:10.3390/info14070376

Md. Jamal Uddin, Md. Martuza Ahamad, Md. Nesarul Hoque, Md. Abul Ala Walid, Sakifa Aktar, Naif Alotaibi, S. Alyami, M. A. Kabir, M. Moni

{"title":"A Comparison of Machine Learning Techniques for the Detection of Type-2 Diabetes Mellitus: Experiences from Bangladesh","authors":"Md. Jamal Uddin, Md. Martuza Ahamad, Md. Nesarul Hoque, Md. Abul Ala Walid, Sakifa Aktar, Naif Alotaibi, S. Alyami, M. A. Kabir, M. Moni","doi":"10.3390/info14070376","DOIUrl":null,"url":null,"abstract":"Diabetes is a chronic disease caused by a persistently high blood sugar level, causing other chronic diseases, including cardiovascular, kidney, eye, and nerve damage. Prompt detection plays a vital role in reducing the risk and severity associated with diabetes, and identifying key risk factors can help individuals become more mindful of their lifestyles. In this study, we conducted a questionnaire-based survey utilizing standard diabetes risk variables to examine the prevalence of diabetes in Bangladesh. To enable prompt detection of diabetes, we compared different machine learning techniques and proposed an ensemble-based machine learning framework that incorporated algorithms such as decision tree, random forest, and extreme gradient boost algorithms. In order to address class imbalance within the dataset, we initially applied the synthetic minority oversampling technique (SMOTE) and random oversampling (ROS) techniques. We evaluated the performance of various classifiers, including decision tree (DT), logistic regression (LR), support vector machine (SVM), gradient boost (GB), extreme gradient boost (XGBoost), random forest (RF), and ensemble technique (ET), on our diabetes datasets. Our experimental results showed that the ET outperformed other classifiers; to further enhance its effectiveness, we fine-tuned and evaluated the hyperparameters of the ET. Using statistical and machine learning techniques, we also ranked features and identified that age, extreme thirst, and diabetes in the family are significant features that prove instrumental in the detection of diabetes patients. This method has great potential for clinicians to effectively identify individuals at risk of diabetes, facilitating timely intervention and care.","PeriodicalId":13622,"journal":{"name":"Inf. Comput.","volume":"3 2","pages":"376"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inf. Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/info14070376","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Diabetes is a chronic disease caused by a persistently high blood sugar level, causing other chronic diseases, including cardiovascular, kidney, eye, and nerve damage. Prompt detection plays a vital role in reducing the risk and severity associated with diabetes, and identifying key risk factors can help individuals become more mindful of their lifestyles. In this study, we conducted a questionnaire-based survey utilizing standard diabetes risk variables to examine the prevalence of diabetes in Bangladesh. To enable prompt detection of diabetes, we compared different machine learning techniques and proposed an ensemble-based machine learning framework that incorporated algorithms such as decision tree, random forest, and extreme gradient boost algorithms. In order to address class imbalance within the dataset, we initially applied the synthetic minority oversampling technique (SMOTE) and random oversampling (ROS) techniques. We evaluated the performance of various classifiers, including decision tree (DT), logistic regression (LR), support vector machine (SVM), gradient boost (GB), extreme gradient boost (XGBoost), random forest (RF), and ensemble technique (ET), on our diabetes datasets. Our experimental results showed that the ET outperformed other classifiers; to further enhance its effectiveness, we fine-tuned and evaluated the hyperparameters of the ET. Using statistical and machine learning techniques, we also ranked features and identified that age, extreme thirst, and diabetes in the family are significant features that prove instrumental in the detection of diabetes patients. This method has great potential for clinicians to effectively identify individuals at risk of diabetes, facilitating timely intervention and care.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

机器学习技术检测2型糖尿病的比较:来自孟加拉国的经验

糖尿病是一种由持续高血糖引起的慢性疾病，可引起其他慢性疾病，包括心血管、肾脏、眼睛和神经损伤。及时发现对于降低与糖尿病相关的风险和严重程度起着至关重要的作用，确定关键的风险因素可以帮助个人更加注意自己的生活方式。在这项研究中，我们利用标准糖尿病风险变量进行了一项基于问卷的调查，以检查孟加拉国的糖尿病患病率。为了能够及时检测糖尿病，我们比较了不同的机器学习技术，并提出了一个基于集成的机器学习框架，该框架结合了决策树、随机森林和极端梯度增强算法等算法。为了解决数据集中的类不平衡问题，我们最初应用了合成少数过采样技术(SMOTE)和随机过采样(ROS)技术。我们评估了各种分类器的性能，包括决策树(DT)、逻辑回归(LR)、支持向量机(SVM)、梯度增强(GB)、极端梯度增强(XGBoost)、随机森林(RF)和集成技术(ET)，在我们的糖尿病数据集上。我们的实验结果表明，ET优于其他分类器;为了进一步提高其有效性，我们对ET的超参数进行了微调和评估。使用统计和机器学习技术，我们还对特征进行了排名，并确定年龄、极度口渴和家庭中的糖尿病是证明有助于检测糖尿病患者的重要特征。这种方法对临床医生有效识别糖尿病风险个体，促进及时干预和护理具有很大的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Inf. Comput.

自引率

0.00%

发文量

期刊最新文献

Traceable Constant-Size Multi-authority Credentials Pspace-Completeness of the Temporal Logic of Sub-Intervals and Suffixes Employee Productivity Assessment Using Fuzzy Inference System Correction of Threshold Determination in Rapid-Guessing Behaviour Detection Combining Classifiers for Deep Learning Mask Face Recognition