孟加拉国糖尿病早期可识别特征的临床适应性机器学习模型

IF 4.4 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Intelligent medicine Pub Date : 2024-02-01 DOI:10.1016/j.imed.2023.01.003
Nurjahan Nipa , Mahmudul Hasan Riyad , Shahriare Satu , Walliullah , Koushik Chandra Howlader , Mohammad Ali Moni
{"title":"孟加拉国糖尿病早期可识别特征的临床适应性机器学习模型","authors":"Nurjahan Nipa ,&nbsp;Mahmudul Hasan Riyad ,&nbsp;Shahriare Satu ,&nbsp;Walliullah ,&nbsp;Koushik Chandra Howlader ,&nbsp;Mohammad Ali Moni","doi":"10.1016/j.imed.2023.01.003","DOIUrl":null,"url":null,"abstract":"<div><p><strong>Objective</strong> Diabetes mellitus is a serious disease where the body of affected patients are failed to produce enough insulin that causes an abnormality of blood sugar. This disease happens for a number of reasons including modern lifestyle, lethargic attitude, unhealthy food consumption, family history, age, overweight, etc. The aim of this study was to propose a machine learning based prediction model that detected diabetes at the beginning.</p><p><strong>Methods</strong> In this work, we collected 520 patients records from the University of California, Irvine (UCI) machine learning repository of Sylhet Diabetes Hospital, Sylhet. Then, a similar questionnaire of that hospital was followed and assembled 558 patients records from all over Bangladesh through this questionnaire. However, we accumulated patient records of these two datasets. In the next step, these datasets were cleaned and applied thirty five state-of-arts classifiers such as logistic regression (LR), K nearest neighbors (KNN), support vector classifier (SVC), Nave Byes (NB), decision tree (DT), random forest (RF), stochastic gradient descent (SGD), Perceptron, AdaBoost, XGBoost, passive aggressive classifier (PAC), ridge classifier (RC), Nu-support vector classifier (Nu-SVC), linear support vector classifier (LSVC), calibrated classifier CV (CCCV), nearest centroid (NC), Gaussian process classifier (GPC), multinomial NB (MNB), complement NB, Bernoulli NB (BNB), categorical NB, Bagging, extra tree(ET), gradiant boosting classifier (GBC), Hist gradiant boosting classifier (HGBC), one vs rest classifier (OVsRC), multi-layer perceptron (MLP), label propagation (LP), label spreading (LS), stacking, ridge classifier CV (RCCV), logistic regression CV (LRCV), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and light gradient boosting machine (LGBM) to explore best stable predictive model. The performance of the classifiers has been measured using five metrics such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic. Finally, these outcomes were interpreted using Shapley additive explanations methods and identified relevant features for happening diabetes.</p><p><strong>Results</strong> In this work, different classifiers were shown their performance where ET outperformed any other classifiers with 97.11% accuracy for the Sylhet Diabetes Hospital dataset (SDHD) and MLP shows the best accuracy (96.42%) for the collected dataset. Subsequently, HGBC and LGBM provide the highest 94.90% accuracy for the combined datasets individually.</p><p><strong>Conclusion</strong> LGBM, stacking, HGBC, RF, ET, bagging, and GBC might represent more stable prediction results for each dataset.</p></div>","PeriodicalId":73400,"journal":{"name":"Intelligent medicine","volume":"4 1","pages":"Pages 22-32"},"PeriodicalIF":4.4000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667102623000049/pdfft?md5=483dd192a0f387935882a26dc29741d8&pid=1-s2.0-S2667102623000049-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Clinically adaptable machine learning model to identify early appreciable features of diabetes\",\"authors\":\"Nurjahan Nipa ,&nbsp;Mahmudul Hasan Riyad ,&nbsp;Shahriare Satu ,&nbsp;Walliullah ,&nbsp;Koushik Chandra Howlader ,&nbsp;Mohammad Ali Moni\",\"doi\":\"10.1016/j.imed.2023.01.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><strong>Objective</strong> Diabetes mellitus is a serious disease where the body of affected patients are failed to produce enough insulin that causes an abnormality of blood sugar. This disease happens for a number of reasons including modern lifestyle, lethargic attitude, unhealthy food consumption, family history, age, overweight, etc. The aim of this study was to propose a machine learning based prediction model that detected diabetes at the beginning.</p><p><strong>Methods</strong> In this work, we collected 520 patients records from the University of California, Irvine (UCI) machine learning repository of Sylhet Diabetes Hospital, Sylhet. Then, a similar questionnaire of that hospital was followed and assembled 558 patients records from all over Bangladesh through this questionnaire. However, we accumulated patient records of these two datasets. In the next step, these datasets were cleaned and applied thirty five state-of-arts classifiers such as logistic regression (LR), K nearest neighbors (KNN), support vector classifier (SVC), Nave Byes (NB), decision tree (DT), random forest (RF), stochastic gradient descent (SGD), Perceptron, AdaBoost, XGBoost, passive aggressive classifier (PAC), ridge classifier (RC), Nu-support vector classifier (Nu-SVC), linear support vector classifier (LSVC), calibrated classifier CV (CCCV), nearest centroid (NC), Gaussian process classifier (GPC), multinomial NB (MNB), complement NB, Bernoulli NB (BNB), categorical NB, Bagging, extra tree(ET), gradiant boosting classifier (GBC), Hist gradiant boosting classifier (HGBC), one vs rest classifier (OVsRC), multi-layer perceptron (MLP), label propagation (LP), label spreading (LS), stacking, ridge classifier CV (RCCV), logistic regression CV (LRCV), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and light gradient boosting machine (LGBM) to explore best stable predictive model. The performance of the classifiers has been measured using five metrics such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic. Finally, these outcomes were interpreted using Shapley additive explanations methods and identified relevant features for happening diabetes.</p><p><strong>Results</strong> In this work, different classifiers were shown their performance where ET outperformed any other classifiers with 97.11% accuracy for the Sylhet Diabetes Hospital dataset (SDHD) and MLP shows the best accuracy (96.42%) for the collected dataset. Subsequently, HGBC and LGBM provide the highest 94.90% accuracy for the combined datasets individually.</p><p><strong>Conclusion</strong> LGBM, stacking, HGBC, RF, ET, bagging, and GBC might represent more stable prediction results for each dataset.</p></div>\",\"PeriodicalId\":73400,\"journal\":{\"name\":\"Intelligent medicine\",\"volume\":\"4 1\",\"pages\":\"Pages 22-32\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2024-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2667102623000049/pdfft?md5=483dd192a0f387935882a26dc29741d8&pid=1-s2.0-S2667102623000049-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667102623000049\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667102623000049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

目标 糖尿病是一种严重的疾病,患者体内无法产生足够的胰岛素,从而导致血糖异常。导致这种疾病的原因有很多,包括现代生活方式、慵懒的态度、不健康的饮食、家族史、年龄、超重等。本研究的目的是提出一种基于机器学习的预测模型,以便在一开始就检测出糖尿病。 在这项工作中,我们从加州大学欧文分校(UCI)的机器学习库中收集了锡尔赫特糖尿病医院的 520 份患者记录。然后,我们按照该医院的类似问卷,通过该问卷收集了孟加拉国全国各地的 558 份患者记录。然而,我们积累了这两个数据集的患者记录。下一步,我们对这些数据集进行了清理,并应用了 35 种最先进的分类器,如逻辑回归(LR)、K 近邻(KNN)、支持向量分类器(SVC)、Nave Byes(NB)、决策树(DT)、随机森林(RF)、随机梯度下降(SGD)、Perceptron、AdaBoost、XGBoost、被动攻击分类器 (PAC)、脊分类器 (RC)、Nu-支持向量分类器 (Nu-SVC)、线性支持向量分类器 (LSVC)、校准分类器 CV (CCCV)、最近中心点 (NC)、高斯过程分类器 (GPC)、多项式 NB (MNB)、补码 NB、伯努利 NB (BNB)、分类 NB、袋式分类法、额外树分类法 (ET)、梯度提升分类器 (GBC)、组梯度提升分类器 (HGBC)、one vs rest 分类器 (OVsRC)、多层感知器 (MLP)、标签传播 (LP)、堆叠、脊分类器 CV (RCCV)、逻辑回归 CV (LRCV)、线性判别分析 (LDA)、二次判别分析 (QDA) 和光梯度提升机 (LGBM),以探索最佳稳定预测模型。这些分类器的性能是通过准确度、精确度、召回率、F1-分数和接收器工作特征下面积等五个指标来衡量的。最后,使用 Shapley 加性解释方法对这些结果进行了解释,并确定了发生糖尿病的相关特征。 结果 在这项工作中,不同的分类器显示了它们的性能,其中 ET 在西尔赫特糖尿病医院数据集(SDHD)上的准确率为 97.11%,优于其他任何分类器,而 MLP 在收集的数据集上显示了最佳准确率(96.42%)。结论 LGBM、stacking、HGBC、RF、ET、bagging 和 GBC 可为每个数据集提供更稳定的预测结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Clinically adaptable machine learning model to identify early appreciable features of diabetes

Objective Diabetes mellitus is a serious disease where the body of affected patients are failed to produce enough insulin that causes an abnormality of blood sugar. This disease happens for a number of reasons including modern lifestyle, lethargic attitude, unhealthy food consumption, family history, age, overweight, etc. The aim of this study was to propose a machine learning based prediction model that detected diabetes at the beginning.

Methods In this work, we collected 520 patients records from the University of California, Irvine (UCI) machine learning repository of Sylhet Diabetes Hospital, Sylhet. Then, a similar questionnaire of that hospital was followed and assembled 558 patients records from all over Bangladesh through this questionnaire. However, we accumulated patient records of these two datasets. In the next step, these datasets were cleaned and applied thirty five state-of-arts classifiers such as logistic regression (LR), K nearest neighbors (KNN), support vector classifier (SVC), Nave Byes (NB), decision tree (DT), random forest (RF), stochastic gradient descent (SGD), Perceptron, AdaBoost, XGBoost, passive aggressive classifier (PAC), ridge classifier (RC), Nu-support vector classifier (Nu-SVC), linear support vector classifier (LSVC), calibrated classifier CV (CCCV), nearest centroid (NC), Gaussian process classifier (GPC), multinomial NB (MNB), complement NB, Bernoulli NB (BNB), categorical NB, Bagging, extra tree(ET), gradiant boosting classifier (GBC), Hist gradiant boosting classifier (HGBC), one vs rest classifier (OVsRC), multi-layer perceptron (MLP), label propagation (LP), label spreading (LS), stacking, ridge classifier CV (RCCV), logistic regression CV (LRCV), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and light gradient boosting machine (LGBM) to explore best stable predictive model. The performance of the classifiers has been measured using five metrics such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic. Finally, these outcomes were interpreted using Shapley additive explanations methods and identified relevant features for happening diabetes.

Results In this work, different classifiers were shown their performance where ET outperformed any other classifiers with 97.11% accuracy for the Sylhet Diabetes Hospital dataset (SDHD) and MLP shows the best accuracy (96.42%) for the collected dataset. Subsequently, HGBC and LGBM provide the highest 94.90% accuracy for the combined datasets individually.

Conclusion LGBM, stacking, HGBC, RF, ET, bagging, and GBC might represent more stable prediction results for each dataset.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Intelligent medicine
Intelligent medicine Surgery, Radiology and Imaging, Artificial Intelligence, Biomedical Engineering
CiteScore
5.20
自引率
0.00%
发文量
19
期刊最新文献
Impact of data balancing a multiclass dataset before the creation of association rules to study bacterial vaginosis Neuropsychological detection and prediction using machine learning algorithms: a comprehensive review Improved neurological diagnoses and treatment strategies via automated human brain tissue segmentation from clinical magnetic resonance imaging Increasing the accuracy and reproducibility of positron emission tomography radiomics for predicting pelvic lymph node metastasis in patients with cervical cancer using 3D local binary pattern-based texture features A clinical decision support system using rough set theory and machine learning for disease prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1