A Comparative Analysis of Machine Learning Models for the Detection of Undiagnosed Diabetes Patients

IF 2.4 Q3 ENDOCRINOLOGY & METABOLISM Diabetology Pub Date : 2024-01-03 DOI:10.3390/diabetology5010001
S. Cichosz, Clara Bender, Ole Hejlesen
{"title":"A Comparative Analysis of Machine Learning Models for the Detection of Undiagnosed Diabetes Patients","authors":"S. Cichosz, Clara Bender, Ole Hejlesen","doi":"10.3390/diabetology5010001","DOIUrl":null,"url":null,"abstract":"Introduction: Early detection of type 2 diabetes is essential for preventing long-term complications. However, screening the entire population for diabetes is not cost-effective, so identifying individuals at high risk for this disease is crucial. The aim of this study was to compare the performance of five diverse machine learning (ML) models in classifying undiagnosed diabetes using large heterogeneous datasets. Methods: We used machine learning data from several years of the National Health and Nutrition Examination Survey (NHANES) from 2005 to 2018 to identify people with undiagnosed diabetes. The dataset included 45,431 participants, and biochemical confirmation of glucose control (HbA1c) were used to identify undiagnosed diabetes. The predictors were based on simple and clinically obtainable variables, which could be feasible for prescreening for diabetes. We included five ML models for comparison: random forest, AdaBoost, RUSBoost, LogitBoost, and a neural network. Results: The prevalence of undiagnosed diabetes was 4%. For the classification of undiagnosed diabetes, the area under the ROC curve (AUC) values were between 0.776 and 0.806. The positive predictive values (PPVs) were between 0.083 and 0.091, the negative predictive values (NPVs) were between 0.984 and 0.99, and the sensitivities were between 0.742 and 0.871. Conclusion: We have demonstrated that several types of classification models can accurately classify undiagnosed diabetes from simple and clinically obtainable variables. These results suggest that the use of machine learning for prescreening for undiagnosed diabetes could be a useful tool in clinical practice.","PeriodicalId":72798,"journal":{"name":"Diabetology","volume":"24 21","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diabetology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/diabetology5010001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Early detection of type 2 diabetes is essential for preventing long-term complications. However, screening the entire population for diabetes is not cost-effective, so identifying individuals at high risk for this disease is crucial. The aim of this study was to compare the performance of five diverse machine learning (ML) models in classifying undiagnosed diabetes using large heterogeneous datasets. Methods: We used machine learning data from several years of the National Health and Nutrition Examination Survey (NHANES) from 2005 to 2018 to identify people with undiagnosed diabetes. The dataset included 45,431 participants, and biochemical confirmation of glucose control (HbA1c) were used to identify undiagnosed diabetes. The predictors were based on simple and clinically obtainable variables, which could be feasible for prescreening for diabetes. We included five ML models for comparison: random forest, AdaBoost, RUSBoost, LogitBoost, and a neural network. Results: The prevalence of undiagnosed diabetes was 4%. For the classification of undiagnosed diabetes, the area under the ROC curve (AUC) values were between 0.776 and 0.806. The positive predictive values (PPVs) were between 0.083 and 0.091, the negative predictive values (NPVs) were between 0.984 and 0.99, and the sensitivities were between 0.742 and 0.871. Conclusion: We have demonstrated that several types of classification models can accurately classify undiagnosed diabetes from simple and clinically obtainable variables. These results suggest that the use of machine learning for prescreening for undiagnosed diabetes could be a useful tool in clinical practice.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于检测未确诊糖尿病患者的机器学习模型比较分析
简介早期发现 2 型糖尿病对预防长期并发症至关重要。然而,对所有人群进行糖尿病筛查并不划算,因此识别高危人群至关重要。本研究旨在比较五种不同的机器学习(ML)模型在使用大型异构数据集对未确诊糖尿病进行分类时的性能。研究方法我们使用从 2005 年到 2018 年连续几年的美国国家健康与营养调查(NHANES)的机器学习数据来识别未确诊的糖尿病患者。数据集包括 45,431 名参与者,使用葡萄糖控制的生化确认(HbA1c)来识别未确诊的糖尿病患者。预测因子基于简单且临床上可获得的变量,可用于糖尿病的预筛查。我们将随机森林、AdaBoost、RUSBoost、LogitBoost 和神经网络等五种 ML 模型进行了比较。结果未确诊糖尿病的发病率为 4%。在对未确诊糖尿病进行分类时,ROC 曲线下面积(AUC)值介于 0.776 和 0.806 之间。阳性预测值(PPV)介于 0.083 和 0.091 之间,阴性预测值(NPV)介于 0.984 和 0.99 之间,灵敏度介于 0.742 和 0.871 之间。结论我们已经证明,几种类型的分类模型可以从简单的、临床上可获得的变量中准确地对未确诊的糖尿病进行分类。这些结果表明,在临床实践中,使用机器学习预检未确诊糖尿病可能是一种有用的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.50
自引率
0.00%
发文量
0
期刊最新文献
Outcomes for Patients with Diabetic Foot Ulcers Following Transition from Medicaid to Commercial Insurance. Does the Efficacy of Semaglutide Treatment Differ between Low-Risk and High-Risk Subgroups of Patients with Type 2 Diabetes and Obesity Based on SCORE2, SCORE2-Diabetes, and ASCVD Calculations? Diet Supplementation with Rosemary (Rosmarinus officinalis L.) Leaf Powder Exhibits an Antidiabetic Property in Streptozotocin-Induced Diabetic Male Wistar Rats A Comparative Analysis of Machine Learning Models for the Detection of Undiagnosed Diabetes Patients Comorbidity of Type 2 Diabetes and Dementia among Hospitalized Patients in Los Angeles County: Hospitalization Outcomes and Costs, 2019–2021
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1