Prediction model for type 2 diabetes mellitus and its association with mortality using machine learning in three independent cohorts from South Korea, Japan, and the UK: a model development and validation study.

IF 10 1区 医学 Q1 MEDICINE, GENERAL & INTERNAL EClinicalMedicine Pub Date : 2025-01-18 eCollection Date: 2025-02-01 DOI:10.1016/j.eclinm.2025.103069
Hayeon Lee, Seung Ha Hwang, Seoyoung Park, Yunjeong Choi, Sooji Lee, Jaeyu Park, Yejun Son, Hyeon Jin Kim, Soeun Kim, Jiyeon Oh, Lee Smith, Damiano Pizzol, Sang Youl Rhee, Hyunji Sang, Jinseok Lee, Dong Keon Yon
{"title":"Prediction model for type 2 diabetes mellitus and its association with mortality using machine learning in three independent cohorts from South Korea, Japan, and the UK: a model development and validation study.","authors":"Hayeon Lee, Seung Ha Hwang, Seoyoung Park, Yunjeong Choi, Sooji Lee, Jaeyu Park, Yejun Son, Hyeon Jin Kim, Soeun Kim, Jiyeon Oh, Lee Smith, Damiano Pizzol, Sang Youl Rhee, Hyunji Sang, Jinseok Lee, Dong Keon Yon","doi":"10.1016/j.eclinm.2025.103069","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Type 2 diabetes mellitus (T2DM) is a significant global public health concern that has steadily increased over the past few decades. Thus, this study aimed to predict the incidence of T2DM within 5 years and the risk of mortality following the onset of T2DM. Data from three independent cohorts worldwide were used.</p><p><strong>Methods: </strong>We utilized data from three independent, large-scale, general population-based, and worldwide cohort studies. The Korean cohort (NHIS-NSC cohort; discovery cohort; n = 973,303), conducted between 1 January, 2002 and 31 December, 2013, was used for training and internal validation, whereas the Japanese cohort (JMDC cohort; validation cohort A; n = 12,143,715) and UK cohort (UK Biobank; validation cohort B; n = 416,656) were used for external validation. We employed various machine learning (ML)-based models, using 18 features, to predict the incidence of T2DM within five years of regular health checkups and calculated the Shapley Additive Explanation (SHAP) values. To ensure the robustness of our ML-based prediction model, we investigated the potential association between the model probability divided into tertiles and the risk of mortality following the onset of T2DM.</p><p><strong>Findings: </strong>In the discovery cohort, the ensemble model using voting with logistic regression and adaptive boosting achieved a balanced accuracy of 72.6% and an area under the receiver operating characteristics curve (AUROC) of 0.792. The SHAP value analysis of our proposed model revealed that age was the most important predictor of incident T2DM, followed by fasting blood glucose, hemoglobin, γ-glutamyl transferase level, and body mass index. The model probability is associated with an increased risk of mortality (T1: adjusted hazard ratio, 2.82 [95% CI, 2.01-3.94]; T2: 3.89 [2.74-5.53]; and T3: 7.73 [5.37-11.12]). Similar patterns and trends were observed in the validation cohorts (T1: 1.74 [1.49-2.03], T2: 1.97 [1.69-2.30], and T3: 3.31 [2.82-3.38] in validation cohort A; T1: 1.33 [1.03-1.71], T2: 1.54 [1.21-1.96], and T3: 1.73 [1.36-2.20] in validation cohort B).</p><p><strong>Interpretation: </strong>This study derived and validated an ML-based model to predict the incidence of T2DM within 5 years across three countries (South Korea, Japan, and the UK), showing that the model probability is associated with an increased risk of mortality.</p><p><strong>Funding: </strong>Institute of Information & Communications Technology Planning & Evaluation, South Korea.</p>","PeriodicalId":11393,"journal":{"name":"EClinicalMedicine","volume":"80 ","pages":"103069"},"PeriodicalIF":10.0000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11787438/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EClinicalMedicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.eclinm.2025.103069","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Type 2 diabetes mellitus (T2DM) is a significant global public health concern that has steadily increased over the past few decades. Thus, this study aimed to predict the incidence of T2DM within 5 years and the risk of mortality following the onset of T2DM. Data from three independent cohorts worldwide were used.

Methods: We utilized data from three independent, large-scale, general population-based, and worldwide cohort studies. The Korean cohort (NHIS-NSC cohort; discovery cohort; n = 973,303), conducted between 1 January, 2002 and 31 December, 2013, was used for training and internal validation, whereas the Japanese cohort (JMDC cohort; validation cohort A; n = 12,143,715) and UK cohort (UK Biobank; validation cohort B; n = 416,656) were used for external validation. We employed various machine learning (ML)-based models, using 18 features, to predict the incidence of T2DM within five years of regular health checkups and calculated the Shapley Additive Explanation (SHAP) values. To ensure the robustness of our ML-based prediction model, we investigated the potential association between the model probability divided into tertiles and the risk of mortality following the onset of T2DM.

Findings: In the discovery cohort, the ensemble model using voting with logistic regression and adaptive boosting achieved a balanced accuracy of 72.6% and an area under the receiver operating characteristics curve (AUROC) of 0.792. The SHAP value analysis of our proposed model revealed that age was the most important predictor of incident T2DM, followed by fasting blood glucose, hemoglobin, γ-glutamyl transferase level, and body mass index. The model probability is associated with an increased risk of mortality (T1: adjusted hazard ratio, 2.82 [95% CI, 2.01-3.94]; T2: 3.89 [2.74-5.53]; and T3: 7.73 [5.37-11.12]). Similar patterns and trends were observed in the validation cohorts (T1: 1.74 [1.49-2.03], T2: 1.97 [1.69-2.30], and T3: 3.31 [2.82-3.38] in validation cohort A; T1: 1.33 [1.03-1.71], T2: 1.54 [1.21-1.96], and T3: 1.73 [1.36-2.20] in validation cohort B).

Interpretation: This study derived and validated an ML-based model to predict the incidence of T2DM within 5 years across three countries (South Korea, Japan, and the UK), showing that the model probability is associated with an increased risk of mortality.

Funding: Institute of Information & Communications Technology Planning & Evaluation, South Korea.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在韩国、日本和英国的三个独立队列中使用机器学习预测2型糖尿病及其与死亡率关联的模型:一项模型开发和验证研究
背景:2型糖尿病(T2DM)是一个重要的全球公共卫生问题,在过去的几十年里稳步增加。因此,本研究旨在预测5年内T2DM的发病率和T2DM发病后的死亡风险。数据来自全球三个独立的队列。方法:我们利用了三个独立的、大规模的、基于一般人群的全球队列研究的数据。韩国队列(NHIS-NSC队列;发现队列;n = 973,303),在2002年1月1日至2013年12月31日期间进行,用于培训和内部验证,而日本队列(JMDC队列;验证队列A;n = 12,143,715)和英国队列(UK Biobank;验证队列B;N = 416,656)进行外部验证。我们采用各种基于机器学习(ML)的模型,使用18个特征,预测五年内定期健康检查的T2DM发病率,并计算Shapley加性解释(SHAP)值。为了确保我们基于ml的预测模型的稳健性,我们研究了模型概率与T2DM发病后死亡风险之间的潜在关联。结果:在发现队列中,使用逻辑回归和自适应增强的投票集成模型实现了72.6%的平衡精度和接受者工作特征曲线下面积(AUROC)为0.792。该模型的SHAP值分析显示,年龄是T2DM发病最重要的预测因子,其次是空腹血糖、血红蛋白、γ-谷氨酰转移酶水平和体重指数。模型概率与死亡风险增加相关(T1:校正风险比,2.82 [95% CI, 2.01-3.94];T2: 3.89 [2.74-5.53];和T3: 7.73[5.37-11.12])。验证队列A的T1: 1.74 [1.49-2.03], T2: 1.97 [1.69-2.30], T3: 3.31 [2.82-3.38];验证队列B中T1: 1.33 [1.03-1.71], T2: 1.54 [1.21-1.96], T3: 1.73[1.36-2.20]。解释:本研究推导并验证了一个基于ml的模型,用于预测三个国家(韩国、日本和英国)5年内T2DM的发病率,表明模型概率与死亡风险增加有关。资助:韩国信息与通信技术规划与评估研究所。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
EClinicalMedicine
EClinicalMedicine Medicine-Medicine (all)
CiteScore
18.90
自引率
1.30%
发文量
506
审稿时长
22 days
期刊介绍: eClinicalMedicine is a gold open-access clinical journal designed to support frontline health professionals in addressing the complex and rapid health transitions affecting societies globally. The journal aims to assist practitioners in overcoming healthcare challenges across diverse communities, spanning diagnosis, treatment, prevention, and health promotion. Integrating disciplines from various specialties and life stages, it seeks to enhance health systems as fundamental institutions within societies. With a forward-thinking approach, eClinicalMedicine aims to redefine the future of healthcare.
期刊最新文献
High-end intestinal ultrasound versus mid-end systems benchmarked against tandem ileocolonoscopy in inflammatory bowel disease (HUMID): a paired prospective, validating confirmatory study. A histo-clinical score to predict evolution to radioactive iodine-refractory of the follicular cell-derived thyroid carcinoma (PREDIRAIR): a single-centre, prospective, cohort study. Impact of cisplatin dose, renal function, and other factors on audiometrically-assessed ototoxicity in more than 1400 adult-onset cancer survivors from The Platinum Study: a multicentre cohort study. Locally advanced rectal cancer: what patients and practitioners should know about non-operative management. Daratumumab in patients with immune thrombocytopenia: a single-center, open-label, phase 2 trial.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1