Application of machine learning algorithm incorporating dietary intake in prediction of gestational diabetes mellitus.

IF 2.6 3区医学 Q3 ENDOCRINOLOGY & METABOLISM Endocrine Connections Pub Date : 2024-11-21 Print Date: 2024-12-01 DOI:10.1530/EC-24-0169

Tianze Ding, Peijie Liu, Jie Jia, Hui Wu, Jie Zhu, Kefeng Yang

{"title":"Application of machine learning algorithm incorporating dietary intake in prediction of gestational diabetes mellitus.","authors":"Tianze Ding, Peijie Liu, Jie Jia, Hui Wu, Jie Zhu, Kefeng Yang","doi":"10.1530/EC-24-0169","DOIUrl":null,"url":null,"abstract":"Introduction: Gestational diabetes mellitus (GDM) significantly affects pregnancy outcomes. Therefore, it is crucial to develop prediction models since they can guide timely interventions to reduce the incidence of GDM and its associated adverse effects.Methods: A total of 554 pregnant women were selected and their sociodemographic characteristics, clinical data and dietary data were collected. Dietary data were investigated by a validated semi-quantitative food frequency questionnaire (FFQ). We applied random forest mean decrease impurity for feature selection and the models are built using logistic regression, XGBoost, and LightGBM algorithms. The prediction performance of different models was compared by accuracy, sensitivity, specificity, area under curve (AUC) and Hosmer-Lemeshow test.Results: Blood glucose, age, pre-pregnancy body mass index (BMI), triglycerides and high-density lipoprotein cholesterol (HDL) were the top five features according to the feature selection. Among the three algorithms, XGBoost performed best with an AUC of 0.788, LightGBM came second (AUC = 0.749), and logistic regression performed the worst (AUC = 0.712). In addition, XGBoost and LightGBM both achieved a fairly good performance when dietary information was included, surpassing their performance on the non-dietary dataset (0.788 vs 0.718 in XGBoost; 0.749 vs 0.726 in LightGBM).Conclusion: XGBoost and LightGBM algorithms outperform logistic regression in predicting GDM among Chinese pregnant women. In addition, dietary data may have a positive effect on improving model performance, which deserves more in-depth investigation with larger sample size.","PeriodicalId":11634,"journal":{"name":"Endocrine Connections","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623027/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Endocrine Connections","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1530/EC-24-0169","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/1 0:00:00","PubModel":"Print","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Gestational diabetes mellitus (GDM) significantly affects pregnancy outcomes. Therefore, it is crucial to develop prediction models since they can guide timely interventions to reduce the incidence of GDM and its associated adverse effects.

Methods: A total of 554 pregnant women were selected and their sociodemographic characteristics, clinical data and dietary data were collected. Dietary data were investigated by a validated semi-quantitative food frequency questionnaire (FFQ). We applied random forest mean decrease impurity for feature selection and the models are built using logistic regression, XGBoost, and LightGBM algorithms. The prediction performance of different models was compared by accuracy, sensitivity, specificity, area under curve (AUC) and Hosmer-Lemeshow test.

Results: Blood glucose, age, pre-pregnancy body mass index (BMI), triglycerides and high-density lipoprotein cholesterol (HDL) were the top five features according to the feature selection. Among the three algorithms, XGBoost performed best with an AUC of 0.788, LightGBM came second (AUC = 0.749), and logistic regression performed the worst (AUC = 0.712). In addition, XGBoost and LightGBM both achieved a fairly good performance when dietary information was included, surpassing their performance on the non-dietary dataset (0.788 vs 0.718 in XGBoost; 0.749 vs 0.726 in LightGBM).

Conclusion: XGBoost and LightGBM algorithms outperform logistic regression in predicting GDM among Chinese pregnant women. In addition, dietary data may have a positive effect on improving model performance, which deserves more in-depth investigation with larger sample size.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

结合饮食摄入量的机器学习算法在预测妊娠糖尿病中的应用。

简介妊娠期糖尿病（GDM）会严重影响妊娠结局。因此，开发预测模型至关重要，因为这些模型可以指导及时干预，降低 GDM 的发病率及其相关不良影响：方法：共选取了 554 名孕妇，收集了她们的社会人口学特征、临床数据和饮食数据。膳食数据通过有效的半定量食物频率问卷（FFQ）进行调查。我们采用随机森林平均降低不纯度的方法进行特征选择，并使用逻辑回归、XGBoost 和 LightGBM 算法建立模型。通过准确性、灵敏度、特异性、曲线下面积（AUC）和 Hosmer-Lemeshow 检验比较了不同模型的预测性能：根据特征选择，血糖、年龄、孕前体重指数（BMI）、甘油三酯和高密度脂蛋白胆固醇（HDL）是排名前五的特征。在三种算法中，XGBoost 的 AUC 为 0.788，表现最佳；LightGBM 次之（AUC = 0.749）；Logistic 回归表现最差（AUC = 0.712）。此外，当包含饮食信息时，XGBoost 和 LightGBM 都取得了相当好的性能，超过了它们在非饮食数据集上的性能（XGBoost 为 0.788 vs. 0.718；LightGBM 为 0.749 vs. 0.726）：结论：XGBoost 和 LightGBM 算法在预测中国孕妇 GDM 方面优于 Logistic 回归。此外，膳食数据可能对提高模型性能有积极作用，这值得在样本量更大的情况下进行更深入的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Endocrine Connections Medicine-Internal Medicine

CiteScore

5.00

自引率

3.40%

发文量

361

审稿时长

6 weeks

期刊介绍： Endocrine Connections publishes original quality research and reviews in all areas of endocrinology, including papers that deal with non-classical tissues as source or targets of hormones and endocrine papers that have relevance to endocrine-related and intersecting disciplines and the wider biomedical community.