Tianze Ding, Peijie Liu, Jie Jia, Hui Wu, Jie Zhu, Kefeng Yang
{"title":"Application of machine learning algorithm incorporating dietary intake in prediction of gestational diabetes mellitus.","authors":"Tianze Ding, Peijie Liu, Jie Jia, Hui Wu, Jie Zhu, Kefeng Yang","doi":"10.1530/EC-24-0169","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Gestational diabetes mellitus (GDM) significantly affects pregnancy outcomes. Therefore, it is crucial to develop prediction models since they can guide timely interventions to reduce the incidence of GDM and its associated adverse effects.</p><p><strong>Methods: </strong>A total of 554 pregnant women were selected and their sociodemographic characteristics, clinical data and dietary data were collected. Dietary data were investigated by a validated semi-quantitative food frequency questionnaire (FFQ). We applied random forest mean decrease impurity for feature selection and the models are built using logistic regression, XGBoost, and LightGBM algorithms. The prediction performance of different models was compared by accuracy, sensitivity, specificity, area under curve (AUC) and Hosmer-Lemeshow test.</p><p><strong>Results: </strong>Blood glucose, age, pre-pregnancy body mass index (BMI), triglycerides and high-density lipoprotein cholesterol (HDL) were the top five features according to the feature selection. Among the three algorithms, XGBoost performed best with an AUC of 0.788, LightGBM came second (AUC = 0.749), and logistic regression performed the worst (AUC = 0.712). In addition, XGBoost and LightGBM both achieved a fairly good performance when dietary information was included, surpassing their performance on the non-dietary dataset (0.788 vs 0.718 in XGBoost; 0.749 vs 0.726 in LightGBM).</p><p><strong>Conclusion: </strong>XGBoost and LightGBM algorithms outperform logistic regression in predicting GDM among Chinese pregnant women. In addition, dietary data may have a positive effect on improving model performance, which deserves more in-depth investigation with larger sample size.</p>","PeriodicalId":11634,"journal":{"name":"Endocrine Connections","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Endocrine Connections","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1530/EC-24-0169","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/1 0:00:00","PubModel":"Print","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Gestational diabetes mellitus (GDM) significantly affects pregnancy outcomes. Therefore, it is crucial to develop prediction models since they can guide timely interventions to reduce the incidence of GDM and its associated adverse effects.
Methods: A total of 554 pregnant women were selected and their sociodemographic characteristics, clinical data and dietary data were collected. Dietary data were investigated by a validated semi-quantitative food frequency questionnaire (FFQ). We applied random forest mean decrease impurity for feature selection and the models are built using logistic regression, XGBoost, and LightGBM algorithms. The prediction performance of different models was compared by accuracy, sensitivity, specificity, area under curve (AUC) and Hosmer-Lemeshow test.
Results: Blood glucose, age, pre-pregnancy body mass index (BMI), triglycerides and high-density lipoprotein cholesterol (HDL) were the top five features according to the feature selection. Among the three algorithms, XGBoost performed best with an AUC of 0.788, LightGBM came second (AUC = 0.749), and logistic regression performed the worst (AUC = 0.712). In addition, XGBoost and LightGBM both achieved a fairly good performance when dietary information was included, surpassing their performance on the non-dietary dataset (0.788 vs 0.718 in XGBoost; 0.749 vs 0.726 in LightGBM).
Conclusion: XGBoost and LightGBM algorithms outperform logistic regression in predicting GDM among Chinese pregnant women. In addition, dietary data may have a positive effect on improving model performance, which deserves more in-depth investigation with larger sample size.
期刊介绍:
Endocrine Connections publishes original quality research and reviews in all areas of endocrinology, including papers that deal with non-classical tissues as source or targets of hormones and endocrine papers that have relevance to endocrine-related and intersecting disciplines and the wider biomedical community.