Development and validation of a new diagnostic prediction model for NAFLD based on machine learning algorithms in NHANES 2017-2020.3.

IF 2.4 4区 医学 Q3 ENDOCRINOLOGY & METABOLISM Hormones-International Journal of Endocrinology and Metabolism Pub Date : 2025-02-13 DOI:10.1007/s42000-025-00634-6
Yazhi Wang, Peng Wang
{"title":"Development and validation of a new diagnostic prediction model for NAFLD based on machine learning algorithms in NHANES 2017-2020.3.","authors":"Yazhi Wang, Peng Wang","doi":"10.1007/s42000-025-00634-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>Nonalcoholic fatty liver disease (NAFLD) is a multisystem disease that can trigger the metabolic syndrome. Early prevention and treatment of NAFLD is still a huge challenge for patients and clinicians. The aim of this study was to develop and validate machine learning (ML)-based predictive models. The model with optimal performance would be developed as a set of simple arithmetic tools for predicting the risk of NAFLD individually.</p><p><strong>Methods: </strong>Statistical analyses were performed in 2428 individuals extracted from the National Health and Nutrition Examination Survey (NHANES, cycle 2017-2020.3) database. Feature variables were selected by the least absolute shrinkage and selection operator (LASSO) regression. Seven ML algorithms, including logistic regression (LR), decision tree (DT), random forest (RF), extreme gradient boosting (XGB), K-nearest neighbor (KNN), light gradient boosting machine (LightGBM), and multilayer perceptron (MLP), were used to construct models based on the feature variables and evaluate their performance. The model with the best performance was transformed into a diagnostic predictive nomogram (DPN). The DPN was developed into an online calculator and an Excel algorithm tool. Receiver operating characteristic (ROC) curve, decision curve analysis (DCA), and subgroup analyses were used to compare and assess the predictive abilities of the DPN and six existing NAFLD predictive models, including the ZJU index, the hepatic steatosis index (HSI), the triglyceride-glucose index (TyG), the Framingham steatosis index (FSI), the fatty liver index (FLI), and the visceral adiposity index (VAI).</p><p><strong>Results: </strong>Among the 2428 participants, the prevalence of NAFLD was 47.45%. LASSO regression identified eight variables from 39 variables, including body mass index (BMI), waist circumference (WC), alanine aminotransferase (ALT), triglyceride (TG), diabetes, hypertension, uric acid (UA), and race. Among the models constructed by the seven algorithms mentioned above, the LR-based model performed the best, demonstrating outstanding performance in terms of area under the curve (AUC, 0.823), accuracy (0.754), precision (0.768), specificity (0.804), and positive predictive value (0.768). It was then transformed into the DPN, which was successfully developed as an online calculator and an Excel algorithm tool. The diagnostic accuracy (AUC 0.856, 95% confidence interval (CI) 0.839-0.874, and AUC 0.823, 95% CI 0.793-0.854, respectively) and net clinical benefit of DPN in the training and validation sets were superior to those of the ZJU, HSI, TyG, FSI, FLI, and VAI. The results were maintained in subgroup analyses.</p><p><strong>Conclusions: </strong>The LR model based on ML was developed, exhibiting good performance. DPN can be used as an individualized tool for rapid detection of NAFLD.</p>","PeriodicalId":50399,"journal":{"name":"Hormones-International Journal of Endocrinology and Metabolism","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hormones-International Journal of Endocrinology and Metabolism","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s42000-025-00634-6","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

Abstract

Aims: Nonalcoholic fatty liver disease (NAFLD) is a multisystem disease that can trigger the metabolic syndrome. Early prevention and treatment of NAFLD is still a huge challenge for patients and clinicians. The aim of this study was to develop and validate machine learning (ML)-based predictive models. The model with optimal performance would be developed as a set of simple arithmetic tools for predicting the risk of NAFLD individually.

Methods: Statistical analyses were performed in 2428 individuals extracted from the National Health and Nutrition Examination Survey (NHANES, cycle 2017-2020.3) database. Feature variables were selected by the least absolute shrinkage and selection operator (LASSO) regression. Seven ML algorithms, including logistic regression (LR), decision tree (DT), random forest (RF), extreme gradient boosting (XGB), K-nearest neighbor (KNN), light gradient boosting machine (LightGBM), and multilayer perceptron (MLP), were used to construct models based on the feature variables and evaluate their performance. The model with the best performance was transformed into a diagnostic predictive nomogram (DPN). The DPN was developed into an online calculator and an Excel algorithm tool. Receiver operating characteristic (ROC) curve, decision curve analysis (DCA), and subgroup analyses were used to compare and assess the predictive abilities of the DPN and six existing NAFLD predictive models, including the ZJU index, the hepatic steatosis index (HSI), the triglyceride-glucose index (TyG), the Framingham steatosis index (FSI), the fatty liver index (FLI), and the visceral adiposity index (VAI).

Results: Among the 2428 participants, the prevalence of NAFLD was 47.45%. LASSO regression identified eight variables from 39 variables, including body mass index (BMI), waist circumference (WC), alanine aminotransferase (ALT), triglyceride (TG), diabetes, hypertension, uric acid (UA), and race. Among the models constructed by the seven algorithms mentioned above, the LR-based model performed the best, demonstrating outstanding performance in terms of area under the curve (AUC, 0.823), accuracy (0.754), precision (0.768), specificity (0.804), and positive predictive value (0.768). It was then transformed into the DPN, which was successfully developed as an online calculator and an Excel algorithm tool. The diagnostic accuracy (AUC 0.856, 95% confidence interval (CI) 0.839-0.874, and AUC 0.823, 95% CI 0.793-0.854, respectively) and net clinical benefit of DPN in the training and validation sets were superior to those of the ZJU, HSI, TyG, FSI, FLI, and VAI. The results were maintained in subgroup analyses.

Conclusions: The LR model based on ML was developed, exhibiting good performance. DPN can be used as an individualized tool for rapid detection of NAFLD.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
5.90
自引率
0.00%
发文量
76
审稿时长
6-12 weeks
期刊介绍: Hormones-International Journal of Endocrinology and Metabolism is an international journal published quarterly with an international editorial board aiming at providing a forum covering all fields of endocrinology and metabolic disorders such as disruption of glucose homeostasis (diabetes mellitus), impaired homeostasis of plasma lipids (dyslipidemia), the disorder of bone metabolism (osteoporosis), disturbances of endocrine function and reproductive capacity of women and men. Hormones-International Journal of Endocrinology and Metabolism particularly encourages clinical, translational and basic science submissions in the areas of endocrine cancers, nutrition, obesity and metabolic disorders, quality of life of endocrine diseases, epidemiology of endocrine and metabolic disorders.
期刊最新文献
Development and validation of a new diagnostic prediction model for NAFLD based on machine learning algorithms in NHANES 2017-2020.3. Effects of aqueous extract from Cyathula prostrata (Linn.) Blume (Amaranthaceae) on puberty onset and some reproductive parameters in immature female Wistar rats. Efficacy and safety of SGLT2 inhibitors in the treatment of maturity-onset diabetes of the young (MODY): a case report and literature review. Hereditary disorders of vitamin-D metabolism and its receptor. Impact of CB1 receptor antagonism on skeletal muscle hypertrophy and metabolic health: a systematic review of preclinical studies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1