{"title":"Development and validation of a new diagnostic prediction model for NAFLD based on machine learning algorithms in NHANES 2017-2020.3.","authors":"Yazhi Wang, Peng Wang","doi":"10.1007/s42000-025-00634-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>Nonalcoholic fatty liver disease (NAFLD) is a multisystem disease that can trigger the metabolic syndrome. Early prevention and treatment of NAFLD is still a huge challenge for patients and clinicians. The aim of this study was to develop and validate machine learning (ML)-based predictive models. The model with optimal performance would be developed as a set of simple arithmetic tools for predicting the risk of NAFLD individually.</p><p><strong>Methods: </strong>Statistical analyses were performed in 2428 individuals extracted from the National Health and Nutrition Examination Survey (NHANES, cycle 2017-2020.3) database. Feature variables were selected by the least absolute shrinkage and selection operator (LASSO) regression. Seven ML algorithms, including logistic regression (LR), decision tree (DT), random forest (RF), extreme gradient boosting (XGB), K-nearest neighbor (KNN), light gradient boosting machine (LightGBM), and multilayer perceptron (MLP), were used to construct models based on the feature variables and evaluate their performance. The model with the best performance was transformed into a diagnostic predictive nomogram (DPN). The DPN was developed into an online calculator and an Excel algorithm tool. Receiver operating characteristic (ROC) curve, decision curve analysis (DCA), and subgroup analyses were used to compare and assess the predictive abilities of the DPN and six existing NAFLD predictive models, including the ZJU index, the hepatic steatosis index (HSI), the triglyceride-glucose index (TyG), the Framingham steatosis index (FSI), the fatty liver index (FLI), and the visceral adiposity index (VAI).</p><p><strong>Results: </strong>Among the 2428 participants, the prevalence of NAFLD was 47.45%. LASSO regression identified eight variables from 39 variables, including body mass index (BMI), waist circumference (WC), alanine aminotransferase (ALT), triglyceride (TG), diabetes, hypertension, uric acid (UA), and race. Among the models constructed by the seven algorithms mentioned above, the LR-based model performed the best, demonstrating outstanding performance in terms of area under the curve (AUC, 0.823), accuracy (0.754), precision (0.768), specificity (0.804), and positive predictive value (0.768). It was then transformed into the DPN, which was successfully developed as an online calculator and an Excel algorithm tool. The diagnostic accuracy (AUC 0.856, 95% confidence interval (CI) 0.839-0.874, and AUC 0.823, 95% CI 0.793-0.854, respectively) and net clinical benefit of DPN in the training and validation sets were superior to those of the ZJU, HSI, TyG, FSI, FLI, and VAI. The results were maintained in subgroup analyses.</p><p><strong>Conclusions: </strong>The LR model based on ML was developed, exhibiting good performance. DPN can be used as an individualized tool for rapid detection of NAFLD.</p>","PeriodicalId":50399,"journal":{"name":"Hormones-International Journal of Endocrinology and Metabolism","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hormones-International Journal of Endocrinology and Metabolism","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s42000-025-00634-6","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
Abstract
Aims: Nonalcoholic fatty liver disease (NAFLD) is a multisystem disease that can trigger the metabolic syndrome. Early prevention and treatment of NAFLD is still a huge challenge for patients and clinicians. The aim of this study was to develop and validate machine learning (ML)-based predictive models. The model with optimal performance would be developed as a set of simple arithmetic tools for predicting the risk of NAFLD individually.
Methods: Statistical analyses were performed in 2428 individuals extracted from the National Health and Nutrition Examination Survey (NHANES, cycle 2017-2020.3) database. Feature variables were selected by the least absolute shrinkage and selection operator (LASSO) regression. Seven ML algorithms, including logistic regression (LR), decision tree (DT), random forest (RF), extreme gradient boosting (XGB), K-nearest neighbor (KNN), light gradient boosting machine (LightGBM), and multilayer perceptron (MLP), were used to construct models based on the feature variables and evaluate their performance. The model with the best performance was transformed into a diagnostic predictive nomogram (DPN). The DPN was developed into an online calculator and an Excel algorithm tool. Receiver operating characteristic (ROC) curve, decision curve analysis (DCA), and subgroup analyses were used to compare and assess the predictive abilities of the DPN and six existing NAFLD predictive models, including the ZJU index, the hepatic steatosis index (HSI), the triglyceride-glucose index (TyG), the Framingham steatosis index (FSI), the fatty liver index (FLI), and the visceral adiposity index (VAI).
Results: Among the 2428 participants, the prevalence of NAFLD was 47.45%. LASSO regression identified eight variables from 39 variables, including body mass index (BMI), waist circumference (WC), alanine aminotransferase (ALT), triglyceride (TG), diabetes, hypertension, uric acid (UA), and race. Among the models constructed by the seven algorithms mentioned above, the LR-based model performed the best, demonstrating outstanding performance in terms of area under the curve (AUC, 0.823), accuracy (0.754), precision (0.768), specificity (0.804), and positive predictive value (0.768). It was then transformed into the DPN, which was successfully developed as an online calculator and an Excel algorithm tool. The diagnostic accuracy (AUC 0.856, 95% confidence interval (CI) 0.839-0.874, and AUC 0.823, 95% CI 0.793-0.854, respectively) and net clinical benefit of DPN in the training and validation sets were superior to those of the ZJU, HSI, TyG, FSI, FLI, and VAI. The results were maintained in subgroup analyses.
Conclusions: The LR model based on ML was developed, exhibiting good performance. DPN can be used as an individualized tool for rapid detection of NAFLD.
期刊介绍:
Hormones-International Journal of Endocrinology and Metabolism is an international journal published quarterly with an international editorial board aiming at providing a forum covering all fields of endocrinology and metabolic disorders such as disruption of glucose homeostasis (diabetes mellitus), impaired homeostasis of plasma lipids (dyslipidemia), the disorder of bone metabolism (osteoporosis), disturbances of endocrine function and reproductive capacity of women and men.
Hormones-International Journal of Endocrinology and Metabolism particularly encourages clinical, translational and basic science submissions in the areas of endocrine cancers, nutrition, obesity and metabolic disorders, quality of life of endocrine diseases, epidemiology of endocrine and metabolic disorders.