通过 NHANES 2017-2020 的受控衰减参数评估非酒精性脂肪肝的自动机器学习模型。

IF 3.3 3区医学 Q2 HEALTH CARE SCIENCES & SERVICES DIGITAL HEALTH Pub Date : 2024-08-07 eCollection Date: 2024-01-01 DOI:10.1177/20552076241272535

Lihe Liu, Jiaxi Lin, Lu Liu, Jingwen Gao, Guoting Xu, Minyue Yin, Xiaolin Liu, Airong Wu, Jinzhou Zhu

{"title":"通过 NHANES 2017-2020 的受控衰减参数评估非酒精性脂肪肝的自动机器学习模型。","authors":"Lihe Liu, Jiaxi Lin, Lu Liu, Jingwen Gao, Guoting Xu, Minyue Yin, Xiaolin Liu, Airong Wu, Jinzhou Zhu","doi":"10.1177/20552076241272535","DOIUrl":null,"url":null,"abstract":"Background: Nonalcoholic fatty liver disease (NAFLD) is recognized as one of the most common chronic liver diseases worldwide. This study aims to assess the efficacy of automated machine learning (AutoML) in the identification of NAFLD using a population-based cross-sectional database.Methods: All data, including laboratory examinations, anthropometric measurements, and demographic variables, were obtained from the National Health and Nutrition Examination Survey (NHANES). NAFLD was defined by controlled attenuation parameter (CAP) in liver transient ultrasound elastography. The least absolute shrinkage and selection operator (LASSO) regression analysis was employed for feature selection. Six algorithms were utilized on the H2O-automated machine learning platform: Gradient Boosting Machine (GBM), Distributed Random Forest (DRF), Extremely Randomized Trees (XRT), Generalized Linear Model (GLM), eXtreme Gradient Boosting (XGBoost), and Deep Learning (DL). These algorithms were selected for their diverse strengths, including their ability to handle complex, non-linear relationships, provide high predictive accuracy, and ensure interpretability. The models were evaluated by area under receiver operating characteristic curves (AUC) and interpreted by the calibration curve, the decision curve analysis, variable importance plot, SHapley Additive exPlanation plot, partial dependence plots, and local interpretable model agnostic explanation plot.Results: A total of 4177 participants (non-NAFLD 3167 vs NAFLD 1010) were included to develop and validate the AutoML models. The model developed by XGBoost performed better than other models in AutoML, achieving an AUC of 0.859, an accuracy of 0.795, a sensitivity of 0.773, and a specificity of 0.802 on the validation set.Conclusions: We developed an XGBoost model to better evaluate the presence of NAFLD. Based on the XGBoost model, we created an R Shiny web-based application named Shiny NAFLD (http://39.101.122.171:3838/App2/). This application demonstrates the potential of AutoML in clinical research and practice, offering a promising tool for the real-world identification of NAFLD.","PeriodicalId":51333,"journal":{"name":"DIGITAL HEALTH","volume":"10 ","pages":"20552076241272535"},"PeriodicalIF":3.3000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11307367/pdf/","citationCount":"0","resultStr":"{\"title\":\"Automated machine learning models for nonalcoholic fatty liver disease assessed by controlled attenuation parameter from the NHANES 2017-2020.\",\"authors\":\"Lihe Liu, Jiaxi Lin, Lu Liu, Jingwen Gao, Guoting Xu, Minyue Yin, Xiaolin Liu, Airong Wu, Jinzhou Zhu\",\"doi\":\"10.1177/20552076241272535\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Nonalcoholic fatty liver disease (NAFLD) is recognized as one of the most common chronic liver diseases worldwide. This study aims to assess the efficacy of automated machine learning (AutoML) in the identification of NAFLD using a population-based cross-sectional database.Methods: All data, including laboratory examinations, anthropometric measurements, and demographic variables, were obtained from the National Health and Nutrition Examination Survey (NHANES). NAFLD was defined by controlled attenuation parameter (CAP) in liver transient ultrasound elastography. The least absolute shrinkage and selection operator (LASSO) regression analysis was employed for feature selection. Six algorithms were utilized on the H2O-automated machine learning platform: Gradient Boosting Machine (GBM), Distributed Random Forest (DRF), Extremely Randomized Trees (XRT), Generalized Linear Model (GLM), eXtreme Gradient Boosting (XGBoost), and Deep Learning (DL). These algorithms were selected for their diverse strengths, including their ability to handle complex, non-linear relationships, provide high predictive accuracy, and ensure interpretability. The models were evaluated by area under receiver operating characteristic curves (AUC) and interpreted by the calibration curve, the decision curve analysis, variable importance plot, SHapley Additive exPlanation plot, partial dependence plots, and local interpretable model agnostic explanation plot.Results: A total of 4177 participants (non-NAFLD 3167 vs NAFLD 1010) were included to develop and validate the AutoML models. The model developed by XGBoost performed better than other models in AutoML, achieving an AUC of 0.859, an accuracy of 0.795, a sensitivity of 0.773, and a specificity of 0.802 on the validation set.Conclusions: We developed an XGBoost model to better evaluate the presence of NAFLD. Based on the XGBoost model, we created an R Shiny web-based application named Shiny NAFLD (http://39.101.122.171:3838/App2/). This application demonstrates the potential of AutoML in clinical research and practice, offering a promising tool for the real-world identification of NAFLD.\",\"PeriodicalId\":51333,\"journal\":{\"name\":\"DIGITAL HEALTH\",\"volume\":\"10 \",\"pages\":\"20552076241272535\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11307367/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"DIGITAL HEALTH\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/20552076241272535\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"DIGITAL HEALTH","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20552076241272535","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

背景：非酒精性脂肪肝（NAFLD非酒精性脂肪肝（NAFLD）被认为是全球最常见的慢性肝病之一。本研究旨在利用基于人群的横断面数据库，评估自动机器学习（AutoML）在识别非酒精性脂肪肝方面的功效：所有数据，包括实验室检查、人体测量和人口统计学变量，均来自美国国家健康与营养调查（NHANES）。非酒精性脂肪肝是通过肝脏瞬态超声弹性成像的受控衰减参数（CAP）来定义的。特征选择采用了最小绝对收缩和选择算子（LASSO）回归分析法。在 H2O 自动机器学习平台上使用了六种算法：梯度提升机（GBM）、分布式随机森林（DRF）、极随机树（XRT）、广义线性模型（GLM）、极梯度提升（XGBoost）和深度学习（DL）。之所以选择这些算法，是因为它们具有不同的优势，包括能够处理复杂的非线性关系、提供较高的预测准确性并确保可解释性。这些模型通过接收者操作特征曲线下面积（AUC）进行评估，并通过校准曲线、决策曲线分析、变量重要性图、SHapley Additive exPlanation 图、部分依赖图和本地可解释模型不可知解释图进行解释：共有 4177 名参与者（非 NAFLD 3167 人 vs NAFLD 1010 人）参与了 AutoML 模型的开发和验证。XGBoost 开发的模型比 AutoML 中的其他模型表现更好，在验证集上的 AUC 为 0.859，准确率为 0.795，灵敏度为 0.773，特异性为 0.802：我们建立了一个XGBoost模型，以更好地评估是否存在非酒精性脂肪肝。基于 XGBoost 模型，我们创建了一个基于 R Shiny 的网络应用程序，名为 Shiny NAFLD (http://39.101.122.171:3838/App2/)。该应用程序展示了 AutoML 在临床研究和实践中的潜力，为非酒精性脂肪肝的实际识别提供了一个前景广阔的工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Automated machine learning models for nonalcoholic fatty liver disease assessed by controlled attenuation parameter from the NHANES 2017-2020.

Background: Nonalcoholic fatty liver disease (NAFLD) is recognized as one of the most common chronic liver diseases worldwide. This study aims to assess the efficacy of automated machine learning (AutoML) in the identification of NAFLD using a population-based cross-sectional database.

Methods: All data, including laboratory examinations, anthropometric measurements, and demographic variables, were obtained from the National Health and Nutrition Examination Survey (NHANES). NAFLD was defined by controlled attenuation parameter (CAP) in liver transient ultrasound elastography. The least absolute shrinkage and selection operator (LASSO) regression analysis was employed for feature selection. Six algorithms were utilized on the H2O-automated machine learning platform: Gradient Boosting Machine (GBM), Distributed Random Forest (DRF), Extremely Randomized Trees (XRT), Generalized Linear Model (GLM), eXtreme Gradient Boosting (XGBoost), and Deep Learning (DL). These algorithms were selected for their diverse strengths, including their ability to handle complex, non-linear relationships, provide high predictive accuracy, and ensure interpretability. The models were evaluated by area under receiver operating characteristic curves (AUC) and interpreted by the calibration curve, the decision curve analysis, variable importance plot, SHapley Additive exPlanation plot, partial dependence plots, and local interpretable model agnostic explanation plot.

Results: A total of 4177 participants (non-NAFLD 3167 vs NAFLD 1010) were included to develop and validate the AutoML models. The model developed by XGBoost performed better than other models in AutoML, achieving an AUC of 0.859, an accuracy of 0.795, a sensitivity of 0.773, and a specificity of 0.802 on the validation set.

Conclusions: We developed an XGBoost model to better evaluate the presence of NAFLD. Based on the XGBoost model, we created an R Shiny web-based application named Shiny NAFLD (http://39.101.122.171:3838/App2/). This application demonstrates the potential of AutoML in clinical research and practice, offering a promising tool for the real-world identification of NAFLD.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

DIGITAL HEALTH Multiple-

CiteScore

2.90

自引率

7.70%

发文量

302