通过 NHANES 2017-2020 的受控衰减参数评估非酒精性脂肪肝的自动机器学习模型。

IF 2.9 3区 医学 Q2 HEALTH CARE SCIENCES & SERVICES DIGITAL HEALTH Pub Date : 2024-08-07 eCollection Date: 2024-01-01 DOI:10.1177/20552076241272535
Lihe Liu, Jiaxi Lin, Lu Liu, Jingwen Gao, Guoting Xu, Minyue Yin, Xiaolin Liu, Airong Wu, Jinzhou Zhu
{"title":"通过 NHANES 2017-2020 的受控衰减参数评估非酒精性脂肪肝的自动机器学习模型。","authors":"Lihe Liu, Jiaxi Lin, Lu Liu, Jingwen Gao, Guoting Xu, Minyue Yin, Xiaolin Liu, Airong Wu, Jinzhou Zhu","doi":"10.1177/20552076241272535","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Nonalcoholic fatty liver disease (NAFLD) is recognized as one of the most common chronic liver diseases worldwide. This study aims to assess the efficacy of automated machine learning (AutoML) in the identification of NAFLD using a population-based cross-sectional database.</p><p><strong>Methods: </strong>All data, including laboratory examinations, anthropometric measurements, and demographic variables, were obtained from the National Health and Nutrition Examination Survey (NHANES). NAFLD was defined by controlled attenuation parameter (CAP) in liver transient ultrasound elastography. The least absolute shrinkage and selection operator (LASSO) regression analysis was employed for feature selection. Six algorithms were utilized on the H2O-automated machine learning platform: Gradient Boosting Machine (GBM), Distributed Random Forest (DRF), Extremely Randomized Trees (XRT), Generalized Linear Model (GLM), eXtreme Gradient Boosting (XGBoost), and Deep Learning (DL). These algorithms were selected for their diverse strengths, including their ability to handle complex, non-linear relationships, provide high predictive accuracy, and ensure interpretability. The models were evaluated by area under receiver operating characteristic curves (AUC) and interpreted by the calibration curve, the decision curve analysis, variable importance plot, SHapley Additive exPlanation plot, partial dependence plots, and local interpretable model agnostic explanation plot.</p><p><strong>Results: </strong>A total of 4177 participants (non-NAFLD 3167 vs NAFLD 1010) were included to develop and validate the AutoML models. The model developed by XGBoost performed better than other models in AutoML, achieving an AUC of 0.859, an accuracy of 0.795, a sensitivity of 0.773, and a specificity of 0.802 on the validation set.</p><p><strong>Conclusions: </strong>We developed an XGBoost model to better evaluate the presence of NAFLD. Based on the XGBoost model, we created an R Shiny web-based application named Shiny NAFLD (http://39.101.122.171:3838/App2/). This application demonstrates the potential of AutoML in clinical research and practice, offering a promising tool for the real-world identification of NAFLD.</p>","PeriodicalId":51333,"journal":{"name":"DIGITAL HEALTH","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11307367/pdf/","citationCount":"0","resultStr":"{\"title\":\"Automated machine learning models for nonalcoholic fatty liver disease assessed by controlled attenuation parameter from the NHANES 2017-2020.\",\"authors\":\"Lihe Liu, Jiaxi Lin, Lu Liu, Jingwen Gao, Guoting Xu, Minyue Yin, Xiaolin Liu, Airong Wu, Jinzhou Zhu\",\"doi\":\"10.1177/20552076241272535\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Nonalcoholic fatty liver disease (NAFLD) is recognized as one of the most common chronic liver diseases worldwide. This study aims to assess the efficacy of automated machine learning (AutoML) in the identification of NAFLD using a population-based cross-sectional database.</p><p><strong>Methods: </strong>All data, including laboratory examinations, anthropometric measurements, and demographic variables, were obtained from the National Health and Nutrition Examination Survey (NHANES). NAFLD was defined by controlled attenuation parameter (CAP) in liver transient ultrasound elastography. The least absolute shrinkage and selection operator (LASSO) regression analysis was employed for feature selection. Six algorithms were utilized on the H2O-automated machine learning platform: Gradient Boosting Machine (GBM), Distributed Random Forest (DRF), Extremely Randomized Trees (XRT), Generalized Linear Model (GLM), eXtreme Gradient Boosting (XGBoost), and Deep Learning (DL). These algorithms were selected for their diverse strengths, including their ability to handle complex, non-linear relationships, provide high predictive accuracy, and ensure interpretability. The models were evaluated by area under receiver operating characteristic curves (AUC) and interpreted by the calibration curve, the decision curve analysis, variable importance plot, SHapley Additive exPlanation plot, partial dependence plots, and local interpretable model agnostic explanation plot.</p><p><strong>Results: </strong>A total of 4177 participants (non-NAFLD 3167 vs NAFLD 1010) were included to develop and validate the AutoML models. The model developed by XGBoost performed better than other models in AutoML, achieving an AUC of 0.859, an accuracy of 0.795, a sensitivity of 0.773, and a specificity of 0.802 on the validation set.</p><p><strong>Conclusions: </strong>We developed an XGBoost model to better evaluate the presence of NAFLD. Based on the XGBoost model, we created an R Shiny web-based application named Shiny NAFLD (http://39.101.122.171:3838/App2/). This application demonstrates the potential of AutoML in clinical research and practice, offering a promising tool for the real-world identification of NAFLD.</p>\",\"PeriodicalId\":51333,\"journal\":{\"name\":\"DIGITAL HEALTH\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11307367/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"DIGITAL HEALTH\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/20552076241272535\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"DIGITAL HEALTH","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20552076241272535","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

摘要

背景:非酒精性脂肪肝(NAFLD非酒精性脂肪肝(NAFLD)被认为是全球最常见的慢性肝病之一。本研究旨在利用基于人群的横断面数据库,评估自动机器学习(AutoML)在识别非酒精性脂肪肝方面的功效:所有数据,包括实验室检查、人体测量和人口统计学变量,均来自美国国家健康与营养调查(NHANES)。非酒精性脂肪肝是通过肝脏瞬态超声弹性成像的受控衰减参数(CAP)来定义的。特征选择采用了最小绝对收缩和选择算子(LASSO)回归分析法。在 H2O 自动机器学习平台上使用了六种算法:梯度提升机(GBM)、分布式随机森林(DRF)、极随机树(XRT)、广义线性模型(GLM)、极梯度提升(XGBoost)和深度学习(DL)。之所以选择这些算法,是因为它们具有不同的优势,包括能够处理复杂的非线性关系、提供较高的预测准确性并确保可解释性。这些模型通过接收者操作特征曲线下面积(AUC)进行评估,并通过校准曲线、决策曲线分析、变量重要性图、SHapley Additive exPlanation 图、部分依赖图和本地可解释模型不可知解释图进行解释:共有 4177 名参与者(非 NAFLD 3167 人 vs NAFLD 1010 人)参与了 AutoML 模型的开发和验证。XGBoost 开发的模型比 AutoML 中的其他模型表现更好,在验证集上的 AUC 为 0.859,准确率为 0.795,灵敏度为 0.773,特异性为 0.802:我们建立了一个XGBoost模型,以更好地评估是否存在非酒精性脂肪肝。基于 XGBoost 模型,我们创建了一个基于 R Shiny 的网络应用程序,名为 Shiny NAFLD (http://39.101.122.171:3838/App2/)。该应用程序展示了 AutoML 在临床研究和实践中的潜力,为非酒精性脂肪肝的实际识别提供了一个前景广阔的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Automated machine learning models for nonalcoholic fatty liver disease assessed by controlled attenuation parameter from the NHANES 2017-2020.

Background: Nonalcoholic fatty liver disease (NAFLD) is recognized as one of the most common chronic liver diseases worldwide. This study aims to assess the efficacy of automated machine learning (AutoML) in the identification of NAFLD using a population-based cross-sectional database.

Methods: All data, including laboratory examinations, anthropometric measurements, and demographic variables, were obtained from the National Health and Nutrition Examination Survey (NHANES). NAFLD was defined by controlled attenuation parameter (CAP) in liver transient ultrasound elastography. The least absolute shrinkage and selection operator (LASSO) regression analysis was employed for feature selection. Six algorithms were utilized on the H2O-automated machine learning platform: Gradient Boosting Machine (GBM), Distributed Random Forest (DRF), Extremely Randomized Trees (XRT), Generalized Linear Model (GLM), eXtreme Gradient Boosting (XGBoost), and Deep Learning (DL). These algorithms were selected for their diverse strengths, including their ability to handle complex, non-linear relationships, provide high predictive accuracy, and ensure interpretability. The models were evaluated by area under receiver operating characteristic curves (AUC) and interpreted by the calibration curve, the decision curve analysis, variable importance plot, SHapley Additive exPlanation plot, partial dependence plots, and local interpretable model agnostic explanation plot.

Results: A total of 4177 participants (non-NAFLD 3167 vs NAFLD 1010) were included to develop and validate the AutoML models. The model developed by XGBoost performed better than other models in AutoML, achieving an AUC of 0.859, an accuracy of 0.795, a sensitivity of 0.773, and a specificity of 0.802 on the validation set.

Conclusions: We developed an XGBoost model to better evaluate the presence of NAFLD. Based on the XGBoost model, we created an R Shiny web-based application named Shiny NAFLD (http://39.101.122.171:3838/App2/). This application demonstrates the potential of AutoML in clinical research and practice, offering a promising tool for the real-world identification of NAFLD.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
DIGITAL HEALTH
DIGITAL HEALTH Multiple-
CiteScore
2.90
自引率
7.70%
发文量
302
期刊最新文献
A feasibility study on utilizing machine learning technology to reduce the costs of gastric cancer screening in Taizhou, China. Ageing well with tech: Exploring the determinants of e-healthcare services adoption in an emerging economy. Chinese colposcopists' attitudes toward the colposcopic artificial intelligence auxiliary diagnostic system (CAIADS): A nation-wide, multi-center survey. Digital leadership: Norwegian healthcare managers' attitudes towards using digital tools. Disease characteristics influence the privacy calculus to adopt electronic health records: A survey study in Germany.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1