Application of Interpretable Machine Learning Models to Predict the Risk Factors of HBV-Related Liver Cirrhosis in CHB Patients Based on Routine Clinical Data: A Retrospective Cohort Study

IF 4.6 3区 医学 Q1 VIROLOGY Journal of Medical Virology Pub Date : 2025-03-19 DOI:10.1002/jmv.70302
Wei Xia, Yafeng Tan, Bing Mei, Yizheng Zhou, Jufang Tan, Zhaxi Pubu, Bu Sang, Tao Jiang
{"title":"Application of Interpretable Machine Learning Models to Predict the Risk Factors of HBV-Related Liver Cirrhosis in CHB Patients Based on Routine Clinical Data: A Retrospective Cohort Study","authors":"Wei Xia,&nbsp;Yafeng Tan,&nbsp;Bing Mei,&nbsp;Yizheng Zhou,&nbsp;Jufang Tan,&nbsp;Zhaxi Pubu,&nbsp;Bu Sang,&nbsp;Tao Jiang","doi":"10.1002/jmv.70302","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Chronic hepatitis B (CHB) infection represents a significant global public health issue, often leading to hepatitis B virus (HBV)-related liver cirrhosis (HBV-LC) with poor prognoses. Early identification of HBV-LC risk is essential for timely intervention. This study develops and compares nine machine learning (ML) models to predict HBV-LC risk in CHB patients using routine clinical and laboratory data. A retrospective analysis was conducted involving 777 CHB patients, with 50.45% (392/777) progressing to HBV-LC. Admission data consisted of 52 clinical and laboratory variables, with missing values addressed using multiple imputation. Feature selection utilized Least Absolute Shrinkage and Selection Operator (LASSO) regression and the Boruta algorithm, identifying 24 key variables. The evaluated ML models included XGBoost, logistic regression (LR), LightGBM, random forest (RF), AdaBoost, Gaussian naive Bayes (GNB), multilayer perceptron (MLP), support vector machine (SVM), and k-nearest neighbors (KNN). The data set was partitioned into an 80% training set (<i>n</i> = 621) and a 20% independent testing set (<i>n</i> = 156). Cross-validation (CV) facilitated hyperparameter tuning and internal validation of the optimal model. Performance metrics included the area under the receiver operating characteristic curve (AUC), Brier score, accuracy, sensitivity, specificity, and F1 score. The RF model demonstrated superior performance, with AUCs of 0.992 (training) and 0.907 (validation), while the reconstructed model achieved AUCs of 0.944 (training) and 0.945 (validation), maintaining an AUC of 0.863 in the testing set. Calibration curves confirmed a strong alignment between observed and predicted probabilities. Decision curve analysis indicated that the RF model provided the highest net benefit across threshold probabilities. The SHAP algorithm identified RPR, PLT, HBV DNA, ALT, and TBA as critical predictors. This interpretable ML model enhances early HBV-LC prediction and supports clinical decision-making in resource-limited settings.</p>\n </div>","PeriodicalId":16354,"journal":{"name":"Journal of Medical Virology","volume":"97 3","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Virology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jmv.70302","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"VIROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Chronic hepatitis B (CHB) infection represents a significant global public health issue, often leading to hepatitis B virus (HBV)-related liver cirrhosis (HBV-LC) with poor prognoses. Early identification of HBV-LC risk is essential for timely intervention. This study develops and compares nine machine learning (ML) models to predict HBV-LC risk in CHB patients using routine clinical and laboratory data. A retrospective analysis was conducted involving 777 CHB patients, with 50.45% (392/777) progressing to HBV-LC. Admission data consisted of 52 clinical and laboratory variables, with missing values addressed using multiple imputation. Feature selection utilized Least Absolute Shrinkage and Selection Operator (LASSO) regression and the Boruta algorithm, identifying 24 key variables. The evaluated ML models included XGBoost, logistic regression (LR), LightGBM, random forest (RF), AdaBoost, Gaussian naive Bayes (GNB), multilayer perceptron (MLP), support vector machine (SVM), and k-nearest neighbors (KNN). The data set was partitioned into an 80% training set (n = 621) and a 20% independent testing set (n = 156). Cross-validation (CV) facilitated hyperparameter tuning and internal validation of the optimal model. Performance metrics included the area under the receiver operating characteristic curve (AUC), Brier score, accuracy, sensitivity, specificity, and F1 score. The RF model demonstrated superior performance, with AUCs of 0.992 (training) and 0.907 (validation), while the reconstructed model achieved AUCs of 0.944 (training) and 0.945 (validation), maintaining an AUC of 0.863 in the testing set. Calibration curves confirmed a strong alignment between observed and predicted probabilities. Decision curve analysis indicated that the RF model provided the highest net benefit across threshold probabilities. The SHAP algorithm identified RPR, PLT, HBV DNA, ALT, and TBA as critical predictors. This interpretable ML model enhances early HBV-LC prediction and supports clinical decision-making in resource-limited settings.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于常规临床数据的可解释机器学习模型在CHB患者hbv相关肝硬化危险因素预测中的应用:一项回顾性队列研究
慢性乙型肝炎(CHB)感染是一个重大的全球公共卫生问题,通常导致乙型肝炎病毒(HBV)相关的肝硬化(HBV- lc),预后不良。早期识别HBV-LC风险对于及时干预至关重要。本研究开发并比较了9种机器学习(ML)模型,利用常规临床和实验室数据预测CHB患者的HBV-LC风险。对777例CHB患者进行回顾性分析,其中50.45%(392/777)进展为HBV-LC。入院数据由52个临床和实验室变量组成,缺失值使用多重输入解决。特征选择利用最小绝对收缩和选择算子(LASSO)回归和Boruta算法,确定了24个关键变量。评估的机器学习模型包括XGBoost、逻辑回归(LR)、LightGBM、随机森林(RF)、AdaBoost、高斯朴素贝叶斯(GNB)、多层感知器(MLP)、支持向量机(SVM)和k近邻(KNN)。数据集被划分为80%的训练集(n = 621)和20%的独立测试集(n = 156)。交叉验证(CV)促进了超参数调整和最优模型的内部验证。性能指标包括受试者工作特征曲线下面积(AUC)、Brier评分、准确性、敏感性、特异性和F1评分。RF模型的AUC为0.992(训练)和0.907(验证),而重构模型的AUC为0.944(训练)和0.945(验证),在测试集中保持了0.863的AUC。校准曲线证实了观测概率和预测概率之间的强烈一致性。决策曲线分析表明,RF模型在阈值概率上提供了最高的净效益。SHAP算法确定RPR、PLT、HBV DNA、ALT和TBA为关键预测因子。这种可解释的ML模型增强了早期HBV-LC预测,并在资源有限的情况下支持临床决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Medical Virology
Journal of Medical Virology 医学-病毒学
CiteScore
23.20
自引率
2.40%
发文量
777
审稿时长
1 months
期刊介绍: The Journal of Medical Virology focuses on publishing original scientific papers on both basic and applied research related to viruses that affect humans. The journal publishes reports covering a wide range of topics, including the characterization, diagnosis, epidemiology, immunology, and pathogenesis of human virus infections. It also includes studies on virus morphology, genetics, replication, and interactions with host cells. The intended readership of the journal includes virologists, microbiologists, immunologists, infectious disease specialists, diagnostic laboratory technologists, epidemiologists, hematologists, and cell biologists. The Journal of Medical Virology is indexed and abstracted in various databases, including Abstracts in Anthropology (Sage), CABI, AgBiotech News & Information, National Agricultural Library, Biological Abstracts, Embase, Global Health, Web of Science, Veterinary Bulletin, and others.
期刊最新文献
Repeat Donor-Derived Cell-Free DNA Monitoring for Adjunctive Assessment of BK Polyomavirus Nephropathy Clinical Courses. CD4-Based Chimeric Antigen Receptor (CAR)-T Cells With Resistance to HIV-1 Infection and Enhanced Anti-HIV Efficacy: Covalent Interaction Between CD4-CAR and HIV-1 Envelope Glycoprotein. Dynamic Associations and Prognosis of Rhabdomyolysis in Patients With Severe Fever With Thrombocytopenia Syndrome. Seroprevalence of Hepatitis E Virus Infection among Blood Donors in Lebanon: A National Comparative Evaluation of Two Serological Assays. Integrating Single-Cell RNA-Seq and Bulk RNA-Seq Reveals Ischemic Injury Promoting Polyomavirus Replication by DNA Damage Response.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1