Application of Interpretable Machine Learning Models to Predict the Risk Factors of HBV-Related Liver Cirrhosis in CHB Patients Based on Routine Clinical Data: A Retrospective Cohort Study

IF 6.8 3区 医学 Q1 VIROLOGY Journal of Medical Virology Pub Date : 2025-03-19 DOI:10.1002/jmv.70302
Wei Xia, Yafeng Tan, Bing Mei, Yizheng Zhou, Jufang Tan, Zhaxi Pubu, Bu Sang, Tao Jiang
{"title":"Application of Interpretable Machine Learning Models to Predict the Risk Factors of HBV-Related Liver Cirrhosis in CHB Patients Based on Routine Clinical Data: A Retrospective Cohort Study","authors":"Wei Xia,&nbsp;Yafeng Tan,&nbsp;Bing Mei,&nbsp;Yizheng Zhou,&nbsp;Jufang Tan,&nbsp;Zhaxi Pubu,&nbsp;Bu Sang,&nbsp;Tao Jiang","doi":"10.1002/jmv.70302","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Chronic hepatitis B (CHB) infection represents a significant global public health issue, often leading to hepatitis B virus (HBV)-related liver cirrhosis (HBV-LC) with poor prognoses. Early identification of HBV-LC risk is essential for timely intervention. This study develops and compares nine machine learning (ML) models to predict HBV-LC risk in CHB patients using routine clinical and laboratory data. A retrospective analysis was conducted involving 777 CHB patients, with 50.45% (392/777) progressing to HBV-LC. Admission data consisted of 52 clinical and laboratory variables, with missing values addressed using multiple imputation. Feature selection utilized Least Absolute Shrinkage and Selection Operator (LASSO) regression and the Boruta algorithm, identifying 24 key variables. The evaluated ML models included XGBoost, logistic regression (LR), LightGBM, random forest (RF), AdaBoost, Gaussian naive Bayes (GNB), multilayer perceptron (MLP), support vector machine (SVM), and k-nearest neighbors (KNN). The data set was partitioned into an 80% training set (<i>n</i> = 621) and a 20% independent testing set (<i>n</i> = 156). Cross-validation (CV) facilitated hyperparameter tuning and internal validation of the optimal model. Performance metrics included the area under the receiver operating characteristic curve (AUC), Brier score, accuracy, sensitivity, specificity, and F1 score. The RF model demonstrated superior performance, with AUCs of 0.992 (training) and 0.907 (validation), while the reconstructed model achieved AUCs of 0.944 (training) and 0.945 (validation), maintaining an AUC of 0.863 in the testing set. Calibration curves confirmed a strong alignment between observed and predicted probabilities. Decision curve analysis indicated that the RF model provided the highest net benefit across threshold probabilities. The SHAP algorithm identified RPR, PLT, HBV DNA, ALT, and TBA as critical predictors. This interpretable ML model enhances early HBV-LC prediction and supports clinical decision-making in resource-limited settings.</p>\n </div>","PeriodicalId":16354,"journal":{"name":"Journal of Medical Virology","volume":"97 3","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Virology","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jmv.70302","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"VIROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Chronic hepatitis B (CHB) infection represents a significant global public health issue, often leading to hepatitis B virus (HBV)-related liver cirrhosis (HBV-LC) with poor prognoses. Early identification of HBV-LC risk is essential for timely intervention. This study develops and compares nine machine learning (ML) models to predict HBV-LC risk in CHB patients using routine clinical and laboratory data. A retrospective analysis was conducted involving 777 CHB patients, with 50.45% (392/777) progressing to HBV-LC. Admission data consisted of 52 clinical and laboratory variables, with missing values addressed using multiple imputation. Feature selection utilized Least Absolute Shrinkage and Selection Operator (LASSO) regression and the Boruta algorithm, identifying 24 key variables. The evaluated ML models included XGBoost, logistic regression (LR), LightGBM, random forest (RF), AdaBoost, Gaussian naive Bayes (GNB), multilayer perceptron (MLP), support vector machine (SVM), and k-nearest neighbors (KNN). The data set was partitioned into an 80% training set (n = 621) and a 20% independent testing set (n = 156). Cross-validation (CV) facilitated hyperparameter tuning and internal validation of the optimal model. Performance metrics included the area under the receiver operating characteristic curve (AUC), Brier score, accuracy, sensitivity, specificity, and F1 score. The RF model demonstrated superior performance, with AUCs of 0.992 (training) and 0.907 (validation), while the reconstructed model achieved AUCs of 0.944 (training) and 0.945 (validation), maintaining an AUC of 0.863 in the testing set. Calibration curves confirmed a strong alignment between observed and predicted probabilities. Decision curve analysis indicated that the RF model provided the highest net benefit across threshold probabilities. The SHAP algorithm identified RPR, PLT, HBV DNA, ALT, and TBA as critical predictors. This interpretable ML model enhances early HBV-LC prediction and supports clinical decision-making in resource-limited settings.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Medical Virology
Journal of Medical Virology 医学-病毒学
CiteScore
23.20
自引率
2.40%
发文量
777
审稿时长
1 months
期刊介绍: The Journal of Medical Virology focuses on publishing original scientific papers on both basic and applied research related to viruses that affect humans. The journal publishes reports covering a wide range of topics, including the characterization, diagnosis, epidemiology, immunology, and pathogenesis of human virus infections. It also includes studies on virus morphology, genetics, replication, and interactions with host cells. The intended readership of the journal includes virologists, microbiologists, immunologists, infectious disease specialists, diagnostic laboratory technologists, epidemiologists, hematologists, and cell biologists. The Journal of Medical Virology is indexed and abstracted in various databases, including Abstracts in Anthropology (Sage), CABI, AgBiotech News & Information, National Agricultural Library, Biological Abstracts, Embase, Global Health, Web of Science, Veterinary Bulletin, and others.
期刊最新文献
Correction to “Long-Lasting Protection and Dose Optimization of MPXV Polyvalent Mpox mRNA Vaccines Against Lethal Vaccinia Virus Challenge in Mice” Application of Interpretable Machine Learning Models to Predict the Risk Factors of HBV-Related Liver Cirrhosis in CHB Patients Based on Routine Clinical Data: A Retrospective Cohort Study Hantaan Virus (HTNV) Human Infection on Jeju Island, South Korea: Unique Phylogeny and Epidemiology of HTNV Superior Performance of Newly Developed Alinity Anti-HCV Next Assay in the Diagnosis of HCV Infection Epidemiology of Parvovirus B19 Infection In an Italian Metropolitan Area, 2012–2024: COVID-19 Pre-Pandemic, Pandemic and Post-Pandemic Trends
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1