Construction and interpretation of weight-balanced enhanced machine learning models for predicting liver metastasis risk in colorectal cancer patients.

IF 2.8 4区 医学 Q3 ENDOCRINOLOGY & METABOLISM Discover. Oncology Pub Date : 2025-02-12 DOI:10.1007/s12672-025-01871-2
Qunzhe Ding, Chenyang Li, Chendong Wang, Qunzhe Ding
{"title":"Construction and interpretation of weight-balanced enhanced machine learning models for predicting liver metastasis risk in colorectal cancer patients.","authors":"Qunzhe Ding, Chenyang Li, Chendong Wang, Qunzhe Ding","doi":"10.1007/s12672-025-01871-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Colorectal cancer (CRC) is a major contributor to cancer-related mortality, with liver metastases developing in approximately 25% of affected individuals. The presence of liver metastasis significantly deteriorates the prognosis for patients. The objective of this study is to predict liver metastasis in CRC patients by developing machine learning (ML)-based models, thereby aiding clinicians in the decision-making process for appropriate interventions.</p><p><strong>Methods: </strong>Retrospective analysis was performed using the Surveillance, Epidemiology, and End Results (SEER) database, and cases with CRC from 2010 to 2015 were extracted to the downstream analysis. Logistic regression (LR), Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and LightGBM are applied to develop machine learning (ML) models to predict liver metastasis of CRC patient. To optimize the models, an improved weight-balancing algorithm was employed, enhancing the performance of the classifiers. The six models were tenfold cross-validated, and the optimal model was selected based on a combination of performance metrics. Shapley additive explanation (SHAP) was utilized to interpret the best-performing ML models globally, locally, and interactively. To ensure the model's reliability and generalizability, an external validation cohort of CRC cases from 2018 to 2021, obtained from a separate SEER database, was used for external evaluation.</p><p><strong>Results: </strong>In total, 50,062 patients with CRC were included in the analysis, with 5604 patients occurring liver metastasis. Among the six models evaluated, the CatBoost model showed excellent performance with the highest AUC of 0.8844. Moreover, the CatBoost model also outperformed the others in terms of recall (0.8060) and F1-score (0.6736). SHAP-based summary and force plots were used to interpret the CatBoost model. The interpretability analysis revealed that elevated carcinoembryonic antigen (CEA) levels, systemic therapy, N and T stages, and chemotherapy performed were the most significant indicators for predicting liver metastasis according to the optimal model. Furthermore, systemic therapy was suggested to increase liver metastasis risk in N0 stage patients, while it appeared to be beneficial in patients with lymph node metastasis. Preoperative radiation therapy was found to be more effective than postoperative radiation therapy. Validation using an external cohort of CRC cases from 2018 to 2021 further confirmed the robustness and stability of the CatBoost model, as its overall performance remained consistent with the internal validation results.</p><p><strong>Conclusion: </strong>Elevated levels of carcinoembryonic antigen (CEA) have been identified as a crucial clinical predictor for liver metastasis in CRC patients. Furthermore, the administration of systemic therapy to patients who do not exhibit lymph node involvement has been found to increase the risk of liver metastasis. In terms of radiation therapy, preoperative radiation appears to be more efficacious in controlling the risk of liver metastasis compared to postoperative radiation. This finding underscores the importance of optimizing treatment strategies based on the specific clinical context and patient characteristics.</p>","PeriodicalId":11148,"journal":{"name":"Discover. Oncology","volume":"16 1","pages":"164"},"PeriodicalIF":2.8000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Discover. Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s12672-025-01871-2","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Colorectal cancer (CRC) is a major contributor to cancer-related mortality, with liver metastases developing in approximately 25% of affected individuals. The presence of liver metastasis significantly deteriorates the prognosis for patients. The objective of this study is to predict liver metastasis in CRC patients by developing machine learning (ML)-based models, thereby aiding clinicians in the decision-making process for appropriate interventions.

Methods: Retrospective analysis was performed using the Surveillance, Epidemiology, and End Results (SEER) database, and cases with CRC from 2010 to 2015 were extracted to the downstream analysis. Logistic regression (LR), Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and LightGBM are applied to develop machine learning (ML) models to predict liver metastasis of CRC patient. To optimize the models, an improved weight-balancing algorithm was employed, enhancing the performance of the classifiers. The six models were tenfold cross-validated, and the optimal model was selected based on a combination of performance metrics. Shapley additive explanation (SHAP) was utilized to interpret the best-performing ML models globally, locally, and interactively. To ensure the model's reliability and generalizability, an external validation cohort of CRC cases from 2018 to 2021, obtained from a separate SEER database, was used for external evaluation.

Results: In total, 50,062 patients with CRC were included in the analysis, with 5604 patients occurring liver metastasis. Among the six models evaluated, the CatBoost model showed excellent performance with the highest AUC of 0.8844. Moreover, the CatBoost model also outperformed the others in terms of recall (0.8060) and F1-score (0.6736). SHAP-based summary and force plots were used to interpret the CatBoost model. The interpretability analysis revealed that elevated carcinoembryonic antigen (CEA) levels, systemic therapy, N and T stages, and chemotherapy performed were the most significant indicators for predicting liver metastasis according to the optimal model. Furthermore, systemic therapy was suggested to increase liver metastasis risk in N0 stage patients, while it appeared to be beneficial in patients with lymph node metastasis. Preoperative radiation therapy was found to be more effective than postoperative radiation therapy. Validation using an external cohort of CRC cases from 2018 to 2021 further confirmed the robustness and stability of the CatBoost model, as its overall performance remained consistent with the internal validation results.

Conclusion: Elevated levels of carcinoembryonic antigen (CEA) have been identified as a crucial clinical predictor for liver metastasis in CRC patients. Furthermore, the administration of systemic therapy to patients who do not exhibit lymph node involvement has been found to increase the risk of liver metastasis. In terms of radiation therapy, preoperative radiation appears to be more efficacious in controlling the risk of liver metastasis compared to postoperative radiation. This finding underscores the importance of optimizing treatment strategies based on the specific clinical context and patient characteristics.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
构建和解释用于预测结直肠癌患者肝转移风险的权重平衡增强型机器学习模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Discover. Oncology
Discover. Oncology Medicine-Endocrinology, Diabetes and Metabolism
CiteScore
2.40
自引率
9.10%
发文量
122
审稿时长
5 weeks
期刊最新文献
Acetyltransferase NAT10 promotes gastric cancer progression by regulating the Wnt/β-catenin signaling pathway and enhances chemotherapy resistance. Analysis of microarray and single-cell RNA-seq identifies gene co-expression, cell-cell communication, and tumor environment associated with metabolite interconversion enzyme in prostate cancer. Identification and characterization of cuproptosis related gene subtypes through multi-omics bioinformatics analysis in breast cancer. Identification of differentially expressed MiRNA clusters in cervical cancer. Oral cancer driver gene mutations in oral potentially malignant disorders: clinical significance and diagnostic implications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1