Construction and interpretation of weight-balanced enhanced machine learning models for predicting liver metastasis risk in colorectal cancer patients.

IF 2.9 4区 医学 Q3 ENDOCRINOLOGY & METABOLISM Discover. Oncology Pub Date : 2025-02-12 DOI:10.1007/s12672-025-01871-2
Qunzhe Ding, Chenyang Li, Chendong Wang, Qunzhe Ding
{"title":"Construction and interpretation of weight-balanced enhanced machine learning models for predicting liver metastasis risk in colorectal cancer patients.","authors":"Qunzhe Ding, Chenyang Li, Chendong Wang, Qunzhe Ding","doi":"10.1007/s12672-025-01871-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Colorectal cancer (CRC) is a major contributor to cancer-related mortality, with liver metastases developing in approximately 25% of affected individuals. The presence of liver metastasis significantly deteriorates the prognosis for patients. The objective of this study is to predict liver metastasis in CRC patients by developing machine learning (ML)-based models, thereby aiding clinicians in the decision-making process for appropriate interventions.</p><p><strong>Methods: </strong>Retrospective analysis was performed using the Surveillance, Epidemiology, and End Results (SEER) database, and cases with CRC from 2010 to 2015 were extracted to the downstream analysis. Logistic regression (LR), Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and LightGBM are applied to develop machine learning (ML) models to predict liver metastasis of CRC patient. To optimize the models, an improved weight-balancing algorithm was employed, enhancing the performance of the classifiers. The six models were tenfold cross-validated, and the optimal model was selected based on a combination of performance metrics. Shapley additive explanation (SHAP) was utilized to interpret the best-performing ML models globally, locally, and interactively. To ensure the model's reliability and generalizability, an external validation cohort of CRC cases from 2018 to 2021, obtained from a separate SEER database, was used for external evaluation.</p><p><strong>Results: </strong>In total, 50,062 patients with CRC were included in the analysis, with 5604 patients occurring liver metastasis. Among the six models evaluated, the CatBoost model showed excellent performance with the highest AUC of 0.8844. Moreover, the CatBoost model also outperformed the others in terms of recall (0.8060) and F1-score (0.6736). SHAP-based summary and force plots were used to interpret the CatBoost model. The interpretability analysis revealed that elevated carcinoembryonic antigen (CEA) levels, systemic therapy, N and T stages, and chemotherapy performed were the most significant indicators for predicting liver metastasis according to the optimal model. Furthermore, systemic therapy was suggested to increase liver metastasis risk in N0 stage patients, while it appeared to be beneficial in patients with lymph node metastasis. Preoperative radiation therapy was found to be more effective than postoperative radiation therapy. Validation using an external cohort of CRC cases from 2018 to 2021 further confirmed the robustness and stability of the CatBoost model, as its overall performance remained consistent with the internal validation results.</p><p><strong>Conclusion: </strong>Elevated levels of carcinoembryonic antigen (CEA) have been identified as a crucial clinical predictor for liver metastasis in CRC patients. Furthermore, the administration of systemic therapy to patients who do not exhibit lymph node involvement has been found to increase the risk of liver metastasis. In terms of radiation therapy, preoperative radiation appears to be more efficacious in controlling the risk of liver metastasis compared to postoperative radiation. This finding underscores the importance of optimizing treatment strategies based on the specific clinical context and patient characteristics.</p>","PeriodicalId":11148,"journal":{"name":"Discover. Oncology","volume":"16 1","pages":"164"},"PeriodicalIF":2.9000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11822177/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Discover. Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s12672-025-01871-2","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Colorectal cancer (CRC) is a major contributor to cancer-related mortality, with liver metastases developing in approximately 25% of affected individuals. The presence of liver metastasis significantly deteriorates the prognosis for patients. The objective of this study is to predict liver metastasis in CRC patients by developing machine learning (ML)-based models, thereby aiding clinicians in the decision-making process for appropriate interventions.

Methods: Retrospective analysis was performed using the Surveillance, Epidemiology, and End Results (SEER) database, and cases with CRC from 2010 to 2015 were extracted to the downstream analysis. Logistic regression (LR), Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and LightGBM are applied to develop machine learning (ML) models to predict liver metastasis of CRC patient. To optimize the models, an improved weight-balancing algorithm was employed, enhancing the performance of the classifiers. The six models were tenfold cross-validated, and the optimal model was selected based on a combination of performance metrics. Shapley additive explanation (SHAP) was utilized to interpret the best-performing ML models globally, locally, and interactively. To ensure the model's reliability and generalizability, an external validation cohort of CRC cases from 2018 to 2021, obtained from a separate SEER database, was used for external evaluation.

Results: In total, 50,062 patients with CRC were included in the analysis, with 5604 patients occurring liver metastasis. Among the six models evaluated, the CatBoost model showed excellent performance with the highest AUC of 0.8844. Moreover, the CatBoost model also outperformed the others in terms of recall (0.8060) and F1-score (0.6736). SHAP-based summary and force plots were used to interpret the CatBoost model. The interpretability analysis revealed that elevated carcinoembryonic antigen (CEA) levels, systemic therapy, N and T stages, and chemotherapy performed were the most significant indicators for predicting liver metastasis according to the optimal model. Furthermore, systemic therapy was suggested to increase liver metastasis risk in N0 stage patients, while it appeared to be beneficial in patients with lymph node metastasis. Preoperative radiation therapy was found to be more effective than postoperative radiation therapy. Validation using an external cohort of CRC cases from 2018 to 2021 further confirmed the robustness and stability of the CatBoost model, as its overall performance remained consistent with the internal validation results.

Conclusion: Elevated levels of carcinoembryonic antigen (CEA) have been identified as a crucial clinical predictor for liver metastasis in CRC patients. Furthermore, the administration of systemic therapy to patients who do not exhibit lymph node involvement has been found to increase the risk of liver metastasis. In terms of radiation therapy, preoperative radiation appears to be more efficacious in controlling the risk of liver metastasis compared to postoperative radiation. This finding underscores the importance of optimizing treatment strategies based on the specific clinical context and patient characteristics.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
构建和解释用于预测结直肠癌患者肝转移风险的权重平衡增强型机器学习模型。
背景:结直肠癌(CRC)是癌症相关死亡的主要原因,约25%的患者发生肝转移。肝转移的存在显著恶化了患者的预后。本研究的目的是通过开发基于机器学习(ML)的模型来预测结直肠癌患者的肝转移,从而帮助临床医生做出适当干预的决策过程。方法:使用监测、流行病学和最终结果(SEER)数据库进行回顾性分析,提取2010 - 2015年结直肠癌病例进行下游分析。应用Logistic回归(LR)、随机森林(RF)、梯度增强机(GBM)、极限梯度增强(XGBoost)、分类增强(CatBoost)和LightGBM建立机器学习(ML)模型,预测结直肠癌患者的肝转移。为了优化模型,采用了改进的权重平衡算法,提高了分类器的性能。对六个模型进行十倍交叉验证,并根据性能指标组合选择最优模型。Shapley加性解释(SHAP)被用于全局、局部和交互式地解释表现最好的ML模型。为了确保模型的可靠性和可推广性,从单独的SEER数据库中获得2018年至2021年CRC病例的外部验证队列,用于外部评估。结果:共纳入50,062例结直肠癌患者,其中5604例发生肝转移。在评价的6个模型中,CatBoost模型表现优异,AUC最高,为0.8844。此外,CatBoost模型在召回率(0.8060)和f1得分(0.6736)方面也优于其他模型。使用基于shap的摘要图和力图来解释CatBoost模型。可解释性分析显示,根据最优模型,癌胚抗原(CEA)水平升高、全身治疗、N和T分期以及化疗是预测肝转移的最重要指标。此外,全身治疗被认为会增加N0期患者的肝转移风险,而对淋巴结转移患者似乎是有益的。术前放疗比术后放疗更有效。2018年至2021年CRC病例的外部队列验证进一步证实了CatBoost模型的稳健性和稳定性,因为其整体性能与内部验证结果保持一致。结论:癌胚抗原(CEA)水平升高已被确定为结直肠癌患者肝转移的重要临床预测因子。此外,对未表现出淋巴结累及的患者进行全身治疗已被发现会增加肝转移的风险。在放疗方面,术前放疗在控制肝转移风险方面似乎比术后放疗更有效。这一发现强调了基于特定临床背景和患者特征优化治疗策略的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Discover. Oncology
Discover. Oncology Medicine-Endocrinology, Diabetes and Metabolism
CiteScore
2.40
自引率
9.10%
发文量
122
审稿时长
5 weeks
期刊最新文献
Bioinformatics and machine learning integration reveals a novel 4-gene (GFUS, ARHGAP8, NBL1, and ACTB) biomarker model for prostate cancer. Targeted therapies reshape extracellular matrix remodeling and microenvironmental regulation in pediatric acute myeloid leukemia. The prognostic significance of early changes in the lung immune prognostic index in patients treated with immunotherapy for advanced-stage lung cancer. Exosomal nanocarriers for targeted drug delivery in cancer therapy. The role of RARG in solid tumor progression and therapeutic potential.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1