Construction and interpretation of weight-balanced enhanced machine learning models for predicting liver metastasis risk in colorectal cancer patients.
{"title":"Construction and interpretation of weight-balanced enhanced machine learning models for predicting liver metastasis risk in colorectal cancer patients.","authors":"Qunzhe Ding, Chenyang Li, Chendong Wang, Qunzhe Ding","doi":"10.1007/s12672-025-01871-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Colorectal cancer (CRC) is a major contributor to cancer-related mortality, with liver metastases developing in approximately 25% of affected individuals. The presence of liver metastasis significantly deteriorates the prognosis for patients. The objective of this study is to predict liver metastasis in CRC patients by developing machine learning (ML)-based models, thereby aiding clinicians in the decision-making process for appropriate interventions.</p><p><strong>Methods: </strong>Retrospective analysis was performed using the Surveillance, Epidemiology, and End Results (SEER) database, and cases with CRC from 2010 to 2015 were extracted to the downstream analysis. Logistic regression (LR), Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and LightGBM are applied to develop machine learning (ML) models to predict liver metastasis of CRC patient. To optimize the models, an improved weight-balancing algorithm was employed, enhancing the performance of the classifiers. The six models were tenfold cross-validated, and the optimal model was selected based on a combination of performance metrics. Shapley additive explanation (SHAP) was utilized to interpret the best-performing ML models globally, locally, and interactively. To ensure the model's reliability and generalizability, an external validation cohort of CRC cases from 2018 to 2021, obtained from a separate SEER database, was used for external evaluation.</p><p><strong>Results: </strong>In total, 50,062 patients with CRC were included in the analysis, with 5604 patients occurring liver metastasis. Among the six models evaluated, the CatBoost model showed excellent performance with the highest AUC of 0.8844. Moreover, the CatBoost model also outperformed the others in terms of recall (0.8060) and F1-score (0.6736). SHAP-based summary and force plots were used to interpret the CatBoost model. The interpretability analysis revealed that elevated carcinoembryonic antigen (CEA) levels, systemic therapy, N and T stages, and chemotherapy performed were the most significant indicators for predicting liver metastasis according to the optimal model. Furthermore, systemic therapy was suggested to increase liver metastasis risk in N0 stage patients, while it appeared to be beneficial in patients with lymph node metastasis. Preoperative radiation therapy was found to be more effective than postoperative radiation therapy. Validation using an external cohort of CRC cases from 2018 to 2021 further confirmed the robustness and stability of the CatBoost model, as its overall performance remained consistent with the internal validation results.</p><p><strong>Conclusion: </strong>Elevated levels of carcinoembryonic antigen (CEA) have been identified as a crucial clinical predictor for liver metastasis in CRC patients. Furthermore, the administration of systemic therapy to patients who do not exhibit lymph node involvement has been found to increase the risk of liver metastasis. In terms of radiation therapy, preoperative radiation appears to be more efficacious in controlling the risk of liver metastasis compared to postoperative radiation. This finding underscores the importance of optimizing treatment strategies based on the specific clinical context and patient characteristics.</p>","PeriodicalId":11148,"journal":{"name":"Discover. Oncology","volume":"16 1","pages":"164"},"PeriodicalIF":2.8000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Discover. Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s12672-025-01871-2","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Colorectal cancer (CRC) is a major contributor to cancer-related mortality, with liver metastases developing in approximately 25% of affected individuals. The presence of liver metastasis significantly deteriorates the prognosis for patients. The objective of this study is to predict liver metastasis in CRC patients by developing machine learning (ML)-based models, thereby aiding clinicians in the decision-making process for appropriate interventions.
Methods: Retrospective analysis was performed using the Surveillance, Epidemiology, and End Results (SEER) database, and cases with CRC from 2010 to 2015 were extracted to the downstream analysis. Logistic regression (LR), Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and LightGBM are applied to develop machine learning (ML) models to predict liver metastasis of CRC patient. To optimize the models, an improved weight-balancing algorithm was employed, enhancing the performance of the classifiers. The six models were tenfold cross-validated, and the optimal model was selected based on a combination of performance metrics. Shapley additive explanation (SHAP) was utilized to interpret the best-performing ML models globally, locally, and interactively. To ensure the model's reliability and generalizability, an external validation cohort of CRC cases from 2018 to 2021, obtained from a separate SEER database, was used for external evaluation.
Results: In total, 50,062 patients with CRC were included in the analysis, with 5604 patients occurring liver metastasis. Among the six models evaluated, the CatBoost model showed excellent performance with the highest AUC of 0.8844. Moreover, the CatBoost model also outperformed the others in terms of recall (0.8060) and F1-score (0.6736). SHAP-based summary and force plots were used to interpret the CatBoost model. The interpretability analysis revealed that elevated carcinoembryonic antigen (CEA) levels, systemic therapy, N and T stages, and chemotherapy performed were the most significant indicators for predicting liver metastasis according to the optimal model. Furthermore, systemic therapy was suggested to increase liver metastasis risk in N0 stage patients, while it appeared to be beneficial in patients with lymph node metastasis. Preoperative radiation therapy was found to be more effective than postoperative radiation therapy. Validation using an external cohort of CRC cases from 2018 to 2021 further confirmed the robustness and stability of the CatBoost model, as its overall performance remained consistent with the internal validation results.
Conclusion: Elevated levels of carcinoembryonic antigen (CEA) have been identified as a crucial clinical predictor for liver metastasis in CRC patients. Furthermore, the administration of systemic therapy to patients who do not exhibit lymph node involvement has been found to increase the risk of liver metastasis. In terms of radiation therapy, preoperative radiation appears to be more efficacious in controlling the risk of liver metastasis compared to postoperative radiation. This finding underscores the importance of optimizing treatment strategies based on the specific clinical context and patient characteristics.