Development and Validation of an Interpretable Machine Learning Prediction Model for Total Pathological Complete Response after Neoadjuvant Chemotherapy in Locally Advanced Breast Cancer: Multicenter Retrospective Analysis.

IF 3.3 3区 医学 Q2 ONCOLOGY Journal of Cancer Pub Date : 2024-08-01 eCollection Date: 2024-01-01 DOI:10.7150/jca.97190
Ziran Zhang, Bo Cao, Jinghua Wu, Chengtian Feng
{"title":"Development and Validation of an Interpretable Machine Learning Prediction Model for Total Pathological Complete Response after Neoadjuvant Chemotherapy in Locally Advanced Breast Cancer: Multicenter Retrospective Analysis.","authors":"Ziran Zhang, Bo Cao, Jinghua Wu, Chengtian Feng","doi":"10.7150/jca.97190","DOIUrl":null,"url":null,"abstract":"<p><p><b>Objective:</b> This study aims to develop an interpretable machine learning (ML) model to accurately predict the probability of achieving total pathological complete response (tpCR) in patients with locally advanced breast cancer (LABC) following neoadjuvant chemotherapy (NAC). <b>Methods:</b> This multi-center retrospective study included pre-NAC clinical pathology data from 698 LABC patients. Post-operative pathological outcomes divided patients into tpCR and non-tpCR groups. Data from 586 patients at Shanghai Ruijin Hospital were randomly assigned to a training set (80%) and a test set (20%). In comparison, data from our hospital's remaining 112 patients were used for external validation. Variable selection was performed using the Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis. Predictive models were constructed using six ML algorithms: decision trees, K-nearest neighbors (KNN), support vector machine, light gradient boosting machine, and extreme gradient boosting. Model efficacy was assessed through various metrics, including receiver operating characteristic (ROC) curves, precision-recall (PR) curves, confusion matrices, calibration plots, and decision curve analysis (DCA). The best-performing model was selected by comparing the performance of different algorithms. Moreover, variable relevance was ranked using the SHapley Additive exPlanations (SHAP) technique to improve the interpretability of the model and solve the \"black box\" problem. <b>Results:</b> A total of 191 patients (32.59%) achieved tpCR following NAC. Through LASSO regression analysis, five variables were identified as predictive factors for model construction, including tumor size, Ki-67, molecular subtype, targeted therapy, and chemotherapy regimen. The KNN model outperformed the other five classifier algorithms, achieving area under the curve (AUC) values of 0.847 (95% CI: 0.809-0.883) in the training set, 0.763 (95% CI: 0.670-0.856) in the test set, and 0.665 (95% CI: 0.555-0.776) in the external validation set. DCA demonstrated that the KNN model yielded the highest net advantage through a wide range of threshold probabilities in both the training and test sets. Furthermore, the analysis of the KNN model utilizing SHAP technology demonstrated that targeted therapy is the most crucial factor in predicting tpCR. <b>Conclusion:</b> An ML prediction model using clinical and pathological data collected before NAC was developed and verified. This model accurately predicted the probability of achieving a tpCR in patients with LABC after receiving NAC. SHAP technology enhanced the interpretability of the model and assisted in clinical decision-making and therapy optimization.</p>","PeriodicalId":15183,"journal":{"name":"Journal of Cancer","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11310874/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cancer","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.7150/jca.97190","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: This study aims to develop an interpretable machine learning (ML) model to accurately predict the probability of achieving total pathological complete response (tpCR) in patients with locally advanced breast cancer (LABC) following neoadjuvant chemotherapy (NAC). Methods: This multi-center retrospective study included pre-NAC clinical pathology data from 698 LABC patients. Post-operative pathological outcomes divided patients into tpCR and non-tpCR groups. Data from 586 patients at Shanghai Ruijin Hospital were randomly assigned to a training set (80%) and a test set (20%). In comparison, data from our hospital's remaining 112 patients were used for external validation. Variable selection was performed using the Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis. Predictive models were constructed using six ML algorithms: decision trees, K-nearest neighbors (KNN), support vector machine, light gradient boosting machine, and extreme gradient boosting. Model efficacy was assessed through various metrics, including receiver operating characteristic (ROC) curves, precision-recall (PR) curves, confusion matrices, calibration plots, and decision curve analysis (DCA). The best-performing model was selected by comparing the performance of different algorithms. Moreover, variable relevance was ranked using the SHapley Additive exPlanations (SHAP) technique to improve the interpretability of the model and solve the "black box" problem. Results: A total of 191 patients (32.59%) achieved tpCR following NAC. Through LASSO regression analysis, five variables were identified as predictive factors for model construction, including tumor size, Ki-67, molecular subtype, targeted therapy, and chemotherapy regimen. The KNN model outperformed the other five classifier algorithms, achieving area under the curve (AUC) values of 0.847 (95% CI: 0.809-0.883) in the training set, 0.763 (95% CI: 0.670-0.856) in the test set, and 0.665 (95% CI: 0.555-0.776) in the external validation set. DCA demonstrated that the KNN model yielded the highest net advantage through a wide range of threshold probabilities in both the training and test sets. Furthermore, the analysis of the KNN model utilizing SHAP technology demonstrated that targeted therapy is the most crucial factor in predicting tpCR. Conclusion: An ML prediction model using clinical and pathological data collected before NAC was developed and verified. This model accurately predicted the probability of achieving a tpCR in patients with LABC after receiving NAC. SHAP technology enhanced the interpretability of the model and assisted in clinical decision-making and therapy optimization.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
局部晚期乳腺癌新辅助化疗后总病理完全缓解的可解释机器学习预测模型的开发与验证:多中心回顾性分析
研究目的本研究旨在开发一种可解释的机器学习(ML)模型,以准确预测局部晚期乳腺癌(LABC)患者在接受新辅助化疗(NAC)后获得总病理完全反应(tpCR)的概率。研究方法这项多中心回顾性研究纳入了698名局部晚期乳腺癌患者的新辅助化疗前临床病理学数据。术后病理结果将患者分为tpCR组和非tpCR组。来自上海瑞金医院 586 名患者的数据被随机分配到训练集(80%)和测试集(20%)中。相比之下,本医院剩余的 112 名患者的数据则用于外部验证。变量选择采用最小绝对收缩和选择操作符(LASSO)回归分析法。预测模型的构建采用了六种多层运算法则:决策树、K-近邻(KNN)、支持向量机、轻梯度提升机和极梯度提升。模型功效通过各种指标进行评估,包括接收者操作特征曲线(ROC)、精确度-召回曲线(PR)、混淆矩阵、校准图和决策曲线分析(DCA)。通过比较不同算法的性能,选出性能最佳的模型。此外,还使用 SHapley Additive exPlanations(SHAP)技术对变量相关性进行了排序,以提高模型的可解释性并解决 "黑箱 "问题。研究结果共有 191 名患者(32.59%)在接受 NAC 治疗后获得了 tpCR。通过 LASSO 回归分析,五个变量被确定为构建模型的预测因素,包括肿瘤大小、Ki-67、分子亚型、靶向治疗和化疗方案。KNN 模型的表现优于其他五种分类器算法,训练集的曲线下面积(AUC)值为 0.847(95% CI:0.809-0.883),测试集的曲线下面积(AUC)值为 0.763(95% CI:0.670-0.856),外部验证集的曲线下面积(AUC)值为 0.665(95% CI:0.555-0.776)。DCA 表明,在训练集和测试集中,KNN 模型在各种阈值概率下都能产生最高的净优势。此外,利用 SHAP 技术对 KNN 模型进行的分析表明,靶向治疗是预测 tpCR 的最关键因素。结论利用 NAC 前收集的临床和病理数据开发并验证了一个 ML 预测模型。该模型准确预测了 LABC 患者在接受 NAC 治疗后获得 tpCR 的概率。SHAP技术增强了模型的可解释性,有助于临床决策和治疗优化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Cancer
Journal of Cancer ONCOLOGY-
CiteScore
8.10
自引率
2.60%
发文量
333
审稿时长
12 weeks
期刊介绍: Journal of Cancer is an open access, peer-reviewed journal with broad scope covering all areas of cancer research, especially novel concepts, new methods, new regimens, new therapeutic agents, and alternative approaches for early detection and intervention of cancer. The Journal is supported by an international editorial board consisting of a distinguished team of cancer researchers. Journal of Cancer aims at rapid publication of high quality results in cancer research while maintaining rigorous peer-review process.
期刊最新文献
Up-regulated SLC25A39 promotes cell growth and metastasis via regulating ROS production in colorectal cancer. Erratum: The role of β-catenin in the initiation and metastasis of TA2 mice spontaneous breast cancer: Erratum. A survival nomogram involving nutritional-inflammatory indicators for cervical cancer patients receiving adjuvant radiotherapy. Comparison of the Efficacy and Safety of Axi-Cel and Tisa-Cel Based on Meta-Analysis. Daphnoretin inhibits glioblastoma cell proliferation and metastasis via PI3K/AKT signaling pathway inactivation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1