Comparing Random Survival Forests and Cox Regression for Nonresponders to Neoadjuvant Chemotherapy Among Patients With Breast Cancer: Multicenter Retrospective Cohort Study.
Yudi Jin, Min Zhao, Tong Su, Yanjia Fan, Zubin Ouyang, Fajin Lv
{"title":"Comparing Random Survival Forests and Cox Regression for Nonresponders to Neoadjuvant Chemotherapy Among Patients With Breast Cancer: Multicenter Retrospective Cohort Study.","authors":"Yudi Jin, Min Zhao, Tong Su, Yanjia Fan, Zubin Ouyang, Fajin Lv","doi":"10.2196/69864","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Breast cancer is one of the most common malignancies among women worldwide. Patients who do not achieve a pathological complete response (pCR) or a clinical complete response (cCR) post-neoadjuvant chemotherapy (NAC) typically have a worse prognosis compared to those who do achieve these responses.</p><p><strong>Objective: </strong>This study aimed to develop and validate a random survival forest (RSF) model to predict survival risk in patients with breast cancer who do not achieve a pCR or cCR post-NAC.</p><p><strong>Methods: </strong>We analyzed patients with no pCR/cCR post-NAC treated at the First Affiliated Hospital of Chongqing Medical University from January 2019 to 2023, with external validation in Duke University and Surveillance, Epidemiology, and End Results (SEER) cohorts. RSF and Cox regression models were compared using the time-dependent area under the curve (AUC), the concordance index (C-index), and risk stratification.</p><p><strong>Results: </strong>The study cohort included 306 patients with breast cancer, with most aged 40-60 years (204/306, 66.7%). The majority had invasive ductal carcinoma (290/306, 94.8%), with estrogen receptor (ER)+ (182/306, 59.5%), progesterone receptor (PR)- (179/306, 58.5%), and human epidermal growth factor receptor 2 (HER2)+ (94/306, 30.7%) profiles. Most patients presented with T2 (185/306, 60.5%), N1 (142/306, 46.4%), and M0 (295/306, 96.4%) staging (TNM meaning \"tumor, node, metastasis\"), with 17.6% (54/306) experiencing disease progression during a median follow-up of 25.9 months (IQR 17.2-36.3). External validation using Duke (N=94) and SEER (N=2760) cohorts confirmed consistent patterns in age (40-60 years: 59/94, 63%, vs 1480/2760, 53.6%), HER2+ rates (26/94, 28%, vs 935/2760, 33.9%), and invasive ductal carcinoma prevalence (89/94, 95%, vs 2506/2760, 90.8%). In the internal cohort, the RSF achieved significantly higher time-dependent AUCs compared to Cox regression at 1-year (0.811 vs 0.763), 3-year (0.834 vs 0.783), and 5-year (0.810 vs 0.771) intervals (overall C-index: 0.803, 95% CI 0.747-0.859, vs 0.736, 95% CI 0.673-0.799). External validation confirmed robust generalizability: the Duke cohort showed 1-, 3-, and 5-year AUCs of 0.912, 0.803, and 0.776, respectively, while the SEER cohort maintained consistent performance with AUCs of 0.771, 0.729, and 0.702, respectively. Risk stratification using the RSF identified 25.8% (79/306) high-risk patients and a significantly reduced survival time (P<.001). Notably, the RSF maintained improved net benefits across decision thresholds in decision curve analysis (DCA); similar results were observed in external studies. The RSF model also showed promising performance across different molecular subtypes in all datasets. Based on the RSF predicted scores, patients were stratified into high- and low-risk groups, with notably poorer survival outcomes observed in the high-risk group compared to the low-risk group.</p><p><strong>Conclusions: </strong>The RSF model, based solely on clinicopathological variables, provides a promising tool for identifying high-risk patients with breast cancer post-NAC. This approach may facilitate personalized treatment strategies and improve patient management in clinical practice.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e69864"},"PeriodicalIF":6.0000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12015342/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/69864","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Breast cancer is one of the most common malignancies among women worldwide. Patients who do not achieve a pathological complete response (pCR) or a clinical complete response (cCR) post-neoadjuvant chemotherapy (NAC) typically have a worse prognosis compared to those who do achieve these responses.
Objective: This study aimed to develop and validate a random survival forest (RSF) model to predict survival risk in patients with breast cancer who do not achieve a pCR or cCR post-NAC.
Methods: We analyzed patients with no pCR/cCR post-NAC treated at the First Affiliated Hospital of Chongqing Medical University from January 2019 to 2023, with external validation in Duke University and Surveillance, Epidemiology, and End Results (SEER) cohorts. RSF and Cox regression models were compared using the time-dependent area under the curve (AUC), the concordance index (C-index), and risk stratification.
Results: The study cohort included 306 patients with breast cancer, with most aged 40-60 years (204/306, 66.7%). The majority had invasive ductal carcinoma (290/306, 94.8%), with estrogen receptor (ER)+ (182/306, 59.5%), progesterone receptor (PR)- (179/306, 58.5%), and human epidermal growth factor receptor 2 (HER2)+ (94/306, 30.7%) profiles. Most patients presented with T2 (185/306, 60.5%), N1 (142/306, 46.4%), and M0 (295/306, 96.4%) staging (TNM meaning "tumor, node, metastasis"), with 17.6% (54/306) experiencing disease progression during a median follow-up of 25.9 months (IQR 17.2-36.3). External validation using Duke (N=94) and SEER (N=2760) cohorts confirmed consistent patterns in age (40-60 years: 59/94, 63%, vs 1480/2760, 53.6%), HER2+ rates (26/94, 28%, vs 935/2760, 33.9%), and invasive ductal carcinoma prevalence (89/94, 95%, vs 2506/2760, 90.8%). In the internal cohort, the RSF achieved significantly higher time-dependent AUCs compared to Cox regression at 1-year (0.811 vs 0.763), 3-year (0.834 vs 0.783), and 5-year (0.810 vs 0.771) intervals (overall C-index: 0.803, 95% CI 0.747-0.859, vs 0.736, 95% CI 0.673-0.799). External validation confirmed robust generalizability: the Duke cohort showed 1-, 3-, and 5-year AUCs of 0.912, 0.803, and 0.776, respectively, while the SEER cohort maintained consistent performance with AUCs of 0.771, 0.729, and 0.702, respectively. Risk stratification using the RSF identified 25.8% (79/306) high-risk patients and a significantly reduced survival time (P<.001). Notably, the RSF maintained improved net benefits across decision thresholds in decision curve analysis (DCA); similar results were observed in external studies. The RSF model also showed promising performance across different molecular subtypes in all datasets. Based on the RSF predicted scores, patients were stratified into high- and low-risk groups, with notably poorer survival outcomes observed in the high-risk group compared to the low-risk group.
Conclusions: The RSF model, based solely on clinicopathological variables, provides a promising tool for identifying high-risk patients with breast cancer post-NAC. This approach may facilitate personalized treatment strategies and improve patient management in clinical practice.
背景:乳腺癌是世界范围内女性最常见的恶性肿瘤之一。新辅助化疗(NAC)后未达到病理完全缓解(pCR)或临床完全缓解(cCR)的患者与达到这些缓解的患者相比,预后通常更差。目的:本研究旨在建立和验证随机生存森林(RSF)模型,以预测nac后未达到pCR或cCR的乳腺癌患者的生存风险。方法:我们分析了2019年1月至2023年重庆医科大学第一附属医院nac后无pCR/cCR的患者,并在杜克大学和监测、流行病学和最终结果(SEER)队列中进行了外部验证。采用随时间变化的曲线下面积(AUC)、一致性指数(C-index)和风险分层对RSF和Cox回归模型进行比较。结果:研究队列包括306例乳腺癌患者,其中40-60岁患者居多(204/306,66.7%)。多数为浸润性导管癌(290/306,94.8%),其中雌激素受体(ER)+(182/306, 59.5%)、孕激素受体(PR)-(179/306, 58.5%)和人表皮生长因子受体2 (HER2)+(94/306, 30.7%)。大多数患者表现为T2(185/306, 60.5%)、N1(142/306, 46.4%)和M0(295/306, 96.4%)分期(TNM意味着“肿瘤、淋巴结、转移”),17.6%(54/306)的患者在中位随访25.9个月(IQR为17.2-36.3)期间出现疾病进展。使用Duke (N=94)和SEER (N=2760)队列进行的外部验证证实了年龄(40-60岁:59/94,63%,vs 1480/2760, 53.6%)、HER2+率(26/94,28%,vs 935/2760, 33.9%)和浸润性导管癌患病率(89/94,95%,vs 2506/2760, 90.8%)的一致模式。在内部队列中,与Cox回归相比,RSF在1年(0.811 vs 0.763)、3年(0.834 vs 0.783)和5年(0.810 vs 0.771)的时间依赖性auc显著更高(总c指数:0.803,95% CI 0.747-0.859, 95% CI 0.736, 95% CI 0.673-0.799)。外部验证证实了稳健的可推广性:Duke队列的1年、3年和5年auc分别为0.912、0.803和0.776,而SEER队列的auc保持一致,分别为0.771、0.729和0.702。使用RSF进行风险分层确定了25.8%(79/306)的高危患者和显著缩短的生存时间(p结论:RSF模型仅基于临床病理变量,为识别nac后乳腺癌高危患者提供了一种有希望的工具。这种方法可以促进个性化的治疗策略,并改善临床实践中的患者管理。
期刊介绍:
The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades.
As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor.
Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.