Fangyuan Zhao, Eric Polley, Julian McClellan, Frederick Howard, Olufunmilayo I Olopade, Dezheng Huo
{"title":"利用机器学习方法预测乳腺癌新辅助化疗的病理完全反应","authors":"Fangyuan Zhao, Eric Polley, Julian McClellan, Frederick Howard, Olufunmilayo I Olopade, Dezheng Huo","doi":"10.1186/s13058-024-01905-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>For patients with breast cancer undergoing neoadjuvant chemotherapy (NACT), most of the existing prediction models of pathologic complete response (pCR) using clinicopathological features were based on standard statistical models like logistic regression, while models based on machine learning mostly utilized imaging data and/or gene expression data. This study aims to develop a robust and accessible machine learning model to predict pCR using clinicopathological features alone, which can be used to facilitate clinical decision-making in diverse settings.</p><p><strong>Methods: </strong>The model was developed and validated within the National Cancer Data Base (NCDB, 2018-2020) and an external cohort at the University of Chicago (2010-2020). We compared logistic regression and machine learning models, and examined whether incorporating quantitative clinicopathological features improved model performance. Decision curve analysis was conducted to assess the model's clinical utility.</p><p><strong>Results: </strong>We identified 56,209 NCDB patients receiving NACT (pCR rate: 34.0%). The machine learning model incorporating quantitative clinicopathological features showed the best discrimination performance among all the fitted models [area under the receiver operating characteristic curve (AUC): 0.785, 95% confidence interval (CI): 0.778-0.792], along with outstanding calibration performance. The model performed best among patients with hormone receptor positive/human epidermal growth factor receptor 2 negative (HR+/HER2-) breast cancer (AUC: 0.817, 95% CI: 0.802-0.832); and by adopting a 7% prediction threshold, the model achieved 90.5% sensitivity and 48.8% specificity, with decision curve analysis finding a 23.1% net reduction in chemotherapy use. In the external testing set of 584 patients (pCR rate: 33.4%), the model maintained robust performance both overall (AUC: 0.711, 95% CI: 0.668-0.753) and in the HR+/HER2- subgroup (AUC: 0.810, 95% CI: 0.742-0.878).</p><p><strong>Conclusions: </strong>The study developed a machine learning model ( https://huolab.cri.uchicago.edu/sample-apps/pcrmodel ) to predict pCR in breast cancer patients undergoing NACT that demonstrated robust discrimination and calibration performance. The model performed particularly well among patients with HR+/HER2- breast cancer, having the potential to identify patients who are less likely to achieve pCR and can consider alternative treatment strategies over chemotherapy. The model can also serve as a robust baseline model that can be integrated with smaller datasets containing additional granular features in future research.</p>","PeriodicalId":49227,"journal":{"name":"Breast Cancer Research","volume":null,"pages":null},"PeriodicalIF":7.4000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520773/pdf/","citationCount":"0","resultStr":"{\"title\":\"Predicting pathologic complete response to neoadjuvant chemotherapy in breast cancer using a machine learning approach.\",\"authors\":\"Fangyuan Zhao, Eric Polley, Julian McClellan, Frederick Howard, Olufunmilayo I Olopade, Dezheng Huo\",\"doi\":\"10.1186/s13058-024-01905-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>For patients with breast cancer undergoing neoadjuvant chemotherapy (NACT), most of the existing prediction models of pathologic complete response (pCR) using clinicopathological features were based on standard statistical models like logistic regression, while models based on machine learning mostly utilized imaging data and/or gene expression data. This study aims to develop a robust and accessible machine learning model to predict pCR using clinicopathological features alone, which can be used to facilitate clinical decision-making in diverse settings.</p><p><strong>Methods: </strong>The model was developed and validated within the National Cancer Data Base (NCDB, 2018-2020) and an external cohort at the University of Chicago (2010-2020). We compared logistic regression and machine learning models, and examined whether incorporating quantitative clinicopathological features improved model performance. Decision curve analysis was conducted to assess the model's clinical utility.</p><p><strong>Results: </strong>We identified 56,209 NCDB patients receiving NACT (pCR rate: 34.0%). The machine learning model incorporating quantitative clinicopathological features showed the best discrimination performance among all the fitted models [area under the receiver operating characteristic curve (AUC): 0.785, 95% confidence interval (CI): 0.778-0.792], along with outstanding calibration performance. The model performed best among patients with hormone receptor positive/human epidermal growth factor receptor 2 negative (HR+/HER2-) breast cancer (AUC: 0.817, 95% CI: 0.802-0.832); and by adopting a 7% prediction threshold, the model achieved 90.5% sensitivity and 48.8% specificity, with decision curve analysis finding a 23.1% net reduction in chemotherapy use. In the external testing set of 584 patients (pCR rate: 33.4%), the model maintained robust performance both overall (AUC: 0.711, 95% CI: 0.668-0.753) and in the HR+/HER2- subgroup (AUC: 0.810, 95% CI: 0.742-0.878).</p><p><strong>Conclusions: </strong>The study developed a machine learning model ( https://huolab.cri.uchicago.edu/sample-apps/pcrmodel ) to predict pCR in breast cancer patients undergoing NACT that demonstrated robust discrimination and calibration performance. The model performed particularly well among patients with HR+/HER2- breast cancer, having the potential to identify patients who are less likely to achieve pCR and can consider alternative treatment strategies over chemotherapy. The model can also serve as a robust baseline model that can be integrated with smaller datasets containing additional granular features in future research.</p>\",\"PeriodicalId\":49227,\"journal\":{\"name\":\"Breast Cancer Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2024-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520773/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Breast Cancer Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s13058-024-01905-7\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Breast Cancer Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13058-024-01905-7","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
Predicting pathologic complete response to neoadjuvant chemotherapy in breast cancer using a machine learning approach.
Background: For patients with breast cancer undergoing neoadjuvant chemotherapy (NACT), most of the existing prediction models of pathologic complete response (pCR) using clinicopathological features were based on standard statistical models like logistic regression, while models based on machine learning mostly utilized imaging data and/or gene expression data. This study aims to develop a robust and accessible machine learning model to predict pCR using clinicopathological features alone, which can be used to facilitate clinical decision-making in diverse settings.
Methods: The model was developed and validated within the National Cancer Data Base (NCDB, 2018-2020) and an external cohort at the University of Chicago (2010-2020). We compared logistic regression and machine learning models, and examined whether incorporating quantitative clinicopathological features improved model performance. Decision curve analysis was conducted to assess the model's clinical utility.
Results: We identified 56,209 NCDB patients receiving NACT (pCR rate: 34.0%). The machine learning model incorporating quantitative clinicopathological features showed the best discrimination performance among all the fitted models [area under the receiver operating characteristic curve (AUC): 0.785, 95% confidence interval (CI): 0.778-0.792], along with outstanding calibration performance. The model performed best among patients with hormone receptor positive/human epidermal growth factor receptor 2 negative (HR+/HER2-) breast cancer (AUC: 0.817, 95% CI: 0.802-0.832); and by adopting a 7% prediction threshold, the model achieved 90.5% sensitivity and 48.8% specificity, with decision curve analysis finding a 23.1% net reduction in chemotherapy use. In the external testing set of 584 patients (pCR rate: 33.4%), the model maintained robust performance both overall (AUC: 0.711, 95% CI: 0.668-0.753) and in the HR+/HER2- subgroup (AUC: 0.810, 95% CI: 0.742-0.878).
Conclusions: The study developed a machine learning model ( https://huolab.cri.uchicago.edu/sample-apps/pcrmodel ) to predict pCR in breast cancer patients undergoing NACT that demonstrated robust discrimination and calibration performance. The model performed particularly well among patients with HR+/HER2- breast cancer, having the potential to identify patients who are less likely to achieve pCR and can consider alternative treatment strategies over chemotherapy. The model can also serve as a robust baseline model that can be integrated with smaller datasets containing additional granular features in future research.
期刊介绍:
Breast Cancer Research, an international, peer-reviewed online journal, publishes original research, reviews, editorials, and reports. It features open-access research articles of exceptional interest across all areas of biology and medicine relevant to breast cancer. This includes normal mammary gland biology, with a special emphasis on the genetic, biochemical, and cellular basis of breast cancer. In addition to basic research, the journal covers preclinical, translational, and clinical studies with a biological basis, including Phase I and Phase II trials.