{"title":"Prognostic prediction of breast cancer patients using machine learning models: a retrospective analysis.","authors":"Xuchun Song, Jiebin Chu, Zijie Guo, Qun Wei, Qingchuan Wang, Wenxian Hu, Linbo Wang, Wenhe Zhao, Heming Zheng, Xudong Lu, Jichun Zhou","doi":"10.21037/gs-24-106","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Breast cancer is a common and complex disease, with various clinical features affecting prognosis. Accurate prediction of prognosis is essential for guiding personalized treatment strategies. This study aimed to develop machine learning models for predicting prognosis in breast cancer patients using retrospective data.</p><p><strong>Methods: </strong>A total of 6,477 patients from Affiliated Sir Run Run Shaw Hospital were included, and their electronic medical records (EMRs) were thoroughly examined to identify 15 clinical features significantly associated with breast cancer survival. We employed eight different machine learning algorithms, including Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGBoost), to develop and evaluate the predictive performance of the models. In addition, to investigate the sensitivity of different training/testing set radio to model performance, we examined five sets of ratios: 50:50, 60:40, 70:30, 80:20, 90:10.</p><p><strong>Results: </strong>Among these models, XGBoost demonstrated the highest performance with receiver operating characteristic (ROC) area under the curve (AUC) of 0.813, accuracy of 0.739, sensitivity of 0.815, and specificity of 0.735. Further statistical analysis identified several significant predictors of prognosis, including age, tumor size, lymph node status, and hormone receptor status. The XGBoost model was found to exhibit superior predictive power compared to established prognostic models such as the Nottingham Prognostic Index (NPI) and Predict Breast. Based on the successful performance of the XGBoost model, we developed a prognosis prediction tool specifically designed for breast cancer, providing valuable insights to clinicians, and aiding them in making informed treatment decisions tailored to individual patients.</p><p><strong>Conclusions: </strong>Our study highlights the potential of machine learning models in accurately predicting prognosis for breast cancer patients, ultimately facilitating personalized treatment strategies. Further research and validation are warranted to fully integrate these models into clinical practice.</p>","PeriodicalId":12760,"journal":{"name":"Gland surgery","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11480873/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gland surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/gs-24-106","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/27 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Breast cancer is a common and complex disease, with various clinical features affecting prognosis. Accurate prediction of prognosis is essential for guiding personalized treatment strategies. This study aimed to develop machine learning models for predicting prognosis in breast cancer patients using retrospective data.
Methods: A total of 6,477 patients from Affiliated Sir Run Run Shaw Hospital were included, and their electronic medical records (EMRs) were thoroughly examined to identify 15 clinical features significantly associated with breast cancer survival. We employed eight different machine learning algorithms, including Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGBoost), to develop and evaluate the predictive performance of the models. In addition, to investigate the sensitivity of different training/testing set radio to model performance, we examined five sets of ratios: 50:50, 60:40, 70:30, 80:20, 90:10.
Results: Among these models, XGBoost demonstrated the highest performance with receiver operating characteristic (ROC) area under the curve (AUC) of 0.813, accuracy of 0.739, sensitivity of 0.815, and specificity of 0.735. Further statistical analysis identified several significant predictors of prognosis, including age, tumor size, lymph node status, and hormone receptor status. The XGBoost model was found to exhibit superior predictive power compared to established prognostic models such as the Nottingham Prognostic Index (NPI) and Predict Breast. Based on the successful performance of the XGBoost model, we developed a prognosis prediction tool specifically designed for breast cancer, providing valuable insights to clinicians, and aiding them in making informed treatment decisions tailored to individual patients.
Conclusions: Our study highlights the potential of machine learning models in accurately predicting prognosis for breast cancer patients, ultimately facilitating personalized treatment strategies. Further research and validation are warranted to fully integrate these models into clinical practice.
期刊介绍:
Gland Surgery (Gland Surg; GS, Print ISSN 2227-684X; Online ISSN 2227-8575) being indexed by PubMed/PubMed Central, is an open access, peer-review journal launched at May of 2012, published bio-monthly since February 2015.