Xiaojuan Qin, Wei Yang, Xiaoping Zhou, Yan Yang, Ningmei Zhang
{"title":"A Machine Learning Model for Predicting the HER2 Positive Expression of Breast Cancer Based on Clinicopathological and Imaging Features.","authors":"Xiaojuan Qin, Wei Yang, Xiaoping Zhou, Yan Yang, Ningmei Zhang","doi":"10.1016/j.acra.2025.01.001","DOIUrl":null,"url":null,"abstract":"<p><strong>Rationale and objectives: </strong>To develop a machine learning (ML) model based on clinicopathological and imaging features to predict the Human Epidermal Growth Factor Receptor 2 (HER2) positive expression (HER2-p) of breast cancer (BC), and to compare its performance with that of a logistic regression (LR) model.</p><p><strong>Materials and methods: </strong>A total of 2541 consecutive female patients with pathologically confirmed primary breast lesions were enrolled in this study. Based on chronological order, 2034 patients treated between January 2018 and December 2022 were designated as the retrospective development cohort, while 507 patients treated between January 2023 and May 2024 were designated as the prospective validation cohort. The patients were randomly divided into a train cohort (n=1628) and a test cohort (n=406) in an 8:2 ratio within the development cohort. Pretreatment mammography (MG) and breast MRI data, along with clinicopathological features, were recorded. Extreme Gradient Boosting (XGBoost) in combination with Artificial Neural Network (ANN) and multivariate LR analyses were employed to extract features associated with HER2 positivity in BC and to develop an ANN model (using XGBoost features) and an LR model, respectively. The predictive value was assessed using a receiver operating characteristic (ROC) curve.</p><p><strong>Results: </strong>Following the application of Recursive Feature Elimination with Cross-Validation (RFE-CV) for feature dimensionality reduction, the XGBoost algorithm identified tumor size, suspicious calcifications, Ki-67 index, spiculation, and minimum apparent diffusion coefficient (minimum ADC) as key feature subsets indicative of HER2-p in BC. The constructed ANN model consistently outperformed the LR model, achieving the area under the curve (AUC) of 0.853 (95% CI: 0.837-0.872) in the train cohort, 0.821 (95% CI: 0.798-0.853) in the test cohort, and 0.809 (95% CI: 0.776-0.841) in the validation cohort.</p><p><strong>Conclusion: </strong>The ANN model, built using the significant feature subsets identified by the XGBoost algorithm with RFE-CV, demonstrates potential in predicting HER2-p in BC.</p>","PeriodicalId":50928,"journal":{"name":"Academic Radiology","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Academic Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.acra.2025.01.001","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Rationale and objectives: To develop a machine learning (ML) model based on clinicopathological and imaging features to predict the Human Epidermal Growth Factor Receptor 2 (HER2) positive expression (HER2-p) of breast cancer (BC), and to compare its performance with that of a logistic regression (LR) model.
Materials and methods: A total of 2541 consecutive female patients with pathologically confirmed primary breast lesions were enrolled in this study. Based on chronological order, 2034 patients treated between January 2018 and December 2022 were designated as the retrospective development cohort, while 507 patients treated between January 2023 and May 2024 were designated as the prospective validation cohort. The patients were randomly divided into a train cohort (n=1628) and a test cohort (n=406) in an 8:2 ratio within the development cohort. Pretreatment mammography (MG) and breast MRI data, along with clinicopathological features, were recorded. Extreme Gradient Boosting (XGBoost) in combination with Artificial Neural Network (ANN) and multivariate LR analyses were employed to extract features associated with HER2 positivity in BC and to develop an ANN model (using XGBoost features) and an LR model, respectively. The predictive value was assessed using a receiver operating characteristic (ROC) curve.
Results: Following the application of Recursive Feature Elimination with Cross-Validation (RFE-CV) for feature dimensionality reduction, the XGBoost algorithm identified tumor size, suspicious calcifications, Ki-67 index, spiculation, and minimum apparent diffusion coefficient (minimum ADC) as key feature subsets indicative of HER2-p in BC. The constructed ANN model consistently outperformed the LR model, achieving the area under the curve (AUC) of 0.853 (95% CI: 0.837-0.872) in the train cohort, 0.821 (95% CI: 0.798-0.853) in the test cohort, and 0.809 (95% CI: 0.776-0.841) in the validation cohort.
Conclusion: The ANN model, built using the significant feature subsets identified by the XGBoost algorithm with RFE-CV, demonstrates potential in predicting HER2-p in BC.
期刊介绍:
Academic Radiology publishes original reports of clinical and laboratory investigations in diagnostic imaging, the diagnostic use of radioactive isotopes, computed tomography, positron emission tomography, magnetic resonance imaging, ultrasound, digital subtraction angiography, image-guided interventions and related techniques. It also includes brief technical reports describing original observations, techniques, and instrumental developments; state-of-the-art reports on clinical issues, new technology and other topics of current medical importance; meta-analyses; scientific studies and opinions on radiologic education; and letters to the Editor.