Muniraj Gupta, Nidhi Verma, Naveen Sharma, Satyendra Narayan Singh, R K Brojen Singh, Saurabh Kumar Sharma
{"title":"Deep transfer learning hybrid techniques for precision in breast cancer tumor histopathology classification.","authors":"Muniraj Gupta, Nidhi Verma, Naveen Sharma, Satyendra Narayan Singh, R K Brojen Singh, Saurabh Kumar Sharma","doi":"10.1007/s13755-025-00337-7","DOIUrl":null,"url":null,"abstract":"<p><p>The breast cancer is one of the most prevalent causes of cancer-related death globally. Preliminary diagnosis of breast cancer increases the patient's chances of survival. Breast cancer classification is a challenging problem due to dense tissue structures, subtle variations, cellular heterogeneity, artifacts, and variability. In this paper, we propose three hybrid deep-transfer learning models for breast cancer classification using histopathology images. These models use Xception model as a base model, and we add seven more layers to fine-tune the base model. We also performed an extensive comparative analysis of five prominent machine-learning classifiers, namely Random Forest Classifier (RFC), Logistic Regression (LR), Support Vector Classifier (SVC), K-Nearest Neighbors (KNN), and Ada-boost. We incorporate the best performing two classifiers, namely RFC and SVC, in the fine-tuned Xception model, and accordingly, they are named as Xception Random Forest (XRF) and Xception Support Vector (XSV), respectively. The fine-tuned Xception model with softmax classifier is termed as Multi-layer Xception Classifier (MXC). These three models are evaluated on the two publically available datasets: BreakHis and Breast Histopathology Images Database (BHID). Our all three models perform better than the state-of-the-art methods. The XRF provides the best performance at the 40 × magnification level on the BreakHis dataset, with an accuracy (ACC) of 94.44%, F1 score (F1) of 94.44%, area under the receiver operating characteristic curve (AUC) of 95.12%, Matthew's correlation coefficient (MCC) of 88.98%, kappa (K) of 88.88%, and classification success index (CSI) of 89.23%. The MXC provides the best performance on the BHID dataset, with an ACC of 88.50%, F1 of 88.50%, AUC of 95.12%, MCC of 77.03%, K of 77.00%, and CSI of 79.13%. Further, to validate our models, we performed fivefold cross-validation on both datasets and obtained similar results.</p>","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"13 1","pages":"20"},"PeriodicalIF":4.7000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11813847/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Information Science and Systems","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s13755-025-00337-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/12/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
The breast cancer is one of the most prevalent causes of cancer-related death globally. Preliminary diagnosis of breast cancer increases the patient's chances of survival. Breast cancer classification is a challenging problem due to dense tissue structures, subtle variations, cellular heterogeneity, artifacts, and variability. In this paper, we propose three hybrid deep-transfer learning models for breast cancer classification using histopathology images. These models use Xception model as a base model, and we add seven more layers to fine-tune the base model. We also performed an extensive comparative analysis of five prominent machine-learning classifiers, namely Random Forest Classifier (RFC), Logistic Regression (LR), Support Vector Classifier (SVC), K-Nearest Neighbors (KNN), and Ada-boost. We incorporate the best performing two classifiers, namely RFC and SVC, in the fine-tuned Xception model, and accordingly, they are named as Xception Random Forest (XRF) and Xception Support Vector (XSV), respectively. The fine-tuned Xception model with softmax classifier is termed as Multi-layer Xception Classifier (MXC). These three models are evaluated on the two publically available datasets: BreakHis and Breast Histopathology Images Database (BHID). Our all three models perform better than the state-of-the-art methods. The XRF provides the best performance at the 40 × magnification level on the BreakHis dataset, with an accuracy (ACC) of 94.44%, F1 score (F1) of 94.44%, area under the receiver operating characteristic curve (AUC) of 95.12%, Matthew's correlation coefficient (MCC) of 88.98%, kappa (K) of 88.88%, and classification success index (CSI) of 89.23%. The MXC provides the best performance on the BHID dataset, with an ACC of 88.50%, F1 of 88.50%, AUC of 95.12%, MCC of 77.03%, K of 77.00%, and CSI of 79.13%. Further, to validate our models, we performed fivefold cross-validation on both datasets and obtained similar results.
期刊介绍:
Health Information Science and Systems is a multidisciplinary journal that integrates artificial intelligence/computer science/information technology with health science and services, embracing information science research coupled with topics related to the modeling, design, development, integration and management of health information systems, smart health, artificial intelligence in medicine, and computer aided diagnosis, medical expert systems. The scope includes: i.) smart health, artificial Intelligence in medicine, computer aided diagnosis, medical image processing, medical expert systems ii.) medical big data, medical/health/biomedicine information resources such as patient medical records, devices and equipments, software and tools to capture, store, retrieve, process, analyze, optimize the use of information in the health domain, iii.) data management, data mining, and knowledge discovery, all of which play a key role in decision making, management of public health, examination of standards, privacy and security issues, iv.) development of new architectures and applications for health information systems.