{"title":"Comparative analysis of features and classification techniques in breast cancer detection for Biglycan biomarker images.","authors":"Jumana Ma'touq, Nasim Alnuman","doi":"10.3233/CBM-230544","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Breast cancer (BC) is considered the world's most prevalent cancer. Early diagnosis of BC enables patients to receive better care and treatment, hence lowering patient mortality rates. Breast lesion identification and classification are challenging even for experienced radiologists due to the complexity of breast tissue and variations in lesion presentations.</p><p><strong>Objective: </strong>This work aims to investigate appropriate features and classification techniques for accurate breast cancer detection in 336 Biglycan biomarker images.</p><p><strong>Methods: </strong>The Biglycan biomarker images were retrieved from the Mendeley Data website (Repository name: Biglycan breast cancer dataset). Five features were extracted and compared based on shape characteristics (i.e., Harris Points and Minimum Eigenvalue (MinEigen) Points), frequency domain characteristics (i.e., The Two-dimensional Fourier Transform and the Wavelet Transform), and statistical characteristics (i.e., histogram). Six different commonly used classification algorithms were used; i.e., K-nearest neighbours (k-NN), Naïve Bayes (NB), Pseudo-Linear Discriminate Analysis (pl-DA), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF).</p><p><strong>Results: </strong>The histogram of greyscale images showed the best performance for the k-NN (97.6%), SVM (95.8%), and RF (95.3%) classifiers. Additionally, among the five features, the greyscale histogram feature achieved the best accuracy in all classifiers with a maximum accuracy of 97.6%, while the wavelet feature provided a promising accuracy in most classifiers (up to 94.6%).</p><p><strong>Conclusion: </strong>Machine learning demonstrates high accuracy in estimating cancer and such technology can assist doctors in the analysis of routine medical images and biopsy samples to improve early diagnosis and risk stratification.</p>","PeriodicalId":56320,"journal":{"name":"Cancer Biomarkers","volume":" ","pages":"263-273"},"PeriodicalIF":2.2000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11380270/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Biomarkers","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3233/CBM-230544","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Breast cancer (BC) is considered the world's most prevalent cancer. Early diagnosis of BC enables patients to receive better care and treatment, hence lowering patient mortality rates. Breast lesion identification and classification are challenging even for experienced radiologists due to the complexity of breast tissue and variations in lesion presentations.
Objective: This work aims to investigate appropriate features and classification techniques for accurate breast cancer detection in 336 Biglycan biomarker images.
Methods: The Biglycan biomarker images were retrieved from the Mendeley Data website (Repository name: Biglycan breast cancer dataset). Five features were extracted and compared based on shape characteristics (i.e., Harris Points and Minimum Eigenvalue (MinEigen) Points), frequency domain characteristics (i.e., The Two-dimensional Fourier Transform and the Wavelet Transform), and statistical characteristics (i.e., histogram). Six different commonly used classification algorithms were used; i.e., K-nearest neighbours (k-NN), Naïve Bayes (NB), Pseudo-Linear Discriminate Analysis (pl-DA), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF).
Results: The histogram of greyscale images showed the best performance for the k-NN (97.6%), SVM (95.8%), and RF (95.3%) classifiers. Additionally, among the five features, the greyscale histogram feature achieved the best accuracy in all classifiers with a maximum accuracy of 97.6%, while the wavelet feature provided a promising accuracy in most classifiers (up to 94.6%).
Conclusion: Machine learning demonstrates high accuracy in estimating cancer and such technology can assist doctors in the analysis of routine medical images and biopsy samples to improve early diagnosis and risk stratification.
期刊介绍:
Concentrating on molecular biomarkers in cancer research, Cancer Biomarkers publishes original research findings (and reviews solicited by the editor) on the subject of the identification of markers associated with the disease processes whether or not they are an integral part of the pathological lesion.
The disease markers may include, but are not limited to, genomic, epigenomic, proteomics, cellular and morphologic, and genetic factors predisposing to the disease or indicating the occurrence of the disease. Manuscripts on these factors or biomarkers, either in altered forms, abnormal concentrations or with abnormal tissue distribution leading to disease causation will be accepted.