{"title":"Efficient breast cancer detection using sequential feature selection techniques","authors":"Taha M. Mohamed","doi":"10.1109/INTELCIS.2015.7397261","DOIUrl":null,"url":null,"abstract":"Breast cancer is one of the most dangerous cancers in the world especially in the Arab countries and Egypt. Due to the large spreading of the disease, automatic recognition systems can help physicians to classify the tumors as benign or malignant. However, performing a lot of pathological analysis consumes time and money. In this paper, we propose an algorithm for decreasing the number of features required to detect the tumor. Two classifiers are chosen to test the classification accuracy; linear and quadratic. The experimental results show that, there are strong correlations between the features in the data set. When using the sequential feature selection algorithm, results show that, discarding more than 50% of the features has no significant loss on classification accuracy when using the quadratic discriminate classifier. Additionally, only four PCA components can be used with the same accuracy as using nine components when being classified by the linear discriminate classifier. Additionally, the outliers in the data set have no notable effect on the classification accuracy. The data set is proved to be homogenous using the k-means clustering algorithm.","PeriodicalId":6478,"journal":{"name":"2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS)","volume":"11 1","pages":"458-464"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTELCIS.2015.7397261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Breast cancer is one of the most dangerous cancers in the world especially in the Arab countries and Egypt. Due to the large spreading of the disease, automatic recognition systems can help physicians to classify the tumors as benign or malignant. However, performing a lot of pathological analysis consumes time and money. In this paper, we propose an algorithm for decreasing the number of features required to detect the tumor. Two classifiers are chosen to test the classification accuracy; linear and quadratic. The experimental results show that, there are strong correlations between the features in the data set. When using the sequential feature selection algorithm, results show that, discarding more than 50% of the features has no significant loss on classification accuracy when using the quadratic discriminate classifier. Additionally, only four PCA components can be used with the same accuracy as using nine components when being classified by the linear discriminate classifier. Additionally, the outliers in the data set have no notable effect on the classification accuracy. The data set is proved to be homogenous using the k-means clustering algorithm.