{"title":"An investigation of the performances of simple gene selection methodologies for cancer classification","authors":"Salim Sazzed","doi":"10.1109/BIBE52308.2021.9635167","DOIUrl":null,"url":null,"abstract":"Gene expression datasets usually contain a large number of genes which impose a computational burden and complexity on the classifier. Thus, feature selection plays an integral part in sophisticated cancer classification frameworks. In the existing literature, feature selections have been often performed by computationally expensive methods (e.g., wrapper-based methods, evolutionary algorithms). In this paper, we show that the combinations of various simple feature selection methods that require minimal computational cost are often effective for cancer classification. We utilize two sets of simple statistical methods to identify the topmost class-correlated genes (set 1) and eliminate redundant genes (set 2), respectively. Finally, the selected gene set is integrated with the support vector machine (SVM) classifier. The performances of these simple methodologies are compared with a number of existing methods on ten gene expression benchmark datasets. It is observed that in many datasets, these simple methodologies yield similar efficacy to the complex and computationally expensive approaches using only a small number of genes.","PeriodicalId":343724,"journal":{"name":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE52308.2021.9635167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Gene expression datasets usually contain a large number of genes which impose a computational burden and complexity on the classifier. Thus, feature selection plays an integral part in sophisticated cancer classification frameworks. In the existing literature, feature selections have been often performed by computationally expensive methods (e.g., wrapper-based methods, evolutionary algorithms). In this paper, we show that the combinations of various simple feature selection methods that require minimal computational cost are often effective for cancer classification. We utilize two sets of simple statistical methods to identify the topmost class-correlated genes (set 1) and eliminate redundant genes (set 2), respectively. Finally, the selected gene set is integrated with the support vector machine (SVM) classifier. The performances of these simple methodologies are compared with a number of existing methods on ten gene expression benchmark datasets. It is observed that in many datasets, these simple methodologies yield similar efficacy to the complex and computationally expensive approaches using only a small number of genes.