Mahmood Khalsan, Mu Mu, E. Al-Shamery, Lee Machado, Michael Opoku Agyeman, S. Ajit
{"title":"Intersection Three Feature Selection and Machine Learning Approaches for Cancer Classification","authors":"Mahmood Khalsan, Mu Mu, E. Al-Shamery, Lee Machado, Michael Opoku Agyeman, S. Ajit","doi":"10.1109/ICSSE58758.2023.10227163","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) methods have a plaid an important role in classification and prediction in most fields. However, analyzing gene expression is remain complex in cancer classification because of the high dimensionality of the provided dataset in gene expression. Consequentially, intersection-based three feature selection methods (ITFS) was developed to select optimal features (genes) that would be used as identifiers for classification and reduce the dimensionality of the available data in gene expression. ITFS has employed three feature selection methods (Mutual Information (MI), F-ClassIf, and Minimum Redundancy Maximum Relevance (mRMR)). Therefore, employing intersection concept that leads to select only the genes that have been selected by the three feature selection techniques. These selected genes would be used as identifiers for the training classifier model. Our study applied the proposed ITFS to six gene expression datasets downloaded from (Microarray and RNAseq tools) for validating the effectiveness of ITFS on classifier methods. The highest average accuracy improvement in the six datasets was when Multilayer Perceptron (MLP) and ITFS employed together compared to employing MLP individually. The proposed ITFS-MLP model has produced classification accuracy between (92% to 100%) for the six datasets and the average accuracy is 96%.","PeriodicalId":280745,"journal":{"name":"2023 International Conference on System Science and Engineering (ICSSE)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on System Science and Engineering (ICSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSSE58758.2023.10227163","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning (ML) methods have a plaid an important role in classification and prediction in most fields. However, analyzing gene expression is remain complex in cancer classification because of the high dimensionality of the provided dataset in gene expression. Consequentially, intersection-based three feature selection methods (ITFS) was developed to select optimal features (genes) that would be used as identifiers for classification and reduce the dimensionality of the available data in gene expression. ITFS has employed three feature selection methods (Mutual Information (MI), F-ClassIf, and Minimum Redundancy Maximum Relevance (mRMR)). Therefore, employing intersection concept that leads to select only the genes that have been selected by the three feature selection techniques. These selected genes would be used as identifiers for the training classifier model. Our study applied the proposed ITFS to six gene expression datasets downloaded from (Microarray and RNAseq tools) for validating the effectiveness of ITFS on classifier methods. The highest average accuracy improvement in the six datasets was when Multilayer Perceptron (MLP) and ITFS employed together compared to employing MLP individually. The proposed ITFS-MLP model has produced classification accuracy between (92% to 100%) for the six datasets and the average accuracy is 96%.