{"title":"Random Optimal Search Based Significant Gene Identification and Classification of Disease Samples","authors":"J. B. Bell, S. Vigila","doi":"10.1109/i-PACT52855.2021.9696570","DOIUrl":null,"url":null,"abstract":"In the advanced field of data mining and machine learning fields, there has been an arise of many methods and algorithms to solve high dimensionality problem, at recent times there are many filter based techniques to select the subset of genes from the gene expression disease dataset. Here we have used a learner based wrapper feature selection for selecting the optimal genes by random search mechanism and classified the significant gene expression set using a classifier. The Principal Component Analysis and t-test based on Random Optimized Search by Linear Discriminant Analysis classifier is used to select the features also PCA based clusters are evaluated by Self Organizing Map as classifier to obtain significant features. Also Genetic Algorithm based approach is used for performing classification based feature selection. The performance is also verified for the various gene selection based classifier approaches using various performance measures. A list of top 10 significant genes are retrieved by gene selection by random optimized search and using the significant genes as expression dataset the classifier is trained validated and tested for classifying mutually exclusive disease samples into various categorical classes. Thus one can calculate the classifier's performance by various test measures. The PCA based random search technique exhibits a higher accuracy while classified on SOM learner. Genetic Algorithm based embedded classifier is used to classify the samples and highly distinct gene features are retrieved. The classifiers performance is improved much by training on the best features of gene set expression and by this reduced dimensional change one can learn much faster at processing. So by these approaches one can easily learn the principle features for performing best sample classification.","PeriodicalId":335956,"journal":{"name":"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/i-PACT52855.2021.9696570","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In the advanced field of data mining and machine learning fields, there has been an arise of many methods and algorithms to solve high dimensionality problem, at recent times there are many filter based techniques to select the subset of genes from the gene expression disease dataset. Here we have used a learner based wrapper feature selection for selecting the optimal genes by random search mechanism and classified the significant gene expression set using a classifier. The Principal Component Analysis and t-test based on Random Optimized Search by Linear Discriminant Analysis classifier is used to select the features also PCA based clusters are evaluated by Self Organizing Map as classifier to obtain significant features. Also Genetic Algorithm based approach is used for performing classification based feature selection. The performance is also verified for the various gene selection based classifier approaches using various performance measures. A list of top 10 significant genes are retrieved by gene selection by random optimized search and using the significant genes as expression dataset the classifier is trained validated and tested for classifying mutually exclusive disease samples into various categorical classes. Thus one can calculate the classifier's performance by various test measures. The PCA based random search technique exhibits a higher accuracy while classified on SOM learner. Genetic Algorithm based embedded classifier is used to classify the samples and highly distinct gene features are retrieved. The classifiers performance is improved much by training on the best features of gene set expression and by this reduced dimensional change one can learn much faster at processing. So by these approaches one can easily learn the principle features for performing best sample classification.