Damodar Patel, A. Saxena, Suman Laha, Gulame Mustafa Ansari
{"title":"A Novel Scheme For Feature Selection Using Filter Approach","authors":"Damodar Patel, A. Saxena, Suman Laha, Gulame Mustafa Ansari","doi":"10.1109/ICCCS55188.2022.10079604","DOIUrl":null,"url":null,"abstract":"This paper proposes a wrapper based scheme for feature selection in unsupervised data sets. This scheme first ranks features in a dataset as per their Laplacian scores. Subsequently, subsets of features are selected considering feature's rank. Each selected feature subset is then tested for classification accuracy. In order to achieve reasonable satisfactory results, feature subsets are iteratively tested for accuracy as well as cardinality using incremental approach. Four real benchmark datasets namely colon, leukaemia-l, lung-discrete & warpPIE10 have been used for the experiments in this paper. Accuracy above 80% and number of features reduced to less than 20% of the total number of features in each dataset justify the potential of the proposed scheme to be applied to other large datasets as well.","PeriodicalId":149615,"journal":{"name":"2022 7th International Conference on Computing, Communication and Security (ICCCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Computing, Communication and Security (ICCCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCS55188.2022.10079604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper proposes a wrapper based scheme for feature selection in unsupervised data sets. This scheme first ranks features in a dataset as per their Laplacian scores. Subsequently, subsets of features are selected considering feature's rank. Each selected feature subset is then tested for classification accuracy. In order to achieve reasonable satisfactory results, feature subsets are iteratively tested for accuracy as well as cardinality using incremental approach. Four real benchmark datasets namely colon, leukaemia-l, lung-discrete & warpPIE10 have been used for the experiments in this paper. Accuracy above 80% and number of features reduced to less than 20% of the total number of features in each dataset justify the potential of the proposed scheme to be applied to other large datasets as well.