{"title":"Significance of entropy correlation coefficient over symmetric uncertainty on FAST clustering feature selection algorithm","authors":"Pallavi Malji, S. Sakhare","doi":"10.1109/ISCO.2017.7856035","DOIUrl":null,"url":null,"abstract":"Feature selection is an essential method in which we identify a subset of most useful ones from the original set of features. On comparing results with original set and identified subset, we observe that the results are compatible. The feature selection algorithm is evaluated based on the components of efficiency and effectiveness, where the time required and the optimality of the subset of the feature is considered. Based on this, we are modifying the fast clustering feature selection algorithm, to check the impact of entropy correlation coefficient on it in this paper. In the algorithm, the correlation between the features is calculated using entropy correlation coefficient instead of symmetric uncertainty and then they are divided into clusters using clustering methods based on the graph. Then, the representative features i.e. those who are strongly related to the target class are selected from them. For ensuring the algorithm's efficiency, we have adopted the Kruskal minimum spanning tree (MST) clustering method. We have compared our proposed algorithm with FAST clustering feature selection algorithm on well-known classifier namely the probability-based Naive Bayes Classifier before and after feature selection. The results, on two publicly available real-world high dimensional text data, demonstrate that our proposed algorithm produces smaller and optimal features subset and also improves classifiers performance. The processing time required for the algorithm is far less than that of the FAST clustering algorithm.","PeriodicalId":321113,"journal":{"name":"2017 11th International Conference on Intelligent Systems and Control (ISCO)","volume":"os-1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 11th International Conference on Intelligent Systems and Control (ISCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCO.2017.7856035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Feature selection is an essential method in which we identify a subset of most useful ones from the original set of features. On comparing results with original set and identified subset, we observe that the results are compatible. The feature selection algorithm is evaluated based on the components of efficiency and effectiveness, where the time required and the optimality of the subset of the feature is considered. Based on this, we are modifying the fast clustering feature selection algorithm, to check the impact of entropy correlation coefficient on it in this paper. In the algorithm, the correlation between the features is calculated using entropy correlation coefficient instead of symmetric uncertainty and then they are divided into clusters using clustering methods based on the graph. Then, the representative features i.e. those who are strongly related to the target class are selected from them. For ensuring the algorithm's efficiency, we have adopted the Kruskal minimum spanning tree (MST) clustering method. We have compared our proposed algorithm with FAST clustering feature selection algorithm on well-known classifier namely the probability-based Naive Bayes Classifier before and after feature selection. The results, on two publicly available real-world high dimensional text data, demonstrate that our proposed algorithm produces smaller and optimal features subset and also improves classifiers performance. The processing time required for the algorithm is far less than that of the FAST clustering algorithm.