Hongzhi Liu, Bin Fu, Zhengshen Jiang, Zhonghai Wu, D. Hsu
{"title":"A feature selection framework based on supervised data clustering","authors":"Hongzhi Liu, Bin Fu, Zhengshen Jiang, Zhonghai Wu, D. Hsu","doi":"10.1109/ICCI-CC.2016.7862054","DOIUrl":null,"url":null,"abstract":"Feature selection is an important step for data mining and machine learning to deal with the curse of dimensionality. In this paper, we propose a novel feature selection framework based on supervised data clustering. Instead of assuming there only exists low-order dependencies between features and the target variable, the proposed method directly estimates the high-dimensional mutual information between a candidate feature subset and the target variable through supervised data clustering. In addition, it can automatically determine the number of features to be selected instead of manually setting it in a prior. Experimental results show that the proposed method performs similar or better compared with state-of-the-art feature selection methods.","PeriodicalId":135701,"journal":{"name":"2016 IEEE 15th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 15th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCI-CC.2016.7862054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Feature selection is an important step for data mining and machine learning to deal with the curse of dimensionality. In this paper, we propose a novel feature selection framework based on supervised data clustering. Instead of assuming there only exists low-order dependencies between features and the target variable, the proposed method directly estimates the high-dimensional mutual information between a candidate feature subset and the target variable through supervised data clustering. In addition, it can automatically determine the number of features to be selected instead of manually setting it in a prior. Experimental results show that the proposed method performs similar or better compared with state-of-the-art feature selection methods.