Zhuo Wang, Huan Li, Bin Nie, Jianqiang Du, Yuwen Du, Yufeng Chen
{"title":"Feature selection using different evaluate strategy and random forests","authors":"Zhuo Wang, Huan Li, Bin Nie, Jianqiang Du, Yuwen Du, Yufeng Chen","doi":"10.1109/ICCEAI52939.2021.00062","DOIUrl":null,"url":null,"abstract":"Aiming at the dimensional disaster and over-fitting problems in data analysis, this paper proposes a feature selection method using hybrid integration of difference models and random forests (Integrate-RF), firstly, Integrate-RF use CART, CHAID, SVM, BN, NN, K-Means, Kohonen to evaluate the importance of features, and then, for the above seven sorts, Integrate-RF use the arithmetic average method to calculate the importance of the features; secondly, Integrate-RF select the most important features from the remaining features into features subset, and use random forest classification to get the corresponding out-of-bag(OOB) data classification error rate; finally, the optimal features subset can be selected based on the OOB data classification error rate. Experiments show that feature selection methods proposed in this paper effectively reduces the data dimension, selects features better and more adaptable.","PeriodicalId":331409,"journal":{"name":"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEAI52939.2021.00062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Aiming at the dimensional disaster and over-fitting problems in data analysis, this paper proposes a feature selection method using hybrid integration of difference models and random forests (Integrate-RF), firstly, Integrate-RF use CART, CHAID, SVM, BN, NN, K-Means, Kohonen to evaluate the importance of features, and then, for the above seven sorts, Integrate-RF use the arithmetic average method to calculate the importance of the features; secondly, Integrate-RF select the most important features from the remaining features into features subset, and use random forest classification to get the corresponding out-of-bag(OOB) data classification error rate; finally, the optimal features subset can be selected based on the OOB data classification error rate. Experiments show that feature selection methods proposed in this paper effectively reduces the data dimension, selects features better and more adaptable.