{"title":"多类分类的集成滤波特征选择方法和离群点检测方法","authors":"Dalton Ndirangu, W. Mwangi, L. Nderu","doi":"10.1145/3316615.3318223","DOIUrl":null,"url":null,"abstract":"Feature selection methods facilitate removal of irrelevant attributes. Ineffective features may contain outliers that degrade performance of classifiers. We propose an ensemble filter base feature selection technique for multiclass classification. The technique combines results of four selection methods to create an ensemble list. The study uses a red wine dataset drawn from UC Irvine machine learning data repository and WEKA, a collection of machine learning algorithms for data mining tasks. The multiclass red wine dataset is binarized using WekaMulticlassClassifier utilizing the 1against 1 with pairwise coupling decomposing scheme. Using random forest algorithm and root mean square error values, a learning curve is generated that establishes an optimal ensemble sub-list. Outliers are detected using the Tukey statistical method. The proposed ensemble method outperformed the single feature methods. The study concludes by showing that unnecessary features and presence of outliers degrades classifiers performance. We recommend further studies on the effect of gradual selective removal of outliers on classification.","PeriodicalId":268392,"journal":{"name":"Proceedings of the 2019 8th International Conference on Software and Computer Applications","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"An Ensemble Filter Feature Selection Method and Outlier Detection Method for Multiclass Classification\",\"authors\":\"Dalton Ndirangu, W. Mwangi, L. Nderu\",\"doi\":\"10.1145/3316615.3318223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature selection methods facilitate removal of irrelevant attributes. Ineffective features may contain outliers that degrade performance of classifiers. We propose an ensemble filter base feature selection technique for multiclass classification. The technique combines results of four selection methods to create an ensemble list. The study uses a red wine dataset drawn from UC Irvine machine learning data repository and WEKA, a collection of machine learning algorithms for data mining tasks. The multiclass red wine dataset is binarized using WekaMulticlassClassifier utilizing the 1against 1 with pairwise coupling decomposing scheme. Using random forest algorithm and root mean square error values, a learning curve is generated that establishes an optimal ensemble sub-list. Outliers are detected using the Tukey statistical method. The proposed ensemble method outperformed the single feature methods. The study concludes by showing that unnecessary features and presence of outliers degrades classifiers performance. We recommend further studies on the effect of gradual selective removal of outliers on classification.\",\"PeriodicalId\":268392,\"journal\":{\"name\":\"Proceedings of the 2019 8th International Conference on Software and Computer Applications\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2019 8th International Conference on Software and Computer Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3316615.3318223\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 8th International Conference on Software and Computer Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3316615.3318223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Ensemble Filter Feature Selection Method and Outlier Detection Method for Multiclass Classification
Feature selection methods facilitate removal of irrelevant attributes. Ineffective features may contain outliers that degrade performance of classifiers. We propose an ensemble filter base feature selection technique for multiclass classification. The technique combines results of four selection methods to create an ensemble list. The study uses a red wine dataset drawn from UC Irvine machine learning data repository and WEKA, a collection of machine learning algorithms for data mining tasks. The multiclass red wine dataset is binarized using WekaMulticlassClassifier utilizing the 1against 1 with pairwise coupling decomposing scheme. Using random forest algorithm and root mean square error values, a learning curve is generated that establishes an optimal ensemble sub-list. Outliers are detected using the Tukey statistical method. The proposed ensemble method outperformed the single feature methods. The study concludes by showing that unnecessary features and presence of outliers degrades classifiers performance. We recommend further studies on the effect of gradual selective removal of outliers on classification.