An Ensemble Filter Feature Selection Method and Outlier Detection Method for Multiclass Classification

Dalton Ndirangu, W. Mwangi, L. Nderu
{"title":"An Ensemble Filter Feature Selection Method and Outlier Detection Method for Multiclass Classification","authors":"Dalton Ndirangu, W. Mwangi, L. Nderu","doi":"10.1145/3316615.3318223","DOIUrl":null,"url":null,"abstract":"Feature selection methods facilitate removal of irrelevant attributes. Ineffective features may contain outliers that degrade performance of classifiers. We propose an ensemble filter base feature selection technique for multiclass classification. The technique combines results of four selection methods to create an ensemble list. The study uses a red wine dataset drawn from UC Irvine machine learning data repository and WEKA, a collection of machine learning algorithms for data mining tasks. The multiclass red wine dataset is binarized using WekaMulticlassClassifier utilizing the 1against 1 with pairwise coupling decomposing scheme. Using random forest algorithm and root mean square error values, a learning curve is generated that establishes an optimal ensemble sub-list. Outliers are detected using the Tukey statistical method. The proposed ensemble method outperformed the single feature methods. The study concludes by showing that unnecessary features and presence of outliers degrades classifiers performance. We recommend further studies on the effect of gradual selective removal of outliers on classification.","PeriodicalId":268392,"journal":{"name":"Proceedings of the 2019 8th International Conference on Software and Computer Applications","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 8th International Conference on Software and Computer Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3316615.3318223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Feature selection methods facilitate removal of irrelevant attributes. Ineffective features may contain outliers that degrade performance of classifiers. We propose an ensemble filter base feature selection technique for multiclass classification. The technique combines results of four selection methods to create an ensemble list. The study uses a red wine dataset drawn from UC Irvine machine learning data repository and WEKA, a collection of machine learning algorithms for data mining tasks. The multiclass red wine dataset is binarized using WekaMulticlassClassifier utilizing the 1against 1 with pairwise coupling decomposing scheme. Using random forest algorithm and root mean square error values, a learning curve is generated that establishes an optimal ensemble sub-list. Outliers are detected using the Tukey statistical method. The proposed ensemble method outperformed the single feature methods. The study concludes by showing that unnecessary features and presence of outliers degrades classifiers performance. We recommend further studies on the effect of gradual selective removal of outliers on classification.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多类分类的集成滤波特征选择方法和离群点检测方法
特征选择方法有助于去除不相关的属性。无效的特征可能包含会降低分类器性能的异常值。提出了一种基于集成滤波器的多类分类特征选择技术。该技术结合了四种选择方法的结果来创建一个集合列表。该研究使用了来自加州大学欧文分校机器学习数据存储库和WEKA的红酒数据集,WEKA是用于数据挖掘任务的机器学习算法集合。使用WekaMulticlassClassifier对多类红酒数据进行二值化,采用1对1的两两耦合分解方案。利用随机森林算法和均方根误差值生成学习曲线,建立最优集成子列表。使用Tukey统计方法检测异常值。所提出的集成方法优于单一特征方法。研究结果表明,不必要的特征和异常值的存在会降低分类器的性能。我们建议进一步研究逐渐选择性去除异常值对分类的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
BookCeption An Enhanced Key Security of Playfair Cipher Algorithm Adoption Issues in DevOps from the Perspective of Continuous Delivery Pipeline A User Attribute Recommendation Algorithm and Peer3D Technology based WebVR P2P Transmission Scheme Survey of Hyperledger Blockchain Frameworks: Case Study in FPT University's Cryptocurrency Wallets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1