Mixed Bagging: A Novel Ensemble Learning Framework for Supervised Classification Based on Instance Hardness

A. Kabir, Carolina Ruiz, S. A. Alvarez
{"title":"Mixed Bagging: A Novel Ensemble Learning Framework for Supervised Classification Based on Instance Hardness","authors":"A. Kabir, Carolina Ruiz, S. A. Alvarez","doi":"10.1109/ICDM.2018.00137","DOIUrl":null,"url":null,"abstract":"We introduce a novel ensemble learning framework for supervised classification. Our proposed framework, mixed bagging, is a form of bootstrap aggregating (bagging) in which the sampling process takes into account the classification hardness of the training instances. The classification hardness, or simply hardness, of an instance is defined as the probability that the instance will be misclassified by a classification model built from the remaining instances in the training set. We incorporate instance hardness into the bagging process by varying the sampling probability of each instance based on its estimated hardness. Bootstraps of differing hardness can be created in this way by over-representing, under-representing and equally representing harder instances. This results in a diverse committee of classifiers induced from the bootstraps, whose individual outputs can be aggregated to achieve a final class prediction. We propose two versions of mixed bagging – one where the bootstraps are grouped as easy, regular or hard, with all bootstraps in one group having the same hardness; and the other where the hardness of bootstraps change gradually from one iteration to the next. We have tested our system on 47 publicly available binary classification problems using C4.5 Decision Trees of varying depth as base learners. We find that the proposed mixed bagging methods perform better than traditional bagging and weighted bagging (wagging) regardless of the base learner. The proposed method also outperforms AdaBoost when the base learner consists of deeper decision trees. We examine the results of mixed bagging in terms of bias-variance decomposition and find that mixed bagging is better than AdaBoost at reducing variance and better than traditional bagging at reducing inductive bias.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2018.00137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

We introduce a novel ensemble learning framework for supervised classification. Our proposed framework, mixed bagging, is a form of bootstrap aggregating (bagging) in which the sampling process takes into account the classification hardness of the training instances. The classification hardness, or simply hardness, of an instance is defined as the probability that the instance will be misclassified by a classification model built from the remaining instances in the training set. We incorporate instance hardness into the bagging process by varying the sampling probability of each instance based on its estimated hardness. Bootstraps of differing hardness can be created in this way by over-representing, under-representing and equally representing harder instances. This results in a diverse committee of classifiers induced from the bootstraps, whose individual outputs can be aggregated to achieve a final class prediction. We propose two versions of mixed bagging – one where the bootstraps are grouped as easy, regular or hard, with all bootstraps in one group having the same hardness; and the other where the hardness of bootstraps change gradually from one iteration to the next. We have tested our system on 47 publicly available binary classification problems using C4.5 Decision Trees of varying depth as base learners. We find that the proposed mixed bagging methods perform better than traditional bagging and weighted bagging (wagging) regardless of the base learner. The proposed method also outperforms AdaBoost when the base learner consists of deeper decision trees. We examine the results of mixed bagging in terms of bias-variance decomposition and find that mixed bagging is better than AdaBoost at reducing variance and better than traditional bagging at reducing inductive bias.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
混合Bagging:一种基于实例硬度的监督分类集成学习框架
提出了一种新的监督分类集成学习框架。我们提出的框架,混合装袋,是一种自举聚合(装袋)形式,其中采样过程考虑了训练实例的分类硬度。一个实例的分类硬度,或者简单的硬度,被定义为该实例被基于训练集中剩余实例构建的分类模型错误分类的概率。我们将实例硬度纳入装袋过程,通过根据其估计硬度改变每个实例的抽样概率。不同硬度的引导可以通过过度表示、不足表示和平均表示更困难的实例来创建。这导致从自举中归纳出一个不同的分类器委员会,其单独的输出可以被聚合以实现最终的类别预测。我们提出了两种版本的混合装袋——一种是将引导分组为容易、规则或困难,所有引导在一组中具有相同的硬度;另一种情况是,从一个迭代到下一个迭代,自举的硬度逐渐改变。我们使用不同深度的C4.5决策树作为基础学习器,在47个公开的二元分类问题上测试了我们的系统。我们发现,无论基础学习器是什么,所提出的混合装袋方法都比传统装袋和加权装袋(摇袋)表现得更好。当基础学习器由更深层次的决策树组成时,所提出的方法也优于AdaBoost。我们从偏差-方差分解的角度检验了混合套袋的结果,发现混合套袋在减少方差方面优于AdaBoost,在减少归纳偏差方面优于传统套袋。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Entire Regularization Path for Sparse Nonnegative Interaction Model Accelerating Experimental Design by Incorporating Experimenter Hunches Title Page i An Efficient Many-Class Active Learning Framework for Knowledge-Rich Domains Social Recommendation with Missing Not at Random Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1