Confidence ensembles: Tabular data classifiers on steroids

IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Information Fusion Pub Date : 2025-03-17 DOI:10.1016/j.inffus.2025.103126
Tommaso Zoppi , Peter Popov
{"title":"Confidence ensembles: Tabular data classifiers on steroids","authors":"Tommaso Zoppi ,&nbsp;Peter Popov","doi":"10.1016/j.inffus.2025.103126","DOIUrl":null,"url":null,"abstract":"<div><div>The astounding amount of research conducted in the last decades provided plenty of Machine Learning (ML) algorithms and models for solving a wide variety of tasks for tabular data. However, classifiers are not always fast, accurate, and robust to unknown inputs, calling for further research in the domain. This paper proposes two classifiers based on <em>confidence ensembles</em>: Confidence Bagging (ConfBag) and Confidence Boosting (ConfBoost). Confidence ensembles build upon a base estimator and create base learners relying on the concept of “confidence” in predictions. They apply to any classification problem: binary and multi-class, supervised or unsupervised, without requiring additional data with respect to those already required by the base estimator. Our experimental evaluation using a range of tabular datasets shows that confidence ensembles, and especially ConfBoost, i) build more accurate classifiers than base estimators alone, even using a limited amount of base learners, ii) are relatively easy to tune as they rely on a limited number of hyper-parameters, and iii) are significantly more robust when dealing with unknown, unexpected input data compared to other tabular data classifiers. Amongst others, confidence ensembles showed potential in going beyond the performance of de-facto standard classifiers for tabular data such as Random Forest and eXtreme Gradient Boosting. ConfBag and ConfBoost are publicly available as PyPI package, compliant with widely used Python frameworks such as <em>scikit-learn</em> and <em>pyod</em>, and require little to no tuning to be exercised on tabular datasets for classification tasks.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103126"},"PeriodicalIF":15.5000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S156625352500199X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The astounding amount of research conducted in the last decades provided plenty of Machine Learning (ML) algorithms and models for solving a wide variety of tasks for tabular data. However, classifiers are not always fast, accurate, and robust to unknown inputs, calling for further research in the domain. This paper proposes two classifiers based on confidence ensembles: Confidence Bagging (ConfBag) and Confidence Boosting (ConfBoost). Confidence ensembles build upon a base estimator and create base learners relying on the concept of “confidence” in predictions. They apply to any classification problem: binary and multi-class, supervised or unsupervised, without requiring additional data with respect to those already required by the base estimator. Our experimental evaluation using a range of tabular datasets shows that confidence ensembles, and especially ConfBoost, i) build more accurate classifiers than base estimators alone, even using a limited amount of base learners, ii) are relatively easy to tune as they rely on a limited number of hyper-parameters, and iii) are significantly more robust when dealing with unknown, unexpected input data compared to other tabular data classifiers. Amongst others, confidence ensembles showed potential in going beyond the performance of de-facto standard classifiers for tabular data such as Random Forest and eXtreme Gradient Boosting. ConfBag and ConfBoost are publicly available as PyPI package, compliant with widely used Python frameworks such as scikit-learn and pyod, and require little to no tuning to be exercised on tabular datasets for classification tasks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
置信集合:表列数据分类器的类固醇
过去几十年进行的大量研究为解决表格数据的各种任务提供了大量机器学习(ML)算法和模型。然而,分类器并不总是快速、准确和对未知输入的鲁棒性,这需要在该领域进一步研究。本文提出了两种基于置信度集合的分类器:置信度Bagging (ConfBag)和置信度Boosting (ConfBoost)。置信度集合建立在基本估计器的基础上,并根据预测中的“置信度”概念创建基础学习器。它们适用于任何分类问题:二元和多类,监督或无监督,不需要额外的数据相对于那些基本估计器已经要求。我们使用一系列表格数据集进行的实验评估表明,置信度集成,尤其是ConfBoost, i)即使使用有限数量的基础学习器,也比单独的基础估计器构建更准确的分类器,ii)相对容易调整,因为它们依赖于有限数量的超参数,iii)在处理未知的、意外的输入数据时,与其他表格数据分类器相比,具有更强的鲁棒性。其中,置信度集合显示出超越表格数据(如随机森林和极端梯度增强)事实上的标准分类器的性能的潜力。ConfBag和ConfBoost作为PyPI包公开提供,与广泛使用的Python框架(如scikit-learn和pyod)兼容,并且几乎不需要对表数据集进行调优以进行分类任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
期刊最新文献
PCFNet: Period–channel fusion network for multivariate time series forecasting — towards multi-period dependency modeling Learning Spatio-Temporal Affine Representation Subspace for Video-based Person Re-Identification From Unimodal to Flexible: A Survey of Generalized Biometric Systems Trustworthy Text-to-Image Diffusion Models: A Timely and Focused Survey Consensus Learning Framework Boosting Co-clustering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1