Confidence ensembles: Tabular data classifiers on steroids

IF 15.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Information Fusion Pub Date : 2025-03-17 DOI:10.1016/j.inffus.2025.103126

Tommaso Zoppi , Peter Popov

{"title":"Confidence ensembles: Tabular data classifiers on steroids","authors":"Tommaso Zoppi , Peter Popov","doi":"10.1016/j.inffus.2025.103126","DOIUrl":null,"url":null,"abstract":"<div><div>The astounding amount of research conducted in the last decades provided plenty of Machine Learning (ML) algorithms and models for solving a wide variety of tasks for tabular data. However, classifiers are not always fast, accurate, and robust to unknown inputs, calling for further research in the domain. This paper proposes two classifiers based on <em>confidence ensembles</em>: Confidence Bagging (ConfBag) and Confidence Boosting (ConfBoost). Confidence ensembles build upon a base estimator and create base learners relying on the concept of “confidence” in predictions. They apply to any classification problem: binary and multi-class, supervised or unsupervised, without requiring additional data with respect to those already required by the base estimator. Our experimental evaluation using a range of tabular datasets shows that confidence ensembles, and especially ConfBoost, i) build more accurate classifiers than base estimators alone, even using a limited amount of base learners, ii) are relatively easy to tune as they rely on a limited number of hyper-parameters, and iii) are significantly more robust when dealing with unknown, unexpected input data compared to other tabular data classifiers. Amongst others, confidence ensembles showed potential in going beyond the performance of de-facto standard classifiers for tabular data such as Random Forest and eXtreme Gradient Boosting. ConfBag and ConfBoost are publicly available as PyPI package, compliant with widely used Python frameworks such as <em>scikit-learn</em> and <em>pyod</em>, and require little to no tuning to be exercised on tabular datasets for classification tasks.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103126"},"PeriodicalIF":15.5000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S156625352500199X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The astounding amount of research conducted in the last decades provided plenty of Machine Learning (ML) algorithms and models for solving a wide variety of tasks for tabular data. However, classifiers are not always fast, accurate, and robust to unknown inputs, calling for further research in the domain. This paper proposes two classifiers based on confidence ensembles: Confidence Bagging (ConfBag) and Confidence Boosting (ConfBoost). Confidence ensembles build upon a base estimator and create base learners relying on the concept of “confidence” in predictions. They apply to any classification problem: binary and multi-class, supervised or unsupervised, without requiring additional data with respect to those already required by the base estimator. Our experimental evaluation using a range of tabular datasets shows that confidence ensembles, and especially ConfBoost, i) build more accurate classifiers than base estimators alone, even using a limited amount of base learners, ii) are relatively easy to tune as they rely on a limited number of hyper-parameters, and iii) are significantly more robust when dealing with unknown, unexpected input data compared to other tabular data classifiers. Amongst others, confidence ensembles showed potential in going beyond the performance of de-facto standard classifiers for tabular data such as Random Forest and eXtreme Gradient Boosting. ConfBag and ConfBoost are publicly available as PyPI package, compliant with widely used Python frameworks such as scikit-learn and pyod, and require little to no tuning to be exercised on tabular datasets for classification tasks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

置信集合：表列数据分类器的类固醇

过去几十年进行的大量研究为解决表格数据的各种任务提供了大量机器学习（ML）算法和模型。然而，分类器并不总是快速、准确和对未知输入的鲁棒性，这需要在该领域进一步研究。本文提出了两种基于置信度集合的分类器：置信度Bagging （ConfBag）和置信度Boosting （ConfBoost）。置信度集合建立在基本估计器的基础上，并根据预测中的“置信度”概念创建基础学习器。它们适用于任何分类问题：二元和多类，监督或无监督，不需要额外的数据相对于那些基本估计器已经要求。我们使用一系列表格数据集进行的实验评估表明，置信度集成，尤其是ConfBoost， i)即使使用有限数量的基础学习器，也比单独的基础估计器构建更准确的分类器，ii)相对容易调整，因为它们依赖于有限数量的超参数，iii)在处理未知的、意外的输入数据时，与其他表格数据分类器相比，具有更强的鲁棒性。其中，置信度集合显示出超越表格数据（如随机森林和极端梯度增强）事实上的标准分类器的性能的潜力。ConfBag和ConfBoost作为PyPI包公开提供，与广泛使用的Python框架（如scikit-learn和pyod）兼容，并且几乎不需要对表数据集进行调优以进行分类任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.