Théo Guilbert, Olivier Caelen, Andrei Chirita, Marco Saerens
{"title":"Calibration methods in imbalanced binary classification","authors":"Théo Guilbert, Olivier Caelen, Andrei Chirita, Marco Saerens","doi":"10.1007/s10472-024-09952-8","DOIUrl":null,"url":null,"abstract":"<div><p>The calibration problem in machine learning classification tasks arises when a model’s output score does not align with the ground truth observed probability of the target class. There exist several parametric and non-parametric post-processing methods that can help to calibrate an existing classifier. In this work, we focus on binary classification cases where the dataset is imbalanced, meaning that the negative target class significantly outnumbers the positive one. We propose new parametric calibration methods designed to this specific case and a new calibration measure focusing on the primary objective in imbalanced problems: detecting infrequent positive cases. Experiments on several datasets show that, for imbalanced problems, our approaches outperform state-of-the-art methods in many cases.</p></div>","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"92 5","pages":"1319 - 1352"},"PeriodicalIF":1.2000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Mathematics and Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10472-024-09952-8","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The calibration problem in machine learning classification tasks arises when a model’s output score does not align with the ground truth observed probability of the target class. There exist several parametric and non-parametric post-processing methods that can help to calibrate an existing classifier. In this work, we focus on binary classification cases where the dataset is imbalanced, meaning that the negative target class significantly outnumbers the positive one. We propose new parametric calibration methods designed to this specific case and a new calibration measure focusing on the primary objective in imbalanced problems: detecting infrequent positive cases. Experiments on several datasets show that, for imbalanced problems, our approaches outperform state-of-the-art methods in many cases.
期刊介绍:
Annals of Mathematics and Artificial Intelligence presents a range of topics of concern to scholars applying quantitative, combinatorial, logical, algebraic and algorithmic methods to diverse areas of Artificial Intelligence, from decision support, automated deduction, and reasoning, to knowledge-based systems, machine learning, computer vision, robotics and planning.
The journal features collections of papers appearing either in volumes (400 pages) or in separate issues (100-300 pages), which focus on one topic and have one or more guest editors.
Annals of Mathematics and Artificial Intelligence hopes to influence the spawning of new areas of applied mathematics and strengthen the scientific underpinnings of Artificial Intelligence.