Evaluating probabilistic classifiers: The triptych

IF 6.9 2区经济学 Q1 ECONOMICS International Journal of Forecasting Pub Date : 2023-11-04 DOI:10.1016/j.ijforecast.2023.09.007

Timo Dimitriadis , Tilmann Gneiting , Alexander I. Jordan , Peter Vogel

{"title":"Evaluating probabilistic classifiers: The triptych","authors":"Timo Dimitriadis , Tilmann Gneiting , Alexander I. Jordan , Peter Vogel","doi":"10.1016/j.ijforecast.2023.09.007","DOIUrl":null,"url":null,"abstract":"<div><p>Probability forecasts for binary outcomes, often referred to as probabilistic classifiers or confidence scores, are ubiquitous in science and society, and methods for evaluating and comparing them are in great demand. We propose and study a triptych of diagnostic graphics focusing on distinct and complementary aspects of forecast performance: Reliability curves address calibration, receiver operating characteristic (ROC) curves diagnose discrimination ability, and Murphy curves visualize overall predictive performance and value. A Murphy curve shows a forecast’s mean elementary scores, including the widely used misclassification rate, and the area under a Murphy curve equals the mean Brier score. For a calibrated forecast, the reliability curve lies on the diagonal, and for competing calibrated forecasts, the ROC and Murphy curves share the same number of crossing points. We invoke the recently developed CORP (Consistent, Optimally binned, Reproducible, and Pool-Adjacent-Violators (PAV) algorithm-based) approach to craft reliability curves and decompose a mean score into miscalibration (MCB), discrimination (DSC), and uncertainty (UNC) components. Plots of the <span><math><mtext>DSC</mtext></math></span> measure of discrimination ability versus the calibration metric <span><math><mtext>MCB</mtext></math></span> visualize classifier performance across multiple competitors. The proposed tools are illustrated in empirical examples from astrophysics, economics, and social science.</p></div>","PeriodicalId":14061,"journal":{"name":"International Journal of Forecasting","volume":"40 3","pages":"Pages 1101-1122"},"PeriodicalIF":6.9000,"publicationDate":"2023-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169207023000997/pdfft?md5=bd26faa9dd0165399770a39be8802f6a&pid=1-s2.0-S0169207023000997-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Forecasting","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169207023000997","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}

引用次数: 0

Abstract

Probability forecasts for binary outcomes, often referred to as probabilistic classifiers or confidence scores, are ubiquitous in science and society, and methods for evaluating and comparing them are in great demand. We propose and study a triptych of diagnostic graphics focusing on distinct and complementary aspects of forecast performance: Reliability curves address calibration, receiver operating characteristic (ROC) curves diagnose discrimination ability, and Murphy curves visualize overall predictive performance and value. A Murphy curve shows a forecast’s mean elementary scores, including the widely used misclassification rate, and the area under a Murphy curve equals the mean Brier score. For a calibrated forecast, the reliability curve lies on the diagonal, and for competing calibrated forecasts, the ROC and Murphy curves share the same number of crossing points. We invoke the recently developed CORP (Consistent, Optimally binned, Reproducible, and Pool-Adjacent-Violators (PAV) algorithm-based) approach to craft reliability curves and decompose a mean score into miscalibration (MCB), discrimination (DSC), and uncertainty (UNC) components. Plots of the $DSC$ measure of discrimination ability versus the calibration metric $MCB$ visualize classifier performance across multiple competitors. The proposed tools are illustrated in empirical examples from astrophysics, economics, and social science.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估概率分类器：三部曲

二元结果的概率预测，通常被称为概率分类器或置信度分数，在科学和社会中无处不在，而评估和比较它们的方法需求量很大。我们提出并研究了一种诊断图形的三要素，其重点是预测性能的不同方面和互补方面：可靠性曲线解决校准问题，接收者操作特征曲线（ROC）诊断辨别能力，墨菲曲线直观显示整体预测性能和价值。墨菲曲线显示预测的平均基本分数，包括广泛使用的误分类率，墨菲曲线下的面积等于平均布赖尔分数。对于经过校准的预测，可靠性曲线位于对角线上，对于经过校准的竞争预测，ROC 曲线和墨菲曲线的交叉点数量相同。我们采用最近开发的 CORP（基于算法的一致性、最佳分档、可重复性和池相邻违规者（PAV））方法来制作可靠性曲线，并将平均得分分解为误判（MCB）、判别（DSC）和不确定性（UNC）三个部分。辨别能力的 DSC 指标与校准指标 MCB 的对比图直观地显示了分类器在多个竞争对手中的表现。天体物理学、经济学和社会科学领域的经验实例对所提出的工具进行了说明。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Forecasting Multiple-

CiteScore

17.10

自引率

11.40%

发文量

189

审稿时长

77 days

期刊介绍： The International Journal of Forecasting is a leading journal in its field that publishes high quality refereed papers. It aims to bridge the gap between theory and practice, making forecasting useful and relevant for decision and policy makers. The journal places strong emphasis on empirical studies, evaluation activities, implementation research, and improving the practice of forecasting. It welcomes various points of view and encourages debate to find solutions to field-related problems. The journal is the official publication of the International Institute of Forecasters (IIF) and is indexed in Sociological Abstracts, Journal of Economic Literature, Statistical Theory and Method Abstracts, INSPEC, Current Contents, UMI Data Courier, RePEc, Academic Journal Guide, CIS, IAOR, and Social Sciences Citation Index.

期刊最新文献

Editorial Board Forecasting house price growth rates with factor models and spatio-temporal clustering Forecasting realized volatility with spillover effects: Perspectives from graph neural networks Sparse time-varying parameter VECMs with an application to modeling electricity prices Guest editorial: Forecasting for social good