评估概率分类器:三部曲

IF 6.9 2区 经济学 Q1 ECONOMICS International Journal of Forecasting Pub Date : 2023-11-04 DOI:10.1016/j.ijforecast.2023.09.007
Timo Dimitriadis , Tilmann Gneiting , Alexander I. Jordan , Peter Vogel
{"title":"评估概率分类器:三部曲","authors":"Timo Dimitriadis ,&nbsp;Tilmann Gneiting ,&nbsp;Alexander I. Jordan ,&nbsp;Peter Vogel","doi":"10.1016/j.ijforecast.2023.09.007","DOIUrl":null,"url":null,"abstract":"<div><p>Probability forecasts for binary outcomes, often referred to as probabilistic classifiers or confidence scores, are ubiquitous in science and society, and methods for evaluating and comparing them are in great demand. We propose and study a triptych of diagnostic graphics focusing on distinct and complementary aspects of forecast performance: Reliability curves address calibration, receiver operating characteristic (ROC) curves diagnose discrimination ability, and Murphy curves visualize overall predictive performance and value. A Murphy curve shows a forecast’s mean elementary scores, including the widely used misclassification rate, and the area under a Murphy curve equals the mean Brier score. For a calibrated forecast, the reliability curve lies on the diagonal, and for competing calibrated forecasts, the ROC and Murphy curves share the same number of crossing points. We invoke the recently developed CORP (Consistent, Optimally binned, Reproducible, and Pool-Adjacent-Violators (PAV) algorithm-based) approach to craft reliability curves and decompose a mean score into miscalibration (MCB), discrimination (DSC), and uncertainty (UNC) components. Plots of the <span><math><mtext>DSC</mtext></math></span> measure of discrimination ability versus the calibration metric <span><math><mtext>MCB</mtext></math></span> visualize classifier performance across multiple competitors. The proposed tools are illustrated in empirical examples from astrophysics, economics, and social science.</p></div>","PeriodicalId":14061,"journal":{"name":"International Journal of Forecasting","volume":null,"pages":null},"PeriodicalIF":6.9000,"publicationDate":"2023-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169207023000997/pdfft?md5=bd26faa9dd0165399770a39be8802f6a&pid=1-s2.0-S0169207023000997-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Evaluating probabilistic classifiers: The triptych\",\"authors\":\"Timo Dimitriadis ,&nbsp;Tilmann Gneiting ,&nbsp;Alexander I. Jordan ,&nbsp;Peter Vogel\",\"doi\":\"10.1016/j.ijforecast.2023.09.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Probability forecasts for binary outcomes, often referred to as probabilistic classifiers or confidence scores, are ubiquitous in science and society, and methods for evaluating and comparing them are in great demand. We propose and study a triptych of diagnostic graphics focusing on distinct and complementary aspects of forecast performance: Reliability curves address calibration, receiver operating characteristic (ROC) curves diagnose discrimination ability, and Murphy curves visualize overall predictive performance and value. A Murphy curve shows a forecast’s mean elementary scores, including the widely used misclassification rate, and the area under a Murphy curve equals the mean Brier score. For a calibrated forecast, the reliability curve lies on the diagonal, and for competing calibrated forecasts, the ROC and Murphy curves share the same number of crossing points. We invoke the recently developed CORP (Consistent, Optimally binned, Reproducible, and Pool-Adjacent-Violators (PAV) algorithm-based) approach to craft reliability curves and decompose a mean score into miscalibration (MCB), discrimination (DSC), and uncertainty (UNC) components. Plots of the <span><math><mtext>DSC</mtext></math></span> measure of discrimination ability versus the calibration metric <span><math><mtext>MCB</mtext></math></span> visualize classifier performance across multiple competitors. The proposed tools are illustrated in empirical examples from astrophysics, economics, and social science.</p></div>\",\"PeriodicalId\":14061,\"journal\":{\"name\":\"International Journal of Forecasting\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2023-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0169207023000997/pdfft?md5=bd26faa9dd0165399770a39be8802f6a&pid=1-s2.0-S0169207023000997-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Forecasting\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169207023000997\",\"RegionNum\":2,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Forecasting","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169207023000997","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

摘要

二元结果的概率预测,通常被称为概率分类器或置信度分数,在科学和社会中无处不在,而评估和比较它们的方法需求量很大。我们提出并研究了一种诊断图形的三要素,其重点是预测性能的不同方面和互补方面:可靠性曲线解决校准问题,接收者操作特征曲线(ROC)诊断辨别能力,墨菲曲线直观显示整体预测性能和价值。墨菲曲线显示预测的平均基本分数,包括广泛使用的误分类率,墨菲曲线下的面积等于平均布赖尔分数。对于经过校准的预测,可靠性曲线位于对角线上,对于经过校准的竞争预测,ROC 曲线和墨菲曲线的交叉点数量相同。我们采用最近开发的 CORP(基于算法的一致性、最佳分档、可重复性和池相邻违规者(PAV))方法来制作可靠性曲线,并将平均得分分解为误判(MCB)、判别(DSC)和不确定性(UNC)三个部分。辨别能力的 DSC 指标与校准指标 MCB 的对比图直观地显示了分类器在多个竞争对手中的表现。天体物理学、经济学和社会科学领域的经验实例对所提出的工具进行了说明。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Evaluating probabilistic classifiers: The triptych

Probability forecasts for binary outcomes, often referred to as probabilistic classifiers or confidence scores, are ubiquitous in science and society, and methods for evaluating and comparing them are in great demand. We propose and study a triptych of diagnostic graphics focusing on distinct and complementary aspects of forecast performance: Reliability curves address calibration, receiver operating characteristic (ROC) curves diagnose discrimination ability, and Murphy curves visualize overall predictive performance and value. A Murphy curve shows a forecast’s mean elementary scores, including the widely used misclassification rate, and the area under a Murphy curve equals the mean Brier score. For a calibrated forecast, the reliability curve lies on the diagonal, and for competing calibrated forecasts, the ROC and Murphy curves share the same number of crossing points. We invoke the recently developed CORP (Consistent, Optimally binned, Reproducible, and Pool-Adjacent-Violators (PAV) algorithm-based) approach to craft reliability curves and decompose a mean score into miscalibration (MCB), discrimination (DSC), and uncertainty (UNC) components. Plots of the DSC measure of discrimination ability versus the calibration metric MCB visualize classifier performance across multiple competitors. The proposed tools are illustrated in empirical examples from astrophysics, economics, and social science.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
17.10
自引率
11.40%
发文量
189
审稿时长
77 days
期刊介绍: The International Journal of Forecasting is a leading journal in its field that publishes high quality refereed papers. It aims to bridge the gap between theory and practice, making forecasting useful and relevant for decision and policy makers. The journal places strong emphasis on empirical studies, evaluation activities, implementation research, and improving the practice of forecasting. It welcomes various points of view and encourages debate to find solutions to field-related problems. The journal is the official publication of the International Institute of Forecasters (IIF) and is indexed in Sociological Abstracts, Journal of Economic Literature, Statistical Theory and Method Abstracts, INSPEC, Current Contents, UMI Data Courier, RePEc, Academic Journal Guide, CIS, IAOR, and Social Sciences Citation Index.
期刊最新文献
On memory-augmented gated recurrent unit network Editorial Board A framework for timely and accessible long-term forecasting of shale gas production based on time series pattern matching Editorial Board Forecasting interest rates with shifting endpoints: The role of the functional demographic age distribution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1