How Flawed is ECE? An Analysis via Logit Smoothing

ArXiv Pub Date : 2024-02-15 DOI:10.48550/arXiv.2402.10046
Muthu Chidambaram, Holden Lee, Colin McSwiggen, Semon Rezchikov
{"title":"How Flawed is ECE? An Analysis via Logit Smoothing","authors":"Muthu Chidambaram, Holden Lee, Colin McSwiggen, Semon Rezchikov","doi":"10.48550/arXiv.2402.10046","DOIUrl":null,"url":null,"abstract":"Informally, a model is calibrated if its predictions are correct with a probability that matches the confidence of the prediction. By far the most common method in the literature for measuring calibration is the expected calibration error (ECE). Recent work, however, has pointed out drawbacks of ECE, such as the fact that it is discontinuous in the space of predictors. In this work, we ask: how fundamental are these issues, and what are their impacts on existing results? Towards this end, we completely characterize the discontinuities of ECE with respect to general probability measures on Polish spaces. We then use the nature of these discontinuities to motivate a novel continuous, easily estimated miscalibration metric, which we term Logit-Smoothed ECE (LS-ECE). By comparing the ECE and LS-ECE of pre-trained image classification models, we show in initial experiments that binned ECE closely tracks LS-ECE, indicating that the theoretical pathologies of ECE may be avoidable in practice.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2402.10046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Informally, a model is calibrated if its predictions are correct with a probability that matches the confidence of the prediction. By far the most common method in the literature for measuring calibration is the expected calibration error (ECE). Recent work, however, has pointed out drawbacks of ECE, such as the fact that it is discontinuous in the space of predictors. In this work, we ask: how fundamental are these issues, and what are their impacts on existing results? Towards this end, we completely characterize the discontinuities of ECE with respect to general probability measures on Polish spaces. We then use the nature of these discontinuities to motivate a novel continuous, easily estimated miscalibration metric, which we term Logit-Smoothed ECE (LS-ECE). By comparing the ECE and LS-ECE of pre-trained image classification models, we show in initial experiments that binned ECE closely tracks LS-ECE, indicating that the theoretical pathologies of ECE may be avoidable in practice.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
欧洲经委会有多大缺陷?对数平滑分析
非正式地讲,如果模型预测正确的概率与预测的置信度相匹配,那么该模型就是经过校准的。迄今为止,文献中最常用的校准测量方法是预期校准误差(ECE)。然而,最近的研究指出了 ECE 的缺点,例如它在预测因子空间中是不连续的。在这项工作中,我们要问:这些问题有多根本,它们对现有结果有什么影响?为此,我们完全描述了 ECE 在波兰空间上的一般概率度量的不连续性。然后,我们利用这些不连续性的性质,提出了一种新颖的连续、易于估计的误判度量,我们称之为 Logit 平滑 ECE (LS-ECE)。通过比较预先训练好的图像分类模型的 ECE 和 LS-ECE,我们在初步实验中发现,二进制 ECE 与 LS-ECE 非常接近,这表明 ECE 的理论缺陷在实践中是可以避免的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Learning temporal relationships between symbols with Laplace Neural Manifolds. Probabilistic Genotype-Phenotype Maps Reveal Mutational Robustness of RNA Folding, Spin Glasses, and Quantum Circuits. Reliability of energy landscape analysis of resting-state functional MRI data. The Dynamic Sensorium competition for predicting large-scale mouse visual cortex activity from videos. LinearAlifold: Linear-Time Consensus Structure Prediction for RNA Alignments.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1