Conformal prediction beyond exchangeability

R. Barber, E. Candès, Aaditya Ramdas, R. Tibshirani
{"title":"Conformal prediction beyond exchangeability","authors":"R. Barber, E. Candès, Aaditya Ramdas, R. Tibshirani","doi":"10.1214/23-aos2276","DOIUrl":null,"url":null,"abstract":"Conformal prediction is a popular, modern technique for providing valid predictive inference for arbitrary machine learning models. Its validity relies on the assumptions of exchangeability of the data, and symmetry of the given model fitting algorithm as a function of the data. However, exchangeability is often violated when predictive models are deployed in practice. For example, if the data distribution drifts over time, then the data points are no longer exchangeable; moreover, in such settings, we might want to use a nonsymmetric algorithm that treats recent observations as more relevant. This paper generalizes conformal prediction to deal with both aspects: we employ weighted quantiles to introduce robustness against distribution drift, and design a new randomization technique to allow for algorithms that do not treat data points symmetrically. Our new methods are provably robust, with substantially less loss of coverage when exchangeability is violated due to distribution drift or other challenging features of real data, while also achieving the same coverage guarantees as existing conformal prediction methods if the data points are in fact exchangeable. We demonstrate the practical utility of these new tools with simulations and real-data experiments on electricity and election forecasting.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"36 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"83","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Annals of Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/23-aos2276","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 83

Abstract

Conformal prediction is a popular, modern technique for providing valid predictive inference for arbitrary machine learning models. Its validity relies on the assumptions of exchangeability of the data, and symmetry of the given model fitting algorithm as a function of the data. However, exchangeability is often violated when predictive models are deployed in practice. For example, if the data distribution drifts over time, then the data points are no longer exchangeable; moreover, in such settings, we might want to use a nonsymmetric algorithm that treats recent observations as more relevant. This paper generalizes conformal prediction to deal with both aspects: we employ weighted quantiles to introduce robustness against distribution drift, and design a new randomization technique to allow for algorithms that do not treat data points symmetrically. Our new methods are provably robust, with substantially less loss of coverage when exchangeability is violated due to distribution drift or other challenging features of real data, while also achieving the same coverage guarantees as existing conformal prediction methods if the data points are in fact exchangeable. We demonstrate the practical utility of these new tools with simulations and real-data experiments on electricity and election forecasting.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
超越互换性的保角预测
保形预测是一种流行的现代技术,用于为任意机器学习模型提供有效的预测推理。它的有效性依赖于数据可交换性的假设,以及给定模型拟合算法作为数据函数的对称性。然而,在实践中部署预测模型时,互换性经常被破坏。例如,如果数据分布随着时间的推移而漂移,那么数据点就不再是可交换的;此外,在这种情况下,我们可能希望使用一种非对称算法,将最近的观察结果视为更相关的。本文推广了保形预测来处理这两个方面:我们使用加权分位数来引入抗分布漂移的鲁棒性,并设计了一种新的随机化技术来允许不对称处理数据点的算法。我们的新方法被证明是鲁棒的,当由于分布漂移或真实数据的其他具有挑战性的特征而违反可交换性时,覆盖损失大大减少,同时如果数据点实际上是可交换的,也可以实现与现有保形预测方法相同的覆盖保证。我们通过模拟和实际数据实验证明了这些新工具在电力和选举预测方面的实际效用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Maximum likelihood for high-noise group orbit estimation and single-particle cryo-EM Local Whittle estimation of high-dimensional long-run variance and precision matrices Efficient estimation of the maximal association between multiple predictors and a survival outcome The impacts of unobserved covariates on covariate-adaptive randomized experiments Estimation of expected Euler characteristic curves of nonstationary smooth random fields
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1