Cross-validation on extreme regions

IF 1.1 3区 数学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Extremes Pub Date : 2024-09-03 DOI:10.1007/s10687-024-00495-z
Anass Aghbalou, Patrice Bertail, François Portier, Anne Sabourin
{"title":"Cross-validation on extreme regions","authors":"Anass Aghbalou, Patrice Bertail, François Portier, Anne Sabourin","doi":"10.1007/s10687-024-00495-z","DOIUrl":null,"url":null,"abstract":"<p>We conduct a non-asymptotic study of the Cross-Validation (CV) estimate of the generalization risk for learning algorithms dedicated to extreme regions of the covariates space. In this context which has recently been analysed from an Extreme Value Analysis perspective, the risk function measures the algorithm’s error given that the norm of the input exceeds a high quantile. The main challenge within this framework is the negligible size of the extreme training sample with respect to the full sample size and the necessity to re-scale the risk function by a probability tending to zero. We open the road to a finite sample understanding of CV for extreme values by establishing two new results: an exponential probability bound on the K-fold CV error and a polynomial probability bound on the leave-p-out CV. Our bounds are sharp in the sense that they match state-of-the-art guarantees for standard CV estimates while extending them to encompass a conditioning event of small probability. We illustrate the significance of our results regarding high dimensional classification in extreme regions via a Lasso-type logistic regression algorithm. The tightness of our bounds is investigated in numerical experiments.</p>","PeriodicalId":49274,"journal":{"name":"Extremes","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Extremes","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10687-024-00495-z","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

We conduct a non-asymptotic study of the Cross-Validation (CV) estimate of the generalization risk for learning algorithms dedicated to extreme regions of the covariates space. In this context which has recently been analysed from an Extreme Value Analysis perspective, the risk function measures the algorithm’s error given that the norm of the input exceeds a high quantile. The main challenge within this framework is the negligible size of the extreme training sample with respect to the full sample size and the necessity to re-scale the risk function by a probability tending to zero. We open the road to a finite sample understanding of CV for extreme values by establishing two new results: an exponential probability bound on the K-fold CV error and a polynomial probability bound on the leave-p-out CV. Our bounds are sharp in the sense that they match state-of-the-art guarantees for standard CV estimates while extending them to encompass a conditioning event of small probability. We illustrate the significance of our results regarding high dimensional classification in extreme regions via a Lasso-type logistic regression algorithm. The tightness of our bounds is investigated in numerical experiments.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
极端区域交叉验证
我们对专门用于协变量空间极端区域的学习算法的泛化风险的交叉验证(CV)估计进行了非渐近研究。在最近从极值分析角度进行分析的这一背景下,风险函数衡量的是输入的常模超过高量值时算法的误差。这一框架的主要挑战在于,相对于全部样本量而言,极端训练样本的大小可以忽略不计,因此必须以趋于零的概率对风险函数进行重新缩放。我们通过建立两个新结果:K 倍 CV 误差的指数概率约束和离散 CV 的多项式概率约束,开启了对极值 CV 的有限样本理解之路。我们的界值非常尖锐,与标准 CV 估计的最新保证相匹配,同时将它们扩展到包括小概率的条件事件。我们通过 Lasso 型逻辑回归算法说明了我们的结果对极端区域高维分类的意义。我们通过数值实验研究了我们的界限的严密性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Extremes
Extremes MATHEMATICS, INTERDISCIPLINARY APPLICATIONS-STATISTICS & PROBABILITY
CiteScore
2.20
自引率
7.70%
发文量
15
审稿时长
>12 weeks
期刊介绍: Extremes publishes original research on all aspects of statistical extreme value theory and its applications in science, engineering, economics and other fields. Authoritative and timely reviews of theoretical advances and of extreme value methods and problems in important applied areas, including detailed case studies, are welcome and will be a regular feature. All papers are refereed. Publication will be swift: in particular electronic submission and correspondence is encouraged. Statistical extreme value methods encompass a very wide range of problems: Extreme waves, rainfall, and floods are of basic importance in oceanography and hydrology, as are high windspeeds and extreme temperatures in meteorology and catastrophic claims in insurance. The waveforms and extremes of random loads determine lifelengths in structural safety, corrosion and metal fatigue.
期刊最新文献
Semiparametric approaches for the inference of univariate and multivariate extremes Modern extreme value statistics for Utopian extremes. EVA (2023) Conference Data Challenge: Team Yalla A utopic adventure in the modelling of conditional univariate and multivariate extremes On Gaussian triangular arrays in the case of strong dependence Cross-validation on extreme regions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1