Decision theoretic bootstrapping

IF 1.5 4区 工程技术 Q2 ENGINEERING, MULTIDISCIPLINARY International Journal for Uncertainty Quantification Pub Date : 2023-12-01 DOI:10.1615/int.j.uncertaintyquantification.2023038552
Peyman Tavallali, Peyman Tavallali, Hamed Hamze Bajgiran, Danial Esaid, Houman Owhadi
{"title":"Decision theoretic bootstrapping","authors":"Peyman Tavallali, Peyman Tavallali, Hamed Hamze Bajgiran, Danial Esaid, Houman Owhadi","doi":"10.1615/int.j.uncertaintyquantification.2023038552","DOIUrl":null,"url":null,"abstract":"The design and testing of supervised machine learning models combine two fundamental distributions: (1) the training data distribution (2) the testing data distribution. Although these two distributions are identical and identifiable when the data set is infinite, they are imperfectly known when the data is finite (and possibly corrupted), and this uncertainty must be taken into account for robust Uncertainty Quantification (UQ). An important case is when the test distribution is coming from a modal or localized area of the finite sample distribution. We present a general decision-theoretic bootstrapping solution to this problem: (1) partition the available data into a training subset, and a UQ subset (2) take $m$ subsampled subsets of the training set and train $m$ models (3) partition the UQ set into $n$ sorted subsets and take a random fraction of them to define $n$ corresponding empirical distributions $\\mu_{j}$ (4) consider the adversarial game where Player I selects a model $i\\in\\left\\{ 1,\\ldots,m\\right\\} $, Player II selects the UQ distribution $\\mu_{j}$ and Player I receives a loss defined by evaluating the model $i$ against data points sampled from $\\mu_{j}$ (5) identify optimal mixed strategies (probability distributions over models and UQ distributions) for both players. These randomized optimal mixed strategies provide optimal model mixtures, and UQ estimates given the adversarial uncertainty of the training and testing distributions represented by the game. The proposed approach provides (1) some degree of robustness to in-sample distribution localization/concentration (2) conditional probability distributions on the output.","PeriodicalId":48814,"journal":{"name":"International Journal for Uncertainty Quantification","volume":"32 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal for Uncertainty Quantification","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1615/int.j.uncertaintyquantification.2023038552","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

The design and testing of supervised machine learning models combine two fundamental distributions: (1) the training data distribution (2) the testing data distribution. Although these two distributions are identical and identifiable when the data set is infinite, they are imperfectly known when the data is finite (and possibly corrupted), and this uncertainty must be taken into account for robust Uncertainty Quantification (UQ). An important case is when the test distribution is coming from a modal or localized area of the finite sample distribution. We present a general decision-theoretic bootstrapping solution to this problem: (1) partition the available data into a training subset, and a UQ subset (2) take $m$ subsampled subsets of the training set and train $m$ models (3) partition the UQ set into $n$ sorted subsets and take a random fraction of them to define $n$ corresponding empirical distributions $\mu_{j}$ (4) consider the adversarial game where Player I selects a model $i\in\left\{ 1,\ldots,m\right\} $, Player II selects the UQ distribution $\mu_{j}$ and Player I receives a loss defined by evaluating the model $i$ against data points sampled from $\mu_{j}$ (5) identify optimal mixed strategies (probability distributions over models and UQ distributions) for both players. These randomized optimal mixed strategies provide optimal model mixtures, and UQ estimates given the adversarial uncertainty of the training and testing distributions represented by the game. The proposed approach provides (1) some degree of robustness to in-sample distribution localization/concentration (2) conditional probability distributions on the output.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
决策理论引导
有监督机器学习模型的设计和测试结合了两个基本分布:(1) 训练数据分布 (2) 测试数据分布。虽然这两种分布在数据集为无限时是相同且可识别的,但在数据为有限(可能已损坏)时,它们是不完全已知的,因此必须考虑到这种不确定性,以实现稳健的不确定性量化(UQ)。一个重要的情况是,测试分布来自有限样本分布的模态或局部区域。针对这一问题,我们提出了一种通用的决策理论引导解决方案:(1) 将可用数据划分为一个训练子集和一个 UQ 子集 (2) 从训练集中提取 $m$ 子采样子集并训练 $m$ 模型 (3) 将 UQ 集划分为 $n$ 排序子集并从中随机提取一部分来定义 $n$ 相应的经验分布 $\mu_{j}$ (4) 考虑对抗博弈,其中玩家 I 选择一个模型 $i\in\left\{ 1、\玩家 II 选择 UQ 分布 $\mu_{j}$,玩家 I 收到损失,损失的定义是根据从 $\mu_{j}$ 中采样的数据点评估模型 $i$ (5) 为两个玩家确定最优混合策略(模型和 UQ 分布的概率分布)。考虑到博弈所代表的训练和测试分布的对抗不确定性,这些随机化的最优混合策略提供了最优模型混合物和 UQ 估计值。建议的方法提供了(1)对样本内分布定位/集中的一定程度的稳健性(2)输出的条件概率分布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International Journal for Uncertainty Quantification
International Journal for Uncertainty Quantification ENGINEERING, MULTIDISCIPLINARY-MATHEMATICS, INTERDISCIPLINARY APPLICATIONS
CiteScore
3.60
自引率
5.90%
发文量
28
期刊介绍: The International Journal for Uncertainty Quantification disseminates information of permanent interest in the areas of analysis, modeling, design and control of complex systems in the presence of uncertainty. The journal seeks to emphasize methods that cross stochastic analysis, statistical modeling and scientific computing. Systems of interest are governed by differential equations possibly with multiscale features. Topics of particular interest include representation of uncertainty, propagation of uncertainty across scales, resolving the curse of dimensionality, long-time integration for stochastic PDEs, data-driven approaches for constructing stochastic models, validation, verification and uncertainty quantification for predictive computational science, and visualization of uncertainty in high-dimensional spaces. Bayesian computation and machine learning techniques are also of interest for example in the context of stochastic multiscale systems, for model selection/classification, and decision making. Reports addressing the dynamic coupling of modern experiments and modeling approaches towards predictive science are particularly encouraged. Applications of uncertainty quantification in all areas of physical and biological sciences are appropriate.
期刊最新文献
Bayesian³ Active learning for regularized arbitrary multi-element polynomial chaos using information theory Sobol’ sensitivity indices– A Machine Learning approach using the Dynamic Adaptive Variances Estimator with Given Data Extremes of vector-valued processes by finite dimensional models A novel probabilistic transfer learning strategy for polynomial regression Variance-based sensitivity of Bayesian inverse problems to the prior distribution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1