The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning Pub Date : 2023-07-25 DOI:10.48550/arXiv.2307.13332

P. Amortila, Nan Jiang, Csaba Szepesvari

{"title":"The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation","authors":"P. Amortila, Nan Jiang, Csaba Szepesvari","doi":"10.48550/arXiv.2307.13332","DOIUrl":null,"url":null,"abstract":"Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such \\emph{approximation factors} -- especially their optimal form in a given learning problem -- is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as with the weighted $L_2$-norm (where the weighting is the offline state distribution), the $L_\\infty$ norm, the presence vs. absence of state aliasing, and full vs. partial coverage of the state space. We establish the optimal asymptotic approximation factors (up to constants) for all of these settings. In particular, our bounds identify two instance-dependent factors for the $L_2(\\mu)$ norm and only one for the $L_\\infty$ norm, which are shown to dictate the hardness of off-policy evaluation under misspecification.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"19 1","pages":"768-790"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2307.13332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such \emph{approximation factors} -- especially their optimal form in a given learning problem -- is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as with the weighted $L_2$-norm (where the weighting is the offline state distribution), the $L_\infty$ norm, the presence vs. absence of state aliasing, and full vs. partial coverage of the state space. We establish the optimal asymptotic approximation factors (up to constants) for all of these settings. In particular, our bounds identify two instance-dependent factors for the $L_2(\mu)$ norm and only one for the $L_\infty$ norm, which are shown to dictate the hardness of off-policy evaluation under misspecification.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

错定非策略值函数估计中的最优逼近因子

已知强化学习(RL)中的理论保证在函数近似的误规范误差方面受到乘法放大因素的影响。然而，这种\emph{近似因子}的本质——尤其是它们在给定学习问题中的最佳形式——却知之甚少。本文研究了线性离策略值函数估计中的这一问题，其中仍有许多有待解决的问题。我们在广泛的设置范围内研究近似因子，例如加权$L_2$ -范数(其中权重是离线状态分布)，$L_\infty$范数，状态混叠的存在与不存在，以及状态空间的完全覆盖与部分覆盖。我们为所有这些设置建立了最优的渐近近似因子(直至常数)。特别是，我们的界限确定了$L_2(\mu)$规范的两个实例相关因素，而$L_\infty$规范只有一个实例相关因素，这表明了在错误规范下偏离策略评估的硬度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

自引率

0.00%

发文量