Finite Sample Analysis of Minmax Variant of Offline Reinforcement Learning for General MDPs

IEEE open journal of control systems Pub Date : 2022-08-16 DOI:10.1109/OJCSYS.2022.3198660

Jayanth Reddy Regatti;Abhishek Gupta

引用次数: 0

Abstract

In this work, we analyze the finite sample complexity bounds for offline reinforcement learning with general state, general function space and state-dependent action sets. The algorithm analyzed does not require the knowledge of the data-collection policy as compared to earlier works. We show that one can compute an

$\epsilon$

-optimal Q function (state-action value function) using

$O(1/\epsilon ^{4})$

i.i.d. samples of state-action-reward-next state tuples.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一般MDP离线强化学习Minmax变量的有限样本分析

在这项工作中，我们分析了具有一般状态、一般函数空间和状态相关动作集的离线强化学习的有限样本复杂度边界。与早期的工作相比，所分析的算法不需要数据收集策略的知识。我们证明了可以使用状态动作奖励下一个状态元组的$O（1/\epsilon^{4}）$i.i.d.样本来计算$\epsilon$最优Q函数（状态动作值函数）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE open journal of control systems

自引率

0.00%

发文量

期刊最新文献

On the Equivalence of Sensory and Incremental Nonlinear Dynamic Inversion Lyapunov-Based Nonlinear Model Predictive Control of Input-Delayed Functional Electrical Stimulation: Investigative Simulations and Experiments Front Cover IEEE Control Systems Society Information IEEE Open Journal of Control Systems Publication Information