一般MDP离线强化学习Minmax变量的有限样本分析

IEEE open journal of control systems Pub Date : 2022-08-16 DOI:10.1109/OJCSYS.2022.3198660

Jayanth Reddy Regatti;Abhishek Gupta

{"title":"一般MDP离线强化学习Minmax变量的有限样本分析","authors":"Jayanth Reddy Regatti;Abhishek Gupta","doi":"10.1109/OJCSYS.2022.3198660","DOIUrl":null,"url":null,"abstract":"In this work, we analyze the finite sample complexity bounds for offline reinforcement learning with general state, general function space and state-dependent action sets. The algorithm analyzed does not require the knowledge of the data-collection policy as compared to earlier works. We show that one can compute an \n<inline-formula><tex-math>$\\epsilon$</tex-math></inline-formula>\n-optimal Q function (state-action value function) using \n<inline-formula><tex-math>$O(1/\\epsilon ^{4})$</tex-math></inline-formula>\n i.i.d. samples of state-action-reward-next state tuples.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"1 ","pages":"152-163"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9683993/09857559.pdf","citationCount":"0","resultStr":"{\"title\":\"Finite Sample Analysis of Minmax Variant of Offline Reinforcement Learning for General MDPs\",\"authors\":\"Jayanth Reddy Regatti;Abhishek Gupta\",\"doi\":\"10.1109/OJCSYS.2022.3198660\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we analyze the finite sample complexity bounds for offline reinforcement learning with general state, general function space and state-dependent action sets. The algorithm analyzed does not require the knowledge of the data-collection policy as compared to earlier works. We show that one can compute an \\n<inline-formula><tex-math>$\\\\epsilon$</tex-math></inline-formula>\\n-optimal Q function (state-action value function) using \\n<inline-formula><tex-math>$O(1/\\\\epsilon ^{4})$</tex-math></inline-formula>\\n i.i.d. samples of state-action-reward-next state tuples.\",\"PeriodicalId\":73299,\"journal\":{\"name\":\"IEEE open journal of control systems\",\"volume\":\"1 \",\"pages\":\"152-163\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/iel7/9552933/9683993/09857559.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of control systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/9857559/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of control systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/9857559/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在这项工作中，我们分析了具有一般状态、一般函数空间和状态相关动作集的离线强化学习的有限样本复杂度边界。与早期的工作相比，所分析的算法不需要数据收集策略的知识。我们证明了可以使用状态动作奖励下一个状态元组的$O（1/\epsilon^{4}）$i.i.d.样本来计算$\epsilon$最优Q函数（状态动作值函数）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Finite Sample Analysis of Minmax Variant of Offline Reinforcement Learning for General MDPs

In this work, we analyze the finite sample complexity bounds for offline reinforcement learning with general state, general function space and state-dependent action sets. The algorithm analyzed does not require the knowledge of the data-collection policy as compared to earlier works. We show that one can compute an

$\epsilon$

-optimal Q function (state-action value function) using

$O(1/\epsilon ^{4})$

i.i.d. samples of state-action-reward-next state tuples.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊