{"title":"一般MDP离线强化学习Minmax变量的有限样本分析","authors":"Jayanth Reddy Regatti;Abhishek Gupta","doi":"10.1109/OJCSYS.2022.3198660","DOIUrl":null,"url":null,"abstract":"In this work, we analyze the finite sample complexity bounds for offline reinforcement learning with general state, general function space and state-dependent action sets. The algorithm analyzed does not require the knowledge of the data-collection policy as compared to earlier works. We show that one can compute an \n<inline-formula><tex-math>$\\epsilon$</tex-math></inline-formula>\n-optimal Q function (state-action value function) using \n<inline-formula><tex-math>$O(1/\\epsilon ^{4})$</tex-math></inline-formula>\n i.i.d. samples of state-action-reward-next state tuples.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"1 ","pages":"152-163"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9683993/09857559.pdf","citationCount":"0","resultStr":"{\"title\":\"Finite Sample Analysis of Minmax Variant of Offline Reinforcement Learning for General MDPs\",\"authors\":\"Jayanth Reddy Regatti;Abhishek Gupta\",\"doi\":\"10.1109/OJCSYS.2022.3198660\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we analyze the finite sample complexity bounds for offline reinforcement learning with general state, general function space and state-dependent action sets. The algorithm analyzed does not require the knowledge of the data-collection policy as compared to earlier works. We show that one can compute an \\n<inline-formula><tex-math>$\\\\epsilon$</tex-math></inline-formula>\\n-optimal Q function (state-action value function) using \\n<inline-formula><tex-math>$O(1/\\\\epsilon ^{4})$</tex-math></inline-formula>\\n i.i.d. samples of state-action-reward-next state tuples.\",\"PeriodicalId\":73299,\"journal\":{\"name\":\"IEEE open journal of control systems\",\"volume\":\"1 \",\"pages\":\"152-163\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/iel7/9552933/9683993/09857559.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of control systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/9857559/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of control systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/9857559/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Finite Sample Analysis of Minmax Variant of Offline Reinforcement Learning for General MDPs
In this work, we analyze the finite sample complexity bounds for offline reinforcement learning with general state, general function space and state-dependent action sets. The algorithm analyzed does not require the knowledge of the data-collection policy as compared to earlier works. We show that one can compute an
$\epsilon$
-optimal Q function (state-action value function) using
$O(1/\epsilon ^{4})$
i.i.d. samples of state-action-reward-next state tuples.