Speedy Categorical Distributional Reinforcement Learning and Complexity Analysis

IF 2.6 Q1 MATHEMATICS, APPLIED SIAM journal on mathematics of data science Pub Date : 2022-06-01 DOI:10.1137/20m1364436
Markus Böck, C. Heitzinger
{"title":"Speedy Categorical Distributional Reinforcement Learning and Complexity Analysis","authors":"Markus Böck, C. Heitzinger","doi":"10.1137/20m1364436","DOIUrl":null,"url":null,"abstract":". In distributional reinforcement learning, the entire distribution of the return instead of just the expected return is modeled. The approach with categorical distributions as the approximation method is well-known in Q-learning, and convergence results have been established in the tabular case. In this work, speedy Q-learning is extended to categorical distributions, a finite-time analysis is performed, and probably approximately correct bounds in terms of the Cram´er distance are established. It is shown that also in the distributional case the new update rule yields faster policy evaluation in comparison to the standard Q-learning one and that the sample complexity is essentially the same as the one of the value-based algorithmic counterpart. Without the need for more state-action-reward samples, one gains significantly more information about the return with categorical distributions. Even though the results do not easily extend to the case of policy control, a slight modification to the update rule yields promising numerical results.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"11 1","pages":"675-693"},"PeriodicalIF":2.6000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM journal on mathematics of data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/20m1364436","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 2

Abstract

. In distributional reinforcement learning, the entire distribution of the return instead of just the expected return is modeled. The approach with categorical distributions as the approximation method is well-known in Q-learning, and convergence results have been established in the tabular case. In this work, speedy Q-learning is extended to categorical distributions, a finite-time analysis is performed, and probably approximately correct bounds in terms of the Cram´er distance are established. It is shown that also in the distributional case the new update rule yields faster policy evaluation in comparison to the standard Q-learning one and that the sample complexity is essentially the same as the one of the value-based algorithmic counterpart. Without the need for more state-action-reward samples, one gains significantly more information about the return with categorical distributions. Even though the results do not easily extend to the case of policy control, a slight modification to the update rule yields promising numerical results.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
快速分类分布强化学习与复杂性分析
. 在分布式强化学习中,建模的是整个收益的分布,而不仅仅是预期收益。在q学习中,以分类分布作为近似方法的方法是众所周知的,并且在表格情况下已经建立了收敛结果。在这项工作中,快速q -学习扩展到分类分布,执行有限时间分析,并根据克拉姆距离建立了可能近似正确的界限。结果表明,在分布式情况下,与标准q -学习规则相比,新的更新规则产生更快的策略评估,并且样本复杂性本质上与基于值的算法相同。不需要更多的状态-行动-奖励样本,就可以通过分类分布获得更多关于回报的信息。尽管结果不容易扩展到策略控制的情况,但对更新规则的稍微修改会产生有希望的数值结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Supervised Gromov-Wasserstein Optimal Transport with Metric-Preserving Constraints. Entropic Optimal Transport on Random Graphs A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors Approximating Probability Distributions by Using Wasserstein Generative Adversarial Networks Adversarial Robustness of Sparse Local Lipschitz Predictors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1