基于模型的基于计数保守的离线强化学习

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning Pub Date : 2023-07-21 DOI:10.48550/arXiv.2307.11352

Byeongchang Kim, Min-hwan Oh

{"title":"基于模型的基于计数保守的离线强化学习","authors":"Byeongchang Kim, Min-hwan Oh","doi":"10.48550/arXiv.2307.11352","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a model-based offline reinforcement learning method that integrates count-based conservatism, named $\\texttt{Count-MORL}$. Our method utilizes the count estimates of state-action pairs to quantify model estimation error, marking the first algorithm of demonstrating the efficacy of count-based conservatism in model-based offline deep RL to the best of our knowledge. For our proposed method, we first show that the estimation error is inversely proportional to the frequency of state-action pairs. Secondly, we demonstrate that the learned policy under the count-based conservative model offers near-optimality performance guarantees. Through extensive numerical experiments, we validate that $\\texttt{Count-MORL}$ with hash code implementation significantly outperforms existing offline RL algorithms on the D4RL benchmark datasets. The code is accessible at $\\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}$.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"38 1","pages":"16728-16746"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Model-based Offline Reinforcement Learning with Count-based Conservatism\",\"authors\":\"Byeongchang Kim, Min-hwan Oh\",\"doi\":\"10.48550/arXiv.2307.11352\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a model-based offline reinforcement learning method that integrates count-based conservatism, named $\\\\texttt{Count-MORL}$. Our method utilizes the count estimates of state-action pairs to quantify model estimation error, marking the first algorithm of demonstrating the efficacy of count-based conservatism in model-based offline deep RL to the best of our knowledge. For our proposed method, we first show that the estimation error is inversely proportional to the frequency of state-action pairs. Secondly, we demonstrate that the learned policy under the count-based conservative model offers near-optimality performance guarantees. Through extensive numerical experiments, we validate that $\\\\texttt{Count-MORL}$ with hash code implementation significantly outperforms existing offline RL algorithms on the D4RL benchmark datasets. The code is accessible at $\\\\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}$.\",\"PeriodicalId\":74529,\"journal\":{\"name\":\"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning\",\"volume\":\"38 1\",\"pages\":\"16728-16746\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2307.11352\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2307.11352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在本文中，我们提出了一种基于模型的离线强化学习方法，该方法集成了基于计数的保守性，命名为$\texttt{Count-MORL}$。我们的方法利用状态-动作对的计数估计来量化模型估计误差，这标志着据我们所知，第一个在基于模型的离线深度强化学习中证明基于计数的保守性有效性的算法。对于我们提出的方法，我们首先证明了估计误差与状态-动作对的频率成反比。其次，我们证明了在基于计数的保守模型下学习的策略提供了接近最优的性能保证。通过大量的数值实验，我们验证了$\texttt{Count-MORL}$与哈希码实现在D4RL基准数据集上显著优于现有的离线RL算法。代码可在$\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}$上访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Model-based Offline Reinforcement Learning with Count-based Conservatism

In this paper, we propose a model-based offline reinforcement learning method that integrates count-based conservatism, named $\texttt{Count-MORL}$. Our method utilizes the count estimates of state-action pairs to quantify model estimation error, marking the first algorithm of demonstrating the efficacy of count-based conservatism in model-based offline deep RL to the best of our knowledge. For our proposed method, we first show that the estimation error is inversely proportional to the frequency of state-action pairs. Secondly, we demonstrate that the learned policy under the count-based conservative model offers near-optimality performance guarantees. Through extensive numerical experiments, we validate that $\texttt{Count-MORL}$ with hash code implementation significantly outperforms existing offline RL algorithms on the D4RL benchmark datasets. The code is accessible at $\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}$.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

自引率

0.00%

发文量