Model-based Offline Reinforcement Learning with Count-based Conservatism

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning Pub Date : 2023-07-21 DOI:10.48550/arXiv.2307.11352

Byeongchang Kim, Min-hwan Oh

引用次数: 1

Abstract

In this paper, we propose a model-based offline reinforcement learning method that integrates count-based conservatism, named $\texttt{Count-MORL}$. Our method utilizes the count estimates of state-action pairs to quantify model estimation error, marking the first algorithm of demonstrating the efficacy of count-based conservatism in model-based offline deep RL to the best of our knowledge. For our proposed method, we first show that the estimation error is inversely proportional to the frequency of state-action pairs. Secondly, we demonstrate that the learned policy under the count-based conservative model offers near-optimality performance guarantees. Through extensive numerical experiments, we validate that $\texttt{Count-MORL}$ with hash code implementation significantly outperforms existing offline RL algorithms on the D4RL benchmark datasets. The code is accessible at $\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}$.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于模型的基于计数保守的离线强化学习

在本文中，我们提出了一种基于模型的离线强化学习方法，该方法集成了基于计数的保守性，命名为$\texttt{Count-MORL}$。我们的方法利用状态-动作对的计数估计来量化模型估计误差，这标志着据我们所知，第一个在基于模型的离线深度强化学习中证明基于计数的保守性有效性的算法。对于我们提出的方法，我们首先证明了估计误差与状态-动作对的频率成反比。其次，我们证明了在基于计数的保守模型下学习的策略提供了接近最优的性能保证。通过大量的数值实验，我们验证了$\texttt{Count-MORL}$与哈希码实现在D4RL基准数据集上显著优于现有的离线RL算法。代码可在$\href{https://github.com/oh-lab/Count-MORL}{https://github.com/oh-lab/Count-MORL}$上访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

自引率

0.00%

发文量