Simplified Temporal Consistency Reinforcement Learning

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning Pub Date : 2023-06-15 DOI:10.48550/arXiv.2306.09466

Yi Zhao, Wenshuai Zhao, Rinu Boney, Juho Kannala, J. Pajarinen

{"title":"Simplified Temporal Consistency Reinforcement Learning","authors":"Yi Zhao, Wenshuai Zhao, Rinu Boney, Juho Kannala, J. Pajarinen","doi":"10.48550/arXiv.2306.09466","DOIUrl":null,"url":null,"abstract":"Reinforcement learning is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves model learning with planning. Recent methods further utilize policy learning, value estimation, and, self-supervised learning as auxiliary objectives. In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. This applies when using pure planning with a dynamics model conditioned on the representation, but, also when utilizing the representation as policy and value function features in model-free RL. In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1 times faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the DeepMind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods' sample efficiency while training 2.4 times faster.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"4 1","pages":"42227-42246"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.09466","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Reinforcement learning is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves model learning with planning. Recent methods further utilize policy learning, value estimation, and, self-supervised learning as auxiliary objectives. In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. This applies when using pure planning with a dynamics model conditioned on the representation, but, also when utilizing the representation as policy and value function features in model-free RL. In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1 times faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the DeepMind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods' sample efficiency while training 2.4 times faster.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

简化时间一致性强化学习

强化学习能够解决复杂的顺序决策任务，但目前受到样本效率和所需计算量的限制。为了提高样本效率，最近的工作集中在基于模型的强化学习上，它将模型学习与规划交叉起来。最近的方法进一步利用策略学习、价值估计和自我监督学习作为辅助目标。在本文中，我们表明，令人惊讶的是，一种简单的表征学习方法仅依赖于由潜在时间一致性训练的潜在动力学模型，就足以实现高性能的强化学习。这适用于使用纯计划和以表示为条件的动态模型时，也适用于在无模型RL中将表示用作策略和价值函数特征时。在实验中，我们的方法学习了一个精确的动力学模型，通过在线规划器解决具有挑战性的高维运动任务，同时与基于集成的方法相比，训练速度快4.1倍。在没有计划的无模型强化学习中，特别是在高维任务上，如DeepMind控制套件Humanoid和Dog任务，我们的方法大大优于无模型方法，并与基于模型的方法的样本效率相匹配，同时训练速度快2.4倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

自引率

0.00%

发文量