Hypernetworks in Meta-Reinforcement Learning

Conference on Robot Learning Pub Date : 2022-10-20 DOI:10.48550/arXiv.2210.11348

Jacob Beck, M. Jackson, Risto Vuorio, Shimon Whiteson

{"title":"Hypernetworks in Meta-Reinforcement Learning","authors":"Jacob Beck, M. Jackson, Risto Vuorio, Shimon Whiteson","doi":"10.48550/arXiv.2210.11348","DOIUrl":null,"url":null,"abstract":"Training a reinforcement learning (RL) agent on a real-world robotics task remains generally impractical due to sample inefficiency. Multi-task RL and meta-RL aim to improve sample efficiency by generalizing over a distribution of related tasks. However, doing so is difficult in practice: In multi-task RL, state of the art methods often fail to outperform a degenerate solution that simply learns each task separately. Hypernetworks are a promising path forward since they replicate the separate policies of the degenerate solution while also allowing for generalization across tasks, and are applicable to meta-RL. However, evidence from supervised learning suggests hypernetwork performance is highly sensitive to the initialization. In this paper, we 1) show that hypernetwork initialization is also a critical factor in meta-RL, and that naive initializations yield poor performance; 2) propose a novel hypernetwork initialization scheme that matches or exceeds the performance of a state-of-the-art approach proposed for supervised settings, as well as being simpler and more general; and 3) use this method to show that hypernetworks can improve performance in meta-RL by evaluating on multiple simulated robotics benchmarks.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Robot Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.11348","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Training a reinforcement learning (RL) agent on a real-world robotics task remains generally impractical due to sample inefficiency. Multi-task RL and meta-RL aim to improve sample efficiency by generalizing over a distribution of related tasks. However, doing so is difficult in practice: In multi-task RL, state of the art methods often fail to outperform a degenerate solution that simply learns each task separately. Hypernetworks are a promising path forward since they replicate the separate policies of the degenerate solution while also allowing for generalization across tasks, and are applicable to meta-RL. However, evidence from supervised learning suggests hypernetwork performance is highly sensitive to the initialization. In this paper, we 1) show that hypernetwork initialization is also a critical factor in meta-RL, and that naive initializations yield poor performance; 2) propose a novel hypernetwork initialization scheme that matches or exceeds the performance of a state-of-the-art approach proposed for supervised settings, as well as being simpler and more general; and 3) use this method to show that hypernetworks can improve performance in meta-RL by evaluating on multiple simulated robotics benchmarks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

元强化学习中的超网络

由于样本效率低下，在现实世界的机器人任务上训练强化学习(RL)代理通常是不切实际的。多任务强化学习和元强化学习旨在通过对相关任务的分布进行泛化来提高样本效率。然而，这样做在实践中是困难的:在多任务强化学习中，最先进的方法往往无法胜过简单地分别学习每个任务的退化解决方案。超级网络是一条很有前途的道路，因为它们复制了退化解决方案的单独策略，同时也允许跨任务的泛化，并且适用于元强化学习。然而，来自监督学习的证据表明，超网络的性能对初始化高度敏感。在本文中，我们1)证明了超网络初始化也是元强化学习中的一个关键因素，并且朴素初始化会产生较差的性能;2)提出一种新的超网络初始化方案，该方案匹配或超过了为监督设置提出的最先进方法的性能，并且更简单，更通用;3)通过对多个模拟机器人基准的评估，使用该方法表明超网络可以提高元强化学习的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Conference on Robot Learning

自引率

0.00%

发文量

期刊最新文献

MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models Lidar Line Selection with Spatially-Aware Shapley Value for Cost-Efficient Depth Completion Safe Robot Learning in Assistive Devices through Neural Network Repair COACH: Cooperative Robot Teaching Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping