Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics

Conference on Robot Learning Pub Date : 2022-11-04 DOI:10.48550/arXiv.2211.02231

Krishan Rana, Ming Xu, Brendan Tidd, Michael Milford, N. Sunderhauf

{"title":"Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics","authors":"Krishan Rana, Ming Xu, Brendan Tidd, Michael Milford, N. Sunderhauf","doi":"10.48550/arXiv.2211.02231","DOIUrl":null,"url":null,"abstract":"Skill-based reinforcement learning (RL) has emerged as a promising strategy to leverage prior knowledge for accelerated robot learning. Skills are typically extracted from expert demonstrations and are embedded into a latent space from which they can be sampled as actions by a high-level RL agent. However, this skill space is expansive, and not all skills are relevant for a given robot state, making exploration difficult. Furthermore, the downstream RL agent is limited to learning structurally similar tasks to those used to construct the skill space. We firstly propose accelerating exploration in the skill space using state-conditioned generative models to directly bias the high-level agent towards only sampling skills relevant to a given state based on prior experience. Next, we propose a low-level residual policy for fine-grained skill adaptation enabling downstream RL agents to adapt to unseen task variations. Finally, we validate our approach across four challenging manipulation tasks that differ from those used to build the skill space, demonstrating our ability to learn across task variations while significantly accelerating exploration, outperforming prior works. Code and videos are available on our project website: https://krishanrana.github.io/reskill.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Robot Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2211.02231","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Skill-based reinforcement learning (RL) has emerged as a promising strategy to leverage prior knowledge for accelerated robot learning. Skills are typically extracted from expert demonstrations and are embedded into a latent space from which they can be sampled as actions by a high-level RL agent. However, this skill space is expansive, and not all skills are relevant for a given robot state, making exploration difficult. Furthermore, the downstream RL agent is limited to learning structurally similar tasks to those used to construct the skill space. We firstly propose accelerating exploration in the skill space using state-conditioned generative models to directly bias the high-level agent towards only sampling skills relevant to a given state based on prior experience. Next, we propose a low-level residual policy for fine-grained skill adaptation enabling downstream RL agents to adapt to unseen task variations. Finally, we validate our approach across four challenging manipulation tasks that differ from those used to build the skill space, demonstrating our ability to learn across task variations while significantly accelerating exploration, outperforming prior works. Code and videos are available on our project website: https://krishanrana.github.io/reskill.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

剩余技能策略:学习一个可适应的基于技能的机器人强化学习行动空间

基于技能的强化学习(RL)已经成为利用先验知识加速机器人学习的一种有前途的策略。技能通常是从专家演示中提取出来的，并嵌入到一个潜在空间中，从中它们可以被高级强化学习代理作为动作进行采样。然而，这个技能空间是很广阔的，并不是所有的技能都与给定的机器人状态相关，这使得探索变得困难。此外，下游RL代理仅限于学习结构上与用于构建技能空间的任务相似的任务。我们首先提出使用状态条件生成模型加速技能空间的探索，直接使高级智能体偏向于基于先前经验的与给定状态相关的采样技能。接下来，我们提出了一种用于细粒度技能适应的低级残留策略，使下游RL代理能够适应看不见的任务变化。最后，我们在四个不同于构建技能空间的具有挑战性的操作任务中验证了我们的方法，证明了我们在跨任务变化学习的能力，同时显著加速了探索，超越了之前的工作。代码和视频可在我们的项目网站上获得:https://krishanrana.github.io/reskill。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Conference on Robot Learning

自引率

0.00%

发文量

期刊最新文献

MResT: Multi-Resolution Sensing for Real-Time Control with Vision-Language Models Lidar Line Selection with Spatially-Aware Shapley Value for Cost-Efficient Depth Completion Safe Robot Learning in Assistive Devices through Neural Network Repair COACH: Cooperative Robot Teaching Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping