Conference on Robot Learning最新文献_第6页

Learning Robust Real-World Dexterous Grasping Policies via Implicit Shape Augmentation 通过隐式形状增强学习稳健的现实世界灵巧抓取策略

Conference on Robot Learning

Pub Date : 2022-10-24 DOI: 10.48550/arXiv.2210.13638

Zoey Chen, Karl Van Wyk, Yu-Wei Chao, Wei Yang, Arsalan Mousavian, Abhishek Gupta, D. Fox

Dexterous robotic hands have the capability to interact with a wide variety of household objects to perform tasks like grasping. However, learning robust real world grasping policies for arbitrary objects has proven challenging due to the difficulty of generating high quality training data. In this work, we propose a learning system (ISAGrasp) for leveraging a small number of human demonstrations to bootstrap the generation of a much larger dataset containing successful grasps on a variety of novel objects. Our key insight is to use a correspondence-aware implicit generative model to deform object meshes and demonstrated human grasps in order to generate a diverse dataset of novel objects and successful grasps for supervised learning, while maintaining semantic realism. We use this dataset to train a robust grasping policy in simulation which can be deployed in the real world. We demonstrate grasping performance with a four-fingered Allegro hand in both simulation and the real world, and show this method can handle entirely new semantic classes and achieve a 79% success rate on grasping unseen objects in the real world.

灵巧的机器人手有能力与各种各样的家庭物品进行交互，以执行抓取等任务。然而，由于难以生成高质量的训练数据，学习针对任意对象的鲁棒抓取策略被证明是具有挑战性的。在这项工作中，我们提出了一个学习系统(ISAGrasp)，用于利用少量的人类演示来引导生成一个更大的数据集，其中包含对各种新对象的成功掌握。我们的关键见解是使用对应感知的隐式生成模型来变形对象网格并演示人类抓取，以便生成新对象的多样化数据集和监督学习的成功抓取，同时保持语义真实感。我们使用该数据集在模拟中训练一个可以在现实世界中部署的鲁棒抓取策略。我们在模拟和现实世界中展示了四指快板手的抓取性能，并表明该方法可以处理全新的语义类，并在现实世界中抓取未见过的物体时达到79%的成功率。

{"title":"Learning Robust Real-World Dexterous Grasping Policies via Implicit Shape Augmentation","authors":"Zoey Chen, Karl Van Wyk, Yu-Wei Chao, Wei Yang, Arsalan Mousavian, Abhishek Gupta, D. Fox","doi":"10.48550/arXiv.2210.13638","DOIUrl":"https://doi.org/10.48550/arXiv.2210.13638","url":null,"abstract":"Dexterous robotic hands have the capability to interact with a wide variety of household objects to perform tasks like grasping. However, learning robust real world grasping policies for arbitrary objects has proven challenging due to the difficulty of generating high quality training data. In this work, we propose a learning system (ISAGrasp) for leveraging a small number of human demonstrations to bootstrap the generation of a much larger dataset containing successful grasps on a variety of novel objects. Our key insight is to use a correspondence-aware implicit generative model to deform object meshes and demonstrated human grasps in order to generate a diverse dataset of novel objects and successful grasps for supervised learning, while maintaining semantic realism. We use this dataset to train a robust grasping policy in simulation which can be deployed in the real world. We demonstrate grasping performance with a four-fingered Allegro hand in both simulation and the real world, and show this method can handle entirely new semantic classes and achieve a 79% success rate on grasping unseen objects in the real world.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121103093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Motion Policy Networks 运动策略网络

Conference on Robot Learning

Pub Date : 2022-10-21 DOI: 10.48550/arXiv.2210.12209

Adam Fishman, Adithya Murali, Clemens Eppner, Bryan N. Peele, Byron Boots, D. Fox

Collision-free motion generation in unknown environments is a core building block for robot manipulation. Generating such motions is challenging due to multiple objectives; not only should the solutions be optimal, the motion generator itself must be fast enough for real-time performance and reliable enough for practical deployment. A wide variety of methods have been proposed ranging from local controllers to global planners, often being combined to offset their shortcomings. We present an end-to-end neural model called Motion Policy Networks (M$pi$Nets) to generate collision-free, smooth motion from just a single depth camera observation. M$pi$Nets are trained on over 3 million motion planning problems in over 500,000 environments. Our experiments show that M$pi$Nets are significantly faster than global planners while exhibiting the reactivity needed to deal with dynamic scenes. They are 46% better than prior neural planners and more robust than local control policies. Despite being only trained in simulation, M$pi$Nets transfer well to the real robot with noisy partial point clouds. Code and data are publicly available at https://mpinets.github.io.

未知环境下的无碰撞运动生成是机器人操作的核心组成部分。产生这样的运动是具有挑战性的，由于多个目标;不仅解决方案应该是最优的，运动发生器本身必须足够快以实现实时性能，并且足够可靠以进行实际部署。已经提出了各种各样的方法，从局部控制器到全局规划器，经常结合起来抵消它们的缺点。我们提出了一个端到端的神经模型，称为运动策略网络(M$pi$Nets)，仅从单个深度相机观察中生成无碰撞，平滑的运动。M$pi$Nets在超过50万个环境中训练了超过300万个运动规划问题。我们的实验表明，M$pi$Nets在表现出处理动态场景所需的反应性的同时，比全局规划器要快得多。它们比先前的神经规划器好46%，比局部控制策略更健壮。尽管只在模拟中训练过，但M$pi$Nets可以很好地转移到具有嘈杂部分点云的真实机器人上。代码和数据可在https://mpinets.github.io上公开获取。

引用次数: 16

Hypernetworks in Meta-Reinforcement Learning 元强化学习中的超网络

Conference on Robot Learning

Pub Date : 2022-10-20 DOI: 10.48550/arXiv.2210.11348

Jacob Beck, M. Jackson, Risto Vuorio, Shimon Whiteson

Training a reinforcement learning (RL) agent on a real-world robotics task remains generally impractical due to sample inefficiency. Multi-task RL and meta-RL aim to improve sample efficiency by generalizing over a distribution of related tasks. However, doing so is difficult in practice: In multi-task RL, state of the art methods often fail to outperform a degenerate solution that simply learns each task separately. Hypernetworks are a promising path forward since they replicate the separate policies of the degenerate solution while also allowing for generalization across tasks, and are applicable to meta-RL. However, evidence from supervised learning suggests hypernetwork performance is highly sensitive to the initialization. In this paper, we 1) show that hypernetwork initialization is also a critical factor in meta-RL, and that naive initializations yield poor performance; 2) propose a novel hypernetwork initialization scheme that matches or exceeds the performance of a state-of-the-art approach proposed for supervised settings, as well as being simpler and more general; and 3) use this method to show that hypernetworks can improve performance in meta-RL by evaluating on multiple simulated robotics benchmarks.

由于样本效率低下，在现实世界的机器人任务上训练强化学习(RL)代理通常是不切实际的。多任务强化学习和元强化学习旨在通过对相关任务的分布进行泛化来提高样本效率。然而，这样做在实践中是困难的:在多任务强化学习中，最先进的方法往往无法胜过简单地分别学习每个任务的退化解决方案。超级网络是一条很有前途的道路，因为它们复制了退化解决方案的单独策略，同时也允许跨任务的泛化，并且适用于元强化学习。然而，来自监督学习的证据表明，超网络的性能对初始化高度敏感。在本文中，我们1)证明了超网络初始化也是元强化学习中的一个关键因素，并且朴素初始化会产生较差的性能;2)提出一种新的超网络初始化方案，该方案匹配或超过了为监督设置提出的最先进方法的性能，并且更简单，更通用;3)通过对多个模拟机器人基准的评估，使用该方法表明超网络可以提高元强化学习的性能。

{"title":"Hypernetworks in Meta-Reinforcement Learning","authors":"Jacob Beck, M. Jackson, Risto Vuorio, Shimon Whiteson","doi":"10.48550/arXiv.2210.11348","DOIUrl":"https://doi.org/10.48550/arXiv.2210.11348","url":null,"abstract":"Training a reinforcement learning (RL) agent on a real-world robotics task remains generally impractical due to sample inefficiency. Multi-task RL and meta-RL aim to improve sample efficiency by generalizing over a distribution of related tasks. However, doing so is difficult in practice: In multi-task RL, state of the art methods often fail to outperform a degenerate solution that simply learns each task separately. Hypernetworks are a promising path forward since they replicate the separate policies of the degenerate solution while also allowing for generalization across tasks, and are applicable to meta-RL. However, evidence from supervised learning suggests hypernetwork performance is highly sensitive to the initialization. In this paper, we 1) show that hypernetwork initialization is also a critical factor in meta-RL, and that naive initializations yield poor performance; 2) propose a novel hypernetwork initialization scheme that matches or exceeds the performance of a state-of-the-art approach proposed for supervised settings, as well as being simpler and more general; and 3) use this method to show that hypernetworks can improve performance in meta-RL by evaluating on multiple simulated robotics benchmarks.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122494206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Learning and Retrieval from Prior Data for Skill-based Imitation Learning 基于技能模仿学习的先验数据学习与检索

Conference on Robot Learning

Pub Date : 2022-10-20 DOI: 10.48550/arXiv.2210.11435

Soroush Nasiriany, Tian Gao, Ajay Mandlekar, Yuke Zhu

Imitation learning offers a promising path for robots to learn general-purpose behaviors, but traditionally has exhibited limited scalability due to high data supervision requirements and brittle generalization. Inspired by recent advances in multi-task imitation learning, we investigate the use of prior data from previous tasks to facilitate learning novel tasks in a robust, data-efficient manner. To make effective use of the prior data, the robot must internalize knowledge from past experiences and contextualize this knowledge in novel tasks. To that end, we develop a skill-based imitation learning framework that extracts temporally extended sensorimotor skills from prior data and subsequently learns a policy for the target task that invokes these learned skills. We identify several key design choices that significantly improve performance on novel tasks, namely representation learning objectives to enable more predictable skill representations and a retrieval-based data augmentation mechanism to increase the scope of supervision for policy training. On a collection of simulated and real-world manipulation domains, we demonstrate that our method significantly outperforms existing imitation learning and offline reinforcement learning approaches. Videos and code are available at https://ut-austin-rpl.github.io/sailor

模仿学习为机器人学习通用行为提供了一条很有前途的途径，但由于高数据监督要求和脆性泛化，传统上具有有限的可扩展性。受多任务模仿学习最新进展的启发，我们研究了使用先前任务的先验数据来促进以稳健，数据高效的方式学习新任务。为了有效地利用先验数据，机器人必须从过去的经验中内化知识，并将这些知识融入到新的任务中。为此，我们开发了一个基于技能的模仿学习框架，该框架从先前的数据中提取暂时扩展的感觉运动技能，随后为调用这些学习技能的目标任务学习策略。我们确定了几个关键的设计选择，可以显著提高新任务的性能，即表征学习目标，以实现更可预测的技能表征，以及基于检索的数据增强机制，以增加政策培训的监督范围。在模拟和现实世界操作领域的集合上，我们证明了我们的方法显着优于现有的模仿学习和离线强化学习方法。视频和代码可在https://ut-austin-rpl.github.io/sailor上获得

{"title":"Learning and Retrieval from Prior Data for Skill-based Imitation Learning","authors":"Soroush Nasiriany, Tian Gao, Ajay Mandlekar, Yuke Zhu","doi":"10.48550/arXiv.2210.11435","DOIUrl":"https://doi.org/10.48550/arXiv.2210.11435","url":null,"abstract":"Imitation learning offers a promising path for robots to learn general-purpose behaviors, but traditionally has exhibited limited scalability due to high data supervision requirements and brittle generalization. Inspired by recent advances in multi-task imitation learning, we investigate the use of prior data from previous tasks to facilitate learning novel tasks in a robust, data-efficient manner. To make effective use of the prior data, the robot must internalize knowledge from past experiences and contextualize this knowledge in novel tasks. To that end, we develop a skill-based imitation learning framework that extracts temporally extended sensorimotor skills from prior data and subsequently learns a policy for the target task that invokes these learned skills. We identify several key design choices that significantly improve performance on novel tasks, namely representation learning objectives to enable more predictable skill representations and a retrieval-based data augmentation mechanism to increase the scope of supervision for policy training. On a collection of simulated and real-world manipulation domains, we demonstrate that our method significantly outperforms existing imitation learning and offline reinforcement learning approaches. Videos and code are available at https://ut-austin-rpl.github.io/sailor","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128827564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Deep Black-Box Reinforcement Learning with Movement Primitives 运动基元的深度黑盒强化学习

Conference on Robot Learning

Pub Date : 2022-10-18 DOI: 10.48550/arXiv.2210.09622

Fabian Otto, Onur Çelik, Hongyi Zhou, Hanna Ziesche, Ngo Anh Vien, G. Neumann

Episode-based reinforcement learning (ERL) algorithms treat reinforcement learning (RL) as a black-box optimization problem where we learn to select a parameter vector of a controller, often represented as a movement primitive, for a given task descriptor called a context. ERL offers several distinct benefits in comparison to step-based RL. It generates smooth control trajectories, can handle non-Markovian reward definitions, and the resulting exploration in parameter space is well suited for solving sparse reward settings. Yet, the high dimensionality of the movement primitive parameters has so far hampered the effective use of deep RL methods. In this paper, we present a new algorithm for deep ERL. It is based on differentiable trust region layers, a successful on-policy deep RL algorithm. These layers allow us to specify trust regions for the policy update that are solved exactly for each state using convex optimization, which enables policies learning with the high precision required for the ERL. We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks. In doing so, we investigate different reward formulations - dense, sparse, and non-Markovian. While step-based algorithms perform well only on dense rewards, ERL performs favorably on sparse and non-Markovian rewards. Moreover, our results show that the sparse and the non-Markovian rewards are also often better suited to define the desired behavior, allowing us to obtain considerably higher quality policies compared to step-based RL.

基于情节的强化学习(ERL)算法将强化学习(RL)视为一个黑盒优化问题，我们学习为给定的任务描述符上下文选择控制器的参数向量，通常表示为运动原语。与基于步骤的强化学习相比，ERL有几个明显的优点。它生成平滑的控制轨迹，可以处理非马尔可夫奖励定义，并且在参数空间中进行的探索非常适合解决稀疏奖励设置。然而，运动基元参数的高维性阻碍了深度强化学习方法的有效应用。本文提出了一种新的深度ERL算法。该算法基于可微信任域层，是一种成功的基于策略的深度强化学习算法。这些层允许我们为策略更新指定信任区域，使用凸优化对每个状态精确求解，这使得策略学习具有ERL所需的高精度。在许多复杂的模拟机器人控制任务中，我们将ERL算法与最先进的基于步进的算法进行了比较。在此过程中，我们研究了不同的奖励公式-密集，稀疏和非马尔可夫。虽然基于步骤的算法仅在密集奖励上表现良好，但ERL在稀疏和非马尔可夫奖励上表现良好。此外，我们的结果表明，稀疏和非马尔可夫奖励通常也更适合于定义期望的行为，与基于步骤的强化学习相比，我们可以获得更高质量的策略。

{"title":"Deep Black-Box Reinforcement Learning with Movement Primitives","authors":"Fabian Otto, Onur Çelik, Hongyi Zhou, Hanna Ziesche, Ngo Anh Vien, G. Neumann","doi":"10.48550/arXiv.2210.09622","DOIUrl":"https://doi.org/10.48550/arXiv.2210.09622","url":null,"abstract":"Episode-based reinforcement learning (ERL) algorithms treat reinforcement learning (RL) as a black-box optimization problem where we learn to select a parameter vector of a controller, often represented as a movement primitive, for a given task descriptor called a context. ERL offers several distinct benefits in comparison to step-based RL. It generates smooth control trajectories, can handle non-Markovian reward definitions, and the resulting exploration in parameter space is well suited for solving sparse reward settings. Yet, the high dimensionality of the movement primitive parameters has so far hampered the effective use of deep RL methods. In this paper, we present a new algorithm for deep ERL. It is based on differentiable trust region layers, a successful on-policy deep RL algorithm. These layers allow us to specify trust regions for the policy update that are solved exactly for each state using convex optimization, which enables policies learning with the high precision required for the ERL. We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks. In doing so, we investigate different reward formulations - dense, sparse, and non-Markovian. While step-based algorithms perform well only on dense rewards, ERL performs favorably on sparse and non-Markovian rewards. Moreover, our results show that the sparse and the non-Markovian rewards are also often better suited to define the desired behavior, allowing us to obtain considerably higher quality policies compared to step-based RL.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131417653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Inferring Versatile Behavior from Demonstrations by Matching Geometric Descriptors 通过匹配几何描述符从演示中推断通用行为

Conference on Robot Learning

Pub Date : 2022-10-17 DOI: 10.48550/arXiv.2210.08121

Niklas Freymuth, Nicolas Schreiber, P. Becker, Aleksander Taranovic, G. Neumann

Humans intuitively solve tasks in versatile ways, varying their behavior in terms of trajectory-based planning and for individual steps. Thus, they can easily generalize and adapt to new and changing environments. Current Imitation Learning algorithms often only consider unimodal expert demonstrations and act in a state-action-based setting, making it difficult for them to imitate human behavior in case of versatile demonstrations. Instead, we combine a mixture of movement primitives with a distribution matching objective to learn versatile behaviors that match the expert's behavior and versatility. To facilitate generalization to novel task configurations, we do not directly match the agent's and expert's trajectory distributions but rather work with concise geometric descriptors which generalize well to unseen task configurations. We empirically validate our method on various robot tasks using versatile human demonstrations and compare to imitation learning algorithms in a state-action setting as well as a trajectory-based setting. We find that the geometric descriptors greatly help in generalizing to new task configurations and that combining them with our distribution-matching objective is crucial for representing and reproducing versatile behavior.

人类本能地以多种方式解决任务，根据基于轨迹的规划和单个步骤来改变他们的行为。因此，他们可以很容易地概括和适应新的和不断变化的环境。目前的模仿学习算法通常只考虑单模态的专家演示，并在基于状态-行为的设置中进行操作，这使得它们很难在多种演示的情况下模仿人类行为。相反，我们将混合运动原语与分布匹配目标相结合，以学习与专家行为和多功能性相匹配的多用途行为。为了促进对新任务配置的泛化，我们不直接匹配智能体和专家的轨迹分布，而是使用简洁的几何描述符，这些描述符可以很好地泛化到看不见的任务配置。我们使用多种人类演示在各种机器人任务上经验验证了我们的方法，并在状态-动作设置和基于轨迹的设置中比较了模仿学习算法。我们发现几何描述符极大地有助于推广到新的任务配置，并且将它们与我们的分布匹配目标相结合对于表示和再现通用行为至关重要。

{"title":"Inferring Versatile Behavior from Demonstrations by Matching Geometric Descriptors","authors":"Niklas Freymuth, Nicolas Schreiber, P. Becker, Aleksander Taranovic, G. Neumann","doi":"10.48550/arXiv.2210.08121","DOIUrl":"https://doi.org/10.48550/arXiv.2210.08121","url":null,"abstract":"Humans intuitively solve tasks in versatile ways, varying their behavior in terms of trajectory-based planning and for individual steps. Thus, they can easily generalize and adapt to new and changing environments. Current Imitation Learning algorithms often only consider unimodal expert demonstrations and act in a state-action-based setting, making it difficult for them to imitate human behavior in case of versatile demonstrations. Instead, we combine a mixture of movement primitives with a distribution matching objective to learn versatile behaviors that match the expert's behavior and versatility. To facilitate generalization to novel task configurations, we do not directly match the agent's and expert's trajectory distributions but rather work with concise geometric descriptors which generalize well to unseen task configurations. We empirically validate our method on various robot tasks using versatile human demonstrations and compare to imitation learning algorithms in a state-action setting as well as a trajectory-based setting. We find that the geometric descriptors greatly help in generalizing to new task configurations and that combining them with our distribution-matching objective is crucial for representing and reproducing versatile behavior.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125683813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Learning Control Admissibility Models with Graph Neural Networks for Multi-Agent Navigation 基于图神经网络的多智能体导航学习控制容忍度模型

Conference on Robot Learning

Pub Date : 2022-10-17 DOI: 10.48550/arXiv.2210.09378

Chenning Yu, Hong-Den Yu, Sicun Gao

Deep reinforcement learning in continuous domains focuses on learning control policies that map states to distributions over actions that ideally concentrate on the optimal choices in each step. In multi-agent navigation problems, the optimal actions depend heavily on the agents' density. Their interaction patterns grow exponentially with respect to such density, making it hard for learning-based methods to generalize. We propose to switch the learning objectives from predicting the optimal actions to predicting sets of admissible actions, which we call control admissibility models (CAMs), such that they can be easily composed and used for online inference for an arbitrary number of agents. We design CAMs using graph neural networks and develop training methods that optimize the CAMs in the standard model-free setting, with the additional benefit of eliminating the need for reward engineering typically required to balance collision avoidance and goal-reaching requirements. We evaluate the proposed approach in multi-agent navigation environments. We show that the CAM models can be trained in environments with only a few agents and be easily composed for deployment in dense environments with hundreds of agents, achieving better performance than state-of-the-art methods.

连续领域的深度强化学习侧重于学习控制策略，将状态映射到动作的分布，理想情况下集中在每一步的最优选择上。在多智能体导航问题中，最优行为很大程度上取决于智能体的密度。它们的交互模式相对于这样的密度呈指数增长，使得基于学习的方法很难泛化。我们建议将学习目标从预测最优行为转换为预测可接受行为集，我们称之为控制可接受模型(CAMs)，这样它们就可以很容易地组成并用于任意数量的智能体的在线推理。我们使用图神经网络设计凸轮，并开发了在标准无模型环境下优化凸轮的训练方法，另外还有一个好处，即消除了平衡避免碰撞和达到目标要求所需的奖励工程。我们在多智能体导航环境中评估了所提出的方法。我们表明，CAM模型可以在只有几个代理的环境中进行训练，并且可以很容易地组合在具有数百个代理的密集环境中进行部署，从而获得比最先进的方法更好的性能。

{"title":"Learning Control Admissibility Models with Graph Neural Networks for Multi-Agent Navigation","authors":"Chenning Yu, Hong-Den Yu, Sicun Gao","doi":"10.48550/arXiv.2210.09378","DOIUrl":"https://doi.org/10.48550/arXiv.2210.09378","url":null,"abstract":"Deep reinforcement learning in continuous domains focuses on learning control policies that map states to distributions over actions that ideally concentrate on the optimal choices in each step. In multi-agent navigation problems, the optimal actions depend heavily on the agents' density. Their interaction patterns grow exponentially with respect to such density, making it hard for learning-based methods to generalize. We propose to switch the learning objectives from predicting the optimal actions to predicting sets of admissible actions, which we call control admissibility models (CAMs), such that they can be easily composed and used for online inference for an arbitrary number of agents. We design CAMs using graph neural networks and develop training methods that optimize the CAMs in the standard model-free setting, with the additional benefit of eliminating the need for reward engineering typically required to balance collision avoidance and goal-reaching requirements. We evaluate the proposed approach in multi-agent navigation environments. We show that the CAM models can be trained in environments with only a few agents and be easily composed for deployment in dense environments with hundreds of agents, achieving better performance than state-of-the-art methods.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124921830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Eliciting Compatible Demonstrations for Multi-Human Imitation Learning 引出兼容的示范多人模仿学习

Conference on Robot Learning

Pub Date : 2022-10-14 DOI: 10.48550/arXiv.2210.08073

Kanishk Gandhi, Siddharth Karamcheti, Madeline Liao, Dorsa Sadigh

Imitation learning from human-provided demonstrations is a strong approach for learning policies for robot manipulation. While the ideal dataset for imitation learning is homogenous and low-variance -- reflecting a single, optimal method for performing a task -- natural human behavior has a great deal of heterogeneity, with several optimal ways to demonstrate a task. This multimodality is inconsequential to human users, with task variations manifesting as subconscious choices; for example, reaching down, then across to grasp an object, versus reaching across, then down. Yet, this mismatch presents a problem for interactive imitation learning, where sequences of users improve on a policy by iteratively collecting new, possibly conflicting demonstrations. To combat this problem of demonstrator incompatibility, this work designs an approach for 1) measuring the compatibility of a new demonstration given a base policy, and 2) actively eliciting more compatible demonstrations from new users. Across two simulation tasks requiring long-horizon, dexterous manipulation and a real-world"food plating"task with a Franka Emika Panda arm, we show that we can both identify incompatible demonstrations via post-hoc filtering, and apply our compatibility measure to actively elicit compatible demonstrations from new users, leading to improved task success rates across simulated and real environments.

从人类提供的演示中进行模仿学习是学习机器人操作策略的一种强有力的方法。虽然模仿学习的理想数据集是同质和低方差的——反映了执行任务的单一、最佳方法——但自然的人类行为具有很大的异质性，有几种最佳方法来演示任务。这种多模态对人类用户来说是无关紧要的，任务变化表现为潜意识的选择;例如，向下伸手，然后跨过去抓住一个物体，而不是向下伸手，然后向下。然而，这种不匹配给交互式模仿学习带来了问题，其中用户序列通过迭代地收集新的、可能相互冲突的演示来改进策略。为了解决演示不兼容的问题，本工作设计了一种方法，用于1)测量给定基本策略的新演示的兼容性，以及2)积极地从新用户那里引出更兼容的演示。在两个模拟任务中，需要长期的视野，灵巧的操作和现实世界的“食物电镀”任务，我们表明，我们可以通过post-hoc过滤识别不兼容的演示，并应用我们的兼容性度量来主动引出新用户的兼容演示，从而提高模拟和真实环境中的任务成功率。

{"title":"Eliciting Compatible Demonstrations for Multi-Human Imitation Learning","authors":"Kanishk Gandhi, Siddharth Karamcheti, Madeline Liao, Dorsa Sadigh","doi":"10.48550/arXiv.2210.08073","DOIUrl":"https://doi.org/10.48550/arXiv.2210.08073","url":null,"abstract":"Imitation learning from human-provided demonstrations is a strong approach for learning policies for robot manipulation. While the ideal dataset for imitation learning is homogenous and low-variance -- reflecting a single, optimal method for performing a task -- natural human behavior has a great deal of heterogeneity, with several optimal ways to demonstrate a task. This multimodality is inconsequential to human users, with task variations manifesting as subconscious choices; for example, reaching down, then across to grasp an object, versus reaching across, then down. Yet, this mismatch presents a problem for interactive imitation learning, where sequences of users improve on a policy by iteratively collecting new, possibly conflicting demonstrations. To combat this problem of demonstrator incompatibility, this work designs an approach for 1) measuring the compatibility of a new demonstration given a base policy, and 2) actively eliciting more compatible demonstrations from new users. Across two simulation tasks requiring long-horizon, dexterous manipulation and a real-world\"food plating\"task with a Franka Emika Panda arm, we show that we can both identify incompatible demonstrations via post-hoc filtering, and apply our compatibility measure to actively elicit compatible demonstrations from new users, leading to improved task success rates across simulated and real environments.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127281275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

ROS-PyBullet Interface: A Framework for Reliable Contact Simulation and Human-Robot Interaction ROS-PyBullet接口:可靠接触模拟和人机交互的框架

Conference on Robot Learning

Pub Date : 2022-10-13 DOI: 10.48550/arXiv.2210.06887

Christopher E. Mower, Theodoros Stouraitis, João Moura, C. Rauch, Lei Yan, Nazanin Zamani Behabadi, M. Gienger, Tom Kamiel Magda Vercauteren, C. Bergeles, S. Vijayakumar

Reliable contact simulation plays a key role in the development of (semi-)autonomous robots, especially when dealing with contact-rich manipulation scenarios, an active robotics research topic. Besides simulation, components such as sensing, perception, data collection, robot hardware control, human interfaces, etc. are all key enablers towards applying machine learning algorithms or model-based approaches in real world systems. However, there is a lack of software connecting reliable contact simulation with the larger robotics ecosystem (i.e. ROS, Orocos), for a more seamless application of novel approaches, found in the literature, to existing robotic hardware. In this paper, we present the ROS-PyBullet Interface, a framework that provides a bridge between the reliable contact/impact simulator PyBullet and the Robot Operating System (ROS). Furthermore, we provide additional utilities for facilitating Human-Robot Interaction (HRI) in the simulated environment. We also present several use-cases that highlight the capabilities and usefulness of our framework. Please check our video, source code, and examples included in the supplementary material. Our full code base is open source and can be found at https://github.com/cmower/ros_pybullet_interface.

可靠的接触仿真在(半)自主机器人的发展中起着关键作用，特别是在处理接触丰富的操作场景时，这是一个活跃的机器人研究课题。除了仿真之外，传感、感知、数据收集、机器人硬件控制、人机界面等组件都是在现实世界系统中应用机器学习算法或基于模型的方法的关键推动因素。然而，缺乏将可靠的接触模拟与更大的机器人生态系统(即ROS, Orocos)连接起来的软件，以便将文献中发现的新方法更无缝地应用于现有的机器人硬件。在本文中，我们介绍了ROS-PyBullet接口，这是一个框架，它在可靠的接触/冲击模拟器PyBullet和机器人操作系统(ROS)之间提供了一座桥梁。此外，我们还提供了在模拟环境中促进人机交互(HRI)的附加实用程序。我们还提供了几个用例，突出了我们框架的功能和有用性。请查看补充材料中包含的视频、源代码和示例。我们的完整代码库是开源的，可以在https://github.com/cmower/ros_pybullet_interface上找到。

{"title":"ROS-PyBullet Interface: A Framework for Reliable Contact Simulation and Human-Robot Interaction","authors":"Christopher E. Mower, Theodoros Stouraitis, João Moura, C. Rauch, Lei Yan, Nazanin Zamani Behabadi, M. Gienger, Tom Kamiel Magda Vercauteren, C. Bergeles, S. Vijayakumar","doi":"10.48550/arXiv.2210.06887","DOIUrl":"https://doi.org/10.48550/arXiv.2210.06887","url":null,"abstract":"Reliable contact simulation plays a key role in the development of (semi-)autonomous robots, especially when dealing with contact-rich manipulation scenarios, an active robotics research topic. Besides simulation, components such as sensing, perception, data collection, robot hardware control, human interfaces, etc. are all key enablers towards applying machine learning algorithms or model-based approaches in real world systems. However, there is a lack of software connecting reliable contact simulation with the larger robotics ecosystem (i.e. ROS, Orocos), for a more seamless application of novel approaches, found in the literature, to existing robotic hardware. In this paper, we present the ROS-PyBullet Interface, a framework that provides a bridge between the reliable contact/impact simulator PyBullet and the Robot Operating System (ROS). Furthermore, we provide additional utilities for facilitating Human-Robot Interaction (HRI) in the simulated environment. We also present several use-cases that highlight the capabilities and usefulness of our framework. Please check our video, source code, and examples included in the supplementary material. Our full code base is open source and can be found at https://github.com/cmower/ros_pybullet_interface.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125936755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks 有损启示的泛化:利用广泛的离线数据学习视觉运动任务

Conference on Robot Learning

Pub Date : 2022-10-12 DOI: 10.48550/arXiv.2210.06601

Kuan Fang, Patrick Yin, Ashvin Nair, Homer Walke, Ge Yan, S. Levine

The utilization of broad datasets has proven to be crucial for generalization for a wide range of fields. However, how to effectively make use of diverse multi-task data for novel downstream tasks still remains a grand challenge in robotics. To tackle this challenge, we introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data, in combination with online fine-tuning guided by subgoals in learned lossy representation space. When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems. Learned from the broad data, the lossy representation emphasizes task-relevant information about states and goals while abstracting away redundant contexts that hinder generalization. It thus enables subgoal planning for unseen tasks, provides a compact input to the policy, and facilitates reward shaping during fine-tuning. We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.

广泛数据集的利用已被证明对广泛领域的泛化至关重要。然而，如何有效地利用多样化的多任务数据进行新颖的下游任务，仍然是机器人技术面临的一个重大挑战。为了应对这一挑战，我们引入了一个框架，该框架通过在广泛数据上的离线强化学习，结合由学习有损表示空间中的子目标指导的在线微调，为看不见的临时扩展任务获取目标条件策略。当面对新的任务目标时，该框架使用一个功能模型来规划一系列有损表示作为子目标，这些子目标将原始任务分解为更容易的问题。从广泛的数据中学习，有损表示强调关于状态和目标的任务相关信息，同时抽象掉阻碍泛化的冗余上下文。因此，它可以为不可见的任务进行子目标规划，为策略提供紧凑的输入，并在微调期间促进奖励形成。我们表明，我们的框架可以在先前工作的机器人经验的大规模数据集上进行预训练，并有效地对新任务进行微调，完全来自视觉输入，而无需任何手动奖励工程。

{"title":"Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks","authors":"Kuan Fang, Patrick Yin, Ashvin Nair, Homer Walke, Ge Yan, S. Levine","doi":"10.48550/arXiv.2210.06601","DOIUrl":"https://doi.org/10.48550/arXiv.2210.06601","url":null,"abstract":"The utilization of broad datasets has proven to be crucial for generalization for a wide range of fields. However, how to effectively make use of diverse multi-task data for novel downstream tasks still remains a grand challenge in robotics. To tackle this challenge, we introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data, in combination with online fine-tuning guided by subgoals in learned lossy representation space. When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems. Learned from the broad data, the lossy representation emphasizes task-relevant information about states and goals while abstracting away redundant contexts that hinder generalization. It thus enables subgoal planning for unseen tasks, provides a compact input to the policy, and facilitates reward shaping during fine-tuning. We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering.","PeriodicalId":273870,"journal":{"name":"Conference on Robot Learning","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134153647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9