Composable energy policies for reactive motion generation and reinforcement learning

IF 7.5 1区计算机科学 Q1 ROBOTICS International Journal of Robotics Research Pub Date : 2023-06-26 DOI:10.1177/02783649231179499

Julen Urain, Anqi Li, Puze Liu, Carlo D’Eramo, Jan Peters

{"title":"Composable energy policies for reactive motion generation and reinforcement learning","authors":"Julen Urain, Anqi Li, Puze Liu, Carlo D’Eramo, Jan Peters","doi":"10.1177/02783649231179499","DOIUrl":null,"url":null,"abstract":"In this work, we introduce composable energy policies (CEP), a novel framework for multi-objective motion generation. We frame the problem of composing multiple policy components from a probabilistic view. We consider a set of stochastic policies represented in arbitrary task spaces, where each policy represents a distribution of the actions to solve a particular task. Then, we aim to find the action in the configuration space that optimally satisfies all the policy components. The presented framework allows the fusion of motion generators from different sources: optimal control, data-driven policies, motion planning, and handcrafted policies. Classically, the problem of multi-objective motion generation is solved by the composition of a set of deterministic policies, rather than stochastic policies. However, there are common situations where different policy components have conflicting behaviors, leading to oscillations or the robot getting stuck in an undesirable state. While our approach is not directly able to solve the conflicting policies problem, we claim that modeling each policy as a stochastic policy allows more expressive representations for each component in contrast with the classical reactive motion generation approaches. In some tasks, such as reaching a target in a cluttered environment, we show experimentally that CEP additional expressivity allows us to model policies that reduce these conflicting behaviors. A field that benefits from these reactive motion generators is the one of robot reinforcement learning. Integrating these policy architectures with reinforcement learning allows us to include a set of inductive biases in the learning problem. These inductive biases guide the reinforcement learning agent towards informative regions or improve collision safety while exploring. In our work, we show how to integrate our proposed reactive motion generator as a structured policy for reinforcement learning. Combining the reinforcement learning agent exploration with the prior-based CEP, we can improve the learning performance and explore safer.","PeriodicalId":54942,"journal":{"name":"International Journal of Robotics Research","volume":"41 1","pages":"0"},"PeriodicalIF":7.5000,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Robotics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/02783649231179499","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 19

Abstract

In this work, we introduce composable energy policies (CEP), a novel framework for multi-objective motion generation. We frame the problem of composing multiple policy components from a probabilistic view. We consider a set of stochastic policies represented in arbitrary task spaces, where each policy represents a distribution of the actions to solve a particular task. Then, we aim to find the action in the configuration space that optimally satisfies all the policy components. The presented framework allows the fusion of motion generators from different sources: optimal control, data-driven policies, motion planning, and handcrafted policies. Classically, the problem of multi-objective motion generation is solved by the composition of a set of deterministic policies, rather than stochastic policies. However, there are common situations where different policy components have conflicting behaviors, leading to oscillations or the robot getting stuck in an undesirable state. While our approach is not directly able to solve the conflicting policies problem, we claim that modeling each policy as a stochastic policy allows more expressive representations for each component in contrast with the classical reactive motion generation approaches. In some tasks, such as reaching a target in a cluttered environment, we show experimentally that CEP additional expressivity allows us to model policies that reduce these conflicting behaviors. A field that benefits from these reactive motion generators is the one of robot reinforcement learning. Integrating these policy architectures with reinforcement learning allows us to include a set of inductive biases in the learning problem. These inductive biases guide the reinforcement learning agent towards informative regions or improve collision safety while exploring. In our work, we show how to integrate our proposed reactive motion generator as a structured policy for reinforcement learning. Combining the reinforcement learning agent exploration with the prior-based CEP, we can improve the learning performance and explore safer.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

反应性运动生成和强化学习的可组合能量策略

在这项工作中，我们引入了可组合能量策略(CEP)，这是一种新的多目标运动生成框架。我们从概率的角度来描述组合多个策略组件的问题。我们考虑在任意任务空间中表示的一组随机策略，其中每个策略表示解决特定任务的操作的分布。然后，我们的目标是在配置空间中找到最优地满足所有策略组件的操作。所提出的框架允许融合来自不同来源的运动生成器:最优控制、数据驱动策略、运动规划和手工制作策略。经典的多目标运动生成问题是通过一组确定性策略的组合来解决的，而不是随机策略。然而，通常情况下，不同的策略组件具有冲突的行为，导致振荡或机器人陷入不希望的状态。虽然我们的方法不能直接解决冲突策略问题，但我们声称，与经典的反应运动生成方法相比，将每个策略建模为随机策略可以为每个组件提供更具表现力的表示。在某些任务中，例如在混乱的环境中到达目标，我们通过实验表明，CEP额外的表现力允许我们对减少这些冲突行为的策略进行建模。从这些反应性运动发生器中受益的一个领域是机器人强化学习。将这些策略架构与强化学习集成，使我们能够在学习问题中包含一组归纳偏差。这些归纳偏差引导强化学习代理进入信息区域或在探索时提高碰撞安全性。在我们的工作中，我们展示了如何将我们提出的反应运动生成器集成为强化学习的结构化策略。将强化学习智能体探索与基于先验的CEP相结合，可以提高学习性能，更安全地进行探索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Robotics Research 工程技术-机器人学

CiteScore

22.20

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： The International Journal of Robotics Research (IJRR) has been a leading peer-reviewed publication in the field for over two decades. It holds the distinction of being the first scholarly journal dedicated to robotics research. IJRR presents cutting-edge and thought-provoking original research papers, articles, and reviews that delve into groundbreaking trends, technical advancements, and theoretical developments in robotics. Renowned scholars and practitioners contribute to its content, offering their expertise and insights. This journal covers a wide range of topics, going beyond narrow technical advancements to encompass various aspects of robotics. The primary aim of IJRR is to publish work that has lasting value for the scientific and technological advancement of the field. Only original, robust, and practical research that can serve as a foundation for further progress is considered for publication. The focus is on producing content that will remain valuable and relevant over time. In summary, IJRR stands as a prestigious publication that drives innovation and knowledge in robotics research.