Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

IF 3.1 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Journal of Intelligent & Robotic Systems Pub Date : 2024-07-09 DOI:10.1007/s10846-024-02118-y

Luka Antonyshyn, Sidney Givigi

{"title":"Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards","authors":"Luka Antonyshyn, Sidney Givigi","doi":"10.1007/s10846-024-02118-y","DOIUrl":null,"url":null,"abstract":"<p>Sparse rewards and sample efficiency are open areas of research in the field of reinforcement learning. These problems are especially important when considering applications of reinforcement learning to robotics and other cyber-physical systems. This is so because in these domains many tasks are goal-based and naturally expressed with binary successes and failures, action spaces are large and continuous, and real interactions with the environment are limited. In this work, we propose Deep Value-and-Predictive-Model Control (DVPMC), a model-based predictive reinforcement learning algorithm for continuous control that uses system identification, value function approximation and sampling-based optimization to select actions. The algorithm is evaluated on a dense reward and a sparse reward task. We show that it can match the performance of a predictive control approach to the dense reward problem, and outperforms model-free and model-based learning algorithms on the sparse reward task on the metrics of sample efficiency and performance. We verify the performance of an agent trained in simulation using DVPMC on a real robot playing the reach-avoid game. Video of the experiment can be found here: https://youtu.be/0Q274kcfn4c.</p>","PeriodicalId":54794,"journal":{"name":"Journal of Intelligent & Robotic Systems","volume":"30 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent & Robotic Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10846-024-02118-y","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Sparse rewards and sample efficiency are open areas of research in the field of reinforcement learning. These problems are especially important when considering applications of reinforcement learning to robotics and other cyber-physical systems. This is so because in these domains many tasks are goal-based and naturally expressed with binary successes and failures, action spaces are large and continuous, and real interactions with the environment are limited. In this work, we propose Deep Value-and-Predictive-Model Control (DVPMC), a model-based predictive reinforcement learning algorithm for continuous control that uses system identification, value function approximation and sampling-based optimization to select actions. The algorithm is evaluated on a dense reward and a sparse reward task. We show that it can match the performance of a predictive control approach to the dense reward problem, and outperforms model-free and model-based learning algorithms on the sparse reward task on the metrics of sample efficiency and performance. We verify the performance of an agent trained in simulation using DVPMC on a real robot playing the reach-avoid game. Video of the experiment can be found here: https://youtu.be/0Q274kcfn4c.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度模型的强化学习用于具有密集和稀疏奖励的机器人系统的预测控制

稀疏奖励和样本效率是强化学习领域的开放研究领域。在考虑将强化学习应用于机器人和其他网络物理系统时，这些问题尤为重要。这是因为在这些领域中，许多任务都是基于目标的，并自然地以二进制的成功和失败来表示，行动空间大且连续，而与环境的实际交互是有限的。在这项工作中，我们提出了深度值与预测模型控制（DVPMC），这是一种基于模型的预测强化学习算法，用于连续控制，它使用系统识别、值函数近似和基于采样的优化来选择行动。该算法在密集奖励和稀疏奖励任务中进行了评估。结果表明，在密集奖励问题上，该算法的性能可以与预测控制方法相媲美；在稀疏奖励任务上，该算法在采样效率和性能指标上优于无模型和基于模型的学习算法。我们在一个玩躲避游戏的真实机器人身上验证了使用 DVPMC 模拟训练的代理的性能。实验视频请点击：https://youtu.be/0Q274kcfn4c。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Intelligent & Robotic Systems 工程技术-机器人学

CiteScore

7.00

自引率

9.10%

发文量

219

审稿时长

6 months

期刊介绍： The Journal of Intelligent and Robotic Systems bridges the gap between theory and practice in all areas of intelligent systems and robotics. It publishes original, peer reviewed contributions from initial concept and theory to prototyping to final product development and commercialization. On the theoretical side, the journal features papers focusing on intelligent systems engineering, distributed intelligence systems, multi-level systems, intelligent control, multi-robot systems, cooperation and coordination of unmanned vehicle systems, etc. On the application side, the journal emphasizes autonomous systems, industrial robotic systems, multi-robot systems, aerial vehicles, mobile robot platforms, underwater robots, sensors, sensor-fusion, and sensor-based control. Readers will also find papers on real applications of intelligent and robotic systems (e.g., mechatronics, manufacturing, biomedical, underwater, humanoid, mobile/legged robot and space applications, etc.).