An online hyper-volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers

IF 4.2 2区计算机科学 Q2 ROBOTICS Journal of Field Robotics Pub Date : 2024-04-28 DOI:10.1002/rob.22355

Ali Aflakian, Alireza Rastegarpanah, Jamie Hathaway, Rustam Stolkin

{"title":"An online hyper-volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers","authors":"Ali Aflakian, Alireza Rastegarpanah, Jamie Hathaway, Rustam Stolkin","doi":"10.1002/rob.22355","DOIUrl":null,"url":null,"abstract":"<p>This paper fuses ideas from reinforcement learning (RL), Learning from Demonstration (LfD), and Ensemble Learning into a single paradigm. Knowledge from a mixture of control algorithms (experts) are used to constrain the action space of the agent, enabling faster RL refining of a control policy, by avoiding unnecessary explorative actions. Domain-specific knowledge of each expert is exploited. However, the resulting policy is robust against errors of individual experts, since it is refined by a RL reward function without copying any particular demonstration. Our method has the potential to supplement existing RLfD methods when multiple algorithmic approaches are available to function as experts, specifically in tasks involving continuous action spaces. We illustrate our method in the context of a visual servoing (VS) task, in which a 7-DoF robot arm is controlled to maintain a desired pose relative to a target object. We explore four methods for bounding the actions of the RL agent during training. These methods include using a hypercube and convex hull with modified loss functions, ignoring actions outside the convex hull, and projecting actions onto the convex hull. We compare the training progress of each method using expert demonstrators, employing one expert demonstrator with the DAgger algorithm, and without using any demonstrators. Our experiments show that using the convex hull with a modified loss function not only accelerates learning but also provides the most optimal solution compared with other approaches. Furthermore, we demonstrate faster VS error convergence while maintaining higher manipulability of the arm, compared with classical image-based VS, position-based VS, and hybrid-decoupled VS.</p>","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"41 6","pages":"1814-1828"},"PeriodicalIF":4.2000,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/rob.22355","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Field Robotics","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rob.22355","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper fuses ideas from reinforcement learning (RL), Learning from Demonstration (LfD), and Ensemble Learning into a single paradigm. Knowledge from a mixture of control algorithms (experts) are used to constrain the action space of the agent, enabling faster RL refining of a control policy, by avoiding unnecessary explorative actions. Domain-specific knowledge of each expert is exploited. However, the resulting policy is robust against errors of individual experts, since it is refined by a RL reward function without copying any particular demonstration. Our method has the potential to supplement existing RLfD methods when multiple algorithmic approaches are available to function as experts, specifically in tasks involving continuous action spaces. We illustrate our method in the context of a visual servoing (VS) task, in which a 7-DoF robot arm is controlled to maintain a desired pose relative to a target object. We explore four methods for bounding the actions of the RL agent during training. These methods include using a hypercube and convex hull with modified loss functions, ignoring actions outside the convex hull, and projecting actions onto the convex hull. We compare the training progress of each method using expert demonstrators, employing one expert demonstrator with the DAgger algorithm, and without using any demonstrators. Our experiments show that using the convex hull with a modified loss function not only accelerates learning but also provides the most optimal solution compared with other approaches. Furthermore, we demonstrate faster VS error convergence while maintaining higher manipulability of the arm, compared with classical image-based VS, position-based VS, and hybrid-decoupled VS.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于加速从多个控制器进行深度强化学习的在线超体积行动界限法

本文将强化学习（RL）、示范学习（LfD）和集合学习（Ensemble Learning）的理念融合到一个单一的范例中。来自混合控制算法（专家）的知识被用来限制代理的行动空间，从而通过避免不必要的探索性行动，更快地对控制策略进行 RL 精炼。每个专家的特定领域知识都得到了利用。不过，由此产生的政策对单个专家的错误具有鲁棒性，因为它是通过 RL 奖励函数完善的，而不会复制任何特定的示范。当有多种算法方法可作为专家发挥作用时，我们的方法有可能补充现有的 RLfD 方法，特别是在涉及连续行动空间的任务中。我们以视觉伺服（VS）任务为背景说明了我们的方法，在该任务中，一个 7-DoF 机械臂被控制以保持相对于目标物体的理想姿势。在训练过程中，我们探索了四种限定 RL 代理动作的方法。这些方法包括使用带有修正损失函数的超立方体和凸壳、忽略凸壳外的动作以及将动作投影到凸壳上。我们比较了每种方法的训练进度，包括使用专家演示器、使用一个专家演示器和 DAgger 算法，以及不使用任何演示器。我们的实验表明，与其他方法相比，使用带有修正损失函数的凸壳不仅能加快学习速度，还能提供最优解。此外，与经典的基于图像的 VS、基于位置的 VS 和混合解耦 VS 相比，我们展示了更快的 VS 误差收敛速度，同时保持了手臂更高的可操作性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Field Robotics 工程技术-机器人学

CiteScore

15.00

自引率

3.60%

发文量

审稿时长

6 months

期刊介绍： The Journal of Field Robotics seeks to promote scholarly publications dealing with the fundamentals of robotics in unstructured and dynamic environments. The Journal focuses on experimental robotics and encourages publication of work that has both theoretical and practical significance.

期刊最新文献

Issue Information Cover Image, Volume 42, Number 1, January 2025 Back Cover, Volume 42, Number 1, January 2025 Issue Information Cover Image, Volume 41, Number 8, December 2024