Ali Aflakian, Alireza Rastegarpanah, Jamie Hathaway, Rustam Stolkin
This paper fuses ideas from reinforcement learning (RL), Learning from Demonstration (LfD), and Ensemble Learning into a single paradigm. Knowledge from a mixture of control algorithms (experts) are used to constrain the action space of the agent, enabling faster RL refining of a control policy, by avoiding unnecessary explorative actions. Domain-specific knowledge of each expert is exploited. However, the resulting policy is robust against errors of individual experts, since it is refined by a RL reward function without copying any particular demonstration. Our method has the potential to supplement existing RLfD methods when multiple algorithmic approaches are available to function as experts, specifically in tasks involving continuous action spaces. We illustrate our method in the context of a visual servoing (VS) task, in which a 7-DoF robot arm is controlled to maintain a desired pose relative to a target object. We explore four methods for bounding the actions of the RL agent during training. These methods include using a hypercube and convex hull with modified loss functions, ignoring actions outside the convex hull, and projecting actions onto the convex hull. We compare the training progress of each method using expert demonstrators, employing one expert demonstrator with the DAgger algorithm, and without using any demonstrators. Our experiments show that using the convex hull with a modified loss function not only accelerates learning but also provides the most optimal solution compared with other approaches. Furthermore, we demonstrate faster VS error convergence while maintaining higher manipulability of the arm, compared with classical image-based VS, position-based VS, and hybrid-decoupled VS.
本文将强化学习(RL)、示范学习(LfD)和集合学习(Ensemble Learning)的理念融合到一个单一的范例中。来自混合控制算法(专家)的知识被用来限制代理的行动空间,从而通过避免不必要的探索性行动,更快地对控制策略进行 RL 精炼。每个专家的特定领域知识都得到了利用。不过,由此产生的政策对单个专家的错误具有鲁棒性,因为它是通过 RL 奖励函数完善的,而不会复制任何特定的示范。当有多种算法方法可作为专家发挥作用时,我们的方法有可能补充现有的 RLfD 方法,特别是在涉及连续行动空间的任务中。我们以视觉伺服(VS)任务为背景说明了我们的方法,在该任务中,一个 7-DoF 机械臂被控制以保持相对于目标物体的理想姿势。在训练过程中,我们探索了四种限定 RL 代理动作的方法。这些方法包括使用带有修正损失函数的超立方体和凸壳、忽略凸壳外的动作以及将动作投影到凸壳上。我们比较了每种方法的训练进度,包括使用专家演示器、使用一个专家演示器和 DAgger 算法,以及不使用任何演示器。我们的实验表明,与其他方法相比,使用带有修正损失函数的凸壳不仅能加快学习速度,还能提供最优解。此外,与经典的基于图像的 VS、基于位置的 VS 和混合解耦 VS 相比,我们展示了更快的 VS 误差收敛速度,同时保持了手臂更高的可操作性。
{"title":"An online hyper-volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers","authors":"Ali Aflakian, Alireza Rastegarpanah, Jamie Hathaway, Rustam Stolkin","doi":"10.1002/rob.22355","DOIUrl":"10.1002/rob.22355","url":null,"abstract":"<p>This paper fuses ideas from reinforcement learning (RL), Learning from Demonstration (LfD), and Ensemble Learning into a single paradigm. Knowledge from a mixture of control algorithms (experts) are used to constrain the action space of the agent, enabling faster RL refining of a control policy, by avoiding unnecessary explorative actions. Domain-specific knowledge of each expert is exploited. However, the resulting policy is robust against errors of individual experts, since it is refined by a RL reward function without copying any particular demonstration. Our method has the potential to supplement existing RLfD methods when multiple algorithmic approaches are available to function as experts, specifically in tasks involving continuous action spaces. We illustrate our method in the context of a visual servoing (VS) task, in which a 7-DoF robot arm is controlled to maintain a desired pose relative to a target object. We explore four methods for bounding the actions of the RL agent during training. These methods include using a hypercube and convex hull with modified loss functions, ignoring actions outside the convex hull, and projecting actions onto the convex hull. We compare the training progress of each method using expert demonstrators, employing one expert demonstrator with the DAgger algorithm, and without using any demonstrators. Our experiments show that using the convex hull with a modified loss function not only accelerates learning but also provides the most optimal solution compared with other approaches. Furthermore, we demonstrate faster VS error convergence while maintaining higher manipulability of the arm, compared with classical image-based VS, position-based VS, and hybrid-decoupled VS.</p>","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"41 6","pages":"1814-1828"},"PeriodicalIF":4.2,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/rob.22355","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140831758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p>Station keeping is an essential maneuver for autonomous surface vehicles (ASVs), mainly when used in confined spaces, to carry out surveys that require the ASV to keep its position or in collaboration with other vehicles where the relative position has an impact over the mission. However, this maneuver can become challenging for classic feedback controllers due to the need for an accurate model of the ASV dynamics and the environmental disturbances. This work proposes a model predictive controller using neural network simulation error minimization (NNSEM–MPC) to accurately predict the dynamics of the ASV under wind disturbances. The performance of the proposed scheme under wind disturbances is tested and compared against other controllers in simulation, using the robotics operating system and the multipurpose simulation environment Gazebo. A set of six tests was conducted by combining two varying wind speeds that are modeled as the Harris spectrum and three wind directions (<span></span><math>