An online hyper-volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers

IF 4.2 2区 计算机科学 Q2 ROBOTICS Journal of Field Robotics Pub Date : 2024-04-28 DOI:10.1002/rob.22355
Ali Aflakian, Alireza Rastegarpanah, Jamie Hathaway, Rustam Stolkin
{"title":"An online hyper-volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers","authors":"Ali Aflakian,&nbsp;Alireza Rastegarpanah,&nbsp;Jamie Hathaway,&nbsp;Rustam Stolkin","doi":"10.1002/rob.22355","DOIUrl":null,"url":null,"abstract":"<p>This paper fuses ideas from reinforcement learning (RL), Learning from Demonstration (LfD), and Ensemble Learning into a single paradigm. Knowledge from a mixture of control algorithms (experts) are used to constrain the action space of the agent, enabling faster RL refining of a control policy, by avoiding unnecessary explorative actions. Domain-specific knowledge of each expert is exploited. However, the resulting policy is robust against errors of individual experts, since it is refined by a RL reward function without copying any particular demonstration. Our method has the potential to supplement existing RLfD methods when multiple algorithmic approaches are available to function as experts, specifically in tasks involving continuous action spaces. We illustrate our method in the context of a visual servoing (VS) task, in which a 7-DoF robot arm is controlled to maintain a desired pose relative to a target object. We explore four methods for bounding the actions of the RL agent during training. These methods include using a hypercube and convex hull with modified loss functions, ignoring actions outside the convex hull, and projecting actions onto the convex hull. We compare the training progress of each method using expert demonstrators, employing one expert demonstrator with the DAgger algorithm, and without using any demonstrators. Our experiments show that using the convex hull with a modified loss function not only accelerates learning but also provides the most optimal solution compared with other approaches. Furthermore, we demonstrate faster VS error convergence while maintaining higher manipulability of the arm, compared with classical image-based VS, position-based VS, and hybrid-decoupled VS.</p>","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"41 6","pages":"1814-1828"},"PeriodicalIF":4.2000,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/rob.22355","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Field Robotics","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rob.22355","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

This paper fuses ideas from reinforcement learning (RL), Learning from Demonstration (LfD), and Ensemble Learning into a single paradigm. Knowledge from a mixture of control algorithms (experts) are used to constrain the action space of the agent, enabling faster RL refining of a control policy, by avoiding unnecessary explorative actions. Domain-specific knowledge of each expert is exploited. However, the resulting policy is robust against errors of individual experts, since it is refined by a RL reward function without copying any particular demonstration. Our method has the potential to supplement existing RLfD methods when multiple algorithmic approaches are available to function as experts, specifically in tasks involving continuous action spaces. We illustrate our method in the context of a visual servoing (VS) task, in which a 7-DoF robot arm is controlled to maintain a desired pose relative to a target object. We explore four methods for bounding the actions of the RL agent during training. These methods include using a hypercube and convex hull with modified loss functions, ignoring actions outside the convex hull, and projecting actions onto the convex hull. We compare the training progress of each method using expert demonstrators, employing one expert demonstrator with the DAgger algorithm, and without using any demonstrators. Our experiments show that using the convex hull with a modified loss function not only accelerates learning but also provides the most optimal solution compared with other approaches. Furthermore, we demonstrate faster VS error convergence while maintaining higher manipulability of the arm, compared with classical image-based VS, position-based VS, and hybrid-decoupled VS.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于加速从多个控制器进行深度强化学习的在线超体积行动界限法
本文将强化学习(RL)、示范学习(LfD)和集合学习(Ensemble Learning)的理念融合到一个单一的范例中。来自混合控制算法(专家)的知识被用来限制代理的行动空间,从而通过避免不必要的探索性行动,更快地对控制策略进行 RL 精炼。每个专家的特定领域知识都得到了利用。不过,由此产生的政策对单个专家的错误具有鲁棒性,因为它是通过 RL 奖励函数完善的,而不会复制任何特定的示范。当有多种算法方法可作为专家发挥作用时,我们的方法有可能补充现有的 RLfD 方法,特别是在涉及连续行动空间的任务中。我们以视觉伺服(VS)任务为背景说明了我们的方法,在该任务中,一个 7-DoF 机械臂被控制以保持相对于目标物体的理想姿势。在训练过程中,我们探索了四种限定 RL 代理动作的方法。这些方法包括使用带有修正损失函数的超立方体和凸壳、忽略凸壳外的动作以及将动作投影到凸壳上。我们比较了每种方法的训练进度,包括使用专家演示器、使用一个专家演示器和 DAgger 算法,以及不使用任何演示器。我们的实验表明,与其他方法相比,使用带有修正损失函数的凸壳不仅能加快学习速度,还能提供最优解。此外,与经典的基于图像的 VS、基于位置的 VS 和混合解耦 VS 相比,我们展示了更快的 VS 误差收敛速度,同时保持了手臂更高的可操作性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Field Robotics
Journal of Field Robotics 工程技术-机器人学
CiteScore
15.00
自引率
3.60%
发文量
80
审稿时长
6 months
期刊介绍: The Journal of Field Robotics seeks to promote scholarly publications dealing with the fundamentals of robotics in unstructured and dynamic environments. The Journal focuses on experimental robotics and encourages publication of work that has both theoretical and practical significance.
期刊最新文献
Issue Information Cover Image, Volume 41, Number 8, December 2024 Issue Information ForzaETH Race Stack—Scaled Autonomous Head‐to‐Head Racing on Fully Commercial Off‐the‐Shelf Hardware Research on Satellite Navigation Control of Six‐Crawler Machinery Based on Fuzzy PID Algorithm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1