Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep Reinforcement Learning

Wenshuai Zhao, J. P. Queralta, Qingqing Li, Tomi Westerlund
{"title":"Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep Reinforcement Learning","authors":"Wenshuai Zhao, J. P. Queralta, Qingqing Li, Tomi Westerlund","doi":"10.1109/ICRAE50850.2020.9310796","DOIUrl":null,"url":null,"abstract":"Current research directions in deep reinforcement learning include bridging the simulation-reality gap, improving sample efficiency of experiences in distributed multi-agent reinforcement learning, together with the development of robust methods against adversarial agents in distributed learning, among many others. In this work, we are particularly interested in analyzing how multi-agent reinforcement learning can bridge the gap to reality in distributed multi-robot systems where the operation of the different robots is not necessarily homogeneous. These variations can happen due to sensing mismatches, inherent errors in terms of calibration of the mechanical joints, or simple differences in accuracy. While our results are simulation-based, we introduce the effect of sensing, calibration, and accuracy mismatches in distributed reinforcement learning with proximal policy optimization (PPO). We discuss on how both the different types of perturbances and how the number of agents experiencing those perturbances affect the collaborative learning effort. The simulations are carried out using a Kuka arm model in the Bullet physics engine. This is, to the best of our knowledge, the first work exploring the limitations of PPO in multi-robot systems when considering that different robots might be exposed to different environments where their sensors or actuators have induced errors. With the conclusions of this work, we set the initial point for future work on designing and developing methods to achieve robust reinforcement learning on the presence of real-world perturbances that might differ within a multi-robot system.","PeriodicalId":296832,"journal":{"name":"2020 5th International Conference on Robotics and Automation Engineering (ICRAE)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Robotics and Automation Engineering (ICRAE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRAE50850.2020.9310796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

Current research directions in deep reinforcement learning include bridging the simulation-reality gap, improving sample efficiency of experiences in distributed multi-agent reinforcement learning, together with the development of robust methods against adversarial agents in distributed learning, among many others. In this work, we are particularly interested in analyzing how multi-agent reinforcement learning can bridge the gap to reality in distributed multi-robot systems where the operation of the different robots is not necessarily homogeneous. These variations can happen due to sensing mismatches, inherent errors in terms of calibration of the mechanical joints, or simple differences in accuracy. While our results are simulation-based, we introduce the effect of sensing, calibration, and accuracy mismatches in distributed reinforcement learning with proximal policy optimization (PPO). We discuss on how both the different types of perturbances and how the number of agents experiencing those perturbances affect the collaborative learning effort. The simulations are carried out using a Kuka arm model in the Bullet physics engine. This is, to the best of our knowledge, the first work exploring the limitations of PPO in multi-robot systems when considering that different robots might be exposed to different environments where their sensors or actuators have induced errors. With the conclusions of this work, we set the initial point for future work on designing and developing methods to achieve robust reinforcement learning on the presence of real-world perturbances that might differ within a multi-robot system.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
缩小协作多机器人深度强化学习中模拟与真实的差距
当前深度强化学习的研究方向包括弥合模拟与现实的差距,提高分布式多智能体强化学习中经验的样本效率,以及开发分布式学习中对抗智能体的鲁棒方法等。在这项工作中,我们特别感兴趣的是分析多智能体强化学习如何在不同机器人的操作不一定均匀的分布式多机器人系统中弥合与现实的差距。这些变化可能是由于传感不匹配、机械关节校准方面的固有误差或精度上的简单差异而发生的。虽然我们的结果是基于模拟的,但我们引入了传感、校准和精度错配在具有近端策略优化(PPO)的分布式强化学习中的影响。我们讨论了不同类型的扰动以及经历这些扰动的代理数量如何影响协作学习的努力。在Bullet物理引擎中使用Kuka臂模型进行了仿真。据我们所知,这是第一个探索PPO在多机器人系统中的局限性的工作,考虑到不同的机器人可能暴露在不同的环境中,其中它们的传感器或执行器会引起错误。根据这项工作的结论,我们为未来设计和开发方法的工作设定了起点,以便在多机器人系统中可能存在不同的现实世界扰动的情况下实现鲁棒强化学习。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Dynamic Occlusion Handling for Real Time Object Perception Network-based H∞ Filter Design for T-S Fuzzy Systems with a Hybrid Event-triggered Scheme Optimal Task Allocation in Human-Robotic Assembly Processes Design and Development of a Novel External Pipe Crawling Robot ExPiRo Autonomous Amphibious Vehicle for Monitoring and Collecting Marine Debris
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1