Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep Reinforcement Learning

2020 5th International Conference on Robotics and Automation Engineering (ICRAE) Pub Date : 2020-08-18 DOI:10.1109/ICRAE50850.2020.9310796

Wenshuai Zhao, J. P. Queralta, Qingqing Li, Tomi Westerlund

{"title":"Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep Reinforcement Learning","authors":"Wenshuai Zhao, J. P. Queralta, Qingqing Li, Tomi Westerlund","doi":"10.1109/ICRAE50850.2020.9310796","DOIUrl":null,"url":null,"abstract":"Current research directions in deep reinforcement learning include bridging the simulation-reality gap, improving sample efficiency of experiences in distributed multi-agent reinforcement learning, together with the development of robust methods against adversarial agents in distributed learning, among many others. In this work, we are particularly interested in analyzing how multi-agent reinforcement learning can bridge the gap to reality in distributed multi-robot systems where the operation of the different robots is not necessarily homogeneous. These variations can happen due to sensing mismatches, inherent errors in terms of calibration of the mechanical joints, or simple differences in accuracy. While our results are simulation-based, we introduce the effect of sensing, calibration, and accuracy mismatches in distributed reinforcement learning with proximal policy optimization (PPO). We discuss on how both the different types of perturbances and how the number of agents experiencing those perturbances affect the collaborative learning effort. The simulations are carried out using a Kuka arm model in the Bullet physics engine. This is, to the best of our knowledge, the first work exploring the limitations of PPO in multi-robot systems when considering that different robots might be exposed to different environments where their sensors or actuators have induced errors. With the conclusions of this work, we set the initial point for future work on designing and developing methods to achieve robust reinforcement learning on the presence of real-world perturbances that might differ within a multi-robot system.","PeriodicalId":296832,"journal":{"name":"2020 5th International Conference on Robotics and Automation Engineering (ICRAE)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Robotics and Automation Engineering (ICRAE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRAE50850.2020.9310796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

Current research directions in deep reinforcement learning include bridging the simulation-reality gap, improving sample efficiency of experiences in distributed multi-agent reinforcement learning, together with the development of robust methods against adversarial agents in distributed learning, among many others. In this work, we are particularly interested in analyzing how multi-agent reinforcement learning can bridge the gap to reality in distributed multi-robot systems where the operation of the different robots is not necessarily homogeneous. These variations can happen due to sensing mismatches, inherent errors in terms of calibration of the mechanical joints, or simple differences in accuracy. While our results are simulation-based, we introduce the effect of sensing, calibration, and accuracy mismatches in distributed reinforcement learning with proximal policy optimization (PPO). We discuss on how both the different types of perturbances and how the number of agents experiencing those perturbances affect the collaborative learning effort. The simulations are carried out using a Kuka arm model in the Bullet physics engine. This is, to the best of our knowledge, the first work exploring the limitations of PPO in multi-robot systems when considering that different robots might be exposed to different environments where their sensors or actuators have induced errors. With the conclusions of this work, we set the initial point for future work on designing and developing methods to achieve robust reinforcement learning on the presence of real-world perturbances that might differ within a multi-robot system.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

缩小协作多机器人深度强化学习中模拟与真实的差距

当前深度强化学习的研究方向包括弥合模拟与现实的差距，提高分布式多智能体强化学习中经验的样本效率，以及开发分布式学习中对抗智能体的鲁棒方法等。在这项工作中，我们特别感兴趣的是分析多智能体强化学习如何在不同机器人的操作不一定均匀的分布式多机器人系统中弥合与现实的差距。这些变化可能是由于传感不匹配、机械关节校准方面的固有误差或精度上的简单差异而发生的。虽然我们的结果是基于模拟的，但我们引入了传感、校准和精度错配在具有近端策略优化(PPO)的分布式强化学习中的影响。我们讨论了不同类型的扰动以及经历这些扰动的代理数量如何影响协作学习的努力。在Bullet物理引擎中使用Kuka臂模型进行了仿真。据我们所知，这是第一个探索PPO在多机器人系统中的局限性的工作，考虑到不同的机器人可能暴露在不同的环境中，其中它们的传感器或执行器会引起错误。根据这项工作的结论，我们为未来设计和开发方法的工作设定了起点，以便在多机器人系统中可能存在不同的现实世界扰动的情况下实现鲁棒强化学习。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 5th International Conference on Robotics and Automation Engineering (ICRAE)

自引率

0.00%

发文量

期刊最新文献

Dynamic Occlusion Handling for Real Time Object Perception Network-based H∞ Filter Design for T-S Fuzzy Systems with a Hybrid Event-triggered Scheme Optimal Task Allocation in Human-Robotic Assembly Processes Design and Development of a Novel External Pipe Crawling Robot ExPiRo Autonomous Amphibious Vehicle for Monitoring and Collecting Marine Debris