PIPO: Policy Optimization with Permutation-Invariant Constraint for Distributed Multi-Robot Navigation

Ruiqi Zhang, Guang Chen, Jing Hou, Zhijun Li, Alois Knoll
{"title":"PIPO: Policy Optimization with Permutation-Invariant Constraint for Distributed Multi-Robot Navigation","authors":"Ruiqi Zhang, Guang Chen, Jing Hou, Zhijun Li, Alois Knoll","doi":"10.1109/MFI55806.2022.9913862","DOIUrl":null,"url":null,"abstract":"For large-scale multi-agent systems (MAS), ensuring the safety and effectiveness of navigation in complicated scenarios is a challenging task. With the agent scale increasing, most existing centralized methods lose their magic for the lack of scalability, and the popular decentralized approaches are hampered by high latency and computing requirements. In this research, we offer PIPO, a novel policy optimization algorithm for decentralized MAS navigation with permutation-invariant constraints. To conduct navigation and avoid un-necessary exploration in the early episodes, we first defined a guide-policy. Then, we introduce the permutation invariant property in decentralized multi-agent systems and leverage the graph convolution network to produce the same output under shuffled observations. Our approach can be easily scaled to an arbitrary number of agents and used in large-scale systems for its decentralized training and execution. We also provide extensive experiments to demonstrate that our PIPO significantly outperforms the baselines of multi-agent reinforcement learning algorithms and other leading methods in variant scenarios.","PeriodicalId":344737,"journal":{"name":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MFI55806.2022.9913862","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

For large-scale multi-agent systems (MAS), ensuring the safety and effectiveness of navigation in complicated scenarios is a challenging task. With the agent scale increasing, most existing centralized methods lose their magic for the lack of scalability, and the popular decentralized approaches are hampered by high latency and computing requirements. In this research, we offer PIPO, a novel policy optimization algorithm for decentralized MAS navigation with permutation-invariant constraints. To conduct navigation and avoid un-necessary exploration in the early episodes, we first defined a guide-policy. Then, we introduce the permutation invariant property in decentralized multi-agent systems and leverage the graph convolution network to produce the same output under shuffled observations. Our approach can be easily scaled to an arbitrary number of agents and used in large-scale systems for its decentralized training and execution. We also provide extensive experiments to demonstrate that our PIPO significantly outperforms the baselines of multi-agent reinforcement learning algorithms and other leading methods in variant scenarios.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于排列不变约束的分布式多机器人导航策略优化
对于大规模多智能体系统(MAS)来说,确保复杂场景下导航的安全性和有效性是一项具有挑战性的任务。随着智能体规模的增加,大多数现有的集中式方法由于缺乏可扩展性而失去了魔力,而流行的分散方法则受到高延迟和计算需求的阻碍。在本研究中,我们提出了一种新的具有排列不变约束的分散MAS导航策略优化算法PIPO。为了在早期章节中进行导航并避免不必要的探索,我们首先定义了一个指南策略。然后,我们引入了分散多智能体系统的排列不变性,并利用图卷积网络在洗牌观测下产生相同的输出。我们的方法可以很容易地扩展到任意数量的代理,并用于大规模系统的分散训练和执行。我们还提供了大量的实验来证明我们的PIPO在不同场景下显著优于多智能体强化学习算法和其他领先方法的基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Regression with Ensemble of RANSAC in Camera-LiDAR Fusion for Road Boundary Detection and Modeling Global-local Feature Aggregation for Event-based Object Detection on EventKITTI Predicting Autonomous Vehicle Navigation Parameters via Image and Image-and-Point Cloud Fusion-based End-to-End Methods Perception-aware Receding Horizon Path Planning for UAVs with LiDAR-based SLAM PIPO: Policy Optimization with Permutation-Invariant Constraint for Distributed Multi-Robot Navigation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1