MACRPO: Multi-agent cooperative recurrent policy optimization.

IF 2.9 Q2 ROBOTICS Frontiers in Robotics and AI Pub Date : 2024-12-20 eCollection Date: 2024-01-01 DOI:10.3389/frobt.2024.1394209
Eshagh Kargar, Ville Kyrki
{"title":"MACRPO: Multi-agent cooperative recurrent policy optimization.","authors":"Eshagh Kargar, Ville Kyrki","doi":"10.3389/frobt.2024.1394209","DOIUrl":null,"url":null,"abstract":"<p><p>This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called <i>Multi-Agent Cooperative Recurrent Proximal Policy Optimization</i> (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in the critic's network architecture and propose a new framework to use the proposed meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents' rewards and value functions by controlling the level of cooperation between agents using a parameter. The use of this control parameter is suitable for environments in which the agents are unable to fully cooperate with each other. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as MAGIC, IC3Net, CommNet, GA-Comm, QMIX, MADDPG, and RMAPPO, and also single-agent methods with shared parameters between agents such as IMPALA and APEX. The results show superior performance against other algorithms. The code is available online at https://github.com/kargarisaac/macrpo.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"11 ","pages":"1394209"},"PeriodicalIF":2.9000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695781/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Robotics and AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frobt.2024.1394209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called Multi-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in the critic's network architecture and propose a new framework to use the proposed meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents' rewards and value functions by controlling the level of cooperation between agents using a parameter. The use of this control parameter is suitable for environments in which the agents are unable to fully cooperate with each other. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as MAGIC, IC3Net, CommNet, GA-Comm, QMIX, MADDPG, and RMAPPO, and also single-agent methods with shared parameters between agents such as IMPALA and APEX. The results show superior performance against other algorithms. The code is available online at https://github.com/kargarisaac/macrpo.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MACRPO:多智能体合作循环策略优化。
这项工作考虑了在没有通信通道的部分可观察和非固定环境下的多智能体设置中学习合作策略的问题。针对智能体之间的信息共享问题,提出了一种新的多智能体行为评价方法——多智能体合作递归近端策略优化(MACRPO)。我们提出了在MACRPO中跨代理和时间集成信息的两种新方法:首先,我们在评论家的网络架构中使用循环层,并提出了一个新的框架,使用所提出的元轨迹来训练循环层。这使得网络可以学习智能体之间的合作和动态交互,并处理部分可观察性。其次,我们提出了一个新的优势函数,通过使用参数控制代理之间的合作水平,将其他代理的奖励和价值函数结合起来。这个控制参数的使用适用于agent不能完全相互协作的环境。我们在三个具有连续和离散动作空间的具有挑战性的多智能体环境,Deepdrive-Zero, Multi-Walker和Particle环境中评估了我们的算法。我们将结果与几种烧烧和最先进的多智能体算法(如MAGIC、IC3Net、CommNet、GA-Comm、QMIX、MADDPG和RMAPPO)以及在智能体之间共享参数的单智能体方法(如IMPALA和APEX)进行了比较。结果表明,该算法具有较好的性能。该代码可在https://github.com/kargarisaac/macrpo上在线获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.50
自引率
5.90%
发文量
355
审稿时长
14 weeks
期刊介绍: Frontiers in Robotics and AI publishes rigorously peer-reviewed research covering all theory and applications of robotics, technology, and artificial intelligence, from biomedical to space robotics.
期刊最新文献
A spiking neural network for active efficient coding. Embedding-based pair generation for contrastive representation learning in audio-visual surveillance data. Advanced robotics for automated EV battery testing using electrochemical impedance spectroscopy. Pig tongue soft robot mimicking intrinsic tongue muscle structure. A fast monocular 6D pose estimation method for textureless objects based on perceptual hashing and template matching.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1