Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation

IF 3.4 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Frontiers of Computer Science Pub Date : 2023-12-16 DOI:10.1007/s11704-023-2733-5
Lei Yuan, Feng Chen, Zongzhang Zhang, Yang Yu
{"title":"Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation","authors":"Lei Yuan, Feng Chen, Zongzhang Zhang, Yang Yu","doi":"10.1007/s11704-023-2733-5","DOIUrl":null,"url":null,"abstract":"<p>Communication can promote coordination in cooperative Multi-Agent Reinforcement Learning (MARL). Nowadays, existing works mainly focus on improving the communication efficiency of agents, neglecting that real-world communication is much more challenging as there may exist noise or potential attackers. Thus the robustness of the communication-based policies becomes an emergent and severe issue that needs more exploration. In this paper, we posit that the ego system<sup>1)</sup> trained with auxiliary adversaries may handle this limitation and propose an adaptable method of <b>M</b>ulti<b>-A</b>gent <b>A</b>uxiliary <b>A</b>dversaries Generation for robust <b>C</b>ommunication, dubbed MA3C, to obtain a robust communication-based policy. In specific, we introduce a novel message-attacking approach that models the learning of the auxiliary attacker as a cooperative problem under a shared goal to minimize the coordination ability of the ego system, with which every information channel may suffer from distinct message attacks. Furthermore, as naive adversarial training may impede the generalization ability of the ego system, we design an attacker population generation approach based on evolutionary learning. Finally, the ego system is paired with an attacker population and then alternatively trained against the continuously evolving attackers to improve its robustness, meaning that both the ego system and the attackers are adaptable. Extensive experiments on multiple benchmarks indicate that our proposed MA3C provides comparable or better robustness and generalization ability than other baselines.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":"18 1","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11704-023-2733-5","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Communication can promote coordination in cooperative Multi-Agent Reinforcement Learning (MARL). Nowadays, existing works mainly focus on improving the communication efficiency of agents, neglecting that real-world communication is much more challenging as there may exist noise or potential attackers. Thus the robustness of the communication-based policies becomes an emergent and severe issue that needs more exploration. In this paper, we posit that the ego system1) trained with auxiliary adversaries may handle this limitation and propose an adaptable method of Multi-Agent Auxiliary Adversaries Generation for robust Communication, dubbed MA3C, to obtain a robust communication-based policy. In specific, we introduce a novel message-attacking approach that models the learning of the auxiliary attacker as a cooperative problem under a shared goal to minimize the coordination ability of the ego system, with which every information channel may suffer from distinct message attacks. Furthermore, as naive adversarial training may impede the generalization ability of the ego system, we design an attacker population generation approach based on evolutionary learning. Finally, the ego system is paired with an attacker population and then alternatively trained against the continuously evolving attackers to improve its robustness, meaning that both the ego system and the attackers are adaptable. Extensive experiments on multiple benchmarks indicate that our proposed MA3C provides comparable or better robustness and generalization ability than other baselines.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过适应性辅助多代理对手生成实现通信稳健的多代理学习
在合作式多代理强化学习(MARL)中,通信可以促进协调。目前,现有的研究主要集中在提高代理的通信效率上,而忽略了现实世界中的通信因可能存在噪音或潜在攻击者而更具挑战性。因此,基于通信的策略的鲁棒性成为一个新出现的严峻问题,需要更多的探索。在本文中,我们认为使用辅助对手训练的自我系统1) 可以解决这一局限性,并提出了一种用于鲁棒通信的多代理辅助对手生成的适应性方法(被称为 MA3C),以获得基于通信的鲁棒策略。具体来说,我们引入了一种新颖的信息攻击方法,将辅助攻击者的学习建模为一个共同目标下的合作问题,即最小化自我系统的协调能力,在此目标下,每个信息通道都可能遭受不同的信息攻击。此外,由于天真的对抗训练可能会阻碍自我系统的泛化能力,我们设计了一种基于进化学习的攻击者群体生成方法。最后,自我系统与攻击者群体配对,然后针对不断进化的攻击者进行交替训练,以提高其鲁棒性,这意味着自我系统和攻击者都具有适应性。在多个基准上进行的广泛实验表明,我们提出的 MA3C 具有与其他基准相当甚至更好的鲁棒性和泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Frontiers of Computer Science
Frontiers of Computer Science COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING
CiteScore
8.60
自引率
2.40%
发文量
799
审稿时长
6-12 weeks
期刊介绍: Frontiers of Computer Science aims to provide a forum for the publication of peer-reviewed papers to promote rapid communication and exchange between computer scientists. The journal publishes research papers and review articles in a wide range of topics, including: architecture, software, artificial intelligence, theoretical computer science, networks and communication, information systems, multimedia and graphics, information security, interdisciplinary, etc. The journal especially encourages papers from new emerging and multidisciplinary areas, as well as papers reflecting the international trends of research and development and on special topics reporting progress made by Chinese computer scientists.
期刊最新文献
A comprehensive survey of federated transfer learning: challenges, methods and applications DMFVAE: miRNA-disease associations prediction based on deep matrix factorization method with variational autoencoder Graph foundation model SEOE: an option graph based semantically embedding method for prenatal depression detection FedTop: a constraint-loosed federated learning aggregation method against poisoning attack
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1