Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Frontiers of Computer Science Pub Date : 2023-12-16 DOI:10.1007/s11704-023-2733-5

Lei Yuan, Feng Chen, Zongzhang Zhang, Yang Yu

{"title":"Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation","authors":"Lei Yuan, Feng Chen, Zongzhang Zhang, Yang Yu","doi":"10.1007/s11704-023-2733-5","DOIUrl":null,"url":null,"abstract":"Communication can promote coordination in cooperative Multi-Agent Reinforcement Learning (MARL). Nowadays, existing works mainly focus on improving the communication efficiency of agents, neglecting that real-world communication is much more challenging as there may exist noise or potential attackers. Thus the robustness of the communication-based policies becomes an emergent and severe issue that needs more exploration. In this paper, we posit that the ego system1) trained with auxiliary adversaries may handle this limitation and propose an adaptable method of Multi-Agent Auxiliary Adversaries Generation for robust Communication, dubbed MA3C, to obtain a robust communication-based policy. In specific, we introduce a novel message-attacking approach that models the learning of the auxiliary attacker as a cooperative problem under a shared goal to minimize the coordination ability of the ego system, with which every information channel may suffer from distinct message attacks. Furthermore, as naive adversarial training may impede the generalization ability of the ego system, we design an attacker population generation approach based on evolutionary learning. Finally, the ego system is paired with an attacker population and then alternatively trained against the continuously evolving attackers to improve its robustness, meaning that both the ego system and the attackers are adaptable. Extensive experiments on multiple benchmarks indicate that our proposed MA3C provides comparable or better robustness and generalization ability than other baselines.","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":"18 1","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11704-023-2733-5","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Communication can promote coordination in cooperative Multi-Agent Reinforcement Learning (MARL). Nowadays, existing works mainly focus on improving the communication efficiency of agents, neglecting that real-world communication is much more challenging as there may exist noise or potential attackers. Thus the robustness of the communication-based policies becomes an emergent and severe issue that needs more exploration. In this paper, we posit that the ego system¹⁾ trained with auxiliary adversaries may handle this limitation and propose an adaptable method of Multi-Agent Auxiliary Adversaries Generation for robust Communication, dubbed MA3C, to obtain a robust communication-based policy. In specific, we introduce a novel message-attacking approach that models the learning of the auxiliary attacker as a cooperative problem under a shared goal to minimize the coordination ability of the ego system, with which every information channel may suffer from distinct message attacks. Furthermore, as naive adversarial training may impede the generalization ability of the ego system, we design an attacker population generation approach based on evolutionary learning. Finally, the ego system is paired with an attacker population and then alternatively trained against the continuously evolving attackers to improve its robustness, meaning that both the ego system and the attackers are adaptable. Extensive experiments on multiple benchmarks indicate that our proposed MA3C provides comparable or better robustness and generalization ability than other baselines.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过适应性辅助多代理对手生成实现通信稳健的多代理学习

在合作式多代理强化学习（MARL）中，通信可以促进协调。目前，现有的研究主要集中在提高代理的通信效率上，而忽略了现实世界中的通信因可能存在噪音或潜在攻击者而更具挑战性。因此，基于通信的策略的鲁棒性成为一个新出现的严峻问题，需要更多的探索。在本文中，我们认为使用辅助对手训练的自我系统1) 可以解决这一局限性，并提出了一种用于鲁棒通信的多代理辅助对手生成的适应性方法（被称为 MA3C），以获得基于通信的鲁棒策略。具体来说，我们引入了一种新颖的信息攻击方法，将辅助攻击者的学习建模为一个共同目标下的合作问题，即最小化自我系统的协调能力，在此目标下，每个信息通道都可能遭受不同的信息攻击。此外，由于天真的对抗训练可能会阻碍自我系统的泛化能力，我们设计了一种基于进化学习的攻击者群体生成方法。最后，自我系统与攻击者群体配对，然后针对不断进化的攻击者进行交替训练，以提高其鲁棒性，这意味着自我系统和攻击者都具有适应性。在多个基准上进行的广泛实验表明，我们提出的 MA3C 具有与其他基准相当甚至更好的鲁棒性和泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Frontiers of Computer Science COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

8.60

自引率

2.40%

发文量

799

审稿时长

6-12 weeks

期刊介绍： Frontiers of Computer Science aims to provide a forum for the publication of peer-reviewed papers to promote rapid communication and exchange between computer scientists. The journal publishes research papers and review articles in a wide range of topics, including: architecture, software, artificial intelligence, theoretical computer science, networks and communication, information systems, multimedia and graphics, information security, interdisciplinary, etc. The journal especially encourages papers from new emerging and multidisciplinary areas, as well as papers reflecting the international trends of research and development and on special topics reporting progress made by Chinese computer scientists.