A Spatiotemporal Stealthy Backdoor Attack against Cooperative Multi-Agent Deep Reinforcement Learning

arXiv - CS - Artificial Intelligence Pub Date : 2024-09-12 DOI:arxiv-2409.07775

Yinbo Yu, Saihao Yan, Jiajia Liu

{"title":"A Spatiotemporal Stealthy Backdoor Attack against Cooperative Multi-Agent Deep Reinforcement Learning","authors":"Yinbo Yu, Saihao Yan, Jiajia Liu","doi":"arxiv-2409.07775","DOIUrl":null,"url":null,"abstract":"Recent studies have shown that cooperative multi-agent deep reinforcement\nlearning (c-MADRL) is under the threat of backdoor attacks. Once a backdoor\ntrigger is observed, it will perform abnormal actions leading to failures or\nmalicious goals. However, existing proposed backdoors suffer from several\nissues, e.g., fixed visual trigger patterns lack stealthiness, the backdoor is\ntrained or activated by an additional network, or all agents are backdoored. To\nthis end, in this paper, we propose a novel backdoor attack against c-MADRL,\nwhich attacks the entire multi-agent team by embedding the backdoor only in a\nsingle agent. Firstly, we introduce adversary spatiotemporal behavior patterns\nas the backdoor trigger rather than manual-injected fixed visual patterns or\ninstant status and control the attack duration. This method can guarantee the\nstealthiness and practicality of injected backdoors. Secondly, we hack the\noriginal reward function of the backdoored agent via reward reverse and\nunilateral guidance during training to ensure its adverse influence on the\nentire team. We evaluate our backdoor attacks on two classic c-MADRL algorithms\nVDN and QMIX, in a popular c-MADRL environment SMAC. The experimental results\ndemonstrate that our backdoor attacks are able to reach a high attack success\nrate (91.6\\%) while maintaining a low clean performance variance rate (3.7\\%).","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07775","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent studies have shown that cooperative multi-agent deep reinforcement learning (c-MADRL) is under the threat of backdoor attacks. Once a backdoor trigger is observed, it will perform abnormal actions leading to failures or malicious goals. However, existing proposed backdoors suffer from several issues, e.g., fixed visual trigger patterns lack stealthiness, the backdoor is trained or activated by an additional network, or all agents are backdoored. To this end, in this paper, we propose a novel backdoor attack against c-MADRL, which attacks the entire multi-agent team by embedding the backdoor only in a single agent. Firstly, we introduce adversary spatiotemporal behavior patterns as the backdoor trigger rather than manual-injected fixed visual patterns or instant status and control the attack duration. This method can guarantee the stealthiness and practicality of injected backdoors. Secondly, we hack the original reward function of the backdoored agent via reward reverse and unilateral guidance during training to ensure its adverse influence on the entire team. We evaluate our backdoor attacks on two classic c-MADRL algorithms VDN and QMIX, in a popular c-MADRL environment SMAC. The experimental results demonstrate that our backdoor attacks are able to reach a high attack success rate (91.6\%) while maintaining a low clean performance variance rate (3.7\%).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

针对合作多代理深度强化学习的时空隐形后门攻击

最近的研究表明，合作式多代理深度强化学习（c-MADRL）面临着后门攻击的威胁。一旦后门触发器被观察到，它就会执行异常行动，导致失败或恶意目标。然而，现有的后门存在几个问题，例如，固定的视觉触发模式缺乏隐蔽性，后门由额外的网络训练或激活，或者所有代理都被后门屏蔽。为此，我们在本文中提出了一种针对 c-MADRL 的新型后门攻击，即只在单个代理中嵌入后门，从而攻击整个多代理团队。首先，我们引入对手的时空行为模式作为后门触发器，而不是人工注入固定的视觉模式或瞬时状态，并控制攻击持续时间。这种方法可以保证注入后门的隐蔽性和实用性。其次，我们在训练过程中通过奖励反向和单边引导的方式黑掉了后门代理的原始奖励功能，以确保其对整个团队产生不利影响。我们在流行的 c-MADRL 环境 SMAC 中评估了对两种经典 c-MADRL 算法VDN 和 QMIX 的后门攻击。实验结果表明，我们的后门攻击能够达到较高的攻击成功率（91.6%），同时保持较低的清洁性能差异率（3.7%）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Artificial Intelligence

自引率

0.00%

发文量