{"title":"A Spatiotemporal Stealthy Backdoor Attack against Cooperative Multi-Agent Deep Reinforcement Learning","authors":"Yinbo Yu, Saihao Yan, Jiajia Liu","doi":"arxiv-2409.07775","DOIUrl":null,"url":null,"abstract":"Recent studies have shown that cooperative multi-agent deep reinforcement\nlearning (c-MADRL) is under the threat of backdoor attacks. Once a backdoor\ntrigger is observed, it will perform abnormal actions leading to failures or\nmalicious goals. However, existing proposed backdoors suffer from several\nissues, e.g., fixed visual trigger patterns lack stealthiness, the backdoor is\ntrained or activated by an additional network, or all agents are backdoored. To\nthis end, in this paper, we propose a novel backdoor attack against c-MADRL,\nwhich attacks the entire multi-agent team by embedding the backdoor only in a\nsingle agent. Firstly, we introduce adversary spatiotemporal behavior patterns\nas the backdoor trigger rather than manual-injected fixed visual patterns or\ninstant status and control the attack duration. This method can guarantee the\nstealthiness and practicality of injected backdoors. Secondly, we hack the\noriginal reward function of the backdoored agent via reward reverse and\nunilateral guidance during training to ensure its adverse influence on the\nentire team. We evaluate our backdoor attacks on two classic c-MADRL algorithms\nVDN and QMIX, in a popular c-MADRL environment SMAC. The experimental results\ndemonstrate that our backdoor attacks are able to reach a high attack success\nrate (91.6\\%) while maintaining a low clean performance variance rate (3.7\\%).","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07775","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recent studies have shown that cooperative multi-agent deep reinforcement
learning (c-MADRL) is under the threat of backdoor attacks. Once a backdoor
trigger is observed, it will perform abnormal actions leading to failures or
malicious goals. However, existing proposed backdoors suffer from several
issues, e.g., fixed visual trigger patterns lack stealthiness, the backdoor is
trained or activated by an additional network, or all agents are backdoored. To
this end, in this paper, we propose a novel backdoor attack against c-MADRL,
which attacks the entire multi-agent team by embedding the backdoor only in a
single agent. Firstly, we introduce adversary spatiotemporal behavior patterns
as the backdoor trigger rather than manual-injected fixed visual patterns or
instant status and control the attack duration. This method can guarantee the
stealthiness and practicality of injected backdoors. Secondly, we hack the
original reward function of the backdoored agent via reward reverse and
unilateral guidance during training to ensure its adverse influence on the
entire team. We evaluate our backdoor attacks on two classic c-MADRL algorithms
VDN and QMIX, in a popular c-MADRL environment SMAC. The experimental results
demonstrate that our backdoor attacks are able to reach a high attack success
rate (91.6\%) while maintaining a low clean performance variance rate (3.7\%).