A Pilot Study of Observation Poisoning on Selective Reincarnation in Multi-Agent Reinforcement Learning

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Processing Letters Pub Date : 2024-05-02 DOI:10.1007/s11063-024-11625-w

Harsha Putla, Chanakya Patibandla, Krishna Pratap Singh, P Nagabhushan

{"title":"A Pilot Study of Observation Poisoning on Selective Reincarnation in Multi-Agent Reinforcement Learning","authors":"Harsha Putla, Chanakya Patibandla, Krishna Pratap Singh, P Nagabhushan","doi":"10.1007/s11063-024-11625-w","DOIUrl":null,"url":null,"abstract":"<p>This research explores the vulnerability of selective reincarnation, a concept in Multi-Agent Reinforcement Learning (MARL), in response to observation poisoning attacks. Observation poisoning is an adversarial strategy that subtly manipulates an agent’s observation space, potentially leading to a misdirection in its learning process. The primary aim of this paper is to systematically evaluate the robustness of selective reincarnation in MARL systems against the subtle yet potentially debilitating effects of observation poisoning attacks. Through assessing how manipulated observation data influences MARL agents, we seek to highlight potential vulnerabilities and inform the development of more resilient MARL systems. Our experimental testbed was the widely used HalfCheetah environment, utilizing the Independent Deep Deterministic Policy Gradient algorithm within a cooperative MARL setting. We introduced a series of triggers, namely Gaussian noise addition, observation reversal, random shuffling, and scaling, into the teacher dataset of the MARL system provided to the reincarnating agents of HalfCheetah. Here, the “teacher dataset” refers to the stored experiences from previous training sessions used to accelerate the learning of reincarnating agents in MARL. This approach enabled the observation of these triggers’ significant impact on reincarnation decisions. Specifically, the reversal technique showed the most pronounced negative effect for maximum returns, with an average decrease of 38.08% in Kendall’s tau values across all the agent combinations. With random shuffling, Kendall’s tau values decreased by 17.66%. On the other hand, noise addition and scaling aligned with the original ranking by only 21.42% and 32.66%, respectively. The results, quantified by Kendall’s tau metric, indicate the fragility of the selective reincarnation process under adversarial observation poisoning. Our findings also reveal that vulnerability to observation poisoning varies significantly among different agent combinations, with some exhibiting markedly higher susceptibility than others. This investigation elucidates our understanding of selective reincarnation’s robustness against observation poisoning attacks, which is crucial for developing more secure MARL systems and also for making informed decisions about agent reincarnation.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"308 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Processing Letters","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11063-024-11625-w","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

This research explores the vulnerability of selective reincarnation, a concept in Multi-Agent Reinforcement Learning (MARL), in response to observation poisoning attacks. Observation poisoning is an adversarial strategy that subtly manipulates an agent’s observation space, potentially leading to a misdirection in its learning process. The primary aim of this paper is to systematically evaluate the robustness of selective reincarnation in MARL systems against the subtle yet potentially debilitating effects of observation poisoning attacks. Through assessing how manipulated observation data influences MARL agents, we seek to highlight potential vulnerabilities and inform the development of more resilient MARL systems. Our experimental testbed was the widely used HalfCheetah environment, utilizing the Independent Deep Deterministic Policy Gradient algorithm within a cooperative MARL setting. We introduced a series of triggers, namely Gaussian noise addition, observation reversal, random shuffling, and scaling, into the teacher dataset of the MARL system provided to the reincarnating agents of HalfCheetah. Here, the “teacher dataset” refers to the stored experiences from previous training sessions used to accelerate the learning of reincarnating agents in MARL. This approach enabled the observation of these triggers’ significant impact on reincarnation decisions. Specifically, the reversal technique showed the most pronounced negative effect for maximum returns, with an average decrease of 38.08% in Kendall’s tau values across all the agent combinations. With random shuffling, Kendall’s tau values decreased by 17.66%. On the other hand, noise addition and scaling aligned with the original ranking by only 21.42% and 32.66%, respectively. The results, quantified by Kendall’s tau metric, indicate the fragility of the selective reincarnation process under adversarial observation poisoning. Our findings also reveal that vulnerability to observation poisoning varies significantly among different agent combinations, with some exhibiting markedly higher susceptibility than others. This investigation elucidates our understanding of selective reincarnation’s robustness against observation poisoning attacks, which is crucial for developing more secure MARL systems and also for making informed decisions about agent reincarnation.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

观察中毒对多代理强化学习中选择性轮回的试点研究

本研究探讨了多代理强化学习（MARL）中的一个概念--选择性轮回（selective reincarnation）在应对观察中毒攻击时的脆弱性。观察中毒是一种对抗策略，它可以巧妙地操纵代理的观察空间，从而有可能导致其学习过程出现偏差。本文的主要目的是系统地评估 MARL 系统中选择性轮回的鲁棒性，以应对观察中毒攻击的微妙但潜在的破坏性影响。通过评估被操纵的观测数据如何影响 MARL 代理，我们试图突出潜在的漏洞，并为开发更具弹性的 MARL 系统提供信息。我们的实验平台是广泛使用的 HalfCheetah 环境，在合作 MARL 环境中使用独立深度确定性策略梯度算法。我们在向 HalfCheetah 的轮回代理提供的 MARL 系统教师数据集中引入了一系列触发器，即高斯噪声添加、观测反转、随机洗牌和缩放。这里的 "教师数据集 "指的是之前训练中存储的经验，用于加速轮回者在 MARL 系统中的学习。这种方法能够观察到这些触发因素对转世决定的重大影响。具体来说，逆转技术对最大回报的负面影响最为明显，所有代理组合的 Kendall's tau 值平均下降了 38.08%。随机洗牌的 Kendall's tau 值下降了 17.66%。另一方面，噪声添加和缩放与原始排名的一致性分别仅为 21.42% 和 32.66%。用 Kendall's tau 指标量化的结果表明，在对抗性观测中毒的情况下，选择性轮回过程非常脆弱。我们的研究结果还显示，不同的代理组合对观察中毒的易感性差异很大，有些代理组合的易感性明显高于其他代理组合。这项研究阐明了我们对选择性轮回对抗观察中毒攻击的鲁棒性的理解，这对于开发更安全的 MARL 系统以及做出关于代理轮回的明智决策至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Neural Processing Letters 工程技术-计算机：人工智能

CiteScore

4.90

自引率

12.90%

发文量

392

审稿时长

2.8 months

期刊介绍： Neural Processing Letters is an international journal publishing research results and innovative ideas on all aspects of artificial neural networks. Coverage includes theoretical developments, biological models, new formal modes, learning, applications, software and hardware developments, and prospective researches. The journal promotes fast exchange of information in the community of neural network researchers and users. The resurgence of interest in the field of artificial neural networks since the beginning of the 1980s is coupled to tremendous research activity in specialized or multidisciplinary groups. Research, however, is not possible without good communication between people and the exchange of information, especially in a field covering such different areas; fast communication is also a key aspect, and this is the reason for Neural Processing Letters