Recent studies have demonstrated that policy manipulation attacks on deep reinforcement learning (DRL) systems can lead to the learning of abnormal policies by victim agents. However, existing work typically assumes that the attacker can manipulate multiple components of the training process, such as reward functions, environment dynamics, or state information. In IoT-enabled smart societies, where AI-driven systems operate in interconnected and data-sensitive environments, such assumptions raise serious concerns regarding security and privacy. This paper investigates a novel policy manipulation attack in competitive multi-agent reinforcement learning under significantly weaker assumptions, where the attacker only requires access to the victim’s training settings and, in some cases, the learned policy outputs during training. We propose the honeypot policy attack (HPA), in which an adversarial agent induces the victim to learn an attacker-specified target policy by deliberately taking suboptimal actions. To this end, we introduce a honeypot reward estimation mechanism that quantifies the amount of reward sacrifice required by the adversarial agent to influence the victim’s learning process, and adapts this sacrifice according to the degree of policy manipulation. Extensive experiments on three representative competitive games demonstrate that HPA is both effective and stealthy, exposing previously unexplored vulnerabilities in DRL-based systems deployed in IoT-driven smart environments. To the best of our knowledge, this work presents the first policy manipulation attack that does not rely on explicit tampering with internal components of DRL systems, but instead operates solely through admissible adversarial interactions, offering new insights into security challenges faced by emerging AIoT ecosystems.
扫码关注我们
求助内容:
应助结果提醒方式:
