You Qiaoben, Chengyang Ying, Xinning Zhou, Hang Su, Jun Zhu, Bo Zhang
{"title":"了解深度强化学习中对观察结果的对抗性攻击","authors":"You Qiaoben, Chengyang Ying, Xinning Zhou, Hang Su, Jun Zhu, Bo Zhang","doi":"10.1007/s11432-021-3688-y","DOIUrl":null,"url":null,"abstract":"<p>Deep reinforcement learning models are vulnerable to adversarial attacks that can decrease the cumulative expected reward of a victim by manipulating its observations. Despite the efficiency of previous optimization-based methods for generating adversarial noise in supervised learning, such methods might not achieve the lowest cumulative reward since they do not generally explore the environmental dynamics. Herein, a framework is provided to better understand the existing methods by reformulating the problem of adversarial attacks on reinforcement learning in the function space. The reformulation approach adopted herein generates an optimal adversary in the function space of targeted attacks, repelling them via a generic two-stage framework. In the first stage, a deceptive policy is trained by hacking the environment and discovering a set of trajectories routing to the lowest reward or the worst-case performance. Next, the adversary misleads the victim to imitate the deceptive policy by perturbing the observations. Compared to existing approaches, it is theoretically shown that our adversary is strong under an appropriate noise level. Extensive experiments demonstrate the superiority of the proposed method in terms of efficiency and effectiveness, achieving state-of-the-art performance in both Atari and MuJoCo environments.</p>","PeriodicalId":21618,"journal":{"name":"Science China Information Sciences","volume":"8 1","pages":""},"PeriodicalIF":7.3000,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Understanding adversarial attacks on observations in deep reinforcement learning\",\"authors\":\"You Qiaoben, Chengyang Ying, Xinning Zhou, Hang Su, Jun Zhu, Bo Zhang\",\"doi\":\"10.1007/s11432-021-3688-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Deep reinforcement learning models are vulnerable to adversarial attacks that can decrease the cumulative expected reward of a victim by manipulating its observations. Despite the efficiency of previous optimization-based methods for generating adversarial noise in supervised learning, such methods might not achieve the lowest cumulative reward since they do not generally explore the environmental dynamics. Herein, a framework is provided to better understand the existing methods by reformulating the problem of adversarial attacks on reinforcement learning in the function space. The reformulation approach adopted herein generates an optimal adversary in the function space of targeted attacks, repelling them via a generic two-stage framework. In the first stage, a deceptive policy is trained by hacking the environment and discovering a set of trajectories routing to the lowest reward or the worst-case performance. Next, the adversary misleads the victim to imitate the deceptive policy by perturbing the observations. Compared to existing approaches, it is theoretically shown that our adversary is strong under an appropriate noise level. Extensive experiments demonstrate the superiority of the proposed method in terms of efficiency and effectiveness, achieving state-of-the-art performance in both Atari and MuJoCo environments.</p>\",\"PeriodicalId\":21618,\"journal\":{\"name\":\"Science China Information Sciences\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2024-04-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Science China Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11432-021-3688-y\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science China Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11432-021-3688-y","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Understanding adversarial attacks on observations in deep reinforcement learning
Deep reinforcement learning models are vulnerable to adversarial attacks that can decrease the cumulative expected reward of a victim by manipulating its observations. Despite the efficiency of previous optimization-based methods for generating adversarial noise in supervised learning, such methods might not achieve the lowest cumulative reward since they do not generally explore the environmental dynamics. Herein, a framework is provided to better understand the existing methods by reformulating the problem of adversarial attacks on reinforcement learning in the function space. The reformulation approach adopted herein generates an optimal adversary in the function space of targeted attacks, repelling them via a generic two-stage framework. In the first stage, a deceptive policy is trained by hacking the environment and discovering a set of trajectories routing to the lowest reward or the worst-case performance. Next, the adversary misleads the victim to imitate the deceptive policy by perturbing the observations. Compared to existing approaches, it is theoretically shown that our adversary is strong under an appropriate noise level. Extensive experiments demonstrate the superiority of the proposed method in terms of efficiency and effectiveness, achieving state-of-the-art performance in both Atari and MuJoCo environments.
期刊介绍:
Science China Information Sciences is a dedicated journal that showcases high-quality, original research across various domains of information sciences. It encompasses Computer Science & Technologies, Control Science & Engineering, Information & Communication Engineering, Microelectronics & Solid-State Electronics, and Quantum Information, providing a platform for the dissemination of significant contributions in these fields.