Understanding adversarial attacks on observations in deep reinforcement learning

IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Science China Information Sciences Pub Date : 2024-04-26 DOI:10.1007/s11432-021-3688-y
You Qiaoben, Chengyang Ying, Xinning Zhou, Hang Su, Jun Zhu, Bo Zhang
{"title":"Understanding adversarial attacks on observations in deep reinforcement learning","authors":"You Qiaoben, Chengyang Ying, Xinning Zhou, Hang Su, Jun Zhu, Bo Zhang","doi":"10.1007/s11432-021-3688-y","DOIUrl":null,"url":null,"abstract":"<p>Deep reinforcement learning models are vulnerable to adversarial attacks that can decrease the cumulative expected reward of a victim by manipulating its observations. Despite the efficiency of previous optimization-based methods for generating adversarial noise in supervised learning, such methods might not achieve the lowest cumulative reward since they do not generally explore the environmental dynamics. Herein, a framework is provided to better understand the existing methods by reformulating the problem of adversarial attacks on reinforcement learning in the function space. The reformulation approach adopted herein generates an optimal adversary in the function space of targeted attacks, repelling them via a generic two-stage framework. In the first stage, a deceptive policy is trained by hacking the environment and discovering a set of trajectories routing to the lowest reward or the worst-case performance. Next, the adversary misleads the victim to imitate the deceptive policy by perturbing the observations. Compared to existing approaches, it is theoretically shown that our adversary is strong under an appropriate noise level. Extensive experiments demonstrate the superiority of the proposed method in terms of efficiency and effectiveness, achieving state-of-the-art performance in both Atari and MuJoCo environments.</p>","PeriodicalId":21618,"journal":{"name":"Science China Information Sciences","volume":"8 1","pages":""},"PeriodicalIF":7.3000,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science China Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11432-021-3688-y","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Deep reinforcement learning models are vulnerable to adversarial attacks that can decrease the cumulative expected reward of a victim by manipulating its observations. Despite the efficiency of previous optimization-based methods for generating adversarial noise in supervised learning, such methods might not achieve the lowest cumulative reward since they do not generally explore the environmental dynamics. Herein, a framework is provided to better understand the existing methods by reformulating the problem of adversarial attacks on reinforcement learning in the function space. The reformulation approach adopted herein generates an optimal adversary in the function space of targeted attacks, repelling them via a generic two-stage framework. In the first stage, a deceptive policy is trained by hacking the environment and discovering a set of trajectories routing to the lowest reward or the worst-case performance. Next, the adversary misleads the victim to imitate the deceptive policy by perturbing the observations. Compared to existing approaches, it is theoretically shown that our adversary is strong under an appropriate noise level. Extensive experiments demonstrate the superiority of the proposed method in terms of efficiency and effectiveness, achieving state-of-the-art performance in both Atari and MuJoCo environments.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
了解深度强化学习中对观察结果的对抗性攻击
深度强化学习模型很容易受到对抗性攻击的影响,对抗性攻击可以通过操纵受害者的观察结果来降低其累积预期奖励。尽管之前基于优化的方法在监督学习中生成对抗噪声的效率很高,但由于这些方法一般不会探索环境动态,因此可能无法实现最低累积奖励。本文提供了一个框架,通过在函数空间中重新表述对强化学习的对抗性攻击问题,更好地理解现有方法。本文采用的重新表述方法可在目标攻击的函数空间中生成一个最佳对手,并通过一个通用的两阶段框架击退它们。在第一阶段,通过黑客攻击环境并发现一组通向最低奖励或最差性能的轨迹来训练欺骗性策略。接下来,对手通过扰动观测数据误导受害者模仿欺骗性策略。与现有方法相比,理论证明我们的对手在适当的噪声水平下是强大的。广泛的实验证明了所提方法在效率和效果方面的优越性,在 Atari 和 MuJoCo 环境中均达到了最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Science China Information Sciences
Science China Information Sciences COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
12.60
自引率
5.70%
发文量
224
审稿时长
8.3 months
期刊介绍: Science China Information Sciences is a dedicated journal that showcases high-quality, original research across various domains of information sciences. It encompasses Computer Science & Technologies, Control Science & Engineering, Information & Communication Engineering, Microelectronics & Solid-State Electronics, and Quantum Information, providing a platform for the dissemination of significant contributions in these fields.
期刊最新文献
Weighted sum power maximization for STAR-RIS-aided SWIPT systems with nonlinear energy harvesting TSCompiler: efficient compilation framework for dynamic-shape models NeurDB: an AI-powered autonomous data system State and parameter identification of linearized water wave equation via adjoint method An STP look at logical blocking of finite state machines: formulation, detection, and search
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1