通过特征空间深度强化学习实现黑箱对抗攻击

Lyue Li, Amir Rezapour, Wen-Guey Tzeng
{"title":"通过特征空间深度强化学习实现黑箱对抗攻击","authors":"Lyue Li, Amir Rezapour, Wen-Guey Tzeng","doi":"10.1109/DSC49826.2021.9346264","DOIUrl":null,"url":null,"abstract":"In this paper we propose a novel black-box adversarial attack by using the reinforcement learning to learn the characteristics of the target classifier C. Our method does not need to find a substitute classifier that resembles $C$ with respect to its structure and parameters. Instead, our method learns an optimal attacking policy of guiding the attacker to build an adversarial image from the original image. We work on the feature space of images, instead of the pixels of images directly. Our method achieves better results on many measures. Our method achieves 94.5 % attack success rate on a well-trained digit classifier. Our adversarial images have better imperceptibility even though the norm distances to original images are larger than other methods. Since our method works on the characteristics of a classifier, it has better transferability. The transfer rate of our method could reach 52.1 % for a targeted class and 65.9% for a non-targeted class. This improves over previous results of single-digit transfer rates. Also, we show that it is harder to defend our attack by incorporating defense mechanisms, such as MagNet, which uses a denoising technique. We show that our method achieves 65% attack success rate even though the target classifier employs MagNet to defend.","PeriodicalId":184504,"journal":{"name":"2021 IEEE Conference on Dependable and Secure Computing (DSC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Black-Box Adversarial Attack via Deep Reinforcement Learning on the Feature Space\",\"authors\":\"Lyue Li, Amir Rezapour, Wen-Guey Tzeng\",\"doi\":\"10.1109/DSC49826.2021.9346264\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we propose a novel black-box adversarial attack by using the reinforcement learning to learn the characteristics of the target classifier C. Our method does not need to find a substitute classifier that resembles $C$ with respect to its structure and parameters. Instead, our method learns an optimal attacking policy of guiding the attacker to build an adversarial image from the original image. We work on the feature space of images, instead of the pixels of images directly. Our method achieves better results on many measures. Our method achieves 94.5 % attack success rate on a well-trained digit classifier. Our adversarial images have better imperceptibility even though the norm distances to original images are larger than other methods. Since our method works on the characteristics of a classifier, it has better transferability. The transfer rate of our method could reach 52.1 % for a targeted class and 65.9% for a non-targeted class. This improves over previous results of single-digit transfer rates. Also, we show that it is harder to defend our attack by incorporating defense mechanisms, such as MagNet, which uses a denoising technique. We show that our method achieves 65% attack success rate even though the target classifier employs MagNet to defend.\",\"PeriodicalId\":184504,\"journal\":{\"name\":\"2021 IEEE Conference on Dependable and Secure Computing (DSC)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Conference on Dependable and Secure Computing (DSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSC49826.2021.9346264\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Conference on Dependable and Secure Computing (DSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSC49826.2021.9346264","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在本文中,我们提出了一种新颖的黑盒对抗攻击方法,即利用强化学习来学习目标分类器 C 的特征。相反,我们的方法可以学习最佳的攻击策略,引导攻击者从原始图像中建立对抗图像。我们的工作对象是图像的特征空间,而不是直接图像的像素。我们的方法在许多指标上都取得了更好的结果。在训练有素的数字分类器上,我们的方法取得了 94.5% 的攻击成功率。我们的对抗图像具有更好的不可感知性,即使与原始图像的标准距离比其他方法更大。由于我们的方法基于分类器的特征,因此具有更好的可移植性。我们的方法对目标类别的转移率可达 52.1%,对非目标类别的转移率可达 65.9%。这比以前个位数的转移率有所提高。此外,我们还表明,通过采用防御机制(如使用去噪技术的 MagNet)来防御我们的攻击更加困难。我们的研究表明,即使目标分类器采用 MagNet 进行防御,我们的方法也能达到 65% 的攻击成功率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Black-Box Adversarial Attack via Deep Reinforcement Learning on the Feature Space
In this paper we propose a novel black-box adversarial attack by using the reinforcement learning to learn the characteristics of the target classifier C. Our method does not need to find a substitute classifier that resembles $C$ with respect to its structure and parameters. Instead, our method learns an optimal attacking policy of guiding the attacker to build an adversarial image from the original image. We work on the feature space of images, instead of the pixels of images directly. Our method achieves better results on many measures. Our method achieves 94.5 % attack success rate on a well-trained digit classifier. Our adversarial images have better imperceptibility even though the norm distances to original images are larger than other methods. Since our method works on the characteristics of a classifier, it has better transferability. The transfer rate of our method could reach 52.1 % for a targeted class and 65.9% for a non-targeted class. This improves over previous results of single-digit transfer rates. Also, we show that it is harder to defend our attack by incorporating defense mechanisms, such as MagNet, which uses a denoising technique. We show that our method achieves 65% attack success rate even though the target classifier employs MagNet to defend.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Provable Data Possession Protocol in Cloud Storage Systems with Fault Tolerance Arithmetic Coding for Floating-Point Numbers A Novel Dynamic Group Signature with Membership Privacy ExamChain: A Privacy-Preserving Onscreen Marking System based on Consortium Blockchain Designated Verifier Signature Transformation: A New Framework for One-Time Delegating Verifiability
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1