BlurNet: Defense by Filtering the Feature Maps

Ravi Raju, Mikko H. Lipasti
{"title":"BlurNet: Defense by Filtering the Feature Maps","authors":"Ravi Raju, Mikko H. Lipasti","doi":"10.1109/DSN-W50199.2020.00016","DOIUrl":null,"url":null,"abstract":"Recently, the field of adversarial machine learning has been garnering attention by showing that state-of-the-art deep neural networks are vulnerable to adversarial examples, stemming from small perturbations being added to the input image. Adversarial examples are generated by a malicious adversary by obtaining access to the model parameters, such as gradient information, to alter the input or by attacking a substitute model and transferring those malicious examples over to attack the victim model. Specifically, one of these attack algorithms, Robust Physical Perturbations $(RP_{2})$, generates adversarial images of stop signs with black and white stickers to achieve high targeted misclassification rates against standard-architecture traffic sign classifiers. In this paper, we propose BlurNet, a defense against the RP2 attack. First, we motivate the defense with a frequency analysis of the first layer feature maps of the network on the LISA dataset, which shows that high frequency noise is introduced into the input image by the RP2 algorithm. To remove the high frequency noise, we introduce a depthwise convolution layer of standard blur kernels after the first layer. We perform a blackbox transfer attack to show that low-pass filtering the feature maps is more beneficial than filtering the input. We then present various regularization schemes to incorporate this low-pass filtering behavior into the training regime of the network and perform white-box attacks. We conclude with an adaptive attack evaluation to show that the success rate of the attack drops from 90% to 20% with total variation regularization, one of the proposed defenses.","PeriodicalId":427687,"journal":{"name":"2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSN-W50199.2020.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

Recently, the field of adversarial machine learning has been garnering attention by showing that state-of-the-art deep neural networks are vulnerable to adversarial examples, stemming from small perturbations being added to the input image. Adversarial examples are generated by a malicious adversary by obtaining access to the model parameters, such as gradient information, to alter the input or by attacking a substitute model and transferring those malicious examples over to attack the victim model. Specifically, one of these attack algorithms, Robust Physical Perturbations $(RP_{2})$, generates adversarial images of stop signs with black and white stickers to achieve high targeted misclassification rates against standard-architecture traffic sign classifiers. In this paper, we propose BlurNet, a defense against the RP2 attack. First, we motivate the defense with a frequency analysis of the first layer feature maps of the network on the LISA dataset, which shows that high frequency noise is introduced into the input image by the RP2 algorithm. To remove the high frequency noise, we introduce a depthwise convolution layer of standard blur kernels after the first layer. We perform a blackbox transfer attack to show that low-pass filtering the feature maps is more beneficial than filtering the input. We then present various regularization schemes to incorporate this low-pass filtering behavior into the training regime of the network and perform white-box attacks. We conclude with an adaptive attack evaluation to show that the success rate of the attack drops from 90% to 20% with total variation regularization, one of the proposed defenses.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
BlurNet:通过过滤特征映射进行防御
最近,对抗性机器学习领域引起了人们的关注,因为它表明,最先进的深度神经网络很容易受到对抗性示例的影响,这源于输入图像中添加的小扰动。恶意的攻击者通过获取模型参数(如梯度信息)来改变输入,或者通过攻击替代模型并将这些恶意示例转移到攻击受害者模型来生成对抗性示例。具体来说,其中一种攻击算法鲁棒物理扰动$(RP_{2})$生成带有黑白贴纸的停车标志的对抗图像,以实现针对标准架构交通标志分类器的高目标误分类率。在本文中,我们提出了一种针对RP2攻击的防御方法——BlurNet。首先,我们通过对LISA数据集上网络第一层特征图的频率分析来激励防御,结果表明RP2算法将高频噪声引入了输入图像中。为了去除高频噪声,我们在第一层之后引入标准模糊核的深度卷积层。我们执行了一个黑盒传输攻击,以表明低通滤波特征映射比滤波输入更有益。然后,我们提出了各种正则化方案,将这种低通滤波行为纳入网络的训练体系,并执行白盒攻击。最后,我们对自适应攻击进行了评估,结果表明,采用总变差正则化(所提出的防御之一)后,攻击的成功率从90%下降到20%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
PyTorchFI: A Runtime Perturbation Tool for DNNs AI Safety Landscape From short-term specific system engineering to long-term artificial general intelligence DSN-W 2020 TOC Approaching certification of complex systems Exploring Fault Parameter Space Using Reinforcement Learning-based Fault Injection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1