Detecting Adversarial Examples - a Lesson from Multimedia Security

Pascal Schöttle, Alexander Schlögl, Cecilia Pasquini, Rainer Böhme
{"title":"Detecting Adversarial Examples - a Lesson from Multimedia Security","authors":"Pascal Schöttle, Alexander Schlögl, Cecilia Pasquini, Rainer Böhme","doi":"10.23919/EUSIPCO.2018.8553164","DOIUrl":null,"url":null,"abstract":"Adversarial classification is the task of performing robust classification in the presence of a strategic attacker. Originating from information hiding and multimedia forensics, adversarial classification recently received a lot of attention in a broader security context. In the domain of machine learning-based image classification, adversarial classification can be interpreted as detecting so-called adversarial examples, which are slightly altered versions of benign images. They are specifically crafted to be misclassified with a very high probability by the classifier under attack. Neural networks, which dominate among modern image classifiers, have been shown to be especially vulnerable to these adversarial examples. However, detecting subtle changes in digital images has always been the goal of multimedia forensics and steganalysis, two major subfields of multimedia security. We highlight the conceptual similarities between these fields and secure machine learning. Furthermore, we adapt a linear filter, similar to early steganal-ysis methods, to detect adversarial examples that are generated with the projected gradient descent (PGD) method, the state-of-the-art algorithm for this task. We test our method on the MNIST database and show for several parameter combinations of PGD that our method can reliably detect adversarial examples. Additionally, the combination of adversarial re-training and our detection method effectively reduces the attack surface of attacks against neural networks. Thus, we conclude that adversarial examples for image classification possibly do not withstand detection methods from steganalysis, and future work should explore the effectiveness of known techniques from multimedia security in other adversarial settings.","PeriodicalId":303069,"journal":{"name":"2018 26th European Signal Processing Conference (EUSIPCO)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 26th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/EUSIPCO.2018.8553164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

Adversarial classification is the task of performing robust classification in the presence of a strategic attacker. Originating from information hiding and multimedia forensics, adversarial classification recently received a lot of attention in a broader security context. In the domain of machine learning-based image classification, adversarial classification can be interpreted as detecting so-called adversarial examples, which are slightly altered versions of benign images. They are specifically crafted to be misclassified with a very high probability by the classifier under attack. Neural networks, which dominate among modern image classifiers, have been shown to be especially vulnerable to these adversarial examples. However, detecting subtle changes in digital images has always been the goal of multimedia forensics and steganalysis, two major subfields of multimedia security. We highlight the conceptual similarities between these fields and secure machine learning. Furthermore, we adapt a linear filter, similar to early steganal-ysis methods, to detect adversarial examples that are generated with the projected gradient descent (PGD) method, the state-of-the-art algorithm for this task. We test our method on the MNIST database and show for several parameter combinations of PGD that our method can reliably detect adversarial examples. Additionally, the combination of adversarial re-training and our detection method effectively reduces the attack surface of attacks against neural networks. Thus, we conclude that adversarial examples for image classification possibly do not withstand detection methods from steganalysis, and future work should explore the effectiveness of known techniques from multimedia security in other adversarial settings.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
检测对抗性示例——多媒体安全的一个教训
对抗性分类是在存在战略攻击者的情况下执行鲁棒分类的任务。对抗性分类起源于信息隐藏和多媒体取证,最近在更广泛的安全环境中受到了广泛的关注。在基于机器学习的图像分类领域,对抗性分类可以解释为检测所谓的对抗性示例,即良性图像的轻微改变版本。它们被专门设计成被攻击的分类器以非常高的概率错误分类。在现代图像分类器中占主导地位的神经网络已被证明特别容易受到这些对抗性示例的影响。然而,检测数字图像的细微变化一直是多媒体取证和隐写分析的目标,这是多媒体安全的两个主要分支领域。我们强调了这些领域与安全机器学习之间的概念相似性。此外,我们采用了一个线性滤波器,类似于早期的隐写分析方法,来检测由投影梯度下降(PGD)方法生成的对抗示例,PGD是该任务的最先进算法。我们在MNIST数据库上测试了我们的方法,并表明对于PGD的几个参数组合,我们的方法可以可靠地检测对抗性示例。此外,对抗性再训练与我们的检测方法相结合,有效地减少了针对神经网络攻击的攻击面。因此,我们得出结论,用于图像分类的对抗性示例可能无法承受来自隐写分析的检测方法,未来的工作应该探索来自多媒体安全的已知技术在其他对抗性设置中的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Missing Sample Estimation Based on High-Order Sparse Linear Prediction for Audio Signals Multi-Shot Single Sensor Light Field Camera Using a Color Coded Mask Knowledge-Aided Normalized Iterative Hard Thresholding Algorithms for Sparse Recovery Two-Step Hybrid Multiuser Equalizer for Sub-Connected mmWave Massive MIMO SC-FDMA Systems How Much Will Tiny IoT Nodes Profit from Massive Base Station Arrays?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1