Detecting Adversarial Examples - a Lesson from Multimedia Security

2018 26th European Signal Processing Conference (EUSIPCO) Pub Date : 2018-09-01 DOI:10.23919/EUSIPCO.2018.8553164

Pascal Schöttle, Alexander Schlögl, Cecilia Pasquini, Rainer Böhme

{"title":"Detecting Adversarial Examples - a Lesson from Multimedia Security","authors":"Pascal Schöttle, Alexander Schlögl, Cecilia Pasquini, Rainer Böhme","doi":"10.23919/EUSIPCO.2018.8553164","DOIUrl":null,"url":null,"abstract":"Adversarial classification is the task of performing robust classification in the presence of a strategic attacker. Originating from information hiding and multimedia forensics, adversarial classification recently received a lot of attention in a broader security context. In the domain of machine learning-based image classification, adversarial classification can be interpreted as detecting so-called adversarial examples, which are slightly altered versions of benign images. They are specifically crafted to be misclassified with a very high probability by the classifier under attack. Neural networks, which dominate among modern image classifiers, have been shown to be especially vulnerable to these adversarial examples. However, detecting subtle changes in digital images has always been the goal of multimedia forensics and steganalysis, two major subfields of multimedia security. We highlight the conceptual similarities between these fields and secure machine learning. Furthermore, we adapt a linear filter, similar to early steganal-ysis methods, to detect adversarial examples that are generated with the projected gradient descent (PGD) method, the state-of-the-art algorithm for this task. We test our method on the MNIST database and show for several parameter combinations of PGD that our method can reliably detect adversarial examples. Additionally, the combination of adversarial re-training and our detection method effectively reduces the attack surface of attacks against neural networks. Thus, we conclude that adversarial examples for image classification possibly do not withstand detection methods from steganalysis, and future work should explore the effectiveness of known techniques from multimedia security in other adversarial settings.","PeriodicalId":303069,"journal":{"name":"2018 26th European Signal Processing Conference (EUSIPCO)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 26th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/EUSIPCO.2018.8553164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Adversarial classification is the task of performing robust classification in the presence of a strategic attacker. Originating from information hiding and multimedia forensics, adversarial classification recently received a lot of attention in a broader security context. In the domain of machine learning-based image classification, adversarial classification can be interpreted as detecting so-called adversarial examples, which are slightly altered versions of benign images. They are specifically crafted to be misclassified with a very high probability by the classifier under attack. Neural networks, which dominate among modern image classifiers, have been shown to be especially vulnerable to these adversarial examples. However, detecting subtle changes in digital images has always been the goal of multimedia forensics and steganalysis, two major subfields of multimedia security. We highlight the conceptual similarities between these fields and secure machine learning. Furthermore, we adapt a linear filter, similar to early steganal-ysis methods, to detect adversarial examples that are generated with the projected gradient descent (PGD) method, the state-of-the-art algorithm for this task. We test our method on the MNIST database and show for several parameter combinations of PGD that our method can reliably detect adversarial examples. Additionally, the combination of adversarial re-training and our detection method effectively reduces the attack surface of attacks against neural networks. Thus, we conclude that adversarial examples for image classification possibly do not withstand detection methods from steganalysis, and future work should explore the effectiveness of known techniques from multimedia security in other adversarial settings.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

检测对抗性示例——多媒体安全的一个教训

对抗性分类是在存在战略攻击者的情况下执行鲁棒分类的任务。对抗性分类起源于信息隐藏和多媒体取证，最近在更广泛的安全环境中受到了广泛的关注。在基于机器学习的图像分类领域，对抗性分类可以解释为检测所谓的对抗性示例，即良性图像的轻微改变版本。它们被专门设计成被攻击的分类器以非常高的概率错误分类。在现代图像分类器中占主导地位的神经网络已被证明特别容易受到这些对抗性示例的影响。然而，检测数字图像的细微变化一直是多媒体取证和隐写分析的目标，这是多媒体安全的两个主要分支领域。我们强调了这些领域与安全机器学习之间的概念相似性。此外，我们采用了一个线性滤波器，类似于早期的隐写分析方法，来检测由投影梯度下降(PGD)方法生成的对抗示例，PGD是该任务的最先进算法。我们在MNIST数据库上测试了我们的方法，并表明对于PGD的几个参数组合，我们的方法可以可靠地检测对抗性示例。此外，对抗性再训练与我们的检测方法相结合，有效地减少了针对神经网络攻击的攻击面。因此，我们得出结论，用于图像分类的对抗性示例可能无法承受来自隐写分析的检测方法，未来的工作应该探索来自多媒体安全的已知技术在其他对抗性设置中的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 26th European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量