Detecting adversarial example attacks to deep neural networks

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing Pub Date : 2017-06-19 DOI:10.1145/3095713.3095753

F. Carrara, F. Falchi, R. Caldelli, Giuseppe Amato, Roberta Fumarola, Rudy Becarelli

引用次数: 33

Abstract

Deep learning has recently become the state of the art in many computer vision applications and in image classification in particular. However, recent works have shown that it is quite easy to create adversarial examples, i.e., images intentionally created or modified to cause the deep neural network to make a mistake. They are like optical illusions for machines containing changes unnoticeable to the human eye. This represents a serious threat for machine learning methods. In this paper, we investigate the robustness of the representations learned by the fooled neural network, analyzing the activations of its hidden layers. Specifically, we tested scoring approaches used for kNN classification, in order to distinguishing between correctly classified authentic images and adversarial examples. The results show that hidden layers activations can be used to detect incorrect classifications caused by adversarial attacks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

检测对深度神经网络的对抗性示例攻击

深度学习最近已经成为许多计算机视觉应用的最新技术，特别是在图像分类方面。然而，最近的研究表明，创建对抗性示例非常容易，即故意创建或修改图像以导致深度神经网络犯错误。它们就像机器的视觉错觉，包含人眼无法察觉的变化。这对机器学习方法构成了严重威胁。在本文中，我们通过分析其隐藏层的激活来研究被愚弄神经网络学习到的表征的鲁棒性。具体来说，我们测试了用于kNN分类的评分方法，以区分正确分类的真实图像和对抗示例。结果表明，隐藏层激活可以用于检测由对抗性攻击引起的错误分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

自引率

0.00%

发文量

期刊最新文献

Tag Propagation Approaches within Speaking Face Graphs for Multimodal Person Discovery A free Web API for single and multi-document summarization Visualizing weakly-Annotated Multi-label Mayan Inscriptions with Supervised t-SNE Prediction of User Demographics from Music Listening Habits Detecting adversarial example attacks to deep neural networks