Toward Model Resistant to Transferable Adversarial Examples via Trigger Activation

IF 8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS IEEE Transactions on Information Forensics and Security Pub Date : 2025-03-19 DOI:10.1109/TIFS.2025.3553043

Yi Yu;Song Xia;Xun Lin;Chenqi Kong;Wenhan Yang;Shijian Lu;Yap-Peng Tan;Alex C. Kot

{"title":"Toward Model Resistant to Transferable Adversarial Examples via Trigger Activation","authors":"Yi Yu;Song Xia;Xun Lin;Chenqi Kong;Wenhan Yang;Shijian Lu;Yap-Peng Tan;Alex C. Kot","doi":"10.1109/TIFS.2025.3553043","DOIUrl":null,"url":null,"abstract":"Adversarial examples, characterized by imperceptible perturbations, pose significant threats to deep neural networks by misleading their predictions. A critical aspect of these examples is their transferability, allowing them to deceive unseen models in closed-box scenarios. Despite the widespread exploration of defense methods, including those on transferability, they show limitations: inefficient deployment, ineffective defense, and degraded performance on clean images. In this work, we introduce a novel training paradigm aimed at enhancing robustness against transferable adversarial examples (TAEs) in a more efficient and effective way. We propose a model that exhibits random guessing behavior when presented with clean data <inline-formula> <tex-math>$\\boldsymbol {x}$ </tex-math></inline-formula> as input, and generates accurate predictions when with triggered data <inline-formula> <tex-math>$\\boldsymbol {x}+\\boldsymbol {\\tau }$ </tex-math></inline-formula>. Importantly, the trigger <inline-formula> <tex-math>$\\boldsymbol {\\tau }$ </tex-math></inline-formula> remains constant for all data instances. We refer to these models as models with trigger activation. We are surprised to find that these models exhibit certain robustness against TAEs. Through the consideration of first-order gradients, we provide a theoretical analysis of this robustness. Moreover, through the joint optimization of the learnable trigger and the model, we achieve improved robustness to transferable attacks. Extensive experiments conducted across diverse datasets, evaluating a variety of attacking methods, underscore the effectiveness and superiority of our approach.","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"3745-3757"},"PeriodicalIF":8.0000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10934010/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Adversarial examples, characterized by imperceptible perturbations, pose significant threats to deep neural networks by misleading their predictions. A critical aspect of these examples is their transferability, allowing them to deceive unseen models in closed-box scenarios. Despite the widespread exploration of defense methods, including those on transferability, they show limitations: inefficient deployment, ineffective defense, and degraded performance on clean images. In this work, we introduce a novel training paradigm aimed at enhancing robustness against transferable adversarial examples (TAEs) in a more efficient and effective way. We propose a model that exhibits random guessing behavior when presented with clean data

$\boldsymbol {x}$

as input, and generates accurate predictions when with triggered data

$\boldsymbol {x}+\boldsymbol {\tau }$

. Importantly, the trigger

$\boldsymbol {\tau }$

remains constant for all data instances. We refer to these models as models with trigger activation. We are surprised to find that these models exhibit certain robustness against TAEs. Through the consideration of first-order gradients, we provide a theoretical analysis of this robustness. Moreover, through the joint optimization of the learnable trigger and the model, we achieve improved robustness to transferable attacks. Extensive experiments conducted across diverse datasets, evaluating a variety of attacking methods, underscore the effectiveness and superiority of our approach.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于触发激活的抗可转移对抗样本模型

对抗性示例以难以察觉的扰动为特征，通过误导深度神经网络的预测，对其构成重大威胁。这些示例的一个关键方面是它们的可移植性，允许它们在封闭的场景中欺骗不可见的模型。尽管对包括可转移性在内的防御方法进行了广泛的探索，但它们显示出局限性：低效的部署，无效的防御以及在干净图像上的性能下降。在这项工作中，我们引入了一种新的训练范式，旨在以更高效和有效的方式增强对可转移对抗示例（TAEs）的鲁棒性。我们提出了一个模型，当使用干净数据$\boldsymbol {x}$作为输入时，它表现出随机猜测行为，当使用触发数据$\boldsymbol {x}+\boldsymbol {\tau}$时，它产生准确的预测。重要的是，对于所有数据实例，触发器$\boldsymbol {\tau}$保持不变。我们将这些模型称为具有触发器激活的模型。我们惊讶地发现，这些模型对TAEs表现出一定的鲁棒性。通过考虑一阶梯度，我们对这种鲁棒性进行了理论分析。此外，通过对可学习触发器和模型的联合优化，提高了对可转移攻击的鲁棒性。在不同的数据集上进行了大量的实验，评估了各种攻击方法，强调了我们方法的有效性和优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features