Diffusion Models for Imperceptible and Transferable Adversarial Attack

IF 18.6 IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-10-15 DOI:10.1109/TPAMI.2024.3480519

Jianqi Chen;Hao Chen;Keyan Chen;Yilan Zhang;Zhengxia Zou;Zhenwei Shi

{"title":"Diffusion Models for Imperceptible and Transferable Adversarial Attack","authors":"Jianqi Chen;Hao Chen;Keyan Chen;Yilan Zhang;Zhengxia Zou;Zhenwei Shi","doi":"10.1109/TPAMI.2024.3480519","DOIUrl":null,"url":null,"abstract":"Many existing adversarial attacks generate \n<inline-formula><tex-math>$L_{p}$</tex-math></inline-formula>\n-norm perturbations on image RGB space. Despite some achievements in transferability and attack success rate, the crafted adversarial examples are easily perceived by human eyes. Towards visual imperceptibility, some recent works explore unrestricted attacks without \n<inline-formula><tex-math>$L_{p}$</tex-math></inline-formula>\n-norm constraints, yet lacking transferability of attacking black-box models. In this work, we propose a novel imperceptible and transferable attack by leveraging both the generative and discriminative power of diffusion models. Specifically, instead of direct manipulation in pixel space, we craft perturbations in the latent space of diffusion models. Combined with well-designed content-preserving structures, we can generate human-insensitive perturbations embedded with semantic clues. For better transferability, we further “deceive” the diffusion model which can be viewed as an implicit recognition surrogate, by distracting its attention away from the target regions. To our knowledge, our proposed method, \n<i>DiffAttack</i>\n, is the first that introduces diffusion models into the adversarial attack field. Extensive experiments conducted across diverse model architectures (CNNs, Transformers, and MLPs), datasets (ImageNet, CUB-200, and Standford Cars), and defense mechanisms underscore the superiority of our attack over existing methods such as iterative attacks, GAN-based attacks, and ensemble attacks. Furthermore, we provide a comprehensive discussion on future research avenues in diffusion-based adversarial attacks, aiming to chart a course for this burgeoning field.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 2","pages":"961-977"},"PeriodicalIF":18.6000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10716799/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Many existing adversarial attacks generate

$L_{p}$

-norm perturbations on image RGB space. Despite some achievements in transferability and attack success rate, the crafted adversarial examples are easily perceived by human eyes. Towards visual imperceptibility, some recent works explore unrestricted attacks without

$L_{p}$

-norm constraints, yet lacking transferability of attacking black-box models. In this work, we propose a novel imperceptible and transferable attack by leveraging both the generative and discriminative power of diffusion models. Specifically, instead of direct manipulation in pixel space, we craft perturbations in the latent space of diffusion models. Combined with well-designed content-preserving structures, we can generate human-insensitive perturbations embedded with semantic clues. For better transferability, we further “deceive” the diffusion model which can be viewed as an implicit recognition surrogate, by distracting its attention away from the target regions. To our knowledge, our proposed method, DiffAttack , is the first that introduces diffusion models into the adversarial attack field. Extensive experiments conducted across diverse model architectures (CNNs, Transformers, and MLPs), datasets (ImageNet, CUB-200, and Standford Cars), and defense mechanisms underscore the superiority of our attack over existing methods such as iterative attacks, GAN-based attacks, and ensemble attacks. Furthermore, we provide a comprehensive discussion on future research avenues in diffusion-based adversarial attacks, aiming to chart a course for this burgeoning field.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

不可感知和可转移对抗性攻击的扩散模型

许多现有的对抗性攻击在图像RGB空间上产生$L_{p}$范数扰动。尽管在可转移性和攻击成功率方面取得了一些成就，但精心制作的对抗性示例很容易被人眼感知。在视觉不可感知性方面，最近的一些研究探索了不受$L_{p}$范数约束的无限制攻击，但缺乏攻击黑箱模型的可转移性。在这项工作中，我们通过利用扩散模型的生成和判别能力，提出了一种新的难以察觉和可转移的攻击。具体来说，我们不是在像素空间中直接操作，而是在扩散模型的潜在空间中制造扰动。结合精心设计的内容保存结构，我们可以生成嵌入语义线索的人类不敏感的扰动。为了更好的可转移性，我们进一步“欺骗”可以被视为隐式识别代理的扩散模型，通过将其注意力从目标区域转移开。据我们所知，我们提出的方法DiffAttack是第一个将扩散模型引入对抗性攻击领域的方法。在不同模型架构（cnn、Transformers和mlp）、数据集（ImageNet、CUB-200和斯坦福汽车）和防御机制上进行的大量实验强调了我们的攻击优于现有方法，如迭代攻击、基于gan的攻击和集成攻击。此外，我们对基于扩散的对抗性攻击的未来研究途径进行了全面的讨论，旨在为这一新兴领域制定路线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量