Diffusion Models for Imperceptible and Transferable Adversarial Attack

Jianqi Chen;Hao Chen;Keyan Chen;Yilan Zhang;Zhengxia Zou;Zhenwei Shi
{"title":"Diffusion Models for Imperceptible and Transferable Adversarial Attack","authors":"Jianqi Chen;Hao Chen;Keyan Chen;Yilan Zhang;Zhengxia Zou;Zhenwei Shi","doi":"10.1109/TPAMI.2024.3480519","DOIUrl":null,"url":null,"abstract":"Many existing adversarial attacks generate \n<inline-formula><tex-math>$L_{p}$</tex-math></inline-formula>\n-norm perturbations on image RGB space. Despite some achievements in transferability and attack success rate, the crafted adversarial examples are easily perceived by human eyes. Towards visual imperceptibility, some recent works explore unrestricted attacks without \n<inline-formula><tex-math>$L_{p}$</tex-math></inline-formula>\n-norm constraints, yet lacking transferability of attacking black-box models. In this work, we propose a novel imperceptible and transferable attack by leveraging both the generative and discriminative power of diffusion models. Specifically, instead of direct manipulation in pixel space, we craft perturbations in the latent space of diffusion models. Combined with well-designed content-preserving structures, we can generate human-insensitive perturbations embedded with semantic clues. For better transferability, we further “deceive” the diffusion model which can be viewed as an implicit recognition surrogate, by distracting its attention away from the target regions. To our knowledge, our proposed method, \n<i>DiffAttack</i>\n, is the first that introduces diffusion models into the adversarial attack field. Extensive experiments conducted across diverse model architectures (CNNs, Transformers, and MLPs), datasets (ImageNet, CUB-200, and Standford Cars), and defense mechanisms underscore the superiority of our attack over existing methods such as iterative attacks, GAN-based attacks, and ensemble attacks. Furthermore, we provide a comprehensive discussion on future research avenues in diffusion-based adversarial attacks, aiming to chart a course for this burgeoning field.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 2","pages":"961-977"},"PeriodicalIF":18.6000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10716799/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Many existing adversarial attacks generate $L_{p}$ -norm perturbations on image RGB space. Despite some achievements in transferability and attack success rate, the crafted adversarial examples are easily perceived by human eyes. Towards visual imperceptibility, some recent works explore unrestricted attacks without $L_{p}$ -norm constraints, yet lacking transferability of attacking black-box models. In this work, we propose a novel imperceptible and transferable attack by leveraging both the generative and discriminative power of diffusion models. Specifically, instead of direct manipulation in pixel space, we craft perturbations in the latent space of diffusion models. Combined with well-designed content-preserving structures, we can generate human-insensitive perturbations embedded with semantic clues. For better transferability, we further “deceive” the diffusion model which can be viewed as an implicit recognition surrogate, by distracting its attention away from the target regions. To our knowledge, our proposed method, DiffAttack , is the first that introduces diffusion models into the adversarial attack field. Extensive experiments conducted across diverse model architectures (CNNs, Transformers, and MLPs), datasets (ImageNet, CUB-200, and Standford Cars), and defense mechanisms underscore the superiority of our attack over existing methods such as iterative attacks, GAN-based attacks, and ensemble attacks. Furthermore, we provide a comprehensive discussion on future research avenues in diffusion-based adversarial attacks, aiming to chart a course for this burgeoning field.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
不可感知和可转移对抗性攻击的扩散模型
许多现有的对抗性攻击在图像RGB空间上产生$L_{p}$范数扰动。尽管在可转移性和攻击成功率方面取得了一些成就,但精心制作的对抗性示例很容易被人眼感知。在视觉不可感知性方面,最近的一些研究探索了不受$L_{p}$范数约束的无限制攻击,但缺乏攻击黑箱模型的可转移性。在这项工作中,我们通过利用扩散模型的生成和判别能力,提出了一种新的难以察觉和可转移的攻击。具体来说,我们不是在像素空间中直接操作,而是在扩散模型的潜在空间中制造扰动。结合精心设计的内容保存结构,我们可以生成嵌入语义线索的人类不敏感的扰动。为了更好的可转移性,我们进一步“欺骗”可以被视为隐式识别代理的扩散模型,通过将其注意力从目标区域转移开。据我们所知,我们提出的方法DiffAttack是第一个将扩散模型引入对抗性攻击领域的方法。在不同模型架构(cnn、Transformers和mlp)、数据集(ImageNet、CUB-200和斯坦福汽车)和防御机制上进行的大量实验强调了我们的攻击优于现有方法,如迭代攻击、基于gan的攻击和集成攻击。此外,我们对基于扩散的对抗性攻击的未来研究途径进行了全面的讨论,旨在为这一新兴领域制定路线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models. Neural Eigenfunctions are Structured Representation Learners. SSD: Making Face Forgery Clues Evident Again With Self-Steganographic Detection. Disentangling Consistent and Specific Information for Double Incomplete Multi-View Multi-Label Classification. Graph Neural Networks Powered by Encoder Embedding for Improved Node Learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1