Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-Language Models via Diffusion Models

IF 8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS IEEE Transactions on Information Forensics and Security Pub Date : 2024-12-23 DOI:10.1109/TIFS.2024.3518072

Qi Guo;Shanmin Pang;Xiaojun Jia;Yang Liu;Qing Guo

{"title":"Efficient Generation of Targeted and Transferable Adversarial Examples for Vision-Language Models via Diffusion Models","authors":"Qi Guo;Shanmin Pang;Xiaojun Jia;Yang Liu;Qing Guo","doi":"10.1109/TIFS.2024.3518072","DOIUrl":null,"url":null,"abstract":"Adversarial attacks, particularly targeted transfer-based attacks, can be used to assess the adversarial robustness of large visual-language models (VLMs), allowing for a more thorough examination of potential security flaws before deployment. However, previous transfer-based adversarial attacks incur high costs due to high iteration counts and complex method structure. Furthermore, due to the unnaturalness of adversarial semantics, the generated adversarial examples have low transferability. These issues limit the utility of existing methods for assessing robustness. To address these issues, we propose AdvDiffVLM, which uses diffusion models to generate natural, unrestricted and targeted adversarial examples via score matching. Specifically, AdvDiffVLM uses Adaptive Ensemble Gradient Estimation (AEGE) to modify the score during the diffusion model’s reverse generation process, ensuring that the produced adversarial examples have natural adversarial targeted semantics, which improves their transferability. Simultaneously, to improve the quality of adversarial examples, we use the GradCAM-guided Mask Generation (GCMG) to disperse adversarial semantics throughout the image rather than concentrating them in a single area. Finally, AdvDiffVLM embeds more target semantics into adversarial examples after multiple iterations. Experimental results show that our method generates adversarial examples 5x to 10x faster than state-of-the-art (SOTA) transfer-based adversarial attacks while maintaining higher quality adversarial examples. Furthermore, compared to previous transfer-based adversarial attacks, the adversarial examples generated by our method have better transferability. Notably, AdvDiffVLM can successfully attack a variety of commercial VLMs in a black-box environment, including GPT-4V. The code is available at <uri>https://github.com/gq-max/AdvDiffVLM</uri>","PeriodicalId":13492,"journal":{"name":"IEEE Transactions on Information Forensics and Security","volume":"20 ","pages":"1333-1348"},"PeriodicalIF":8.0000,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Forensics and Security","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10812818/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Adversarial attacks, particularly targeted transfer-based attacks, can be used to assess the adversarial robustness of large visual-language models (VLMs), allowing for a more thorough examination of potential security flaws before deployment. However, previous transfer-based adversarial attacks incur high costs due to high iteration counts and complex method structure. Furthermore, due to the unnaturalness of adversarial semantics, the generated adversarial examples have low transferability. These issues limit the utility of existing methods for assessing robustness. To address these issues, we propose AdvDiffVLM, which uses diffusion models to generate natural, unrestricted and targeted adversarial examples via score matching. Specifically, AdvDiffVLM uses Adaptive Ensemble Gradient Estimation (AEGE) to modify the score during the diffusion model’s reverse generation process, ensuring that the produced adversarial examples have natural adversarial targeted semantics, which improves their transferability. Simultaneously, to improve the quality of adversarial examples, we use the GradCAM-guided Mask Generation (GCMG) to disperse adversarial semantics throughout the image rather than concentrating them in a single area. Finally, AdvDiffVLM embeds more target semantics into adversarial examples after multiple iterations. Experimental results show that our method generates adversarial examples 5x to 10x faster than state-of-the-art (SOTA) transfer-based adversarial attacks while maintaining higher quality adversarial examples. Furthermore, compared to previous transfer-based adversarial attacks, the adversarial examples generated by our method have better transferability. Notably, AdvDiffVLM can successfully attack a variety of commercial VLMs in a black-box environment, including GPT-4V. The code is available at https://github.com/gq-max/AdvDiffVLM

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于扩散模型的视觉语言模型的目标和可转移对抗示例的高效生成

对抗性攻击，特别是有针对性的基于传输的攻击，可以用来评估大型视觉语言模型（vlm）的对抗性鲁棒性，允许在部署之前更彻底地检查潜在的安全漏洞。然而，以往基于转移的对抗性攻击由于迭代次数多、方法结构复杂，成本高。此外，由于对抗语义的非自然性，生成的对抗示例具有较低的可移植性。这些问题限制了评估鲁棒性的现有方法的实用性。为了解决这些问题，我们提出了AdvDiffVLM，它使用扩散模型通过分数匹配生成自然的、不受限制的和有针对性的对抗示例。具体来说，AdvDiffVLM在扩散模型逆向生成过程中使用自适应集成梯度估计（age）来修改分数，确保生成的对抗样本具有自然的对抗目标语义，从而提高了它们的可移植性。同时，为了提高对抗示例的质量，我们使用gradcam引导的掩码生成（GCMG）将对抗语义分散到整个图像中，而不是将它们集中在单个区域。最后，AdvDiffVLM在多次迭代后将更多的目标语义嵌入到对抗性示例中。实验结果表明，我们的方法生成对抗性示例的速度比基于最先进（SOTA）传输的对抗性攻击快5到10倍，同时保持更高质量的对抗性示例。此外，与以往基于转移的对抗性攻击相比，本文方法生成的对抗性示例具有更好的可转移性。值得注意的是，AdvDiffVLM可以在黑箱环境中成功攻击各种商用vlm，包括GPT-4V。代码可在https://github.com/gq-max/AdvDiffVLM上获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Information Forensics and Security 工程技术-工程：电子与电气

CiteScore

14.40

自引率

7.40%

发文量

234

审稿时长

6.5 months

期刊介绍： The IEEE Transactions on Information Forensics and Security covers the sciences, technologies, and applications relating to information forensics, information security, biometrics, surveillance and systems applications that incorporate these features