字符作为像素:黑箱文本引导图像生成模型的可控提示对抗攻击框架

International Joint Conference on Artificial Intelligence Pub Date : 2023-08-01 DOI:10.24963/ijcai.2023/109

Ziyi Kou, Shichao Pei, Yijun Tian, Xiangliang Zhang

{"title":"字符作为像素:黑箱文本引导图像生成模型的可控提示对抗攻击框架","authors":"Ziyi Kou, Shichao Pei, Yijun Tian, Xiangliang Zhang","doi":"10.24963/ijcai.2023/109","DOIUrl":null,"url":null,"abstract":"In this paper, we study a controllable prompt adversarial attacking problem for text guided image generation (Text2Image) models in the black-box scenario, where the goal is to attack specific visual subjects (e.g., changing a brown dog to white) in a generated image by slightly, if not imperceptibly, perturbing the characters of the driven prompt (e.g., ``brown'' to ``br0wn''). Our study is motivated by the limitations of current Text2Image attacking approaches that still rely on manual trials to create adversarial prompts. To address such limitations, we develop CharGrad, a character-level gradient based attacking framework that replaces specific characters of a prompt with pixel-level similar ones by interactively learning the perturbation direction for the prompt and updating the attacking examiner for the generated image based on a novel proxy perturbation representation for characters. We evaluate CharGrad using the texts from two public image captioning datasets. Results demonstrate that CharGrad outperforms existing text adversarial attacking approaches on attacking various subjects of generated images by black-box Text2Image models in a more effective and efficient way with less perturbation on the characters of the prompts.","PeriodicalId":394530,"journal":{"name":"International Joint Conference on Artificial Intelligence","volume":"317 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Character As Pixels: A Controllable Prompt Adversarial Attacking Framework for Black-Box Text Guided Image Generation Models\",\"authors\":\"Ziyi Kou, Shichao Pei, Yijun Tian, Xiangliang Zhang\",\"doi\":\"10.24963/ijcai.2023/109\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study a controllable prompt adversarial attacking problem for text guided image generation (Text2Image) models in the black-box scenario, where the goal is to attack specific visual subjects (e.g., changing a brown dog to white) in a generated image by slightly, if not imperceptibly, perturbing the characters of the driven prompt (e.g., ``brown'' to ``br0wn''). Our study is motivated by the limitations of current Text2Image attacking approaches that still rely on manual trials to create adversarial prompts. To address such limitations, we develop CharGrad, a character-level gradient based attacking framework that replaces specific characters of a prompt with pixel-level similar ones by interactively learning the perturbation direction for the prompt and updating the attacking examiner for the generated image based on a novel proxy perturbation representation for characters. We evaluate CharGrad using the texts from two public image captioning datasets. Results demonstrate that CharGrad outperforms existing text adversarial attacking approaches on attacking various subjects of generated images by black-box Text2Image models in a more effective and efficient way with less perturbation on the characters of the prompts.\",\"PeriodicalId\":394530,\"journal\":{\"name\":\"International Joint Conference on Artificial Intelligence\",\"volume\":\"317 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Joint Conference on Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24963/ijcai.2023/109\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Joint Conference on Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24963/ijcai.2023/109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在本文中，我们研究了黑盒场景下文本引导图像生成(Text2Image)模型的可控提示对抗性攻击问题，其目标是通过轻微(如果不是难以察觉的话)干扰驱动提示的字符(例如，“棕色”到“棕色”)来攻击生成图像中的特定视觉对象(例如，将棕色狗变为白色)。我们的研究是由当前Text2Image攻击方法的局限性所激发的，这些方法仍然依赖于手动试验来创建对抗性提示。为了解决这些限制，我们开发了CharGrad，这是一个基于字符级梯度的攻击框架，通过交互式地学习提示符的扰动方向，并用像素级相似的字符替换提示符的特定字符，并基于字符的新型代理扰动表示更新生成图像的攻击检查器。我们使用来自两个公共图像字幕数据集的文本来评估CharGrad。结果表明，CharGrad算法在对黑盒Text2Image模型生成的各种主题图像进行攻击时，对提示符字符的扰动较小，且攻击效果更佳，优于现有的文本对抗性攻击方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Character As Pixels: A Controllable Prompt Adversarial Attacking Framework for Black-Box Text Guided Image Generation Models

In this paper, we study a controllable prompt adversarial attacking problem for text guided image generation (Text2Image) models in the black-box scenario, where the goal is to attack specific visual subjects (e.g., changing a brown dog to white) in a generated image by slightly, if not imperceptibly, perturbing the characters of the driven prompt (e.g., ``brown'' to ``br0wn''). Our study is motivated by the limitations of current Text2Image attacking approaches that still rely on manual trials to create adversarial prompts. To address such limitations, we develop CharGrad, a character-level gradient based attacking framework that replaces specific characters of a prompt with pixel-level similar ones by interactively learning the perturbation direction for the prompt and updating the attacking examiner for the generated image based on a novel proxy perturbation representation for characters. We evaluate CharGrad using the texts from two public image captioning datasets. Results demonstrate that CharGrad outperforms existing text adversarial attacking approaches on attacking various subjects of generated images by black-box Text2Image models in a more effective and efficient way with less perturbation on the characters of the prompts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Joint Conference on Artificial Intelligence

自引率

0.00%

发文量

期刊最新文献

Towards Formal Verification of Neuro-symbolic Multi-agent Systems RuleMatch: Matching Abstract Rules for Semi-supervised Learning of Human Standard Intelligence Tests Computing (1+epsilon)-Approximate Degeneracy in Sublinear Time AI and Decision Support for Sustainable Socio-Ecosystems Contrastive Learning and Reward Smoothing for Deep Portfolio Management