Reversible jump attack to textual classifiers with modification reduction

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Learning Pub Date : 2024-04-22 DOI:10.1007/s10994-024-06539-6

Mingze Ni, Zhensu Sun, Wei Liu

{"title":"Reversible jump attack to textual classifiers with modification reduction","authors":"Mingze Ni, Zhensu Sun, Wei Liu","doi":"10.1007/s10994-024-06539-6","DOIUrl":null,"url":null,"abstract":"<p>Recent studies on adversarial examples expose vulnerabilities of natural language processing models. Existing techniques for generating adversarial examples are typically driven by deterministic hierarchical rules that are agnostic to the optimal adversarial examples, a strategy that often results in adversarial samples with a suboptimal balance between magnitudes of changes and attack successes. To this end, in this research we propose two algorithms, Reversible Jump Attack (RJA) and Metropolis–Hasting Modification Reduction (MMR), to generate highly effective adversarial examples and to improve the imperceptibility of the examples, respectively. RJA utilizes a novel randomization mechanism to enlarge the search space and efficiently adapts to a number of perturbed words for adversarial examples. With these generated adversarial examples, MMR applies the Metropolis–Hasting sampler to enhance the imperceptibility of adversarial examples. Extensive experiments demonstrate that RJA-MMR outperforms current state-of-the-art methods in attack performance, imperceptibility, fluency and grammar correctness.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"279 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-024-06539-6","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recent studies on adversarial examples expose vulnerabilities of natural language processing models. Existing techniques for generating adversarial examples are typically driven by deterministic hierarchical rules that are agnostic to the optimal adversarial examples, a strategy that often results in adversarial samples with a suboptimal balance between magnitudes of changes and attack successes. To this end, in this research we propose two algorithms, Reversible Jump Attack (RJA) and Metropolis–Hasting Modification Reduction (MMR), to generate highly effective adversarial examples and to improve the imperceptibility of the examples, respectively. RJA utilizes a novel randomization mechanism to enlarge the search space and efficiently adapts to a number of perturbed words for adversarial examples. With these generated adversarial examples, MMR applies the Metropolis–Hasting sampler to enhance the imperceptibility of adversarial examples. Extensive experiments demonstrate that RJA-MMR outperforms current state-of-the-art methods in attack performance, imperceptibility, fluency and grammar correctness.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

针对文本分类器的可逆跳转攻击与修改减少

最近关于对抗示例的研究暴露了自然语言处理模型的漏洞。生成对抗示例的现有技术通常由确定性分层规则驱动，而这些规则与最优对抗示例无关，这种策略通常会导致对抗样本在变化幅度和攻击成功率之间达不到最佳平衡。为此，我们在本研究中提出了两种算法--可逆跳跃攻击（RJA）和大都会-空速修改还原（MMR），分别用于生成高效的对抗示例和提高示例的不可感知性。RJA 利用一种新颖的随机化机制来扩大搜索空间，并能有效地适应大量扰动词的对抗示例。利用这些生成的对抗示例，MMR 应用 Metropolis-Hasting 采样器来增强对抗示例的不可感知性。大量实验证明，RJA-MMR 在攻击性能、不可感知性、流畅性和语法正确性方面都优于目前最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine Learning 工程技术-计算机：人工智能

CiteScore

11.00

自引率

2.70%

发文量

162

审稿时长

3 months

期刊介绍： Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.