{"title":"Improving the Quality of Textual Adversarial Examples with Dynamic N-gram Based Attack","authors":"Xiaojiao Xie, Pengwei Zhan","doi":"10.1109/CSCWD57460.2023.10152569","DOIUrl":null,"url":null,"abstract":"Natural language models have been widely used for their impressive performance in various tasks, while their poor robustness also puts critical applications at high risk. These models are vulnerable to adversarial examples, which contain imperceptible noise that leads the model to wrong predictions. To ensure such malicious examples are imperceptible to humans, various word-level attack methods have been proposed. Previous works on word-level attacks attempt to generate adversarial examples by substituting words in sentences. They utilize different candidate substitution selection methods and substitution strategies to improve attack effectiveness and the quality of generated examples. However, previous works are all unigram-based attack methods, which ignore the connection between words. The unigram nature of these methods downgrades fluency, increases grammatical errors, and biases the semantics of adversarial examples, making adversarial examples easier to be detected by humans. In this paper, to improve the quality of textual adversarial examples and makes the adversarial example more imperceptible to human, we propose a black-box word-level attack method called Dynamic N-Gram Based Attack (DyGram). DyGram tokenizes the entire sentence into multiple n-gram units, rather than individual words as in previous works, and substitutes words in a sentence in descending order of n-gram unit importance. Extensive experiments demonstrate that DyGram achieves higher attack success rates than previous attack methods and improves the quality of generated adversarial examples in terms of the number of perturbed words, perplexity, grammatical correctness, and semantic similarity.","PeriodicalId":51008,"journal":{"name":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","volume":"53 1","pages":"594-599"},"PeriodicalIF":2.0000,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/CSCWD57460.2023.10152569","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Natural language models have been widely used for their impressive performance in various tasks, while their poor robustness also puts critical applications at high risk. These models are vulnerable to adversarial examples, which contain imperceptible noise that leads the model to wrong predictions. To ensure such malicious examples are imperceptible to humans, various word-level attack methods have been proposed. Previous works on word-level attacks attempt to generate adversarial examples by substituting words in sentences. They utilize different candidate substitution selection methods and substitution strategies to improve attack effectiveness and the quality of generated examples. However, previous works are all unigram-based attack methods, which ignore the connection between words. The unigram nature of these methods downgrades fluency, increases grammatical errors, and biases the semantics of adversarial examples, making adversarial examples easier to be detected by humans. In this paper, to improve the quality of textual adversarial examples and makes the adversarial example more imperceptible to human, we propose a black-box word-level attack method called Dynamic N-Gram Based Attack (DyGram). DyGram tokenizes the entire sentence into multiple n-gram units, rather than individual words as in previous works, and substitutes words in a sentence in descending order of n-gram unit importance. Extensive experiments demonstrate that DyGram achieves higher attack success rates than previous attack methods and improves the quality of generated adversarial examples in terms of the number of perturbed words, perplexity, grammatical correctness, and semantic similarity.
期刊介绍:
Computer Supported Cooperative Work (CSCW): The Journal of Collaborative Computing and Work Practices is devoted to innovative research in computer-supported cooperative work (CSCW). It provides an interdisciplinary and international forum for the debate and exchange of ideas concerning theoretical, practical, technical, and social issues in CSCW.
The CSCW Journal arose in response to the growing interest in the design, implementation and use of technical systems (including computing, information, and communications technologies) which support people working cooperatively, and its scope remains to encompass the multifarious aspects of research within CSCW and related areas.
The CSCW Journal focuses on research oriented towards the development of collaborative computing technologies on the basis of studies of actual cooperative work practices (where ‘work’ is used in the wider sense). That is, it welcomes in particular submissions that (a) report on findings from ethnographic or similar kinds of in-depth fieldwork of work practices with a view to their technological implications, (b) report on empirical evaluations of the use of extant or novel technical solutions under real-world conditions, and/or (c) develop technical or conceptual frameworks for practice-oriented computing research based on previous fieldwork and evaluations.