CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Privacy and Security Pub Date : 2024-07-12 DOI:10.1145/3678007

Xinyu He, Fengrui Hao, Tianlong Gu, Liang Chang

{"title":"CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models","authors":"Xinyu He, Fengrui Hao, Tianlong Gu, Liang Chang","doi":"10.1145/3678007","DOIUrl":null,"url":null,"abstract":"The pre-trained language models (PLMs) aim to assist computers in various domains to provide natural and efficient language interaction and text processing capabilities. However, recent studies have shown that PLMs are highly vulnerable to malicious backdoor attacks, where triggers could be injected into the models to guide them to exhibit the expected behavior of the attackers. Unfortunately, existing researches on backdoor attacks have mainly focused on English PLMs, but paid less attention to the Chinese PLMs. Moreover, these extant backdoor attacks don’t work well against Chinese PLMs. In this paper, we disclose the limitations of English backdoor attacks against Chinese PLMs, and propose the character-level backdoor attacks (CBAs) against the Chinese PLMs. Specifically, we first design three Chinese trigger generation strategies to ensure the backdoor being effectively triggered while improving the effectiveness of the backdoor attacks. Then, based on the attacker’s capabilities of accessing the training dataset, we develop trigger injection mechanisms with either the target label similarity or the masked language model, which select the most influential position and insert the trigger to maximize the stealth of backdoor attacks. Extensive experiments on three major natural language processing tasks in various Chinese PLMs and English PLMs demonstrate the effectiveness and stealthiness of our method. Besides, CBAs also have very strong resistance against three state-of-the-art backdoor defense methods.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Privacy and Security","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3678007","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The pre-trained language models (PLMs) aim to assist computers in various domains to provide natural and efficient language interaction and text processing capabilities. However, recent studies have shown that PLMs are highly vulnerable to malicious backdoor attacks, where triggers could be injected into the models to guide them to exhibit the expected behavior of the attackers. Unfortunately, existing researches on backdoor attacks have mainly focused on English PLMs, but paid less attention to the Chinese PLMs. Moreover, these extant backdoor attacks don’t work well against Chinese PLMs. In this paper, we disclose the limitations of English backdoor attacks against Chinese PLMs, and propose the character-level backdoor attacks (CBAs) against the Chinese PLMs. Specifically, we first design three Chinese trigger generation strategies to ensure the backdoor being effectively triggered while improving the effectiveness of the backdoor attacks. Then, based on the attacker’s capabilities of accessing the training dataset, we develop trigger injection mechanisms with either the target label similarity or the masked language model, which select the most influential position and insert the trigger to maximize the stealth of backdoor attacks. Extensive experiments on three major natural language processing tasks in various Chinese PLMs and English PLMs demonstrate the effectiveness and stealthiness of our method. Besides, CBAs also have very strong resistance against three state-of-the-art backdoor defense methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CBAs：针对中文预训练语言模型的字符级后门攻击

预训练语言模型（PLMs）旨在协助各领域的计算机提供自然、高效的语言交互和文本处理能力。然而，最近的研究表明，PLMs 非常容易受到恶意后门攻击的影响，恶意后门攻击可将触发器注入模型，引导模型表现出攻击者预期的行为。遗憾的是，现有的后门攻击研究主要集中在英文版 PLM 上，对中文版 PLM 关注较少。此外，这些现有的后门攻击对中文 PLM 也不起作用。本文揭示了针对中文 PLM 的英文后门攻击的局限性，并提出了针对中文 PLM 的字符级后门攻击（CBA）。具体来说，我们首先设计了三种中文触发生成策略，以确保后门被有效触发，同时提高后门攻击的有效性。然后，根据攻击者获取训练数据集的能力，我们开发了目标标签相似度或屏蔽语言模型的触发器注入机制，选择最有影响力的位置插入触发器，最大限度地提高后门攻击的隐蔽性。在各种中文 PLM 和英文 PLM 中进行的三大自然语言处理任务的广泛实验证明了我们的方法的有效性和隐蔽性。此外，CBA 对三种最先进的后门防御方法也有很强的抵御能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Privacy and Security Computer Science-General Computer Science

CiteScore

5.20

自引率

0.00%

发文量

期刊介绍： ACM Transactions on Privacy and Security (TOPS) (formerly known as TISSEC) publishes high-quality research results in the fields of information and system security and privacy. Studies addressing all aspects of these fields are welcomed, ranging from technologies, to systems and applications, to the crafting of policies.