Kalt：生成可解释的对抗性中文法律文本

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Learning Pub Date : 2024-06-21 DOI:10.1007/s10994-024-06572-5

Yunting Zhang, Shang Li, Lin Ye, Hongli Zhang, Zhe Chen, Binxing Fang

{"title":"Kalt：生成可解释的对抗性中文法律文本","authors":"Yunting Zhang, Shang Li, Lin Ye, Hongli Zhang, Zhe Chen, Binxing Fang","doi":"10.1007/s10994-024-06572-5","DOIUrl":null,"url":null,"abstract":"<p>Deep neural networks (DNNs) are vulnerable to adversarial examples (AEs), which are well-designed input samples with imperceptible perturbations. Existing methods generate AEs to evaluate the robustness of DNN-based natural language processing models. However, the AE attack performance significantly degrades in some verticals, such as law, due to overlooking essential domain knowledge. To generate explainable Chinese legal adversarial texts, we introduce legal knowledge and propose a novel black-box approach, knowledge-aware law tricker (KALT), in the framework of adversarial text generation based on word importance. Firstly, we invent a legal knowledge extraction method based on KeyBERT. The knowledge contains unique features from each category and shared features among different categories. Additionally, we design two perturbation strategies, Strengthen Similar Label and Weaken Original Label, to selectively perturb the two types of features, which can significantly reduce the classification accuracy of the target model. These two perturbation strategies can be regarded as components, which can be conveniently integrated into any perturbation method to enhance attack performance. Furthermore, we propose a strong hybrid perturbation method to introduce perturbation into the original texts. The perturbation method combines seven representative perturbation methods for Chinese. Finally, we design a formula to calculate interpretability scores, quantifying the interpretability of adversarial text generation methods. Experimental results demonstrate that KALT can effectively generate explainable Chinese legal adversarial texts that can be misclassified with high confidence and achieve excellent attack performance against the powerful Chinese BERT.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"53 32 1","pages":""},"PeriodicalIF":4.3000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Kalt: generating adversarial explainable chinese legal texts\",\"authors\":\"Yunting Zhang, Shang Li, Lin Ye, Hongli Zhang, Zhe Chen, Binxing Fang\",\"doi\":\"10.1007/s10994-024-06572-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Deep neural networks (DNNs) are vulnerable to adversarial examples (AEs), which are well-designed input samples with imperceptible perturbations. Existing methods generate AEs to evaluate the robustness of DNN-based natural language processing models. However, the AE attack performance significantly degrades in some verticals, such as law, due to overlooking essential domain knowledge. To generate explainable Chinese legal adversarial texts, we introduce legal knowledge and propose a novel black-box approach, knowledge-aware law tricker (KALT), in the framework of adversarial text generation based on word importance. Firstly, we invent a legal knowledge extraction method based on KeyBERT. The knowledge contains unique features from each category and shared features among different categories. Additionally, we design two perturbation strategies, Strengthen Similar Label and Weaken Original Label, to selectively perturb the two types of features, which can significantly reduce the classification accuracy of the target model. These two perturbation strategies can be regarded as components, which can be conveniently integrated into any perturbation method to enhance attack performance. Furthermore, we propose a strong hybrid perturbation method to introduce perturbation into the original texts. The perturbation method combines seven representative perturbation methods for Chinese. Finally, we design a formula to calculate interpretability scores, quantifying the interpretability of adversarial text generation methods. Experimental results demonstrate that KALT can effectively generate explainable Chinese legal adversarial texts that can be misclassified with high confidence and achieve excellent attack performance against the powerful Chinese BERT.</p>\",\"PeriodicalId\":49900,\"journal\":{\"name\":\"Machine Learning\",\"volume\":\"53 32 1\",\"pages\":\"\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine Learning\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10994-024-06572-5\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-024-06572-5","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

深度神经网络（DNN）很容易受到对抗示例（AE）的影响，对抗示例是精心设计的输入样本，具有难以察觉的扰动。现有方法通过生成 AE 来评估基于 DNN 的自然语言处理模型的鲁棒性。然而，在某些垂直领域（如法律），由于忽略了基本的领域知识，AE 攻击性能明显下降。为了生成可解释的中文法律对抗文本，我们引入了法律知识，并在基于词重要性的对抗文本生成框架下提出了一种新颖的黑盒方法--知识感知法律诱导器（KALT）。首先，我们发明了一种基于 KeyBERT 的法律知识提取方法。该知识包含每个类别的独特特征和不同类别之间的共享特征。此外，我们还设计了两种扰动策略，即强化相似标签和弱化原始标签，以选择性地扰动这两类特征，从而显著降低目标模型的分类准确率。这两种扰动策略可以被视为组件，可以方便地集成到任何扰动方法中以提高攻击性能。此外，我们还提出了一种强混合扰动方法，将扰动引入原始文本。该扰动方法结合了七种具有代表性的中文扰动方法。最后，我们设计了一个计算可解释性分数的公式，量化了对抗文本生成方法的可解释性。实验结果表明，KALT 可以有效生成可解释的中文法律对抗文本，这些文本可以被高置信度地错误分类，并在面对强大的中文 BERT 时取得优异的攻击性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Kalt: generating adversarial explainable chinese legal texts

Deep neural networks (DNNs) are vulnerable to adversarial examples (AEs), which are well-designed input samples with imperceptible perturbations. Existing methods generate AEs to evaluate the robustness of DNN-based natural language processing models. However, the AE attack performance significantly degrades in some verticals, such as law, due to overlooking essential domain knowledge. To generate explainable Chinese legal adversarial texts, we introduce legal knowledge and propose a novel black-box approach, knowledge-aware law tricker (KALT), in the framework of adversarial text generation based on word importance. Firstly, we invent a legal knowledge extraction method based on KeyBERT. The knowledge contains unique features from each category and shared features among different categories. Additionally, we design two perturbation strategies, Strengthen Similar Label and Weaken Original Label, to selectively perturb the two types of features, which can significantly reduce the classification accuracy of the target model. These two perturbation strategies can be regarded as components, which can be conveniently integrated into any perturbation method to enhance attack performance. Furthermore, we propose a strong hybrid perturbation method to introduce perturbation into the original texts. The perturbation method combines seven representative perturbation methods for Chinese. Finally, we design a formula to calculate interpretability scores, quantifying the interpretability of adversarial text generation methods. Experimental results demonstrate that KALT can effectively generate explainable Chinese legal adversarial texts that can be misclassified with high confidence and achieve excellent attack performance against the powerful Chinese BERT.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine Learning 工程技术-计算机：人工智能

CiteScore

11.00

自引率

2.70%

发文量

162

审稿时长

3 months

期刊介绍： Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.