用文本对抗性示例测试NLP模型的方法

Q4 Engineering Scientific and Technical Journal of Information Technologies, Mechanics and Optics Pub Date : 2023-10-01 DOI:10.17586/2226-1494-2023-23-5-946-954

A.B. Menisov, A.G. Lomako, T.R. Sabirov

{"title":"用文本对抗性示例测试NLP模型的方法","authors":"A.B. Menisov, A.G. Lomako, T.R. Sabirov","doi":"10.17586/2226-1494-2023-23-5-946-954","DOIUrl":null,"url":null,"abstract":"At present, the interpretability of Natural Language Processing (NLP) models is unsatisfactory due to the imperfection of the scientific and methodological apparatus for describing the functioning of both individual elements and models as a whole. One of the problems associated with poor interpretability is the low reliability of the functioning of neural networks that process natural language texts. Small perturbations in text data are known to affect the stability of neural networks. The paper presents a method for testing NLP models for the threat of evasion attacks. The method includes the following text adversarial examples generations: random text modification and modification generation network. Random text modification is made using homoglyphs, rearranging text, adding invisible characters and removing characters randomly. The modification generation network is based on a generative adversarial architecture of neural networks. The conducted experiments demonstrated the effectiveness of the testing method based on the network for generating text adversarial examples. The advantage of the developed method is, firstly, in the possibility of generating more natural and diverse adversarial examples, which have less restrictions, and, secondly, that multiple requests to the model under test are not required. This may be applicable in more complex test scenarios where interaction with the model is limited. The experiments showed that the developed method allowed achieving a relatively better balance of effectiveness and stealth of textual adversarial examples (e.g. GigaChat and YaGPT models tested). The results of the work showed the need to test for defects and vulnerabilities that can be exploited by attackers in order to reduce the quality of the functioning of NLP models. This indicates a lot of potential in terms of ensuring the reliability of machine learning models. A promising direction is the problem of restoring the level of security (confidentiality, availability and integrity) of NLP models.","PeriodicalId":21700,"journal":{"name":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Method for testing NLP models with text adversarial examples\",\"authors\":\"A.B. Menisov, A.G. Lomako, T.R. Sabirov\",\"doi\":\"10.17586/2226-1494-2023-23-5-946-954\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"At present, the interpretability of Natural Language Processing (NLP) models is unsatisfactory due to the imperfection of the scientific and methodological apparatus for describing the functioning of both individual elements and models as a whole. One of the problems associated with poor interpretability is the low reliability of the functioning of neural networks that process natural language texts. Small perturbations in text data are known to affect the stability of neural networks. The paper presents a method for testing NLP models for the threat of evasion attacks. The method includes the following text adversarial examples generations: random text modification and modification generation network. Random text modification is made using homoglyphs, rearranging text, adding invisible characters and removing characters randomly. The modification generation network is based on a generative adversarial architecture of neural networks. The conducted experiments demonstrated the effectiveness of the testing method based on the network for generating text adversarial examples. The advantage of the developed method is, firstly, in the possibility of generating more natural and diverse adversarial examples, which have less restrictions, and, secondly, that multiple requests to the model under test are not required. This may be applicable in more complex test scenarios where interaction with the model is limited. The experiments showed that the developed method allowed achieving a relatively better balance of effectiveness and stealth of textual adversarial examples (e.g. GigaChat and YaGPT models tested). The results of the work showed the need to test for defects and vulnerabilities that can be exploited by attackers in order to reduce the quality of the functioning of NLP models. This indicates a lot of potential in terms of ensuring the reliability of machine learning models. A promising direction is the problem of restoring the level of security (confidentiality, availability and integrity) of NLP models.\",\"PeriodicalId\":21700,\"journal\":{\"name\":\"Scientific and Technical Journal of Information Technologies, Mechanics and Optics\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific and Technical Journal of Information Technologies, Mechanics and Optics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17586/2226-1494-2023-23-5-946-954\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17586/2226-1494-2023-23-5-946-954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}

引用次数: 0

摘要

目前，自然语言处理(NLP)模型的可解释性并不令人满意，这是由于描述单个元素和模型整体功能的科学和方法设备的不完善。与可解释性差相关的问题之一是处理自然语言文本的神经网络功能的低可靠性。已知文本数据中的小扰动会影响神经网络的稳定性。本文提出了一种用于测试逃避攻击威胁的NLP模型的方法。该方法包括文本对抗性示例生成:随机文本修改生成和网络修改生成。随机文本修改使用同音异义字，重新排列文本，添加不可见字符和删除字符随机。修正生成网络基于神经网络的生成对抗结构。实验证明了基于网络的测试方法在生成文本对抗示例方面的有效性。所开发的方法的优点在于，首先，可以生成更自然和多样化的对抗示例，这些示例具有较少的限制;其次，不需要对被测模型进行多次请求。这可能适用于与模型交互受限的更复杂的测试场景。实验表明，开发的方法可以实现文本对抗示例(例如所测试的GigaChat和YaGPT模型)的有效性和隐蔽性之间的相对更好的平衡。工作的结果表明，需要测试可以被攻击者利用的缺陷和漏洞，以降低NLP模型的功能质量。这表明在确保机器学习模型的可靠性方面有很大的潜力。一个有希望的方向是恢复NLP模型的安全级别(机密性、可用性和完整性)的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Method for testing NLP models with text adversarial examples

At present, the interpretability of Natural Language Processing (NLP) models is unsatisfactory due to the imperfection of the scientific and methodological apparatus for describing the functioning of both individual elements and models as a whole. One of the problems associated with poor interpretability is the low reliability of the functioning of neural networks that process natural language texts. Small perturbations in text data are known to affect the stability of neural networks. The paper presents a method for testing NLP models for the threat of evasion attacks. The method includes the following text adversarial examples generations: random text modification and modification generation network. Random text modification is made using homoglyphs, rearranging text, adding invisible characters and removing characters randomly. The modification generation network is based on a generative adversarial architecture of neural networks. The conducted experiments demonstrated the effectiveness of the testing method based on the network for generating text adversarial examples. The advantage of the developed method is, firstly, in the possibility of generating more natural and diverse adversarial examples, which have less restrictions, and, secondly, that multiple requests to the model under test are not required. This may be applicable in more complex test scenarios where interaction with the model is limited. The experiments showed that the developed method allowed achieving a relatively better balance of effectiveness and stealth of textual adversarial examples (e.g. GigaChat and YaGPT models tested). The results of the work showed the need to test for defects and vulnerabilities that can be exploited by attackers in order to reduce the quality of the functioning of NLP models. This indicates a lot of potential in terms of ensuring the reliability of machine learning models. A promising direction is the problem of restoring the level of security (confidentiality, availability and integrity) of NLP models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊