用文本对抗性示例测试NLP模型的方法

A.B. Menisov, A.G. Lomako, T.R. Sabirov
{"title":"用文本对抗性示例测试NLP模型的方法","authors":"A.B. Menisov, A.G. Lomako, T.R. Sabirov","doi":"10.17586/2226-1494-2023-23-5-946-954","DOIUrl":null,"url":null,"abstract":"At present, the interpretability of Natural Language Processing (NLP) models is unsatisfactory due to the imperfection of the scientific and methodological apparatus for describing the functioning of both individual elements and models as a whole. One of the problems associated with poor interpretability is the low reliability of the functioning of neural networks that process natural language texts. Small perturbations in text data are known to affect the stability of neural networks. The paper presents a method for testing NLP models for the threat of evasion attacks. The method includes the following text adversarial examples generations: random text modification and modification generation network. Random text modification is made using homoglyphs, rearranging text, adding invisible characters and removing characters randomly. The modification generation network is based on a generative adversarial architecture of neural networks. The conducted experiments demonstrated the effectiveness of the testing method based on the network for generating text adversarial examples. The advantage of the developed method is, firstly, in the possibility of generating more natural and diverse adversarial examples, which have less restrictions, and, secondly, that multiple requests to the model under test are not required. This may be applicable in more complex test scenarios where interaction with the model is limited. The experiments showed that the developed method allowed achieving a relatively better balance of effectiveness and stealth of textual adversarial examples (e.g. GigaChat and YaGPT models tested). The results of the work showed the need to test for defects and vulnerabilities that can be exploited by attackers in order to reduce the quality of the functioning of NLP models. This indicates a lot of potential in terms of ensuring the reliability of machine learning models. A promising direction is the problem of restoring the level of security (confidentiality, availability and integrity) of NLP models.","PeriodicalId":21700,"journal":{"name":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Method for testing NLP models with text adversarial examples\",\"authors\":\"A.B. Menisov, A.G. Lomako, T.R. Sabirov\",\"doi\":\"10.17586/2226-1494-2023-23-5-946-954\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"At present, the interpretability of Natural Language Processing (NLP) models is unsatisfactory due to the imperfection of the scientific and methodological apparatus for describing the functioning of both individual elements and models as a whole. One of the problems associated with poor interpretability is the low reliability of the functioning of neural networks that process natural language texts. Small perturbations in text data are known to affect the stability of neural networks. The paper presents a method for testing NLP models for the threat of evasion attacks. The method includes the following text adversarial examples generations: random text modification and modification generation network. Random text modification is made using homoglyphs, rearranging text, adding invisible characters and removing characters randomly. The modification generation network is based on a generative adversarial architecture of neural networks. The conducted experiments demonstrated the effectiveness of the testing method based on the network for generating text adversarial examples. The advantage of the developed method is, firstly, in the possibility of generating more natural and diverse adversarial examples, which have less restrictions, and, secondly, that multiple requests to the model under test are not required. This may be applicable in more complex test scenarios where interaction with the model is limited. The experiments showed that the developed method allowed achieving a relatively better balance of effectiveness and stealth of textual adversarial examples (e.g. GigaChat and YaGPT models tested). The results of the work showed the need to test for defects and vulnerabilities that can be exploited by attackers in order to reduce the quality of the functioning of NLP models. This indicates a lot of potential in terms of ensuring the reliability of machine learning models. A promising direction is the problem of restoring the level of security (confidentiality, availability and integrity) of NLP models.\",\"PeriodicalId\":21700,\"journal\":{\"name\":\"Scientific and Technical Journal of Information Technologies, Mechanics and Optics\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific and Technical Journal of Information Technologies, Mechanics and Optics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17586/2226-1494-2023-23-5-946-954\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and Technical Journal of Information Technologies, Mechanics and Optics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17586/2226-1494-2023-23-5-946-954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0

摘要

目前,自然语言处理(NLP)模型的可解释性并不令人满意,这是由于描述单个元素和模型整体功能的科学和方法设备的不完善。与可解释性差相关的问题之一是处理自然语言文本的神经网络功能的低可靠性。已知文本数据中的小扰动会影响神经网络的稳定性。本文提出了一种用于测试逃避攻击威胁的NLP模型的方法。该方法包括文本对抗性示例生成:随机文本修改生成和网络修改生成。随机文本修改使用同音异义字,重新排列文本,添加不可见字符和删除字符随机。修正生成网络基于神经网络的生成对抗结构。实验证明了基于网络的测试方法在生成文本对抗示例方面的有效性。所开发的方法的优点在于,首先,可以生成更自然和多样化的对抗示例,这些示例具有较少的限制;其次,不需要对被测模型进行多次请求。这可能适用于与模型交互受限的更复杂的测试场景。实验表明,开发的方法可以实现文本对抗示例(例如所测试的GigaChat和YaGPT模型)的有效性和隐蔽性之间的相对更好的平衡。工作的结果表明,需要测试可以被攻击者利用的缺陷和漏洞,以降低NLP模型的功能质量。这表明在确保机器学习模型的可靠性方面有很大的潜力。一个有希望的方向是恢复NLP模型的安全级别(机密性、可用性和完整性)的问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Method for testing NLP models with text adversarial examples
At present, the interpretability of Natural Language Processing (NLP) models is unsatisfactory due to the imperfection of the scientific and methodological apparatus for describing the functioning of both individual elements and models as a whole. One of the problems associated with poor interpretability is the low reliability of the functioning of neural networks that process natural language texts. Small perturbations in text data are known to affect the stability of neural networks. The paper presents a method for testing NLP models for the threat of evasion attacks. The method includes the following text adversarial examples generations: random text modification and modification generation network. Random text modification is made using homoglyphs, rearranging text, adding invisible characters and removing characters randomly. The modification generation network is based on a generative adversarial architecture of neural networks. The conducted experiments demonstrated the effectiveness of the testing method based on the network for generating text adversarial examples. The advantage of the developed method is, firstly, in the possibility of generating more natural and diverse adversarial examples, which have less restrictions, and, secondly, that multiple requests to the model under test are not required. This may be applicable in more complex test scenarios where interaction with the model is limited. The experiments showed that the developed method allowed achieving a relatively better balance of effectiveness and stealth of textual adversarial examples (e.g. GigaChat and YaGPT models tested). The results of the work showed the need to test for defects and vulnerabilities that can be exploited by attackers in order to reduce the quality of the functioning of NLP models. This indicates a lot of potential in terms of ensuring the reliability of machine learning models. A promising direction is the problem of restoring the level of security (confidentiality, availability and integrity) of NLP models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.70
自引率
0.00%
发文量
102
审稿时长
8 weeks
期刊最新文献
Homograph recognition algorithm based on Euclidean metric Deep attention based Proto-oncogene prediction and Oncogene transition possibility detection using moments and position based amino acid features Structural and spectral properties of YAG:Nd, YAG:Ce and YAG:Yb nanocrystalline powders synthesized via modified Pechini method Laser-induced thermal effect on the electrical characteristics of photosensitive PbSe films An improved performance of RetinaNet model for hand-gun detection in custom dataset and real time surveillance video
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1