Fortifying NLP models against poisoning attacks: The power of personalized prediction architectures

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Information Fusion Pub Date : 2024-09-19 DOI:10.1016/j.inffus.2024.102692

Teddy Ferdinan, Jan Kocoń

{"title":"Fortifying NLP models against poisoning attacks: The power of personalized prediction architectures","authors":"Teddy Ferdinan, Jan Kocoń","doi":"10.1016/j.inffus.2024.102692","DOIUrl":null,"url":null,"abstract":"<div><p>In Natural Language Processing (NLP), state-of-the-art machine learning models heavily depend on vast amounts of training data. Often, this data is sourced from third parties, such as crowdsourcing platforms, to enable swift and efficient annotation collection for supervised learning. Yet, such an approach is susceptible to poisoning attacks where malicious agents deliberately insert harmful data to skew the resulting model behavior. Current countermeasures to these attacks either come at a significant cost, lack full efficacy, or are simply non-applicable. This study introduces and evaluates the potential of personalized model architectures as a defense against these threats. By comparing two top-performing personalized model architectures, User-ID and HuBi-Medium, against a standard non-personalized baseline across two NLP tasks and various simulated attack scenarios, we found that the personalized model architectures significantly outperformed the baseline. The robustness advantage increased with the rise in malicious annotations. Notably, the User-ID model excelled in safeguarding predictions for legitimate users from the influence of malicious annotations. Our findings emphasize the benefit of adopting personalized model architectures to bolster NLP system defenses against poisoning attacks.</p></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"114 ","pages":"Article 102692"},"PeriodicalIF":14.7000,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1566253524004706/pdfft?md5=3a6019ed5699d3ea16b3237461a74599&pid=1-s2.0-S1566253524004706-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253524004706","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In Natural Language Processing (NLP), state-of-the-art machine learning models heavily depend on vast amounts of training data. Often, this data is sourced from third parties, such as crowdsourcing platforms, to enable swift and efficient annotation collection for supervised learning. Yet, such an approach is susceptible to poisoning attacks where malicious agents deliberately insert harmful data to skew the resulting model behavior. Current countermeasures to these attacks either come at a significant cost, lack full efficacy, or are simply non-applicable. This study introduces and evaluates the potential of personalized model architectures as a defense against these threats. By comparing two top-performing personalized model architectures, User-ID and HuBi-Medium, against a standard non-personalized baseline across two NLP tasks and various simulated attack scenarios, we found that the personalized model architectures significantly outperformed the baseline. The robustness advantage increased with the rise in malicious annotations. Notably, the User-ID model excelled in safeguarding predictions for legitimate users from the influence of malicious annotations. Our findings emphasize the benefit of adopting personalized model architectures to bolster NLP system defenses against poisoning attacks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

强化 NLP 模型，抵御中毒攻击：个性化预测架构的力量

在自然语言处理（NLP）领域，最先进的机器学习模型在很大程度上依赖于大量的训练数据。这些数据通常来自第三方，如众包平台，以便为监督学习快速、高效地收集注释。然而，这种方法很容易受到 "中毒 "攻击，即恶意代理蓄意插入有害数据，以歪曲由此产生的模型行为。目前针对这些攻击的对策要么成本高昂，要么缺乏全面的有效性，要么根本无法应用。本研究介绍并评估了个性化模型架构作为防御这些威胁的潜力。通过在两个 NLP 任务和各种模拟攻击场景中将两个表现最佳的个性化模型架构（User-ID 和 HuBi-Medium ）与标准非个性化基线进行比较，我们发现个性化模型架构的表现明显优于基线。随着恶意注释的增加，鲁棒性优势也在增加。值得注意的是，User-ID 模型在保护合法用户的预测不受恶意注释影响方面表现出色。我们的研究结果强调了采用个性化模型架构来增强 NLP 系统防御中毒攻击的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.