Unintended Memorization and Timing Attacks in Named Entity Recognition Models

Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium Pub Date : 2023-04-01 DOI:10.56553/popets-2023-0056

Rana Salal Ali, Benjamin Zi Hao Zhao, Hassan Jameel Asghar, Tham Nguyen, Ian David Wood, Mohamed Ali Kaafar

{"title":"Unintended Memorization and Timing Attacks in Named Entity Recognition Models","authors":"Rana Salal Ali, Benjamin Zi Hao Zhao, Hassan Jameel Asghar, Tham Nguyen, Ian David Wood, Mohamed Ali Kaafar","doi":"10.56553/popets-2023-0056","DOIUrl":null,"url":null,"abstract":"Named entity recognition models (NER), are widely used for identifying named entities (e.g., individuals, locations, and other information) in text documents. Machine learning based NER models are increasingly being applied in privacy-sensitive applications that need automatic and scalable identification of sensitive information to redact text for data sharing. In this paper, we study the setting when NER models are available as a black-box service for identifying sensitive information in user documents and show that these models are vulnerable to membership inference on their training datasets. With updated pre-trained NER models from spaCy, we demonstrate two distinct membership attacks on these models. Our first attack capitalizes on unintended memorization in the NER's underlying neural network, a phenomenon NNs are known to be vulnerable to. Our second attack leverages a timing side-channel to target NER models that maintain vocabularies constructed from the training data. We show that different functional paths of words within the training dataset in contrast to words not previously seen have measurable differences in execution time. Revealing membership status of training samples has clear privacy implications. For example, in text redaction, sensitive words or phrases to be found and removed, are at risk of being detected in the training dataset. Our experimental evaluation includes the redaction of both password and health data, presenting both security risks and a privacy/regulatory issues. This is exacerbated by results that indicate memorization after only a single phrase. We achieved a 70% AUC in our first attack on a text redaction use-case. We also show overwhelming success in the second timing attack with an 99.23% AUC. Finally we discuss potential mitigation approaches to realize the safe use of NER models in light of the presented privacy and security implications of membership inference attacks.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.56553/popets-2023-0056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Named entity recognition models (NER), are widely used for identifying named entities (e.g., individuals, locations, and other information) in text documents. Machine learning based NER models are increasingly being applied in privacy-sensitive applications that need automatic and scalable identification of sensitive information to redact text for data sharing. In this paper, we study the setting when NER models are available as a black-box service for identifying sensitive information in user documents and show that these models are vulnerable to membership inference on their training datasets. With updated pre-trained NER models from spaCy, we demonstrate two distinct membership attacks on these models. Our first attack capitalizes on unintended memorization in the NER's underlying neural network, a phenomenon NNs are known to be vulnerable to. Our second attack leverages a timing side-channel to target NER models that maintain vocabularies constructed from the training data. We show that different functional paths of words within the training dataset in contrast to words not previously seen have measurable differences in execution time. Revealing membership status of training samples has clear privacy implications. For example, in text redaction, sensitive words or phrases to be found and removed, are at risk of being detected in the training dataset. Our experimental evaluation includes the redaction of both password and health data, presenting both security risks and a privacy/regulatory issues. This is exacerbated by results that indicate memorization after only a single phrase. We achieved a 70% AUC in our first attack on a text redaction use-case. We also show overwhelming success in the second timing attack with an 99.23% AUC. Finally we discuss potential mitigation approaches to realize the safe use of NER models in light of the presented privacy and security implications of membership inference attacks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

命名实体识别模型中的意外记忆和定时攻击

命名实体识别模型(NER)广泛用于识别文本文档中的命名实体(例如，个人、位置和其他信息)。基于机器学习的NER模型越来越多地应用于隐私敏感的应用程序，这些应用程序需要对敏感信息进行自动和可扩展的识别，以编辑文本以进行数据共享。在本文中，我们研究了NER模型作为识别用户文档中敏感信息的黑盒服务时的设置，并表明这些模型容易受到其训练数据集的隶属度推断的影响。使用来自spaCy的更新的预训练NER模型，我们演示了对这些模型的两种不同的成员攻击。我们的第一个攻击利用了神经网络底层神经网络中的意外记忆，这是一种已知的神经网络易受攻击的现象。我们的第二种攻击利用时序侧信道来瞄准维护从训练数据构建的词汇表的NER模型。我们表明，与以前未见过的单词相比，训练数据集中单词的不同功能路径在执行时间上具有可测量的差异。揭示训练样本的成员状态具有明显的隐私含义。我们的实验评估包括对密码和健康数据进行编辑，这既存在安全风险，也存在隐私/监管问题。结果表明，只记住了一个短语，这就加剧了这种情况。在对文本编校用例的第一次攻击中，我们实现了70%的AUC。我们也以99.23%的AUC在第二次定时攻击中取得了压倒性的成功。最后，我们根据成员推理攻击所带来的隐私和安全影响，讨论了实现NER模型安全使用的潜在缓解方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊