Understanding Unintended Memorization in Language Models Under Federated Learning

Om Thakkar, Swaroop Indra Ramaswamy, Rajiv Mathews, F. Beaufays
{"title":"Understanding Unintended Memorization in Language Models Under Federated Learning","authors":"Om Thakkar, Swaroop Indra Ramaswamy, Rajiv Mathews, F. Beaufays","doi":"10.18653/V1/2021.PRIVATENLP-1.1","DOIUrl":null,"url":null,"abstract":"Recent works have shown that language models (LMs), e.g., for next word prediction (NWP), have a tendency to memorize rare or unique sequences in the training data. Since useful LMs are often trained on sensitive data, it is critical to identify and mitigate such unintended memorization. Federated Learning (FL) has emerged as a novel framework for large-scale distributed learning tasks. It differs in many aspects from the well-studied central learning setting where all the data is stored at the central server, and minibatch stochastic gradient descent is used to conduct training. This work is motivated by our observation that NWP models trained under FL exhibited remarkably less propensity to such memorization compared to the central learning setting. Thus, we initiate a formal study to understand the effect of different components of FL on unintended memorization in trained NWP models. Our results show that several differing components of FL play an important role in reducing unintended memorization. First, we discover that the clustering of data according to users—which happens by design in FL—has the most significant effect in reducing such memorization. Using the Federated Averaging optimizer with larger effective minibatch sizes for training causes a further reduction. We also demonstrate that training in FL with a user-level differential privacy guarantee results in models that can provide high utility while being resilient to memorizing out-of-distribution phrases with thousands of insertions across over a hundred users in the training set.","PeriodicalId":270632,"journal":{"name":"Proceedings of the Third Workshop on Privacy in Natural Language Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third Workshop on Privacy in Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/V1/2021.PRIVATENLP-1.1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

Abstract

Recent works have shown that language models (LMs), e.g., for next word prediction (NWP), have a tendency to memorize rare or unique sequences in the training data. Since useful LMs are often trained on sensitive data, it is critical to identify and mitigate such unintended memorization. Federated Learning (FL) has emerged as a novel framework for large-scale distributed learning tasks. It differs in many aspects from the well-studied central learning setting where all the data is stored at the central server, and minibatch stochastic gradient descent is used to conduct training. This work is motivated by our observation that NWP models trained under FL exhibited remarkably less propensity to such memorization compared to the central learning setting. Thus, we initiate a formal study to understand the effect of different components of FL on unintended memorization in trained NWP models. Our results show that several differing components of FL play an important role in reducing unintended memorization. First, we discover that the clustering of data according to users—which happens by design in FL—has the most significant effect in reducing such memorization. Using the Federated Averaging optimizer with larger effective minibatch sizes for training causes a further reduction. We also demonstrate that training in FL with a user-level differential privacy guarantee results in models that can provide high utility while being resilient to memorizing out-of-distribution phrases with thousands of insertions across over a hundred users in the training set.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
理解联邦学习下语言模型中的非预期记忆
最近的研究表明,语言模型(LMs),例如下一个单词预测(NWP),倾向于记忆训练数据中的稀有或唯一序列。由于有用的LMs通常是在敏感数据上进行训练的,因此识别和减少这种意外记忆非常重要。联邦学习(FL)是一种用于大规模分布式学习任务的新框架。它在许多方面不同于研究得很好的中央学习设置,在中央学习设置中,所有的数据都存储在中心服务器上,并使用minibatch随机梯度下降进行训练。这项工作的动机是我们观察到,与中心学习设置相比,在FL下训练的NWP模型表现出明显更少的这种记忆倾向。因此,我们启动了一项正式的研究,以了解FL的不同成分对训练后的NWP模型中意外记忆的影响。我们的研究结果表明,FL的几个不同组成部分在减少意外记忆中起着重要作用。首先,我们发现根据用户对数据进行聚类——这在fl中是通过设计实现的——在减少这种记忆方面有最显著的效果。使用具有更大有效小批大小的联邦平均优化器进行训练可以进一步减少。我们还证明,在具有用户级差分隐私保证的FL中进行训练会产生可以提供高效用的模型,同时能够灵活地记忆训练集中超过100个用户中具有数千个插入的分布外短语。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An Investigation towards Differentially Private Sequence Tagging in a Federated Framework Understanding Unintended Memorization in Language Models Under Federated Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1