How Private is Machine Learning?

Nicolas Carlini
{"title":"How Private is Machine Learning?","authors":"Nicolas Carlini","doi":"10.1145/3437880.3458440","DOIUrl":null,"url":null,"abstract":"A machine learning model is private if it doesn't reveal (too much) about its training data. This three-part talk examines to what extent current networks are private. Standard models are not private. We develop an attack that extracts rare training examples (for example, individual people's names, phone numbers, or addresses) out of GPT-2, a language model trained on gigabytes of text from the Internet [2]. As a result there is a clear need for training models with privacy-preserving techniques. We show that InstaHide, a recent candidate, is not private. We develop a complete break of this scheme and can again recover verbatim inputs [1]. Fortunately, there exists provably-correct \"differentiallyprivate\" training that guarantees no adversary could ever succeed at the above attacks. We develop techniques to that allow us to empirically evaluate the privacy offered by such schemes, and find they may be more private than can be proven formally [3].","PeriodicalId":120300,"journal":{"name":"Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3437880.3458440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

A machine learning model is private if it doesn't reveal (too much) about its training data. This three-part talk examines to what extent current networks are private. Standard models are not private. We develop an attack that extracts rare training examples (for example, individual people's names, phone numbers, or addresses) out of GPT-2, a language model trained on gigabytes of text from the Internet [2]. As a result there is a clear need for training models with privacy-preserving techniques. We show that InstaHide, a recent candidate, is not private. We develop a complete break of this scheme and can again recover verbatim inputs [1]. Fortunately, there exists provably-correct "differentiallyprivate" training that guarantees no adversary could ever succeed at the above attacks. We develop techniques to that allow us to empirically evaluate the privacy offered by such schemes, and find they may be more private than can be proven formally [3].
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
机器学习有多私密?
如果机器学习模型不透露(太多)其训练数据,那么它就是私有的。这个由三部分组成的演讲考察了当前网络在多大程度上是私有的。标准模型不是私有的。我们开发了一种攻击,从GPT-2中提取罕见的训练示例(例如,个人的姓名,电话号码或地址),GPT-2是一种基于来自互联网的千兆字节文本训练的语言模型[2]。因此,显然需要使用隐私保护技术来训练模型。我们表明InstaHide,最近的候选人,并不是私有的。我们开发了一种完全打破该方案的方法,可以再次恢复逐字输入[1]。幸运的是,存在着可证明正确的“差异化私有”训练,可以保证没有对手能够在上述攻击中成功。我们开发了一些技术,使我们能够经验性地评估这些方案所提供的隐私性,并发现它们可能比正式证明的更隐私[3]。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
General Requirements on Synthetic Fingerprint Images for Biometric Authentication and Forensic Investigations Information Hiding in Cyber Physical Systems: Challenges for Embedding, Retrieval and Detection using Sensor Data of the SWAT Dataset On the Robustness of Backdoor-based Watermarking in Deep Neural Networks Banners: Binarized Neural Networks with Replicated Secret Sharing Meta and Media Data Stream Forensics in the Encrypted Domain of Video Conferences
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1