PrivBayes

Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, D. Srivastava, Xiaokui Xiao
{"title":"PrivBayes","authors":"Jun Zhang, Graham Cormode, Cecilia M. Procopiuc, D. Srivastava, Xiaokui Xiao","doi":"10.1145/3134428","DOIUrl":null,"url":null,"abstract":"Privacy-preserving data publishing is an important problem that has been the focus of extensive study. The state-of-the-art solution for this problem is differential privacy, which offers a strong degree of privacy protection without making restrictive assumptions about the adversary. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional data. In particular, when the input dataset contains a large number of attributes, existing methods require injecting a prohibitive amount of noise compared to the signal in the data, which renders the published data next to useless. To address the deficiency of the existing methods, this paper presents PrivBayes, a differentially private method for releasing high-dimensional data. Given a dataset D, PrivBayes first constructs a Bayesian network N, which (i) provides a succinct model of the correlations among the attributes in D and (ii) allows us to approximate the distribution of data in D using a set P of low-dimensional marginals of D. After that, PrivBayes injects noise into each marginal in P to ensure differential privacy and then uses the noisy marginals and the Bayesian network to construct an approximation of the data distribution in D. Finally, PrivBayes samples tuples from the approximate distribution to construct a synthetic dataset, and then releases the synthetic data. Intuitively, PrivBayes circumvents the curse of dimensionality, as it injects noise into the low-dimensional marginals in P instead of the high-dimensional dataset D. Private construction of Bayesian networks turns out to be significantly challenging, and we introduce a novel approach that uses a surrogate function for mutual information to build the model more accurately. We experimentally evaluate PrivBayes on real data and demonstrate that it significantly outperforms existing solutions in terms of accuracy.","PeriodicalId":6983,"journal":{"name":"ACM Transactions on Database Systems (TODS)","volume":"2 1","pages":"1 - 41"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"103","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems (TODS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3134428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 103

Abstract

Privacy-preserving data publishing is an important problem that has been the focus of extensive study. The state-of-the-art solution for this problem is differential privacy, which offers a strong degree of privacy protection without making restrictive assumptions about the adversary. Existing techniques using differential privacy, however, cannot effectively handle the publication of high-dimensional data. In particular, when the input dataset contains a large number of attributes, existing methods require injecting a prohibitive amount of noise compared to the signal in the data, which renders the published data next to useless. To address the deficiency of the existing methods, this paper presents PrivBayes, a differentially private method for releasing high-dimensional data. Given a dataset D, PrivBayes first constructs a Bayesian network N, which (i) provides a succinct model of the correlations among the attributes in D and (ii) allows us to approximate the distribution of data in D using a set P of low-dimensional marginals of D. After that, PrivBayes injects noise into each marginal in P to ensure differential privacy and then uses the noisy marginals and the Bayesian network to construct an approximation of the data distribution in D. Finally, PrivBayes samples tuples from the approximate distribution to construct a synthetic dataset, and then releases the synthetic data. Intuitively, PrivBayes circumvents the curse of dimensionality, as it injects noise into the low-dimensional marginals in P instead of the high-dimensional dataset D. Private construction of Bayesian networks turns out to be significantly challenging, and we introduce a novel approach that uses a surrogate function for mutual information to build the model more accurately. We experimentally evaluate PrivBayes on real data and demonstrate that it significantly outperforms existing solutions in terms of accuracy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
保护隐私的数据发布是一个重要的问题,一直是广泛研究的焦点。这个问题的最先进的解决方案是差分隐私,它提供了很强的隐私保护程度,而不需要对对手进行限制性假设。然而,使用差分隐私的现有技术不能有效地处理高维数据的发布。特别是,当输入数据集包含大量属性时,与数据中的信号相比,现有方法需要注入大量的噪声,这使得发布的数据几乎无用。针对现有方法的不足,本文提出了一种用于高维数据发布的差分私有方法PrivBayes。给定一个数据集D, PrivBayes首先构造贝叶斯网络N,(我)提供了一个简洁的模型的属性之间的相关性在D和(2)允许我们近似分布的数据在低维D使用一组P D的人之后,PrivBayes注入噪声在P,确保每个边际微分隐私,然后使用嘈杂的不着边际和贝叶斯网络构造一个近似的数据分布在D .最后,PrivBayes从近似分布中抽取元组来构建合成数据集,然后释放合成数据。直观地说,PrivBayes规避了维数的诅咒,因为它将噪声注入到P中的低维边缘而不是高维数据集d中。贝叶斯网络的私有构建被证明是非常具有挑战性的,我们引入了一种新的方法,使用替代函数来获取相互信息,以更准确地构建模型。我们在真实数据上对PrivBayes进行了实验评估,并证明它在准确性方面明显优于现有的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
On Finding Rank Regret Representatives Answering (Unions of) Conjunctive Queries using Random Access and Random-Order Enumeration Persistent Summaries Influence Maximization Revisited: Efficient Sampling with Bound Tightened The Space-Efficient Core of Vadalog
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1