Boosting and Differential Privacy

2010 IEEE 51st Annual Symposium on Foundations of Computer Science Pub Date : 2010-10-23 DOI:10.1109/FOCS.2010.12

C. Dwork, G. Rothblum, S. Vadhan

{"title":"Boosting and Differential Privacy","authors":"C. Dwork, G. Rothblum, S. Vadhan","doi":"10.1109/FOCS.2010.12","DOIUrl":null,"url":null,"abstract":"Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct improved {\\em privacy-preserving synopses} of an input database. These are data structures that yield, for a given set $\\Q$ of queries over an input database, reasonably accurate estimates of the responses to every query in~$\\Q$, even when the number of queries is much larger than the number of rows in the database. Given a {\\em base synopsis generator} that takes a distribution on $\\Q$ and produces a ``weak'' synopsis that yields ``good'' answers for a majority of the weight in $\\Q$, our {\\em Boosting for Queries} algorithm obtains a synopsis that is good for all of~$\\Q$. We ensure privacy for the rows of the database, but the boosting is performed on the {\\em queries}. We also provide the first synopsis generators for arbitrary sets of arbitrary low-sensitivity queries, {\\it i.e.}, queries whose answers do not vary much under the addition or deletion of a single row. In the execution of our algorithm certain tasks, each incurring some privacy loss, are performed many times. To analyze the cumulative privacy loss, we obtain an $O(\\eps^2)$ bound on the {\\em expected} privacy loss from a single $\\eps$-\\dfp{} mechanism. Combining this with evolution of confidence arguments from the literature, we get stronger bounds on the expected cumulative privacy loss due to multiple mechanisms, each of which provides $\\eps$-differential privacy or one of its relaxations, and each of which operates on (potentially) different, adaptively chosen, databases.","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"866","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2010.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 866

Abstract

Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct improved {\em privacy-preserving synopses} of an input database. These are data structures that yield, for a given set $\Q$ of queries over an input database, reasonably accurate estimates of the responses to every query in~$\Q$, even when the number of queries is much larger than the number of rows in the database. Given a {\em base synopsis generator} that takes a distribution on $\Q$ and produces a ``weak'' synopsis that yields ``good'' answers for a majority of the weight in $\Q$, our {\em Boosting for Queries} algorithm obtains a synopsis that is good for all of~$\Q$. We ensure privacy for the rows of the database, but the boosting is performed on the {\em queries}. We also provide the first synopsis generators for arbitrary sets of arbitrary low-sensitivity queries, {\it i.e.}, queries whose answers do not vary much under the addition or deletion of a single row. In the execution of our algorithm certain tasks, each incurring some privacy loss, are performed many times. To analyze the cumulative privacy loss, we obtain an $O(\eps^2)$ bound on the {\em expected} privacy loss from a single $\eps$-\dfp{} mechanism. Combining this with evolution of confidence arguments from the literature, we get stronger bounds on the expected cumulative privacy loss due to multiple mechanisms, each of which provides $\eps$-differential privacy or one of its relaxations, and each of which operates on (potentially) different, adaptively chosen, databases.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

增强和区分隐私

增强是提高学习算法准确性的一种通用方法。我们使用增强来构造一个输入数据库的改进的{\em隐私保护概要}。对于输入数据库上给定的查询集$ $\Q$，这些数据结构可以产生对~$ $\Q$中每个查询的响应的合理准确的估计，即使查询的数量远远大于数据库中的行数。给定一个{\em基础概要生成器}，它在$\Q$上取一个分布，并产生一个“弱”概要，该概要对$\Q$中的大部分权重产生“好”答案，我们的{\em Boosting for Queries}算法获得一个对所有~$\Q$都好的概要。我们确保数据库行的私密性，但增强是在{\em查询}上执行的。我们还为任意低灵敏度查询的任意集提供了第一个概要生成器，{\it i.e.}，这些查询的答案在添加或删除单行时不会发生太大变化。在我们算法的执行过程中，某些会导致隐私丢失的任务会被执行多次。为了分析累积隐私损失，我们从单个$\eps$-\dfp{}机制中获得了{\em期望}隐私损失的$O(\eps^2)$界。将其与文献中可信度论证的演变相结合，我们得到了由于多种机制而导致的预期累积隐私损失的更强界限，每种机制都提供了$\eps$差分隐私或其松弛之一，并且每种机制都在(可能)不同的，自适应选择的数据库上运行。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2010 IEEE 51st Annual Symposium on Foundations of Computer Science

自引率

0.00%

发文量

期刊最新文献

On the Computational Complexity of Coin Flipping The Monotone Complexity of k-clique on Random Graphs Local List Decoding with a Constant Number of Queries Agnostically Learning under Permutation Invariant Distributions Pseudorandom Generators for Regular Branching Programs