{"title":"Boosting and Differential Privacy","authors":"C. Dwork, G. Rothblum, S. Vadhan","doi":"10.1109/FOCS.2010.12","DOIUrl":null,"url":null,"abstract":"Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct improved {\\em privacy-preserving synopses} of an input database. These are data structures that yield, for a given set $\\Q$ of queries over an input database, reasonably accurate estimates of the responses to every query in~$\\Q$, even when the number of queries is much larger than the number of rows in the database. Given a {\\em base synopsis generator} that takes a distribution on $\\Q$ and produces a ``weak'' synopsis that yields ``good'' answers for a majority of the weight in $\\Q$, our {\\em Boosting for Queries} algorithm obtains a synopsis that is good for all of~$\\Q$. We ensure privacy for the rows of the database, but the boosting is performed on the {\\em queries}. We also provide the first synopsis generators for arbitrary sets of arbitrary low-sensitivity queries, {\\it i.e.}, queries whose answers do not vary much under the addition or deletion of a single row. In the execution of our algorithm certain tasks, each incurring some privacy loss, are performed many times. To analyze the cumulative privacy loss, we obtain an $O(\\eps^2)$ bound on the {\\em expected} privacy loss from a single $\\eps$-\\dfp{} mechanism. Combining this with evolution of confidence arguments from the literature, we get stronger bounds on the expected cumulative privacy loss due to multiple mechanisms, each of which provides $\\eps$-differential privacy or one of its relaxations, and each of which operates on (potentially) different, adaptively chosen, databases.","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"866","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2010.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 866
Abstract
Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct improved {\em privacy-preserving synopses} of an input database. These are data structures that yield, for a given set $\Q$ of queries over an input database, reasonably accurate estimates of the responses to every query in~$\Q$, even when the number of queries is much larger than the number of rows in the database. Given a {\em base synopsis generator} that takes a distribution on $\Q$ and produces a ``weak'' synopsis that yields ``good'' answers for a majority of the weight in $\Q$, our {\em Boosting for Queries} algorithm obtains a synopsis that is good for all of~$\Q$. We ensure privacy for the rows of the database, but the boosting is performed on the {\em queries}. We also provide the first synopsis generators for arbitrary sets of arbitrary low-sensitivity queries, {\it i.e.}, queries whose answers do not vary much under the addition or deletion of a single row. In the execution of our algorithm certain tasks, each incurring some privacy loss, are performed many times. To analyze the cumulative privacy loss, we obtain an $O(\eps^2)$ bound on the {\em expected} privacy loss from a single $\eps$-\dfp{} mechanism. Combining this with evolution of confidence arguments from the literature, we get stronger bounds on the expected cumulative privacy loss due to multiple mechanisms, each of which provides $\eps$-differential privacy or one of its relaxations, and each of which operates on (potentially) different, adaptively chosen, databases.