{"title":"增强和区分隐私","authors":"C. Dwork, G. Rothblum, S. Vadhan","doi":"10.1109/FOCS.2010.12","DOIUrl":null,"url":null,"abstract":"Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct improved {\\em privacy-preserving synopses} of an input database. These are data structures that yield, for a given set $\\Q$ of queries over an input database, reasonably accurate estimates of the responses to every query in~$\\Q$, even when the number of queries is much larger than the number of rows in the database. Given a {\\em base synopsis generator} that takes a distribution on $\\Q$ and produces a ``weak'' synopsis that yields ``good'' answers for a majority of the weight in $\\Q$, our {\\em Boosting for Queries} algorithm obtains a synopsis that is good for all of~$\\Q$. We ensure privacy for the rows of the database, but the boosting is performed on the {\\em queries}. We also provide the first synopsis generators for arbitrary sets of arbitrary low-sensitivity queries, {\\it i.e.}, queries whose answers do not vary much under the addition or deletion of a single row. In the execution of our algorithm certain tasks, each incurring some privacy loss, are performed many times. To analyze the cumulative privacy loss, we obtain an $O(\\eps^2)$ bound on the {\\em expected} privacy loss from a single $\\eps$-\\dfp{} mechanism. Combining this with evolution of confidence arguments from the literature, we get stronger bounds on the expected cumulative privacy loss due to multiple mechanisms, each of which provides $\\eps$-differential privacy or one of its relaxations, and each of which operates on (potentially) different, adaptively chosen, databases.","PeriodicalId":228365,"journal":{"name":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"866","resultStr":"{\"title\":\"Boosting and Differential Privacy\",\"authors\":\"C. Dwork, G. Rothblum, S. Vadhan\",\"doi\":\"10.1109/FOCS.2010.12\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct improved {\\\\em privacy-preserving synopses} of an input database. These are data structures that yield, for a given set $\\\\Q$ of queries over an input database, reasonably accurate estimates of the responses to every query in~$\\\\Q$, even when the number of queries is much larger than the number of rows in the database. Given a {\\\\em base synopsis generator} that takes a distribution on $\\\\Q$ and produces a ``weak'' synopsis that yields ``good'' answers for a majority of the weight in $\\\\Q$, our {\\\\em Boosting for Queries} algorithm obtains a synopsis that is good for all of~$\\\\Q$. We ensure privacy for the rows of the database, but the boosting is performed on the {\\\\em queries}. We also provide the first synopsis generators for arbitrary sets of arbitrary low-sensitivity queries, {\\\\it i.e.}, queries whose answers do not vary much under the addition or deletion of a single row. In the execution of our algorithm certain tasks, each incurring some privacy loss, are performed many times. To analyze the cumulative privacy loss, we obtain an $O(\\\\eps^2)$ bound on the {\\\\em expected} privacy loss from a single $\\\\eps$-\\\\dfp{} mechanism. Combining this with evolution of confidence arguments from the literature, we get stronger bounds on the expected cumulative privacy loss due to multiple mechanisms, each of which provides $\\\\eps$-differential privacy or one of its relaxations, and each of which operates on (potentially) different, adaptively chosen, databases.\",\"PeriodicalId\":228365,\"journal\":{\"name\":\"2010 IEEE 51st Annual Symposium on Foundations of Computer Science\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"866\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 51st Annual Symposium on Foundations of Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FOCS.2010.12\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 51st Annual Symposium on Foundations of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FOCS.2010.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 866
摘要
增强是提高学习算法准确性的一种通用方法。我们使用增强来构造一个输入数据库的改进的{\em隐私保护概要}。对于输入数据库上给定的查询集$ $\Q$,这些数据结构可以产生对~$ $\Q$中每个查询的响应的合理准确的估计,即使查询的数量远远大于数据库中的行数。给定一个{\em基础概要生成器},它在$\Q$上取一个分布,并产生一个“弱”概要,该概要对$\Q$中的大部分权重产生“好”答案,我们的{\em Boosting for Queries}算法获得一个对所有~$\Q$都好的概要。我们确保数据库行的私密性,但增强是在{\em查询}上执行的。我们还为任意低灵敏度查询的任意集提供了第一个概要生成器,{\it i.e.},这些查询的答案在添加或删除单行时不会发生太大变化。在我们算法的执行过程中,某些会导致隐私丢失的任务会被执行多次。为了分析累积隐私损失,我们从单个$\eps$-\dfp{}机制中获得了{\em期望}隐私损失的$O(\eps^2)$界。将其与文献中可信度论证的演变相结合,我们得到了由于多种机制而导致的预期累积隐私损失的更强界限,每种机制都提供了$\eps$差分隐私或其松弛之一,并且每种机制都在(可能)不同的,自适应选择的数据库上运行。
Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct improved {\em privacy-preserving synopses} of an input database. These are data structures that yield, for a given set $\Q$ of queries over an input database, reasonably accurate estimates of the responses to every query in~$\Q$, even when the number of queries is much larger than the number of rows in the database. Given a {\em base synopsis generator} that takes a distribution on $\Q$ and produces a ``weak'' synopsis that yields ``good'' answers for a majority of the weight in $\Q$, our {\em Boosting for Queries} algorithm obtains a synopsis that is good for all of~$\Q$. We ensure privacy for the rows of the database, but the boosting is performed on the {\em queries}. We also provide the first synopsis generators for arbitrary sets of arbitrary low-sensitivity queries, {\it i.e.}, queries whose answers do not vary much under the addition or deletion of a single row. In the execution of our algorithm certain tasks, each incurring some privacy loss, are performed many times. To analyze the cumulative privacy loss, we obtain an $O(\eps^2)$ bound on the {\em expected} privacy loss from a single $\eps$-\dfp{} mechanism. Combining this with evolution of confidence arguments from the literature, we get stronger bounds on the expected cumulative privacy loss due to multiple mechanisms, each of which provides $\eps$-differential privacy or one of its relaxations, and each of which operates on (potentially) different, adaptively chosen, databases.