Making the Most of Parallel Composition in Differential Privacy

Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium Pub Date : 2021-09-19 DOI:10.2478/popets-2022-0013

Joshua Smith, H. Asghar, Gianpaolo Gioiosa, Sirine Mrabet, Serge Gaspers, P. Tyler

{"title":"Making the Most of Parallel Composition in Differential Privacy","authors":"Joshua Smith, H. Asghar, Gianpaolo Gioiosa, Sirine Mrabet, Serge Gaspers, P. Tyler","doi":"10.2478/popets-2022-0013","DOIUrl":null,"url":null,"abstract":"Abstract We show that the ‘optimal’ use of the parallel composition theorem corresponds to finding the size of the largest subset of queries that ‘overlap’ on the data domain, a quantity we call the maximum overlap of the queries. It has previously been shown that a certain instance of this problem, formulated in terms of determining the sensitivity of the queries, is NP-hard, but also that it is possible to use graph-theoretic algorithms, such as finding the maximum clique, to approximate query sensitivity. In this paper, we consider a significant generalization of the aforementioned instance which encompasses both a wider range of differentially private mechanisms and a broader class of queries. We show that for a particular class of predicate queries, determining if they are disjoint can be done in time polynomial in the number of attributes. For this class, we show that the maximum overlap problem remains NP-hard as a function of the number of queries. However, we show that efficient approximate solutions exist by relating maximum overlap to the clique and chromatic numbers of a certain graph determined by the queries. The link to chromatic number allows us to use more efficient approximate algorithms, which cannot be done for the clique number as it may underestimate the privacy budget. Our approach is defined in the general setting of f-differential privacy, which subsumes standard pure differential privacy and Gaussian differential privacy. We prove the parallel composition theorem for f-differential privacy. We evaluate our approach on synthetic and real-world data sets of queries. We show that the approach can scale to large domain sizes (up to 1020000), and that its application can reduce the noise added to query answers by up to 60%.","PeriodicalId":74556,"journal":{"name":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","volume":"2022 1","pages":"253 - 273"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings on Privacy Enhancing Technologies. Privacy Enhancing Technologies Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/popets-2022-0013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Abstract We show that the ‘optimal’ use of the parallel composition theorem corresponds to finding the size of the largest subset of queries that ‘overlap’ on the data domain, a quantity we call the maximum overlap of the queries. It has previously been shown that a certain instance of this problem, formulated in terms of determining the sensitivity of the queries, is NP-hard, but also that it is possible to use graph-theoretic algorithms, such as finding the maximum clique, to approximate query sensitivity. In this paper, we consider a significant generalization of the aforementioned instance which encompasses both a wider range of differentially private mechanisms and a broader class of queries. We show that for a particular class of predicate queries, determining if they are disjoint can be done in time polynomial in the number of attributes. For this class, we show that the maximum overlap problem remains NP-hard as a function of the number of queries. However, we show that efficient approximate solutions exist by relating maximum overlap to the clique and chromatic numbers of a certain graph determined by the queries. The link to chromatic number allows us to use more efficient approximate algorithms, which cannot be done for the clique number as it may underestimate the privacy budget. Our approach is defined in the general setting of f-differential privacy, which subsumes standard pure differential privacy and Gaussian differential privacy. We prove the parallel composition theorem for f-differential privacy. We evaluate our approach on synthetic and real-world data sets of queries. We show that the approach can scale to large domain sizes (up to 1020000), and that its application can reduce the noise added to query answers by up to 60%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

充分利用差异隐私中的平行作文

我们证明了并行组合定理的“最佳”使用对应于找到在数据域上“重叠”的查询的最大子集的大小，我们称之为查询的最大重叠量。以前已经表明，这个问题的某个实例(根据确定查询的灵敏度来表述)是np困难的，但也可以使用图论算法，例如找到最大团，来近似查询灵敏度。在本文中，我们考虑了上述实例的一个重要概括，它包含了更广泛的差异私有机制和更广泛的查询类别。我们表明，对于一类特定的谓词查询，确定它们是否不相交可以在属性数量的时间多项式中完成。对于这个类，我们证明了最大重叠问题仍然是NP-hard，作为查询数量的函数。然而，我们通过将最大重叠与查询确定的某个图的团数和色数联系起来，证明存在有效的近似解。与色数的联系使我们能够使用更有效的近似算法，而对于团数不能这样做，因为它可能低估了隐私预算。我们的方法是在f-微分隐私的一般设置下定义的，它包括标准纯微分隐私和高斯微分隐私。证明了f微分隐私的平行复合定理。我们在合成和真实世界的查询数据集上评估我们的方法。我们表明，该方法可以扩展到大的域大小(高达1020000)，并且它的应用可以减少查询答案中添加的噪声高达60%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊