基于随机混合的差分私有数据集合成

2019 IEEE International Symposium on Information Theory (ISIT) Pub Date : 2019-07-07 DOI:10.1109/ISIT.2019.8849381

Kangwook Lee, Hoon Kim, Kyungmin Lee, Changho Suh, K. Ramchandran

{"title":"基于随机混合的差分私有数据集合成","authors":"Kangwook Lee, Hoon Kim, Kyungmin Lee, Changho Suh, K. Ramchandran","doi":"10.1109/ISIT.2019.8849381","DOIUrl":null,"url":null,"abstract":"The goal of differentially private data publishing is to release a modified dataset so that its privacy can be ensured while allowing for efficient learning. We propose a new data publishing algorithm in which a released dataset is formed by mixing ` randomly chosen data points and then perturbing them with an additive noise. Our privacy analysis shows that as ` increases, noise with smaller variance is sufficient to achieve a target privacy level. In order to quantify the usefulness of our algorithm, we adopt the accuracy of a predictive model trained with our synthetic dataset, which we call the utility of the dataset. By characterizing the utility of our dataset as a function of `, we show that one can learn both linear and nonlinear predictive models so that they yield reasonably good prediction accuracies. Particularly, we show that there exists a sweet spot on ` that maximizes the prediction accuracy given a required privacy level, or vice versa. We also demonstrate that given a target privacy level, our datasets can achieve higher utility than other datasets generated with the existing data publishing algorithms.","PeriodicalId":6708,"journal":{"name":"2019 IEEE International Symposium on Information Theory (ISIT)","volume":"24 1","pages":"542-546"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Synthesizing Differentially Private Datasets using Random Mixing\",\"authors\":\"Kangwook Lee, Hoon Kim, Kyungmin Lee, Changho Suh, K. Ramchandran\",\"doi\":\"10.1109/ISIT.2019.8849381\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of differentially private data publishing is to release a modified dataset so that its privacy can be ensured while allowing for efficient learning. We propose a new data publishing algorithm in which a released dataset is formed by mixing ` randomly chosen data points and then perturbing them with an additive noise. Our privacy analysis shows that as ` increases, noise with smaller variance is sufficient to achieve a target privacy level. In order to quantify the usefulness of our algorithm, we adopt the accuracy of a predictive model trained with our synthetic dataset, which we call the utility of the dataset. By characterizing the utility of our dataset as a function of `, we show that one can learn both linear and nonlinear predictive models so that they yield reasonably good prediction accuracies. Particularly, we show that there exists a sweet spot on ` that maximizes the prediction accuracy given a required privacy level, or vice versa. We also demonstrate that given a target privacy level, our datasets can achieve higher utility than other datasets generated with the existing data publishing algorithms.\",\"PeriodicalId\":6708,\"journal\":{\"name\":\"2019 IEEE International Symposium on Information Theory (ISIT)\",\"volume\":\"24 1\",\"pages\":\"542-546\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Symposium on Information Theory (ISIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISIT.2019.8849381\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Information Theory (ISIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIT.2019.8849381","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

差异私有数据发布的目标是发布修改后的数据集，以便在允许有效学习的同时确保其隐私。我们提出了一种新的数据发布算法，该算法通过混合随机选择的数据点，然后用加性噪声扰动它们来形成发布的数据集。我们的隐私分析表明，随着' '的增加，方差较小的噪声足以达到目标隐私水平。为了量化我们算法的有用性，我们采用了用我们的合成数据集训练的预测模型的准确性，我们称之为数据集的效用。通过将我们的数据集的效用描述为'的函数，我们表明可以学习线性和非线性预测模型，以便它们产生相当好的预测精度。特别是，我们表明，在给定所需的隐私级别时，存在一个“最佳点”，可以最大限度地提高预测准确性，反之亦然。我们还证明，给定目标隐私级别，我们的数据集可以实现比使用现有数据发布算法生成的其他数据集更高的效用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Synthesizing Differentially Private Datasets using Random Mixing

The goal of differentially private data publishing is to release a modified dataset so that its privacy can be ensured while allowing for efficient learning. We propose a new data publishing algorithm in which a released dataset is formed by mixing ` randomly chosen data points and then perturbing them with an additive noise. Our privacy analysis shows that as ` increases, noise with smaller variance is sufficient to achieve a target privacy level. In order to quantify the usefulness of our algorithm, we adopt the accuracy of a predictive model trained with our synthetic dataset, which we call the utility of the dataset. By characterizing the utility of our dataset as a function of `, we show that one can learn both linear and nonlinear predictive models so that they yield reasonably good prediction accuracies. Particularly, we show that there exists a sweet spot on ` that maximizes the prediction accuracy given a required privacy level, or vice versa. We also demonstrate that given a target privacy level, our datasets can achieve higher utility than other datasets generated with the existing data publishing algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Symposium on Information Theory (ISIT)

自引率

0.00%

发文量

期刊最新文献

Gambling and Rényi Divergence Irregular Product Coded Computation for High-Dimensional Matrix Multiplication Error Exponents in Distributed Hypothesis Testing of Correlations Pareto Optimal Schemes in Coded Caching Constrained de Bruijn Codes and their Applications