具有局部差分隐私的多维数据发布

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2023-01-01 DOI:10.48786/edbt.2023.15

Gaoyuan Liu, Peng Tang, Chengyu Hu, Chongshi Jin, Shanqing Guo

{"title":"具有局部差分隐私的多维数据发布","authors":"Gaoyuan Liu, Peng Tang, Chengyu Hu, Chongshi Jin, Shanqing Guo","doi":"10.48786/edbt.2023.15","DOIUrl":null,"url":null,"abstract":"This paper studies the publication of multi-dimensional data with local differential privacy (LDP). This problem raises tremendous challenges in terms of both computational efficiency and data utility. The state-of-the-art solution addresses this problem by first constructing a junction tree (a kind of probabilistic graphical model, PGM) to generate a set of noisy low-dimensional marginals of the input data and then using them to approximate the distribution of the input dataset for synthetic data generation. However, there are two severe limitations in the existing solution, i.e., calculating a large number of attribute pairs’ marginals to construct the PGM and not solving well in calculating the marginal distribution of large cliques in the PGM, which degrade the quality of synthetic data. To address the above deficiencies, based on the sparseness of the constructed PGM and the divisibility of LDP, we first propose an incremental learning-based PGM construction method. In this method, we gradually prune the edges (attribute pairs) with weak correlation and allocate more data and privacy budgets to the useful edges, thereby improving the model’s accuracy. In this method, we introduce a high-precision data accumulation technique and a low-error edge pruning technique. Second, based on joint distribution decomposition and redundancy elimination, we propose a novel marginal calculation method for the large cliques in the context of LDP. Extensive experiments on real datasets demonstrate that our solution offers desirable data utility.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"18 1","pages":"183-194"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Multi-Dimensional Data Publishing With Local Differential Privacy\",\"authors\":\"Gaoyuan Liu, Peng Tang, Chengyu Hu, Chongshi Jin, Shanqing Guo\",\"doi\":\"10.48786/edbt.2023.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper studies the publication of multi-dimensional data with local differential privacy (LDP). This problem raises tremendous challenges in terms of both computational efficiency and data utility. The state-of-the-art solution addresses this problem by first constructing a junction tree (a kind of probabilistic graphical model, PGM) to generate a set of noisy low-dimensional marginals of the input data and then using them to approximate the distribution of the input dataset for synthetic data generation. However, there are two severe limitations in the existing solution, i.e., calculating a large number of attribute pairs’ marginals to construct the PGM and not solving well in calculating the marginal distribution of large cliques in the PGM, which degrade the quality of synthetic data. To address the above deficiencies, based on the sparseness of the constructed PGM and the divisibility of LDP, we first propose an incremental learning-based PGM construction method. In this method, we gradually prune the edges (attribute pairs) with weak correlation and allocate more data and privacy budgets to the useful edges, thereby improving the model’s accuracy. In this method, we introduce a high-precision data accumulation technique and a low-error edge pruning technique. Second, based on joint distribution decomposition and redundancy elimination, we propose a novel marginal calculation method for the large cliques in the context of LDP. Extensive experiments on real datasets demonstrate that our solution offers desirable data utility.\",\"PeriodicalId\":88813,\"journal\":{\"name\":\"Advances in database technology : proceedings. International Conference on Extending Database Technology\",\"volume\":\"18 1\",\"pages\":\"183-194\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in database technology : proceedings. International Conference on Extending Database Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48786/edbt.2023.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in database technology : proceedings. International Conference on Extending Database Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48786/edbt.2023.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

研究了基于局部差分隐私(LDP)的多维数据发布问题。这个问题在计算效率和数据效用方面都提出了巨大的挑战。最先进的解决方案通过首先构建一个连接树(一种概率图形模型，PGM)来生成一组输入数据的噪声低维边缘，然后使用它们来近似输入数据集的分布，以生成合成数据。但是，现有的解决方案存在两个严重的局限性，即计算大量属性对的边际来构造PGM，以及计算PGM中大集团的边际分布不能很好地求解，从而降低了合成数据的质量。针对上述不足，基于构造的PGM的稀疏性和LDP的可整除性，我们首先提出了一种基于增量学习的PGM构造方法。在该方法中，我们逐渐修剪弱相关性的边(属性对)，并将更多的数据和隐私预算分配给有用的边，从而提高模型的准确性。在该方法中，我们引入了高精度的数据积累技术和低误差的边缘修剪技术。其次，基于联合分布分解和冗余消除，提出了一种新的LDP背景下大集团的边际计算方法。在真实数据集上的大量实验表明，我们的解决方案提供了理想的数据效用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multi-Dimensional Data Publishing With Local Differential Privacy

This paper studies the publication of multi-dimensional data with local differential privacy (LDP). This problem raises tremendous challenges in terms of both computational efficiency and data utility. The state-of-the-art solution addresses this problem by first constructing a junction tree (a kind of probabilistic graphical model, PGM) to generate a set of noisy low-dimensional marginals of the input data and then using them to approximate the distribution of the input dataset for synthetic data generation. However, there are two severe limitations in the existing solution, i.e., calculating a large number of attribute pairs’ marginals to construct the PGM and not solving well in calculating the marginal distribution of large cliques in the PGM, which degrade the quality of synthetic data. To address the above deficiencies, based on the sparseness of the constructed PGM and the divisibility of LDP, we first propose an incremental learning-based PGM construction method. In this method, we gradually prune the edges (attribute pairs) with weak correlation and allocate more data and privacy budgets to the useful edges, thereby improving the model’s accuracy. In this method, we introduce a high-precision data accumulation technique and a low-error edge pruning technique. Second, based on joint distribution decomposition and redundancy elimination, we propose a novel marginal calculation method for the large cliques in the context of LDP. Extensive experiments on real datasets demonstrate that our solution offers desirable data utility.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Advances in database technology : proceedings. International Conference on Extending Database Technology

自引率

0.00%

发文量

期刊最新文献

Computing Generic Abstractions from Application Datasets Fair Spatial Indexing: A paradigm for Group Spatial Fairness. Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach Auditing for Spatial Fairness TransEdge: Supporting Efficient Read Queries Across Untrusted Edge Nodes