用遗传算法生成私有合成数据

Terrance Liu, Jingwu Tang, Giuseppe Vietri, Zhiwei Steven Wu
{"title":"用遗传算法生成私有合成数据","authors":"Terrance Liu, Jingwu Tang, Giuseppe Vietri, Zhiwei Steven Wu","doi":"arxiv-2306.03257","DOIUrl":null,"url":null,"abstract":"We study the problem of efficiently generating differentially private\nsynthetic data that approximate the statistical properties of an underlying\nsensitive dataset. In recent years, there has been a growing line of work that\napproaches this problem using first-order optimization techniques. However,\nsuch techniques are restricted to optimizing differentiable objectives only,\nseverely limiting the types of analyses that can be conducted. For example,\nfirst-order mechanisms have been primarily successful in approximating\nstatistical queries only in the form of marginals for discrete data domains. In\nsome cases, one can circumvent such issues by relaxing the task's objective to\nmaintain differentiability. However, even when possible, these approaches\nimpose a fundamental limitation in which modifications to the minimization\nproblem become additional sources of error. Therefore, we propose Private-GSD,\na private genetic algorithm based on zeroth-order optimization heuristics that\ndo not require modifying the original objective. As a result, it avoids the\naforementioned limitations of first-order optimization. We empirically evaluate\nPrivate-GSD against baseline algorithms on data derived from the American\nCommunity Survey across a variety of statistics--otherwise known as statistical\nqueries--both for discrete and real-valued attributes. We show that Private-GSD\noutperforms the state-of-the-art methods on non-differential queries while\nmatching accuracy in approximating differentiable ones.","PeriodicalId":501310,"journal":{"name":"arXiv - CS - Other Computer Science","volume":"238 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generating Private Synthetic Data with Genetic Algorithms\",\"authors\":\"Terrance Liu, Jingwu Tang, Giuseppe Vietri, Zhiwei Steven Wu\",\"doi\":\"arxiv-2306.03257\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the problem of efficiently generating differentially private\\nsynthetic data that approximate the statistical properties of an underlying\\nsensitive dataset. In recent years, there has been a growing line of work that\\napproaches this problem using first-order optimization techniques. However,\\nsuch techniques are restricted to optimizing differentiable objectives only,\\nseverely limiting the types of analyses that can be conducted. For example,\\nfirst-order mechanisms have been primarily successful in approximating\\nstatistical queries only in the form of marginals for discrete data domains. In\\nsome cases, one can circumvent such issues by relaxing the task's objective to\\nmaintain differentiability. However, even when possible, these approaches\\nimpose a fundamental limitation in which modifications to the minimization\\nproblem become additional sources of error. Therefore, we propose Private-GSD,\\na private genetic algorithm based on zeroth-order optimization heuristics that\\ndo not require modifying the original objective. As a result, it avoids the\\naforementioned limitations of first-order optimization. We empirically evaluate\\nPrivate-GSD against baseline algorithms on data derived from the American\\nCommunity Survey across a variety of statistics--otherwise known as statistical\\nqueries--both for discrete and real-valued attributes. We show that Private-GSD\\noutperforms the state-of-the-art methods on non-differential queries while\\nmatching accuracy in approximating differentiable ones.\",\"PeriodicalId\":501310,\"journal\":{\"name\":\"arXiv - CS - Other Computer Science\",\"volume\":\"238 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Other Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2306.03257\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Other Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2306.03257","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们研究了有效地生成近似底层敏感数据集的统计属性的差分私有合成数据的问题。近年来,有越来越多的研究使用一阶优化技术来解决这个问题。然而,这些技术仅限于优化可微分目标,严重限制了可以进行的分析类型。例如,一阶机制主要成功地近似于离散数据域的边际形式的统计查询。在某些情况下,可以通过放松任务的目标来保持可微分性来规避这些问题。然而,即使在可能的情况下,这些方法也有一个基本的限制,即对最小化问题的修改成为额外的误差来源。因此,我们提出了private - gsd,一种不需要修改原始目标的基于零阶优化启发式的私有遗传算法。因此,它避免了上述一阶优化的局限性。我们根据来自美国社区调查(AmericanCommunity Survey)的各种统计数据(也称为统计查询)的基线算法对private - gsd进行了经验评估,这些数据包括离散和实值属性。我们表明private - gsd在非微分查询上优于最先进的方法,同时在近似可微分查询时匹配精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Generating Private Synthetic Data with Genetic Algorithms
We study the problem of efficiently generating differentially private synthetic data that approximate the statistical properties of an underlying sensitive dataset. In recent years, there has been a growing line of work that approaches this problem using first-order optimization techniques. However, such techniques are restricted to optimizing differentiable objectives only, severely limiting the types of analyses that can be conducted. For example, first-order mechanisms have been primarily successful in approximating statistical queries only in the form of marginals for discrete data domains. In some cases, one can circumvent such issues by relaxing the task's objective to maintain differentiability. However, even when possible, these approaches impose a fundamental limitation in which modifications to the minimization problem become additional sources of error. Therefore, we propose Private-GSD, a private genetic algorithm based on zeroth-order optimization heuristics that do not require modifying the original objective. As a result, it avoids the aforementioned limitations of first-order optimization. We empirically evaluate Private-GSD against baseline algorithms on data derived from the American Community Survey across a variety of statistics--otherwise known as statistical queries--both for discrete and real-valued attributes. We show that Private-GSD outperforms the state-of-the-art methods on non-differential queries while matching accuracy in approximating differentiable ones.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Artificial Intelligence-based Smart Port Logistics Metaverse for Enhancing Productivity, Environment, and Safety in Port Logistics: A Case Study of Busan Port Evaluating the Usability of Qualified Electronic Signatures: Systematized Use Cases and Design Paradigms A Brief Discussion on the Philosophical Principles and Development Directions of Data Circulation Predicting Star Scientists in the Field of Artificial Intelligence: A Machine Learning Approach A Match Made in Semantics: Physics-infused Digital Twins for Smart Building Automation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1