March Boedihardjo, Thomas Strohmer, Roman Vershynin
{"title":"私人措施、随机漫步和合成数据","authors":"March Boedihardjo, Thomas Strohmer, Roman Vershynin","doi":"10.1007/s00440-024-01279-z","DOIUrl":null,"url":null,"abstract":"<p>Differential privacy is a mathematical concept that provides an information-theoretic security guarantee. While differential privacy has emerged as a de facto standard for guaranteeing privacy in data sharing, the known mechanisms to achieve it come with some serious limitations. Utility guarantees are usually provided only for a fixed, a priori specified set of queries. Moreover, there are no utility guarantees for more complex—but very common—machine learning tasks such as clustering or classification. In this paper we overcome some of these limitations. Working with metric privacy, a powerful generalization of differential privacy, we develop a polynomial-time algorithm that creates a <i>private measure</i> from a data set. This private measure allows us to efficiently construct private synthetic data that are accurate for a wide range of statistical analysis tools. Moreover, we prove an asymptotically sharp min-max result for private measures and synthetic data in general compact metric spaces, for any fixed privacy budget <span>\\(\\varepsilon \\)</span> bounded away from zero. A key ingredient in our construction is a new <i>superregular random walk</i>, whose joint distribution of steps is as regular as that of independent random variables, yet which deviates from the origin logarithmically slowly.\n</p>","PeriodicalId":20527,"journal":{"name":"Probability Theory and Related Fields","volume":"11 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Private measures, random walks, and synthetic data\",\"authors\":\"March Boedihardjo, Thomas Strohmer, Roman Vershynin\",\"doi\":\"10.1007/s00440-024-01279-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Differential privacy is a mathematical concept that provides an information-theoretic security guarantee. While differential privacy has emerged as a de facto standard for guaranteeing privacy in data sharing, the known mechanisms to achieve it come with some serious limitations. Utility guarantees are usually provided only for a fixed, a priori specified set of queries. Moreover, there are no utility guarantees for more complex—but very common—machine learning tasks such as clustering or classification. In this paper we overcome some of these limitations. Working with metric privacy, a powerful generalization of differential privacy, we develop a polynomial-time algorithm that creates a <i>private measure</i> from a data set. This private measure allows us to efficiently construct private synthetic data that are accurate for a wide range of statistical analysis tools. Moreover, we prove an asymptotically sharp min-max result for private measures and synthetic data in general compact metric spaces, for any fixed privacy budget <span>\\\\(\\\\varepsilon \\\\)</span> bounded away from zero. A key ingredient in our construction is a new <i>superregular random walk</i>, whose joint distribution of steps is as regular as that of independent random variables, yet which deviates from the origin logarithmically slowly.\\n</p>\",\"PeriodicalId\":20527,\"journal\":{\"name\":\"Probability Theory and Related Fields\",\"volume\":\"11 1\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Probability Theory and Related Fields\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s00440-024-01279-z\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Probability Theory and Related Fields","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s00440-024-01279-z","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
Private measures, random walks, and synthetic data
Differential privacy is a mathematical concept that provides an information-theoretic security guarantee. While differential privacy has emerged as a de facto standard for guaranteeing privacy in data sharing, the known mechanisms to achieve it come with some serious limitations. Utility guarantees are usually provided only for a fixed, a priori specified set of queries. Moreover, there are no utility guarantees for more complex—but very common—machine learning tasks such as clustering or classification. In this paper we overcome some of these limitations. Working with metric privacy, a powerful generalization of differential privacy, we develop a polynomial-time algorithm that creates a private measure from a data set. This private measure allows us to efficiently construct private synthetic data that are accurate for a wide range of statistical analysis tools. Moreover, we prove an asymptotically sharp min-max result for private measures and synthetic data in general compact metric spaces, for any fixed privacy budget \(\varepsilon \) bounded away from zero. A key ingredient in our construction is a new superregular random walk, whose joint distribution of steps is as regular as that of independent random variables, yet which deviates from the origin logarithmically slowly.
期刊介绍:
Probability Theory and Related Fields publishes research papers in modern probability theory and its various fields of application. Thus, subjects of interest include: mathematical statistical physics, mathematical statistics, mathematical biology, theoretical computer science, and applications of probability theory to other areas of mathematics such as combinatorics, analysis, ergodic theory and geometry. Survey papers on emerging areas of importance may be considered for publication. The main languages of publication are English, French and German.