综合数据与微数据样本的效用与披露风险比较

Privacy in statistical databases. PSD (Conference : 2004- ) Pub Date : 2022-07-02 DOI:10.48550/arXiv.2207.03339

C. Little, M. Elliot, R. Allmendinger

{"title":"综合数据与微数据样本的效用与披露风险比较","authors":"C. Little, M. Elliot, R. Allmendinger","doi":"10.48550/arXiv.2207.03339","DOIUrl":null,"url":null,"abstract":"Most statistical agencies release randomly selected samples of Census microdata, usually with sample fractions under 10% and with other forms of statistical disclosure control (SDC) applied. An alternative to SDC is data synthesis, which has been attracting growing interest, yet there is no clear consensus on how to measure the associated utility and disclosure risk of the data. The ability to produce synthetic Census microdata, where the utility and associated risks are clearly understood, could mean that more timely and wider-ranging access to microdata would be possible. This paper follows on from previous work by the authors which mapped synthetic Census data on a risk-utility (R-U) map. The paper presents a framework to measure the utility and disclosure risk of synthetic data by comparing it to samples of the original data of varying sample fractions, thereby identifying the sample fraction which has equivalent utility and risk to the synthetic data. Three commonly used data synthesis packages are compared with some interesting results. Further work is needed in several directions but the methodology looks very promising.","PeriodicalId":91946,"journal":{"name":"Privacy in statistical databases. PSD (Conference : 2004- )","volume":"117 1","pages":"234-249"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Comparing the Utility and Disclosure Risk of Synthetic Data with Samples of Microdata\",\"authors\":\"C. Little, M. Elliot, R. Allmendinger\",\"doi\":\"10.48550/arXiv.2207.03339\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most statistical agencies release randomly selected samples of Census microdata, usually with sample fractions under 10% and with other forms of statistical disclosure control (SDC) applied. An alternative to SDC is data synthesis, which has been attracting growing interest, yet there is no clear consensus on how to measure the associated utility and disclosure risk of the data. The ability to produce synthetic Census microdata, where the utility and associated risks are clearly understood, could mean that more timely and wider-ranging access to microdata would be possible. This paper follows on from previous work by the authors which mapped synthetic Census data on a risk-utility (R-U) map. The paper presents a framework to measure the utility and disclosure risk of synthetic data by comparing it to samples of the original data of varying sample fractions, thereby identifying the sample fraction which has equivalent utility and risk to the synthetic data. Three commonly used data synthesis packages are compared with some interesting results. Further work is needed in several directions but the methodology looks very promising.\",\"PeriodicalId\":91946,\"journal\":{\"name\":\"Privacy in statistical databases. PSD (Conference : 2004- )\",\"volume\":\"117 1\",\"pages\":\"234-249\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Privacy in statistical databases. PSD (Conference : 2004- )\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2207.03339\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Privacy in statistical databases. PSD (Conference : 2004- )","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2207.03339","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

大多数统计机构发布随机抽取的普查微数据样本，样本比例通常在10%以下，并采用其他形式的统计披露控制(SDC)。SDC的另一种替代方案是数据综合，这引起了越来越多的兴趣，但在如何衡量数据的相关效用和披露风险方面尚无明确的共识。生成综合普查微数据的能力，其中的效用和相关风险是清楚了解的，这可能意味着更及时和更广泛地获取微数据是可能的。本文继承了前人在风险效用(R-U)图上绘制人口普查综合数据的工作。本文提出了一个框架，通过将合成数据与不同样本分数的原始数据的样本进行比较，来衡量合成数据的效用和披露风险，从而识别出与合成数据具有同等效用和风险的样本分数。比较了三种常用的数据合成包，得到了一些有趣的结果。在几个方向上还需要进一步的工作，但这种方法看起来很有前途。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comparing the Utility and Disclosure Risk of Synthetic Data with Samples of Microdata

Most statistical agencies release randomly selected samples of Census microdata, usually with sample fractions under 10% and with other forms of statistical disclosure control (SDC) applied. An alternative to SDC is data synthesis, which has been attracting growing interest, yet there is no clear consensus on how to measure the associated utility and disclosure risk of the data. The ability to produce synthetic Census microdata, where the utility and associated risks are clearly understood, could mean that more timely and wider-ranging access to microdata would be possible. This paper follows on from previous work by the authors which mapped synthetic Census data on a risk-utility (R-U) map. The paper presents a framework to measure the utility and disclosure risk of synthetic data by comparing it to samples of the original data of varying sample fractions, thereby identifying the sample fraction which has equivalent utility and risk to the synthetic data. Three commonly used data synthesis packages are compared with some interesting results. Further work is needed in several directions but the methodology looks very promising.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Privacy in statistical databases. PSD (Conference : 2004- )

自引率

0.00%

发文量