综合数据与微数据样本的效用与披露风险比较

C. Little, M. Elliot, R. Allmendinger
{"title":"综合数据与微数据样本的效用与披露风险比较","authors":"C. Little, M. Elliot, R. Allmendinger","doi":"10.48550/arXiv.2207.03339","DOIUrl":null,"url":null,"abstract":"Most statistical agencies release randomly selected samples of Census microdata, usually with sample fractions under 10% and with other forms of statistical disclosure control (SDC) applied. An alternative to SDC is data synthesis, which has been attracting growing interest, yet there is no clear consensus on how to measure the associated utility and disclosure risk of the data. The ability to produce synthetic Census microdata, where the utility and associated risks are clearly understood, could mean that more timely and wider-ranging access to microdata would be possible. This paper follows on from previous work by the authors which mapped synthetic Census data on a risk-utility (R-U) map. The paper presents a framework to measure the utility and disclosure risk of synthetic data by comparing it to samples of the original data of varying sample fractions, thereby identifying the sample fraction which has equivalent utility and risk to the synthetic data. Three commonly used data synthesis packages are compared with some interesting results. Further work is needed in several directions but the methodology looks very promising.","PeriodicalId":91946,"journal":{"name":"Privacy in statistical databases. PSD (Conference : 2004- )","volume":"117 1","pages":"234-249"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Comparing the Utility and Disclosure Risk of Synthetic Data with Samples of Microdata\",\"authors\":\"C. Little, M. Elliot, R. Allmendinger\",\"doi\":\"10.48550/arXiv.2207.03339\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most statistical agencies release randomly selected samples of Census microdata, usually with sample fractions under 10% and with other forms of statistical disclosure control (SDC) applied. An alternative to SDC is data synthesis, which has been attracting growing interest, yet there is no clear consensus on how to measure the associated utility and disclosure risk of the data. The ability to produce synthetic Census microdata, where the utility and associated risks are clearly understood, could mean that more timely and wider-ranging access to microdata would be possible. This paper follows on from previous work by the authors which mapped synthetic Census data on a risk-utility (R-U) map. The paper presents a framework to measure the utility and disclosure risk of synthetic data by comparing it to samples of the original data of varying sample fractions, thereby identifying the sample fraction which has equivalent utility and risk to the synthetic data. Three commonly used data synthesis packages are compared with some interesting results. Further work is needed in several directions but the methodology looks very promising.\",\"PeriodicalId\":91946,\"journal\":{\"name\":\"Privacy in statistical databases. PSD (Conference : 2004- )\",\"volume\":\"117 1\",\"pages\":\"234-249\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Privacy in statistical databases. PSD (Conference : 2004- )\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2207.03339\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Privacy in statistical databases. PSD (Conference : 2004- )","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2207.03339","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

大多数统计机构发布随机抽取的普查微数据样本,样本比例通常在10%以下,并采用其他形式的统计披露控制(SDC)。SDC的另一种替代方案是数据综合,这引起了越来越多的兴趣,但在如何衡量数据的相关效用和披露风险方面尚无明确的共识。生成综合普查微数据的能力,其中的效用和相关风险是清楚了解的,这可能意味着更及时和更广泛地获取微数据是可能的。本文继承了前人在风险效用(R-U)图上绘制人口普查综合数据的工作。本文提出了一个框架,通过将合成数据与不同样本分数的原始数据的样本进行比较,来衡量合成数据的效用和披露风险,从而识别出与合成数据具有同等效用和风险的样本分数。比较了三种常用的数据合成包,得到了一些有趣的结果。在几个方向上还需要进一步的工作,但这种方法看起来很有前途。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comparing the Utility and Disclosure Risk of Synthetic Data with Samples of Microdata
Most statistical agencies release randomly selected samples of Census microdata, usually with sample fractions under 10% and with other forms of statistical disclosure control (SDC) applied. An alternative to SDC is data synthesis, which has been attracting growing interest, yet there is no clear consensus on how to measure the associated utility and disclosure risk of the data. The ability to produce synthetic Census microdata, where the utility and associated risks are clearly understood, could mean that more timely and wider-ranging access to microdata would be possible. This paper follows on from previous work by the authors which mapped synthetic Census data on a risk-utility (R-U) map. The paper presents a framework to measure the utility and disclosure risk of synthetic data by comparing it to samples of the original data of varying sample fractions, thereby identifying the sample fraction which has equivalent utility and risk to the synthetic data. Three commonly used data synthesis packages are compared with some interesting results. Further work is needed in several directions but the methodology looks very promising.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Comparing the Utility and Disclosure Risk of Synthetic Data with Samples of Microdata Utility and Disclosure Risk for Differentially Private Synthetic Categorical Data On Integrating the Number of Synthetic Data Sets m into the a priori Synthesis Approach A Re-examination of the Census Bureau Reconstruction and Reidentification Attack A Note on the Misinterpretation of the US Census Re-identification Attack
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1