Expanding the Role of Synthetic Data at the U.S. Census Bureau

Ron S. Jarmin, T. Louis, Javier Miranda
{"title":"Expanding the Role of Synthetic Data at the U.S. Census Bureau","authors":"Ron S. Jarmin, T. Louis, Javier Miranda","doi":"10.2139/ssrn.2408030","DOIUrl":null,"url":null,"abstract":"National Statistical offices (NSOs) create official statistics from data collected from survey respondents, government administrative records and other sources. The raw source data is usually considered to be confidential. In the case of the U.S. Census Bureau, confidentiality of survey and administrative records microdata is mandated by statute, and this mandate to protect confidentiality is often at odds with the needs of users to extract as much information from the data as possible. Traditional disclosure protection techniques result in official data products that do not fully utilize the information content of the underlying microdata. Typically, these products take the form of simple aggregate tabulations. In a few cases anonymized public- use micro samples are made available, but these face a growing risk of re-identification by the increasing amounts of information about individuals and firms available in the public domain. One approach for overcoming these risks is to release products based on synthetic data where values are simulated from statistical models designed to mimic the (joint) distributions of the underlying microdata. We discuss re- cent Census Bureau work to develop and deploy such products. We discuss the benefits and challenges involved with extending the scope of synthetic data products in official statistics.","PeriodicalId":92154,"journal":{"name":"U.S. Census Bureau Center for Economic Studies research paper series","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"U.S. Census Bureau Center for Economic Studies research paper series","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.2408030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

National Statistical offices (NSOs) create official statistics from data collected from survey respondents, government administrative records and other sources. The raw source data is usually considered to be confidential. In the case of the U.S. Census Bureau, confidentiality of survey and administrative records microdata is mandated by statute, and this mandate to protect confidentiality is often at odds with the needs of users to extract as much information from the data as possible. Traditional disclosure protection techniques result in official data products that do not fully utilize the information content of the underlying microdata. Typically, these products take the form of simple aggregate tabulations. In a few cases anonymized public- use micro samples are made available, but these face a growing risk of re-identification by the increasing amounts of information about individuals and firms available in the public domain. One approach for overcoming these risks is to release products based on synthetic data where values are simulated from statistical models designed to mimic the (joint) distributions of the underlying microdata. We discuss re- cent Census Bureau work to develop and deploy such products. We discuss the benefits and challenges involved with extending the scope of synthetic data products in official statistics.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
扩大综合数据在美国人口普查局的作用
国家统计局(NSOs)根据从调查对象、政府行政记录和其他来源收集的数据创建官方统计数据。原始源数据通常被认为是机密的。在美国人口普查局的情况下,调查和行政记录微数据的机密性是由法规规定的,而保护机密性的这一规定往往与用户从数据中提取尽可能多的信息的需求相冲突。传统的披露保护技术导致官方数据产品不能充分利用底层微数据的信息内容。通常,这些产品采用简单汇总表格的形式。在少数情况下,提供了匿名的公共使用微样本,但由于公共领域中个人和公司信息的数量不断增加,这些样本面临着重新识别的日益增加的风险。克服这些风险的一种方法是发布基于合成数据的产品,其中的值是从旨在模拟底层微数据的(联合)分布的统计模型中模拟出来的。我们讨论了最近人口普查局开发和部署此类产品的工作。我们讨论了在官方统计中扩大合成数据产品范围所涉及的好处和挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Creditor Control Rights and Resource Allocation Within Firms Examining Multi-Level Correlates of Suicide by Merging NVDRS and ACS Data. Do Firms Mitigate or Magnify Capital Misallocation? Evidence from Plant-Level Data Going Entrepreneurial? IPOs and New Firm Creation Examining Multi-Level Correlates of Suicide by Merging NVDRS and ACS Data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1