重新审视人口普查局重建和重新识别攻击

K. Muralidhar
{"title":"重新审视人口普查局重建和重新识别攻击","authors":"K. Muralidhar","doi":"10.48550/arXiv.2205.03939","DOIUrl":null,"url":null,"abstract":": Recent analysis by researchers at the U.S. Census Bureau claims that by reconstructing the tabular data released from the 2010 Census, it is possible to reconstruct the original data and, using an accurate external data file with identity, reidentify 179 million respondents (approximately 58% of the population). This study shows that there are a practically infinite number of possible reconstructions, and each reconstruction leads to assigning a different identity to the respondents in the reconstructed data. The results reported by the Census Bureau researchers are based on just one of these infinite possible reconstructions and is easily refuted by an alternate reconstruction. Without definitive proof that the reconstruction is unique, or at the very least, that most reconstructions lead to the assignment of the same identity to the same respondent, claims of confirmed reidentification are highly suspect and easily refuted. The Census releases data at different geographic levels: nation, state, county, tract, block group, and block. The final three are census-defined constructs and do not necessarily correspond to traditional geographic classification. For personal level data, the data at the smaller geographic level is aggregated to the next higher level, that is, the results at the block level are aggregated to block groups, block groups are aggregated to tracts, etc. The multiple tables that are released (Total Population, Sex by Age, Total Races, and others) are all aggregations of the most detailed data release (Age by Sex, by Race, by Ethnicity). The different tables released form the basis of the reconstruction of the respondent microdata.","PeriodicalId":91946,"journal":{"name":"Privacy in statistical databases. PSD (Conference : 2004- )","volume":"39 1","pages":"312-323"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"A Re-examination of the Census Bureau Reconstruction and Reidentification Attack\",\"authors\":\"K. Muralidhar\",\"doi\":\"10.48550/arXiv.2205.03939\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Recent analysis by researchers at the U.S. Census Bureau claims that by reconstructing the tabular data released from the 2010 Census, it is possible to reconstruct the original data and, using an accurate external data file with identity, reidentify 179 million respondents (approximately 58% of the population). This study shows that there are a practically infinite number of possible reconstructions, and each reconstruction leads to assigning a different identity to the respondents in the reconstructed data. The results reported by the Census Bureau researchers are based on just one of these infinite possible reconstructions and is easily refuted by an alternate reconstruction. Without definitive proof that the reconstruction is unique, or at the very least, that most reconstructions lead to the assignment of the same identity to the same respondent, claims of confirmed reidentification are highly suspect and easily refuted. The Census releases data at different geographic levels: nation, state, county, tract, block group, and block. The final three are census-defined constructs and do not necessarily correspond to traditional geographic classification. For personal level data, the data at the smaller geographic level is aggregated to the next higher level, that is, the results at the block level are aggregated to block groups, block groups are aggregated to tracts, etc. The multiple tables that are released (Total Population, Sex by Age, Total Races, and others) are all aggregations of the most detailed data release (Age by Sex, by Race, by Ethnicity). The different tables released form the basis of the reconstruction of the respondent microdata.\",\"PeriodicalId\":91946,\"journal\":{\"name\":\"Privacy in statistical databases. PSD (Conference : 2004- )\",\"volume\":\"39 1\",\"pages\":\"312-323\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Privacy in statistical databases. PSD (Conference : 2004- )\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2205.03939\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Privacy in statistical databases. PSD (Conference : 2004- )","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.03939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

美国人口普查局的研究人员最近分析称,通过重建2010年人口普查发布的表格数据,有可能重建原始数据,并使用具有身份的准确外部数据文件,重新识别1.79亿受访者(约占人口的58%)。本研究表明,重构的可能性几乎是无限的,每一次重构都会给重构数据中的被调查者赋予不同的身份。人口普查局研究人员报告的结果只是基于这些无限可能的重建中的一种,很容易被另一种重建所反驳。如果没有明确的证据证明重建是独一无二的,或者至少,大多数重建导致将同一身份分配给同一被告,则证实重新身份的说法是高度可疑的,很容易被驳斥。人口普查按不同的地理层次发布数据:国家、州、县、地区、街区和街区。最后三个是人口普查定义的结构,不一定符合传统的地理分类。对于个人层面的数据,将较小地理层面的数据聚合到更高的层面,即将块层面的结果聚合到块组,块组聚合到域等。发布的多个表(总人口、按年龄性别、总种族和其他)都是最详细的数据发布(按性别年龄、按种族、按民族)的汇总。发布的不同表格构成了被调查者微数据重建的基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Re-examination of the Census Bureau Reconstruction and Reidentification Attack
: Recent analysis by researchers at the U.S. Census Bureau claims that by reconstructing the tabular data released from the 2010 Census, it is possible to reconstruct the original data and, using an accurate external data file with identity, reidentify 179 million respondents (approximately 58% of the population). This study shows that there are a practically infinite number of possible reconstructions, and each reconstruction leads to assigning a different identity to the respondents in the reconstructed data. The results reported by the Census Bureau researchers are based on just one of these infinite possible reconstructions and is easily refuted by an alternate reconstruction. Without definitive proof that the reconstruction is unique, or at the very least, that most reconstructions lead to the assignment of the same identity to the same respondent, claims of confirmed reidentification are highly suspect and easily refuted. The Census releases data at different geographic levels: nation, state, county, tract, block group, and block. The final three are census-defined constructs and do not necessarily correspond to traditional geographic classification. For personal level data, the data at the smaller geographic level is aggregated to the next higher level, that is, the results at the block level are aggregated to block groups, block groups are aggregated to tracts, etc. The multiple tables that are released (Total Population, Sex by Age, Total Races, and others) are all aggregations of the most detailed data release (Age by Sex, by Race, by Ethnicity). The different tables released form the basis of the reconstruction of the respondent microdata.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Comparing the Utility and Disclosure Risk of Synthetic Data with Samples of Microdata Utility and Disclosure Risk for Differentially Private Synthetic Categorical Data On Integrating the Number of Synthetic Data Sets m into the a priori Synthesis Approach A Re-examination of the Census Bureau Reconstruction and Reidentification Attack A Note on the Misinterpretation of the US Census Re-identification Attack
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1