{"title":"重新审视人口普查局重建和重新识别攻击","authors":"K. Muralidhar","doi":"10.48550/arXiv.2205.03939","DOIUrl":null,"url":null,"abstract":": Recent analysis by researchers at the U.S. Census Bureau claims that by reconstructing the tabular data released from the 2010 Census, it is possible to reconstruct the original data and, using an accurate external data file with identity, reidentify 179 million respondents (approximately 58% of the population). This study shows that there are a practically infinite number of possible reconstructions, and each reconstruction leads to assigning a different identity to the respondents in the reconstructed data. The results reported by the Census Bureau researchers are based on just one of these infinite possible reconstructions and is easily refuted by an alternate reconstruction. Without definitive proof that the reconstruction is unique, or at the very least, that most reconstructions lead to the assignment of the same identity to the same respondent, claims of confirmed reidentification are highly suspect and easily refuted. The Census releases data at different geographic levels: nation, state, county, tract, block group, and block. The final three are census-defined constructs and do not necessarily correspond to traditional geographic classification. For personal level data, the data at the smaller geographic level is aggregated to the next higher level, that is, the results at the block level are aggregated to block groups, block groups are aggregated to tracts, etc. The multiple tables that are released (Total Population, Sex by Age, Total Races, and others) are all aggregations of the most detailed data release (Age by Sex, by Race, by Ethnicity). The different tables released form the basis of the reconstruction of the respondent microdata.","PeriodicalId":91946,"journal":{"name":"Privacy in statistical databases. PSD (Conference : 2004- )","volume":"39 1","pages":"312-323"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"A Re-examination of the Census Bureau Reconstruction and Reidentification Attack\",\"authors\":\"K. Muralidhar\",\"doi\":\"10.48550/arXiv.2205.03939\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Recent analysis by researchers at the U.S. Census Bureau claims that by reconstructing the tabular data released from the 2010 Census, it is possible to reconstruct the original data and, using an accurate external data file with identity, reidentify 179 million respondents (approximately 58% of the population). This study shows that there are a practically infinite number of possible reconstructions, and each reconstruction leads to assigning a different identity to the respondents in the reconstructed data. The results reported by the Census Bureau researchers are based on just one of these infinite possible reconstructions and is easily refuted by an alternate reconstruction. Without definitive proof that the reconstruction is unique, or at the very least, that most reconstructions lead to the assignment of the same identity to the same respondent, claims of confirmed reidentification are highly suspect and easily refuted. The Census releases data at different geographic levels: nation, state, county, tract, block group, and block. The final three are census-defined constructs and do not necessarily correspond to traditional geographic classification. For personal level data, the data at the smaller geographic level is aggregated to the next higher level, that is, the results at the block level are aggregated to block groups, block groups are aggregated to tracts, etc. The multiple tables that are released (Total Population, Sex by Age, Total Races, and others) are all aggregations of the most detailed data release (Age by Sex, by Race, by Ethnicity). The different tables released form the basis of the reconstruction of the respondent microdata.\",\"PeriodicalId\":91946,\"journal\":{\"name\":\"Privacy in statistical databases. PSD (Conference : 2004- )\",\"volume\":\"39 1\",\"pages\":\"312-323\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Privacy in statistical databases. PSD (Conference : 2004- )\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2205.03939\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Privacy in statistical databases. PSD (Conference : 2004- )","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.03939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Re-examination of the Census Bureau Reconstruction and Reidentification Attack
: Recent analysis by researchers at the U.S. Census Bureau claims that by reconstructing the tabular data released from the 2010 Census, it is possible to reconstruct the original data and, using an accurate external data file with identity, reidentify 179 million respondents (approximately 58% of the population). This study shows that there are a practically infinite number of possible reconstructions, and each reconstruction leads to assigning a different identity to the respondents in the reconstructed data. The results reported by the Census Bureau researchers are based on just one of these infinite possible reconstructions and is easily refuted by an alternate reconstruction. Without definitive proof that the reconstruction is unique, or at the very least, that most reconstructions lead to the assignment of the same identity to the same respondent, claims of confirmed reidentification are highly suspect and easily refuted. The Census releases data at different geographic levels: nation, state, county, tract, block group, and block. The final three are census-defined constructs and do not necessarily correspond to traditional geographic classification. For personal level data, the data at the smaller geographic level is aggregated to the next higher level, that is, the results at the block level are aggregated to block groups, block groups are aggregated to tracts, etc. The multiple tables that are released (Total Population, Sex by Age, Total Races, and others) are all aggregations of the most detailed data release (Age by Sex, by Race, by Ethnicity). The different tables released form the basis of the reconstruction of the respondent microdata.