{"title":"Linking Individuals to Areas: Protecting Confidentiality While Preserving Research Utility","authors":"Paul Norman, Jessie Colbert, Daniel J. Exeter","doi":"10.1007/s40980-023-00121-9","DOIUrl":null,"url":null,"abstract":"<p>Modern computational capabilities have brought about concerns about risks associated with the level of information disclosed in public datasets. A tension exists between making data available that protects the confidentiality of individuals while containing sufficiently detailed geographic information to underpin the utility of research. Our aim is to inform data collectors and suppliers about geographic choices for confidentiality protection and to balance this with reassurance to the research community that data will still be fit-for-purpose. We test this using simple logistic regression models, by investigating the interplay between two geographical entities (points for the observations and polygons for area attributes) at a variety of scales, using a synthetic population of 22,000 people. In an England and Wales setting, we do this for individuals located by postcodes and by postal sector and postal district centroids and link these to a variety of census geographies. We also ‘jitter’ postcode coordinates to test the effect of moving people away from their original location. We find a smoothing of relationships up the geographical hierarchy. However, if postal sector centroids are used to locate individuals, linkages to Lower/Medium Super Output Area scales and subsequent results are very similar to the more detailed unit postcodes. Postcode locations jittered by 500–750 m in any direction are likely to allow the same conclusions to be drawn as for the original locations. Within these geographic scenarios, there is likely to be a sufficient level of confidentiality protection while statistical relationships are very similar to those obtained using the most detailed geographic locators.</p>","PeriodicalId":43022,"journal":{"name":"Spatial Demography","volume":"8 11","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spatial Demography","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s40980-023-00121-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"DEMOGRAPHY","Score":null,"Total":0}
引用次数: 0
Abstract
Modern computational capabilities have brought about concerns about risks associated with the level of information disclosed in public datasets. A tension exists between making data available that protects the confidentiality of individuals while containing sufficiently detailed geographic information to underpin the utility of research. Our aim is to inform data collectors and suppliers about geographic choices for confidentiality protection and to balance this with reassurance to the research community that data will still be fit-for-purpose. We test this using simple logistic regression models, by investigating the interplay between two geographical entities (points for the observations and polygons for area attributes) at a variety of scales, using a synthetic population of 22,000 people. In an England and Wales setting, we do this for individuals located by postcodes and by postal sector and postal district centroids and link these to a variety of census geographies. We also ‘jitter’ postcode coordinates to test the effect of moving people away from their original location. We find a smoothing of relationships up the geographical hierarchy. However, if postal sector centroids are used to locate individuals, linkages to Lower/Medium Super Output Area scales and subsequent results are very similar to the more detailed unit postcodes. Postcode locations jittered by 500–750 m in any direction are likely to allow the same conclusions to be drawn as for the original locations. Within these geographic scenarios, there is likely to be a sufficient level of confidentiality protection while statistical relationships are very similar to those obtained using the most detailed geographic locators.
期刊介绍:
Spatial Demography focuses on understanding the spatial and spatiotemporal dimension of demographic processes. More specifically, the journal is interested in submissions that include the innovative use and adoption of spatial concepts, geospatial data, spatial technologies, and spatial analytic methods that further our understanding of demographic and policy-related related questions. The journal publishes both substantive and methodological papers from across the discipline of demography and its related fields (including economics, geography, sociology, anthropology, environmental science) and in applications ranging from local to global scale. In addition to research articles the journal will consider for publication review essays, book reviews, and reports/reviews on data, software, and instructional resources.