{"title":"应用内位置数据的区域化和聚合,实现信息最大化和数据披露最小化","authors":"Louise Sieg, James Cheshire","doi":"10.1111/gean.12406","DOIUrl":null,"url":null,"abstract":"<p>To minimize the disclosure of personal information, sensitive location data collected by mobile phones is often aggregated to predefined geographic units and presented as counts of devices at a given time. The use of grids or units created by statistical agencies for the dissemination of traditional data sets—such as censuses—are common choices for this aggregation process. However, these can result in large variations in the number of devices encapsulated within each geographic unit, resulting in over-generalization and a loss of information in some areas. To alleviate this issue, we propose a new method for the aggregation of mobile phone generated location data sets that creates bespoke geometries that maximize the granularity of the data, whilst minimizing the risks of disclosing personal information. The resulting small areas are built on Uber's H3 hexagonal indexing system by attributing activity counts and land-use features to each cell, then merging cells into geographies containing a predetermined number of data points and respecting the underlying topography and land use. This methodology has applications to widely available data sets and enables bespoke geographical units to be created for different contexts. We compare the generated units to established aggregates from the England and Wales Census and Ordnance Survey. We demonstrate that our outputs are more representative of the original mobile phone data set and minimize data omission caused by low counts. This speaks to the need for a data-driven and context-driven regionalization methodology.</p>","PeriodicalId":12533,"journal":{"name":"Geographical Analysis","volume":"57 1","pages":"27-51"},"PeriodicalIF":3.3000,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/gean.12406","citationCount":"0","resultStr":"{\"title\":\"The Regionalization and Aggregation of In-App Location Data to Maximize Information and Minimize Data Disclosure\",\"authors\":\"Louise Sieg, James Cheshire\",\"doi\":\"10.1111/gean.12406\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>To minimize the disclosure of personal information, sensitive location data collected by mobile phones is often aggregated to predefined geographic units and presented as counts of devices at a given time. The use of grids or units created by statistical agencies for the dissemination of traditional data sets—such as censuses—are common choices for this aggregation process. However, these can result in large variations in the number of devices encapsulated within each geographic unit, resulting in over-generalization and a loss of information in some areas. To alleviate this issue, we propose a new method for the aggregation of mobile phone generated location data sets that creates bespoke geometries that maximize the granularity of the data, whilst minimizing the risks of disclosing personal information. The resulting small areas are built on Uber's H3 hexagonal indexing system by attributing activity counts and land-use features to each cell, then merging cells into geographies containing a predetermined number of data points and respecting the underlying topography and land use. This methodology has applications to widely available data sets and enables bespoke geographical units to be created for different contexts. We compare the generated units to established aggregates from the England and Wales Census and Ordnance Survey. We demonstrate that our outputs are more representative of the original mobile phone data set and minimize data omission caused by low counts. This speaks to the need for a data-driven and context-driven regionalization methodology.</p>\",\"PeriodicalId\":12533,\"journal\":{\"name\":\"Geographical Analysis\",\"volume\":\"57 1\",\"pages\":\"27-51\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/gean.12406\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geographical Analysis\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/gean.12406\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geographical Analysis","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/gean.12406","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY","Score":null,"Total":0}
The Regionalization and Aggregation of In-App Location Data to Maximize Information and Minimize Data Disclosure
To minimize the disclosure of personal information, sensitive location data collected by mobile phones is often aggregated to predefined geographic units and presented as counts of devices at a given time. The use of grids or units created by statistical agencies for the dissemination of traditional data sets—such as censuses—are common choices for this aggregation process. However, these can result in large variations in the number of devices encapsulated within each geographic unit, resulting in over-generalization and a loss of information in some areas. To alleviate this issue, we propose a new method for the aggregation of mobile phone generated location data sets that creates bespoke geometries that maximize the granularity of the data, whilst minimizing the risks of disclosing personal information. The resulting small areas are built on Uber's H3 hexagonal indexing system by attributing activity counts and land-use features to each cell, then merging cells into geographies containing a predetermined number of data points and respecting the underlying topography and land use. This methodology has applications to widely available data sets and enables bespoke geographical units to be created for different contexts. We compare the generated units to established aggregates from the England and Wales Census and Ordnance Survey. We demonstrate that our outputs are more representative of the original mobile phone data set and minimize data omission caused by low counts. This speaks to the need for a data-driven and context-driven regionalization methodology.
期刊介绍:
First in its specialty area and one of the most frequently cited publications in geography, Geographical Analysis has, since 1969, presented significant advances in geographical theory, model building, and quantitative methods to geographers and scholars in a wide spectrum of related fields. Traditionally, mathematical and nonmathematical articulations of geographical theory, and statements and discussions of the analytic paradigm are published in the journal. Spatial data analyses and spatial econometrics and statistics are strongly represented.