应用内位置数据的区域化和聚合，实现信息最大化和数据披露最小化

IF 3.3 3区地球科学 Q1 GEOGRAPHY Geographical Analysis Pub Date : 2024-06-07 DOI:10.1111/gean.12406

Louise Sieg, James Cheshire

{"title":"应用内位置数据的区域化和聚合，实现信息最大化和数据披露最小化","authors":"Louise Sieg, James Cheshire","doi":"10.1111/gean.12406","DOIUrl":null,"url":null,"abstract":"<p>To minimize the disclosure of personal information, sensitive location data collected by mobile phones is often aggregated to predefined geographic units and presented as counts of devices at a given time. The use of grids or units created by statistical agencies for the dissemination of traditional data sets—such as censuses—are common choices for this aggregation process. However, these can result in large variations in the number of devices encapsulated within each geographic unit, resulting in over-generalization and a loss of information in some areas. To alleviate this issue, we propose a new method for the aggregation of mobile phone generated location data sets that creates bespoke geometries that maximize the granularity of the data, whilst minimizing the risks of disclosing personal information. The resulting small areas are built on Uber's H3 hexagonal indexing system by attributing activity counts and land-use features to each cell, then merging cells into geographies containing a predetermined number of data points and respecting the underlying topography and land use. This methodology has applications to widely available data sets and enables bespoke geographical units to be created for different contexts. We compare the generated units to established aggregates from the England and Wales Census and Ordnance Survey. We demonstrate that our outputs are more representative of the original mobile phone data set and minimize data omission caused by low counts. This speaks to the need for a data-driven and context-driven regionalization methodology.</p>","PeriodicalId":12533,"journal":{"name":"Geographical Analysis","volume":"57 1","pages":"27-51"},"PeriodicalIF":3.3000,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/gean.12406","citationCount":"0","resultStr":"{\"title\":\"The Regionalization and Aggregation of In-App Location Data to Maximize Information and Minimize Data Disclosure\",\"authors\":\"Louise Sieg, James Cheshire\",\"doi\":\"10.1111/gean.12406\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>To minimize the disclosure of personal information, sensitive location data collected by mobile phones is often aggregated to predefined geographic units and presented as counts of devices at a given time. The use of grids or units created by statistical agencies for the dissemination of traditional data sets—such as censuses—are common choices for this aggregation process. However, these can result in large variations in the number of devices encapsulated within each geographic unit, resulting in over-generalization and a loss of information in some areas. To alleviate this issue, we propose a new method for the aggregation of mobile phone generated location data sets that creates bespoke geometries that maximize the granularity of the data, whilst minimizing the risks of disclosing personal information. The resulting small areas are built on Uber's H3 hexagonal indexing system by attributing activity counts and land-use features to each cell, then merging cells into geographies containing a predetermined number of data points and respecting the underlying topography and land use. This methodology has applications to widely available data sets and enables bespoke geographical units to be created for different contexts. We compare the generated units to established aggregates from the England and Wales Census and Ordnance Survey. We demonstrate that our outputs are more representative of the original mobile phone data set and minimize data omission caused by low counts. This speaks to the need for a data-driven and context-driven regionalization methodology.</p>\",\"PeriodicalId\":12533,\"journal\":{\"name\":\"Geographical Analysis\",\"volume\":\"57 1\",\"pages\":\"27-51\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/gean.12406\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geographical Analysis\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/gean.12406\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geographical Analysis","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/gean.12406","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY","Score":null,"Total":0}

引用次数: 0

摘要

为了最大限度地减少个人信息的泄露，移动电话收集的敏感位置数据通常被汇总到预定义的地理单元，并以特定时间内的设备计数形式呈现。使用网格或统计机构为传播传统数据集（如人口普查）而创建的单位是这种汇总过程的常见选择。然而，这可能会导致每个地理单元内所包含的设备数量差异很大，从而造成过度概括和某些地区的信息丢失。为了缓解这一问题，我们提出了一种聚合手机生成的位置数据集的新方法，该方法可创建定制的几何图形，从而最大限度地提高数据的粒度，同时将披露个人信息的风险降至最低。由此产生的小区域以 Uber 的 H3 六边形索引系统为基础，将活动计数和土地使用特征归属于每个单元格，然后将单元格合并为包含预定数量数据点的地理区域，并尊重底层地形和土地使用情况。这种方法适用于广泛可用的数据集，并可根据不同情况创建定制的地理单元。我们将生成的地理单元与英格兰和威尔士人口普查以及英国国家测绘局（Ordnance Survey）的既定综合数据进行比较。我们证明，我们的输出结果更能代表原始的移动电话数据集，并最大限度地减少了因计数低而造成的数据遗漏。这说明需要一种由数据和背景驱动的区域化方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The Regionalization and Aggregation of In-App Location Data to Maximize Information and Minimize Data Disclosure

To minimize the disclosure of personal information, sensitive location data collected by mobile phones is often aggregated to predefined geographic units and presented as counts of devices at a given time. The use of grids or units created by statistical agencies for the dissemination of traditional data sets—such as censuses—are common choices for this aggregation process. However, these can result in large variations in the number of devices encapsulated within each geographic unit, resulting in over-generalization and a loss of information in some areas. To alleviate this issue, we propose a new method for the aggregation of mobile phone generated location data sets that creates bespoke geometries that maximize the granularity of the data, whilst minimizing the risks of disclosing personal information. The resulting small areas are built on Uber's H3 hexagonal indexing system by attributing activity counts and land-use features to each cell, then merging cells into geographies containing a predetermined number of data points and respecting the underlying topography and land use. This methodology has applications to widely available data sets and enables bespoke geographical units to be created for different contexts. We compare the generated units to established aggregates from the England and Wales Census and Ordnance Survey. We demonstrate that our outputs are more representative of the original mobile phone data set and minimize data omission caused by low counts. This speaks to the need for a data-driven and context-driven regionalization methodology.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Geographical Analysis GEOGRAPHY-

CiteScore

8.70

自引率

5.60%

发文量

期刊介绍： First in its specialty area and one of the most frequently cited publications in geography, Geographical Analysis has, since 1969, presented significant advances in geographical theory, model building, and quantitative methods to geographers and scholars in a wide spectrum of related fields. Traditionally, mathematical and nonmathematical articulations of geographical theory, and statements and discussions of the analytic paradigm are published in the journal. Spatial data analyses and spatial econometrics and statistics are strongly represented.

期刊最新文献

Issue Information Issue Information Impacts of improved transport on regional market access Testing Hypotheses When You Have More Than a Few* Beyond Auto-Models: Self-Correlated Sui-Model Respecifications