可扩展的空间扫描统计通过采样

Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems Pub Date : 2016-10-31 DOI:10.1145/2996913.2996939

Michael Matheny, Raghvendra Singh, L. Zhang, Kaiqiang Wang, J. M. Phillips

{"title":"可扩展的空间扫描统计通过采样","authors":"Michael Matheny, Raghvendra Singh, L. Zhang, Kaiqiang Wang, J. M. Phillips","doi":"10.1145/2996913.2996939","DOIUrl":null,"url":null,"abstract":"Finding anomalous regions within spatial data sets is a central task for biosurveillance, homeland security, policy making, and many other important areas. These communities have mainly settled on spatial scan statistics as a rigorous way to discover regions where a measured quantity (e.g., crime) is statistically significant in its difference from a baseline population. However, most common approaches are inefficient and thus, can only be run with very modest data sizes (a few thousand data points) or make assumptions on the geographic distributions of the data. We address these challenges by designing, exploring, and analyzing sample-then-scan algorithms. These algorithms randomly sample data at two scales, one to define regions and the other to approximate the counts in these regions. Our experiments demonstrate that these algorithms are efficient and accurate independent of the size of the original data set, and our analysis explains why this is the case. For the first time, these sample-then-scan algorithms allow spatial scan statistics to run on a million or more data points without making assumptions on the spatial distribution of the data. Moreover, our experiments and analysis give insight into when it is appropriate to trust the various types of spatial anomalies when the data is modeled as a random sample from a larger but unknown data set.","PeriodicalId":20525,"journal":{"name":"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":"191 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Scalable spatial scan statistics through sampling\",\"authors\":\"Michael Matheny, Raghvendra Singh, L. Zhang, Kaiqiang Wang, J. M. Phillips\",\"doi\":\"10.1145/2996913.2996939\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Finding anomalous regions within spatial data sets is a central task for biosurveillance, homeland security, policy making, and many other important areas. These communities have mainly settled on spatial scan statistics as a rigorous way to discover regions where a measured quantity (e.g., crime) is statistically significant in its difference from a baseline population. However, most common approaches are inefficient and thus, can only be run with very modest data sizes (a few thousand data points) or make assumptions on the geographic distributions of the data. We address these challenges by designing, exploring, and analyzing sample-then-scan algorithms. These algorithms randomly sample data at two scales, one to define regions and the other to approximate the counts in these regions. Our experiments demonstrate that these algorithms are efficient and accurate independent of the size of the original data set, and our analysis explains why this is the case. For the first time, these sample-then-scan algorithms allow spatial scan statistics to run on a million or more data points without making assumptions on the spatial distribution of the data. Moreover, our experiments and analysis give insight into when it is appropriate to trust the various types of spatial anomalies when the data is modeled as a random sample from a larger but unknown data set.\",\"PeriodicalId\":20525,\"journal\":{\"name\":\"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems\",\"volume\":\"191 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2996913.2996939\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2996913.2996939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

在空间数据集中发现异常区域是生物监测、国土安全、政策制定和许多其他重要领域的中心任务。这些社区主要采用空间扫描统计作为一种严格的方法，以发现测量数量(例如，犯罪)与基线人口的差异在统计上显着的区域。然而，大多数常见的方法效率低下，因此只能在非常有限的数据量(几千个数据点)下运行，或者对数据的地理分布进行假设。我们通过设计、探索和分析采样扫描算法来解决这些挑战。这些算法在两个尺度上随机采样数据，一个用于定义区域，另一个用于近似这些区域的计数。我们的实验表明，这些算法是高效和准确的，与原始数据集的大小无关，我们的分析解释了为什么会出现这种情况。这是第一次，这些采样-扫描算法允许在不假设数据的空间分布的情况下对一百万或更多的数据点运行空间扫描统计。此外，我们的实验和分析让我们深入了解，当数据作为一个更大但未知的数据集的随机样本建模时，何时应该相信各种类型的空间异常。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Scalable spatial scan statistics through sampling

Finding anomalous regions within spatial data sets is a central task for biosurveillance, homeland security, policy making, and many other important areas. These communities have mainly settled on spatial scan statistics as a rigorous way to discover regions where a measured quantity (e.g., crime) is statistically significant in its difference from a baseline population. However, most common approaches are inefficient and thus, can only be run with very modest data sizes (a few thousand data points) or make assumptions on the geographic distributions of the data. We address these challenges by designing, exploring, and analyzing sample-then-scan algorithms. These algorithms randomly sample data at two scales, one to define regions and the other to approximate the counts in these regions. Our experiments demonstrate that these algorithms are efficient and accurate independent of the size of the original data set, and our analysis explains why this is the case. For the first time, these sample-then-scan algorithms allow spatial scan statistics to run on a million or more data points without making assumptions on the spatial distribution of the data. Moreover, our experiments and analysis give insight into when it is appropriate to trust the various types of spatial anomalies when the data is modeled as a random sample from a larger but unknown data set.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems

自引率

0.00%

发文量

期刊最新文献

Location corroborations by mobile devices without traces Knowledge-based trajectory completion from sparse GPS samples Particle filter for real-time human mobility prediction following unprecedented disaster Pyspatiotemporalgeom: a python library for spatiotemporal types and operations Fast transportation network traversal with hyperedges: (industrial paper)