Traditionally, historical crimes and socioeconomic data have been used to understand crime in cities and to build crime prediction models. Nevertheless, the increasing availability of mobility data from cell phones to location-based services, has introduced a new family of mobility-based crime prediction models that exploit the relation between mobility patterns and reported crime incidents. One of the major concerns of using reported crime data is underreporting, which will bias the crime predictions. In this paper, we propose a novel Bayesian Hierarchical model that utilizes domain knowledge about biases in reported crime data to characterize and enhance fairness and accuracy in mobility-based crime predictions. An in-depth feature analysis reveals the influence that various factors might play in crime under-reporting and algorithmic fairness for mobility-based crime predictors.
{"title":"Addressing Under-Reporting to Enhance Fairness and Accuracy in Mobility-based Crime Prediction","authors":"Jiahui Wu, E. Frías-Martínez, V. Frías-Martínez","doi":"10.1145/3397536.3422205","DOIUrl":"https://doi.org/10.1145/3397536.3422205","url":null,"abstract":"Traditionally, historical crimes and socioeconomic data have been used to understand crime in cities and to build crime prediction models. Nevertheless, the increasing availability of mobility data from cell phones to location-based services, has introduced a new family of mobility-based crime prediction models that exploit the relation between mobility patterns and reported crime incidents. One of the major concerns of using reported crime data is underreporting, which will bias the crime predictions. In this paper, we propose a novel Bayesian Hierarchical model that utilizes domain knowledge about biases in reported crime data to characterize and enhance fairness and accuracy in mobility-based crime predictions. An in-depth feature analysis reveals the influence that various factors might play in crime under-reporting and algorithmic fairness for mobility-based crime predictors.","PeriodicalId":233918,"journal":{"name":"Proceedings of the 28th International Conference on Advances in Geographic Information Systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123217001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Puloma Katiyar, Tin Vu, A. Eldawy, S. Migliorini, A. Belussi
This demonstration presents a web-based generator for spatial data. This generator allows users to choose from a wide range of spatial data distributions and configure the cardinality of the data and the distribution parameters. It then provides three functionalities. First, it provides a visualization of how the data will look like. Second, it allows users to download this data in several standard formats including CSV and GeoJSON. Third, it provides a permalink that users can bookmark or share with their team members to reproduce the same dataset later. This service is a step towards standardized benchmarking for spatial data systems.
{"title":"SpiderWeb: A Spatial Data Generator on the Web","authors":"Puloma Katiyar, Tin Vu, A. Eldawy, S. Migliorini, A. Belussi","doi":"10.1145/3397536.3422351","DOIUrl":"https://doi.org/10.1145/3397536.3422351","url":null,"abstract":"This demonstration presents a web-based generator for spatial data. This generator allows users to choose from a wide range of spatial data distributions and configure the cardinality of the data and the distribution parameters. It then provides three functionalities. First, it provides a visualization of how the data will look like. Second, it allows users to download this data in several standard formats including CSV and GeoJSON. Third, it provides a permalink that users can bookmark or share with their team members to reproduce the same dataset later. This service is a step towards standardized benchmarking for spatial data systems.","PeriodicalId":233918,"journal":{"name":"Proceedings of the 28th International Conference on Advances in Geographic Information Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116799570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the main challenges in investigation in the field of spatiotemporal databases is that there are few datasets available, they represent specific phenomena, in general, have a small number of observations, and do not provide a ground truth. In this work we present a generator for 2D moving regions that can represent several atomic events: face shrink, grow and evolve, face burst and engulf, face internal split from a closed and an open line (fissure), face internal merge, face split at a point, face split hole, face consume hole, hole shrink, grow and evolve, hole appear from an open line, hole consume face, hole split face, and hole split at a point. The generator allows datasets to be created and annotated automatically and it can also be used to create custom datasets. We also present datasets created with this generator.
{"title":"A Generator for 2D Moving Regions","authors":"José Duarte, Mark McKenney","doi":"10.1145/3397536.3422336","DOIUrl":"https://doi.org/10.1145/3397536.3422336","url":null,"abstract":"One of the main challenges in investigation in the field of spatiotemporal databases is that there are few datasets available, they represent specific phenomena, in general, have a small number of observations, and do not provide a ground truth. In this work we present a generator for 2D moving regions that can represent several atomic events: face shrink, grow and evolve, face burst and engulf, face internal split from a closed and an open line (fissure), face internal merge, face split at a point, face split hole, face consume hole, hole shrink, grow and evolve, hole appear from an open line, hole consume face, hole split face, and hole split at a point. The generator allows datasets to be created and annotated automatically and it can also be used to create custom datasets. We also present datasets created with this generator.","PeriodicalId":233918,"journal":{"name":"Proceedings of the 28th International Conference on Advances in Geographic Information Systems","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120880101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Rilee, Niklas Griessbaum, K. Kuo, J. Frew, R. Wolfe
Scaling up volume and variety in Big Earth Science Data is particularly difficult when combining low-level, ungridded data, such as swath observations obtained with, for example, Moderate Resolution Imaging Spectroradiometers (MODIS). A unified way to index and combine data with different geo-spatiotemporal layouts and incomparable native array formatting is required for scalable integrative analyses based on data at its full instrument resolution, that is, without extra interpolation (or extrapolation) onto a common grid. The SpatioTemporal Adaptive Resolution Encoding (STARE) uses the Hierarchical Triangular Mesh (HTM) and the Hierarchical Calendrical Partitioning (HCP), recursive partitionings of solid angle and time into tree data structures, to encode spatiotemporal neighborhoods as sets of integers. Regions sharing common paths through the STARE tree hierarchy have similar index values, which can then serve as keys in algorithms and data structures supporting scalable integrative analyses. Thus, STARE co-aligns data in both physical (spatiotemporal) and cyber (memory) spaces, providing a means for marshalling computing resources and conducting analysis with minimum data movement, addressing volume scalability while simultaneously unifying diverse data for variety scaling. In this paper, we demonstrate how easy it is to use the Python STARE API (PySTARE) and the parallel programming platform Dask to integrate MODIS and Geostationary Operational Environmental Satellite (GOES) data, datasets with very different geo-spatiotemporal characteristics.
当结合低水平的、未网格化的数据,例如用中分辨率成像光谱仪(MODIS)获得的条带观测数据时,扩大大地球科学数据的容量和多样性尤其困难。需要一种统一的方式来索引和组合具有不同地理-时空布局和无与伦比的本地阵列格式的数据,以便基于其全仪器分辨率的数据进行可扩展的集成分析,也就是说,不需要在公共网格上额外的插值(或外推)。时空自适应分辨率编码(STARE)采用分层三角网格(HTM)和分层日历分区(HCP),将立体角和时间递归划分为树状数据结构,将时空邻域编码为整数集。通过STARE树层次结构共享公共路径的区域具有相似的索引值,这些索引值可以作为支持可扩展集成分析的算法和数据结构中的关键。因此,STARE将物理(时空)和网络(内存)空间中的数据协同对齐,提供了一种编组计算资源的方法,并以最小的数据移动进行分析,解决了容量可扩展性问题,同时将不同的数据统一为各种扩展。在本文中,我们演示了使用Python STARE API (PySTARE)和并行编程平台Dask集成MODIS和地球静止运行环境卫星(GOES)数据是多么容易,这些数据集具有非常不同的地理时空特征。
{"title":"STARE-based Integrative Analysis of Diverse Data Using Dask Parallel Programming Demo Paper","authors":"M. Rilee, Niklas Griessbaum, K. Kuo, J. Frew, R. Wolfe","doi":"10.1145/3397536.3422346","DOIUrl":"https://doi.org/10.1145/3397536.3422346","url":null,"abstract":"Scaling up volume and variety in Big Earth Science Data is particularly difficult when combining low-level, ungridded data, such as swath observations obtained with, for example, Moderate Resolution Imaging Spectroradiometers (MODIS). A unified way to index and combine data with different geo-spatiotemporal layouts and incomparable native array formatting is required for scalable integrative analyses based on data at its full instrument resolution, that is, without extra interpolation (or extrapolation) onto a common grid. The SpatioTemporal Adaptive Resolution Encoding (STARE) uses the Hierarchical Triangular Mesh (HTM) and the Hierarchical Calendrical Partitioning (HCP), recursive partitionings of solid angle and time into tree data structures, to encode spatiotemporal neighborhoods as sets of integers. Regions sharing common paths through the STARE tree hierarchy have similar index values, which can then serve as keys in algorithms and data structures supporting scalable integrative analyses. Thus, STARE co-aligns data in both physical (spatiotemporal) and cyber (memory) spaces, providing a means for marshalling computing resources and conducting analysis with minimum data movement, addressing volume scalability while simultaneously unifying diverse data for variety scaling. In this paper, we demonstrate how easy it is to use the Python STARE API (PySTARE) and the parallel programming platform Dask to integrate MODIS and Geostationary Operational Environmental Satellite (GOES) data, datasets with very different geo-spatiotemporal characteristics.","PeriodicalId":233918,"journal":{"name":"Proceedings of the 28th International Conference on Advances in Geographic Information Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126115290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. U. Kiran, Sourabh Shrivastava, Philippe Fournier-Viger, K. Zettsu, Masashi Toyoda, M. Kitsuregawa
Frequent pattern mining is an important model in data mining. It involves finding all patterns in a transactional database that satisfy the user-specified minimum support (minSup) constraint. The minSup controls the minimum number of transactions that a pattern must cover in a transactional database. Since only minSup is used to evaluate a pattern's interestingness, the frequent pattern model implicitly assumes that spatial information of the items will not impact the interestingness of a pattern in the database. This assumption limits the applicability of the frequent pattern model in many real-world applications. It is because patterns whose items are close to each other are typically more attractive to the user than the patterns whose items are far from each other in a coordinate system. With this motivation, this paper proposes a novel model of frequent spatial pattern that may exist in a spatiotemporal database. An efficient pattern-growth algorithm, called Frequent Spatial Pattern-growth (FSP-growth), has also been presented to mine all desired patterns in a database. Experimental results demonstrate that our algorithm is efficient. The usefulness of the proposed patterns has also been shown with a real-world application.
{"title":"Discovering Frequent Spatial Patterns in Very Large Spatiotemporal Databases","authors":"R. U. Kiran, Sourabh Shrivastava, Philippe Fournier-Viger, K. Zettsu, Masashi Toyoda, M. Kitsuregawa","doi":"10.1145/3397536.3422206","DOIUrl":"https://doi.org/10.1145/3397536.3422206","url":null,"abstract":"Frequent pattern mining is an important model in data mining. It involves finding all patterns in a transactional database that satisfy the user-specified minimum support (minSup) constraint. The minSup controls the minimum number of transactions that a pattern must cover in a transactional database. Since only minSup is used to evaluate a pattern's interestingness, the frequent pattern model implicitly assumes that spatial information of the items will not impact the interestingness of a pattern in the database. This assumption limits the applicability of the frequent pattern model in many real-world applications. It is because patterns whose items are close to each other are typically more attractive to the user than the patterns whose items are far from each other in a coordinate system. With this motivation, this paper proposes a novel model of frequent spatial pattern that may exist in a spatiotemporal database. An efficient pattern-growth algorithm, called Frequent Spatial Pattern-growth (FSP-growth), has also been presented to mine all desired patterns in a database. Experimental results demonstrate that our algorithm is efficient. The usefulness of the proposed patterns has also been shown with a real-world application.","PeriodicalId":233918,"journal":{"name":"Proceedings of the 28th International Conference on Advances in Geographic Information Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124625502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, location prediction has become an important task and has gained significant attention. Existing location prediction methods rely on centralized storage of user mobility data for model training, which may lead to privacy concerns and risks due to the privacy-sensitive nature of user behaviors. In this work, we propose a privacy-preserving method for mobility prediction model training based on federated learning, which can leverage the useful information in the behaviors of massive users to train accurate mobility prediction models and meanwhile remove the need to centralized storage of them. Firstly, we propose a novel network named STSAN (Spatial-Temporal Self-Attention Network) on each user device, which can integrate spatiotemporal information with the self-attention for location prediction and a new personalized federated learning model named AMF (Adaptive Model Fusion Federated Learning), which is a mixture of local and global model. Finally, the results are superior to various baselines on four real-world check-ins datasets, verifying the effectiveness of the method.
近年来,位置预测已成为一项重要的研究课题,受到了广泛的关注。现有的位置预测方法依赖于用户移动数据的集中存储进行模型训练,由于用户行为的隐私敏感性,这可能会导致隐私问题和风险。在这项工作中,我们提出了一种基于联邦学习的移动性预测模型训练的隐私保护方法,该方法可以利用大量用户行为中的有用信息来训练准确的移动性预测模型,同时消除了对移动性预测模型集中存储的需求。首先,我们在每个用户设备上提出了一种新的时空自注意网络(Spatial-Temporal Self-Attention network, STSAN),该网络将时空信息与自注意相结合进行位置预测,并提出了一种新的个性化联邦学习模型AMF (Adaptive model Fusion federated learning),该模型是局部模型和全局模型的混合模型。最后,在四个实际签入数据集上,结果优于各种基线,验证了该方法的有效性。
{"title":"Predicting Human Mobility with Federated Learning","authors":"Anliang Li, Shuang Wang, Wenzhu Li, Shengnan Liu, Siyuan Zhang","doi":"10.1145/3397536.3422270","DOIUrl":"https://doi.org/10.1145/3397536.3422270","url":null,"abstract":"In recent years, location prediction has become an important task and has gained significant attention. Existing location prediction methods rely on centralized storage of user mobility data for model training, which may lead to privacy concerns and risks due to the privacy-sensitive nature of user behaviors. In this work, we propose a privacy-preserving method for mobility prediction model training based on federated learning, which can leverage the useful information in the behaviors of massive users to train accurate mobility prediction models and meanwhile remove the need to centralized storage of them. Firstly, we propose a novel network named STSAN (Spatial-Temporal Self-Attention Network) on each user device, which can integrate spatiotemporal information with the self-attention for location prediction and a new personalized federated learning model named AMF (Adaptive Model Fusion Federated Learning), which is a mixture of local and global model. Finally, the results are superior to various baselines on four real-world check-ins datasets, verifying the effectiveness of the method.","PeriodicalId":233918,"journal":{"name":"Proceedings of the 28th International Conference on Advances in Geographic Information Systems","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128773286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online sources of news have steadily supplanted their paper counterparts alongside the growth of the internet. This growth in online news has led to a surplus of data in the form of the text of news articles published online. While an abundance of data is obviously desirable, it can make it difficult for a human to analyze and find trends in the data without assistance. The application demonstrated in the paper aims to aid users in such analysis by building a spatio-textual and spatiotemporal data visualization based on the existing NewsStand architecture. The application is shown to be applicable to tracking the changing geographic prevalence of a disease (e.g., COVID-19) over time.
{"title":"Visualizing SpatioTemporal Keyword Trends in Online News Articles","authors":"J. Kastner, H. Samet","doi":"10.1145/3397536.3422339","DOIUrl":"https://doi.org/10.1145/3397536.3422339","url":null,"abstract":"Online sources of news have steadily supplanted their paper counterparts alongside the growth of the internet. This growth in online news has led to a surplus of data in the form of the text of news articles published online. While an abundance of data is obviously desirable, it can make it difficult for a human to analyze and find trends in the data without assistance. The application demonstrated in the paper aims to aid users in such analysis by building a spatio-textual and spatiotemporal data visualization based on the existing NewsStand architecture. The application is shown to be applicable to tracking the changing geographic prevalence of a disease (e.g., COVID-19) over time.","PeriodicalId":233918,"journal":{"name":"Proceedings of the 28th International Conference on Advances in Geographic Information Systems","volume":"528 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116575791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is commonly believed that, in congressional and state legislature elections in the United States, rural voters have an inherent political advantage over urban voters. We study this hypothesis using an idealized redistricting method, balanced centroidal power diagrams, that achieves essentially perfect population balance while optimizing a principled measure of compactness. We find that, using this method, the degree to which rural or urban voters have a political advantage depends on the number of districts and the population density of urban areas. Moreover, we find that the political advantage in any case tends to be dramatically less than that afforded by district plans used in the real world, including district plans drawn by presumably neutral parties such as the courts. One possible explanation is suggested by the following discovery: modifying centroidal power diagrams to prefer placing boundaries along city boundaries significantly increases the advantage rural voters have over urban voters.
{"title":"The impact of highly compact algorithmic redistricting on the rural-versus-urban balance","authors":"Archer Wheeler, P. Klein","doi":"10.1145/3397536.3422249","DOIUrl":"https://doi.org/10.1145/3397536.3422249","url":null,"abstract":"It is commonly believed that, in congressional and state legislature elections in the United States, rural voters have an inherent political advantage over urban voters. We study this hypothesis using an idealized redistricting method, balanced centroidal power diagrams, that achieves essentially perfect population balance while optimizing a principled measure of compactness. We find that, using this method, the degree to which rural or urban voters have a political advantage depends on the number of districts and the population density of urban areas. Moreover, we find that the political advantage in any case tends to be dramatically less than that afforded by district plans used in the real world, including district plans drawn by presumably neutral parties such as the courts. One possible explanation is suggested by the following discovery: modifying centroidal power diagrams to prefer placing boundaries along city boundaries significantly increases the advantage rural voters have over urban voters.","PeriodicalId":233918,"journal":{"name":"Proceedings of the 28th International Conference on Advances in Geographic Information Systems","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123065019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present spaTScope, a web application for visual exploration of geolocated time series. Analyzing such data is becoming increasingly important in many domains, such as energy demand management, geomarketing and geosocial networks. spaTScope allows users to visually explore large collections of geolocated time series and obtain insights about trends and patterns in their area of interest. The provided functionalities leverage a hybrid index that allows to navigate and group the available time series based not only on their similarity but also on spatial proximity. The results are visualized using linked plots combining maps and timelines.
{"title":"A Visual Explorer for Geolocated Time Series","authors":"Georgios Chatzigeorgakidis, Kostas Patroumpas, Dimitrios Skoutas, Spiros Athanasiou","doi":"10.1145/3397536.3422345","DOIUrl":"https://doi.org/10.1145/3397536.3422345","url":null,"abstract":"We present spaTScope, a web application for visual exploration of geolocated time series. Analyzing such data is becoming increasingly important in many domains, such as energy demand management, geomarketing and geosocial networks. spaTScope allows users to visually explore large collections of geolocated time series and obtain insights about trends and patterns in their area of interest. The provided functionalities leverage a hybrid index that allows to navigate and group the available time series based not only on their similarity but also on spatial proximity. The results are visualized using linked plots combining maps and timelines.","PeriodicalId":233918,"journal":{"name":"Proceedings of the 28th International Conference on Advances in Geographic Information Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133832066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent applications employ publish/subscribe (Pub/Sub) systems so that publishers can easily receive attentions of customers and subscribers can monitor useful information generated by publishers. Due to the prevalence of smart devices and social networking services, a large number of objects that contain both spatial and keyword information have been generated continuously, and the number of subscribers also continues to increase. This poses a challenge to Pub/Sub systems: they need to continuously extract useful information from massive objects for each subscriber in real time. In this paper, we address the problem of k nearest neighbor monitoring on a spatial-keyword data stream for a large number of subscriptions. To scale well to massive objects and subscriptions, we propose a distributed solution. Given m workers, we divide a set of subscriptions into m disjoint subsets based on a cost model so that each worker has almost the same kNN-update cost, to maintain load balancing. We allow an arbitrary approach to updating kNN of each subscription, so with a suitable in-memory index, our solution can accelerate update efficiency by pruning irrelevant subscriptions for a given new object. We conduct experiments on real datasets, and the results demonstrate the efficiency and scalability of our solution.
{"title":"Distributed Spatial-Keyword kNN Monitoring for Location-aware Pub/Sub","authors":"Shohei Tsuruoka, Daichi Amagata, Shunya Nishio, Takahiro Hara","doi":"10.1145/3397536.3422199","DOIUrl":"https://doi.org/10.1145/3397536.3422199","url":null,"abstract":"Recent applications employ publish/subscribe (Pub/Sub) systems so that publishers can easily receive attentions of customers and subscribers can monitor useful information generated by publishers. Due to the prevalence of smart devices and social networking services, a large number of objects that contain both spatial and keyword information have been generated continuously, and the number of subscribers also continues to increase. This poses a challenge to Pub/Sub systems: they need to continuously extract useful information from massive objects for each subscriber in real time. In this paper, we address the problem of k nearest neighbor monitoring on a spatial-keyword data stream for a large number of subscriptions. To scale well to massive objects and subscriptions, we propose a distributed solution. Given m workers, we divide a set of subscriptions into m disjoint subsets based on a cost model so that each worker has almost the same kNN-update cost, to maintain load balancing. We allow an arbitrary approach to updating kNN of each subscription, so with a suitable in-memory index, our solution can accelerate update efficiency by pruning irrelevant subscriptions for a given new object. We conduct experiments on real datasets, and the results demonstrate the efficiency and scalability of our solution.","PeriodicalId":233918,"journal":{"name":"Proceedings of the 28th International Conference on Advances in Geographic Information Systems","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114340409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}