The COVID-19 pandemic has dramatically transformed human mobility patterns. Therefore, human mobility prediction for the “new normal” is crucial to infrastructure redesign, emergency management, and urban planning post the pandemic. This paper aims to predict people’s number of visits to various locations in New York City using COVID and mobility data in the past two years. To quantitatively model the impact of COVID cases on human mobility patterns and predict mobility patterns across the pandemic period, this paper develops a model CCAAT-GCN (Cross- and Context-Attention based Spatial-Temporal Graph Convolutional Networks). The proposed model is validated using SafeGraph data in New York City from August 2020 to April 2022. A rich set of baselines are performed to demonstrate the performance of our proposed model. Results demonstrate the superior performance of our proposed method. Also, the attention matrix learned by our model exhibits a strong alignment with the COVID-19 situation and the points of interest within the geographic region. This alignment suggests that the model effectively captures the intricate relationships between COVID-19 case rates and human mobility patterns. The developed model and findings can offer insights into the mobility pattern prediction for future disruptive events and pandemics, so as to assist with emergency preparedness for planners, decision-makers and policymakers.
{"title":"Cross- and Context-Aware Attention Based Spatial-Temporal Graph Convolutional Networks for Human Mobility Prediction","authors":"Zhaobin Mo, Haotian Xiang, Xuan Di","doi":"10.1145/3673227","DOIUrl":"https://doi.org/10.1145/3673227","url":null,"abstract":"The COVID-19 pandemic has dramatically transformed human mobility patterns. Therefore, human mobility prediction for the “new normal” is crucial to infrastructure redesign, emergency management, and urban planning post the pandemic. This paper aims to predict people’s number of visits to various locations in New York City using COVID and mobility data in the past two years. To quantitatively model the impact of COVID cases on human mobility patterns and predict mobility patterns across the pandemic period, this paper develops a model CCAAT-GCN (Cross- and Context-Attention based Spatial-Temporal Graph Convolutional Networks). The proposed model is validated using SafeGraph data in New York City from August 2020 to April 2022. A rich set of baselines are performed to demonstrate the performance of our proposed model. Results demonstrate the superior performance of our proposed method. Also, the attention matrix learned by our model exhibits a strong alignment with the COVID-19 situation and the points of interest within the geographic region. This alignment suggests that the model effectively captures the intricate relationships between COVID-19 case rates and human mobility patterns. The developed model and findings can offer insights into the mobility pattern prediction for future disruptive events and pandemics, so as to assist with emergency preparedness for planners, decision-makers and policymakers.","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141340649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fahim Tasneema Azad, K. Candan, Ahmet Kapkic, Mao-Lin Li, Huan Liu, Pratanu Mandal, Paras Sheth, Bilgehan Arslan, Gerardo Chowell-Puente, John Sabo, R. Muenich, Javier Redondo Anton, M. Sapino
Successfully tackling many urgent challenges in socio-economically critical domains, such as public health and sustainability, requires a deeper understanding of causal relationships and interactions among a diverse spectrum of spatio-temporally distributed entities. In these applications, the ability to leverage spatio-temporal data to obtain causally-based situational awareness and to develop informed forecasts to provide resilience at different scales is critical. While the promise of a causally-grounded approach to these challenges is apparent, the core data technologies needed to achieve these are in the early stages and lack a framework to help realize their potential. In this paper, we argue that there is an urgent need for a novel paradigm of spatio-causal research built on computational advances in, spatio-temporal data and model integration, causal learning and discovery, large scale data- and model-driven simulations, emulations, and forecasting, spatio-temporal data-driven and model centric operational recommendations, and effective causally-driven visualization and explanation. We, thus, provide a vision, and a road-map, for spatio-causal situation awareness, forecasting, and planning.
{"title":"(Vision Paper) A Vision for Spatio-Causal Situation Awareness, Forecasting, and Planning","authors":"Fahim Tasneema Azad, K. Candan, Ahmet Kapkic, Mao-Lin Li, Huan Liu, Pratanu Mandal, Paras Sheth, Bilgehan Arslan, Gerardo Chowell-Puente, John Sabo, R. Muenich, Javier Redondo Anton, M. Sapino","doi":"10.1145/3672556","DOIUrl":"https://doi.org/10.1145/3672556","url":null,"abstract":"Successfully tackling many urgent challenges in socio-economically critical domains, such as public health and sustainability, requires a deeper understanding of causal relationships and interactions among a diverse spectrum of spatio-temporally distributed entities. In these applications, the ability to leverage spatio-temporal data to obtain causally-based situational awareness and to develop informed forecasts to provide resilience at different scales is critical. While the promise of a causally-grounded approach to these challenges is apparent, the core data technologies needed to achieve these are in the early stages and lack a framework to help realize their potential. In this paper, we argue that there is an urgent need for a novel paradigm of spatio-causal research built on computational advances in, spatio-temporal data and model integration, causal learning and discovery, large scale data- and model-driven simulations, emulations, and forecasting, spatio-temporal data-driven and model centric operational recommendations, and effective causally-driven visualization and explanation. We, thus, provide a vision, and a road-map, for spatio-causal situation awareness, forecasting, and planning.","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141353812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Mokbel, Mahmoud Sakr, Li Xiong, Andreas Züfle, Jussara Almeida, Taylor Anderson, W. Aref, G. Andrienko, N. Andrienko, Yang Cao, Sanjay Chawla, R. Cheng, P. Chrysanthis, Xiqi Fei, Gabriel Ghinita, Anita Graser, D. Gunopulos, C. S. Jensen, Joon-Seok Kim, Peer Kröger Kyoung-Sook Kim, John Krumm, Johannes Lauer, A. Magdy, Mario A. Nascimento, S. Ravada, Matthias Renz, Dimitris Sacharidis, Flora Salim, Mohamed Sarwat, M. Schoemans, Cyrus Shahabi, Bettina Speckmann, E. Tanin, Xu Teng, Y. Theodoridis, Kristian Torp, Goce Trajcevski, Mar van Kreveld, C. Wenk, Martin Werner, Raymond E. Wong, Song Wu, Jianqiu Xu, Moustafa Youssef, Demetris Zeinalipour, Mengxuan Zhang
Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the domain of mobility data science. Towards a unified approach to mobility data science, we present a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art, and describe open challenges for the research community in the coming years.
{"title":"Mobility Data Science: Perspectives and Challenges","authors":"M. Mokbel, Mahmoud Sakr, Li Xiong, Andreas Züfle, Jussara Almeida, Taylor Anderson, W. Aref, G. Andrienko, N. Andrienko, Yang Cao, Sanjay Chawla, R. Cheng, P. Chrysanthis, Xiqi Fei, Gabriel Ghinita, Anita Graser, D. Gunopulos, C. S. Jensen, Joon-Seok Kim, Peer Kröger Kyoung-Sook Kim, John Krumm, Johannes Lauer, A. Magdy, Mario A. Nascimento, S. Ravada, Matthias Renz, Dimitris Sacharidis, Flora Salim, Mohamed Sarwat, M. Schoemans, Cyrus Shahabi, Bettina Speckmann, E. Tanin, Xu Teng, Y. Theodoridis, Kristian Torp, Goce Trajcevski, Mar van Kreveld, C. Wenk, Martin Werner, Raymond E. Wong, Song Wu, Jianqiu Xu, Moustafa Youssef, Demetris Zeinalipour, Mengxuan Zhang","doi":"10.1145/3652158","DOIUrl":"https://doi.org/10.1145/3652158","url":null,"abstract":"Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the domain of mobility data science. Towards a unified approach to mobility data science, we present a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art, and describe open challenges for the research community in the coming years.","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141002628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Aguilar, K. Buchin, M. Buchin, Erfan Hosseini Sereshgi, Rodrigo I. Silveira, C. Wenk
Comparing two road maps is a basic operation that arises in a variety of situations. A map comparison method that is commonly used, mainly in the context of comparing reconstructed maps to ground truth maps, is based on graph sampling . The essential idea is to first compute a set of point samples on each map, and then to match pairs of samples—one from each map—in a one-to-one fashion. For deciding whether two samples can be matched, different criteria, e.g., based on distance or orientation, can be used. The total number of matched pairs gives a measure of how similar the maps are. Since the work of Biagioni and Eriksson [11, 12], graph sampling methods have become widely used. However, there are different ways to implement each of the steps, which can lead to significant differences in the results. This means that conclusions drawn from different studies that seemingly use the same comparison method, cannot necessarily be compared. In this work we present a unified approach to graph sampling for map comparison. We present the method in full generality, discussing the main decisions involved in its implementation. In particular, we point out the importance of the sampling method (GEO vs. TOPO) and that of the matching definition, discussing the main options used, and proposing alternatives for both key steps. We experimentally evaluate the different sampling and matching options considered on map datasets and reconstructed maps. Furthermore, we provide a code base and an interactive visualization tool to set a standard for future evaluations in the field of map construction and map comparison.
{"title":"Graph Sampling for Map Comparison","authors":"J. Aguilar, K. Buchin, M. Buchin, Erfan Hosseini Sereshgi, Rodrigo I. Silveira, C. Wenk","doi":"10.1145/3662733","DOIUrl":"https://doi.org/10.1145/3662733","url":null,"abstract":"\u0000 Comparing two road maps is a basic operation that arises in a variety of situations. A map comparison method that is commonly used, mainly in the context of comparing reconstructed maps to ground truth maps, is based on\u0000 graph sampling\u0000 . The essential idea is to first compute a set of point samples on each map, and then to match pairs of samples—one from each map—in a one-to-one fashion. For deciding whether two samples can be matched, different criteria, e.g., based on distance or orientation, can be used. The total number of matched pairs gives a measure of how similar the maps are.\u0000 \u0000 Since the work of Biagioni and Eriksson [11, 12], graph sampling methods have become widely used. However, there are different ways to implement each of the steps, which can lead to significant differences in the results. This means that conclusions drawn from different studies that seemingly use the same comparison method, cannot necessarily be compared.\u0000 In this work we present a unified approach to graph sampling for map comparison. We present the method in full generality, discussing the main decisions involved in its implementation. In particular, we point out the importance of the sampling method (GEO vs. TOPO) and that of the matching definition, discussing the main options used, and proposing alternatives for both key steps. We experimentally evaluate the different sampling and matching options considered on map datasets and reconstructed maps. Furthermore, we provide a code base and an interactive visualization tool to set a standard for future evaluations in the field of map construction and map comparison.","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141016446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Representation learning has been instrumental in the success of machine learning, offering compact and performant data representations for diverse downstream tasks. In the spatial domain, it has been pivotal in extracting latent patterns from various data types, including points, polylines, polygons, and networked structures. However, existing approaches often fall short of explicitly capturing both semantic and spatial information, relying on proxies and synthetic features. This paper presents GeoNN, a novel graph neural network-based model designed to learn spatially-aware embeddings for geospatial entities. GeoNN leverages edge features generated from geodesic functions, dynamically selecting relevant features based on relative locations. It introduces both transductive (GeoNN-T) and inductive (GeoNN-I) models, ensuring effective encoding of geospatial features and scalability with entity changes. Extensive experiments demonstrate GeoNN’s effectiveness in location-sensitive superpixel-based graphs and real-world points of interest, outperforming baselines across various evaluation measures.
{"title":"Latent Representation Learning for Geospatial Entities","authors":"Ween Jiann Lee, Hady W. Lauw","doi":"10.1145/3663474","DOIUrl":"https://doi.org/10.1145/3663474","url":null,"abstract":"Representation learning has been instrumental in the success of machine learning, offering compact and performant data representations for diverse downstream tasks. In the spatial domain, it has been pivotal in extracting latent patterns from various data types, including points, polylines, polygons, and networked structures. However, existing approaches often fall short of explicitly capturing both semantic and spatial information, relying on proxies and synthetic features. This paper presents GeoNN, a novel graph neural network-based model designed to learn spatially-aware embeddings for geospatial entities. GeoNN leverages edge features generated from geodesic functions, dynamically selecting relevant features based on relative locations. It introduces both transductive (GeoNN-T) and inductive (GeoNN-I) models, ensuring effective encoding of geospatial features and scalability with entity changes. Extensive experiments demonstrate GeoNN’s effectiveness in location-sensitive superpixel-based graphs and real-world points of interest, outperforming baselines across various evaluation measures.","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141018785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas Züfle, Flora Salim, Taylor Anderson, M. Scotch, Li Xiong, Kacper Sokol, Hao Xue, Ruochen Kong, David Heslop, Hye-Young Paik, C. R. MacIntyre
The spread of infectious diseases is a highly complex spatiotemporal process, difficult to understand, predict, and effectively respond to. Machine learning and artificial intelligence (AI) have achieved impressive results in other learning and prediction tasks; however, while many AI solutions are developed for disease prediction, only a few of them are adopted by decision-makers to support policy interventions. Among several issues preventing their uptake, AI methods are known to amplify the bias in the data they are trained on. This is especially problematic for infectious disease models that typically leverage large, open, and inherently biased spatiotemporal data. These biases may propagate through the modeling pipeline to decision-making, resulting in inequitable policy interventions. Therefore, there is a need to gain an understanding of how the AI disease modeling pipeline can mitigate biased input data, in-processing models, and biased outputs. Specifically, our vision is to develop a large-scale micro-simulation of individuals from which human mobility, population, and disease ground truth data can be obtained. From this complete dataset – which may not reflect the real world – we can sample and inject different types of bias. By using the sampled data in which bias is known (as it is given as the simulation parameter), we can explore how existing solutions for fairness in AI can mitigate and correct these biases and investigate novel AI fairness solutions. Achieving this vision would result in improved trust in such models for informing fair and equitable policy interventions.
{"title":"Leveraging Simulation Data to Understand Bias in Predictive Models of Infectious Disease Spread","authors":"Andreas Züfle, Flora Salim, Taylor Anderson, M. Scotch, Li Xiong, Kacper Sokol, Hao Xue, Ruochen Kong, David Heslop, Hye-Young Paik, C. R. MacIntyre","doi":"10.1145/3660631","DOIUrl":"https://doi.org/10.1145/3660631","url":null,"abstract":"The spread of infectious diseases is a highly complex spatiotemporal process, difficult to understand, predict, and effectively respond to. Machine learning and artificial intelligence (AI) have achieved impressive results in other learning and prediction tasks; however, while many AI solutions are developed for disease prediction, only a few of them are adopted by decision-makers to support policy interventions. Among several issues preventing their uptake, AI methods are known to amplify the bias in the data they are trained on. This is especially problematic for infectious disease models that typically leverage large, open, and inherently biased spatiotemporal data. These biases may propagate through the modeling pipeline to decision-making, resulting in inequitable policy interventions. Therefore, there is a need to gain an understanding of how the AI disease modeling pipeline can mitigate biased input data, in-processing models, and biased outputs. Specifically, our vision is to develop a large-scale micro-simulation of individuals from which human mobility, population, and disease ground truth data can be obtained. From this complete dataset – which may not reflect the real world – we can sample and inject different types of bias. By using the sampled data in which bias is known (as it is given as the simulation parameter), we can explore how existing solutions for fairness in AI can mitigate and correct these biases and investigate novel AI fairness solutions. Achieving this vision would result in improved trust in such models for informing fair and equitable policy interventions.","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140657355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Skyline path queries (SPQs) extend skyline queries to multi-dimensional networks, such as multi-cost road networks (MCRNs). Such queries return a set of non-dominated paths between two given network nodes. Despite the existence of extensive works on evaluating different SPQ variants, SPQ evaluation is still very inefficient due to the nonexistence of efficient index structures to support such queries. Existing index building approaches for supporting shortest-path query execution, when directly extended to support SPQs, use an unreasonable amount of space and time to build, making them impractical for processing large graphs. In this paper, we propose a novel index structure, backbone index , and a corresponding index construction method that condenses an initial MCRN to multiple smaller summarized graphs with different granularity. We present efficient approaches to find approximate solutions to SPQs by utilizing the backbone index structure. Furthermore, considering making good use of historical query and query results, we propose two models, S kyline P ath G raph N eural N etwork (SP-GNN) and T ransfer SP-GNN (TSP-GNN), to support effective SPQ processing. Our extensive experiments on real-world large road networks show that the backbone index can support finding meaningful approximate SPQ solutions efficiently. The backbone index can be constructed in a reasonable time, which dramatically outperforms the construction of other types of indexes for road networks. As far as we know, this is the first compact index structure that can support efficient approximate SPQ evaluation on large MCRNs. The results on the SP-GNN and TSP-GNN models also show that both models can help get approximate SPQ answers efficiently.
天际线路径查询(SPQ)将天际线查询扩展到多维网络,如多成本道路网络(MCRN)。此类查询会返回两个给定网络节点之间的一组非主干路径。尽管有大量工作在评估不同的 SPQ 变体,但由于不存在支持此类查询的高效索引结构,SPQ 评估的效率仍然很低。现有的支持最短路径查询执行的索引构建方法在直接扩展到支持 SPQ 时,会耗费大量的空间和时间,使其在处理大型图时变得不切实际。在本文中,我们提出了一种新颖的索引结构--骨干索引,以及相应的索引构建方法,该方法可将初始 MCRN 压缩为多个具有不同粒度的较小汇总图。我们提出了利用骨干索引结构找到 SPQ 近似解的有效方法。此外,考虑到充分利用历史查询和查询结果,我们提出了两种模型,即S kyline P ath G raph N eural N etwork(SP-GNN)和T ransfer SP-GNN(TSP-GNN),以支持有效的SPQ处理。我们在真实世界的大型道路网络上进行的大量实验表明,骨干索引能够支持高效地找到有意义的近似 SPQ 解。骨干索引可以在合理的时间内构建,大大优于为道路网络构建其他类型的索引。据我们所知,这是第一个能支持在大型 MCRN 上高效近似 SPQ 评估的紧凑型索引结构。SP-GNN 和 TSP-GNN 模型的结果也表明,这两种模型都能帮助高效获得近似 SPQ 答案。
{"title":"Backbone Index and GNN Models for Skyline Path Query Evaluation over Multi-cost Road Networks","authors":"Qixu Gong, Huiying Chen, Huiping Cao, Jiefei Liu","doi":"10.1145/3660632","DOIUrl":"https://doi.org/10.1145/3660632","url":null,"abstract":"\u0000 Skyline path queries (SPQs) extend skyline queries to multi-dimensional networks, such as multi-cost road networks (MCRNs). Such queries return a set of non-dominated paths between two given network nodes. Despite the existence of extensive works on evaluating different SPQ variants, SPQ evaluation is still very inefficient due to the nonexistence of efficient index structures to support such queries. Existing index building approaches for supporting shortest-path query execution, when directly extended to support SPQs, use an unreasonable amount of space and time to build, making them impractical for processing large graphs. In this paper, we propose a novel index structure,\u0000 backbone index\u0000 , and a corresponding index construction method that condenses an initial MCRN to multiple smaller summarized graphs with different granularity. We present efficient approaches to find approximate solutions to SPQs by utilizing the backbone index structure. Furthermore, considering making good use of historical query and query results, we propose two models,\u0000 S\u0000 kyline\u0000 P\u0000 ath\u0000 G\u0000 raph\u0000 N\u0000 eural\u0000 N\u0000 etwork (SP-GNN) and\u0000 T\u0000 ransfer SP-GNN (TSP-GNN), to support effective SPQ processing. Our extensive experiments on real-world large road networks show that the backbone index can support finding meaningful approximate SPQ solutions efficiently. The backbone index can be constructed in a reasonable time, which dramatically outperforms the construction of other types of indexes for road networks. As far as we know, this is the first compact index structure that can support efficient approximate SPQ evaluation on large MCRNs. The results on the SP-GNN and TSP-GNN models also show that both models can help get approximate SPQ answers efficiently.\u0000","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140670799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xovee Xu, Ting Zhong, Haoyang Yu, Fan Zhou, Goce Trajcevski
Citywide fine-grained urban flow inference (FUFI) problem aims to infer the high-resolution flow maps from the coarse-grained ones, which plays an important role in sustainable and economic urban computing and intelligent traffic management. Previous models tackle this problem from spatial constraint, external factors and memory cost. However, utilizing the new urban flow maps to calibrate the learned model is very challenging due to the “catastrophic forgetting” problem and is still under-explored. In this paper, we make the first step in FUFI and present CUFAR – Continual Urban Flow inference with augmented Adaptive knowledge Replay – a novel framework for inferring the fine-grained citywide traffic flows. Specifically, (1) we design a spatial-temporal inference network that can extract better flow map features from both local and global levels; (2) then we present an augmented adaptive knowledge replay (AKR) training algorithm to selectively replay the learned knowledge to facilitate the learning process of the model on new knowledge without forgetting. We apply several data augmentation techniques to improve the generalization capability of the learning model, gaining additional performance improvements. We also propose a knowledge discriminator to avoid the “negative replaying” issue introduced by noisy urban flow maps. Extensive experiments on two large-scale real-world FUFI datasets demonstrate that our proposed model consistently outperforms strong baselines and effectively mitigates the forgetting problem.
{"title":"Overcoming Catastrophic Forgetting in Continual Fine-Grained Urban Flow Inference","authors":"Xovee Xu, Ting Zhong, Haoyang Yu, Fan Zhou, Goce Trajcevski","doi":"10.1145/3660523","DOIUrl":"https://doi.org/10.1145/3660523","url":null,"abstract":"Citywide fine-grained urban flow inference (FUFI) problem aims to infer the high-resolution flow maps from the coarse-grained ones, which plays an important role in sustainable and economic urban computing and intelligent traffic management. Previous models tackle this problem from spatial constraint, external factors and memory cost. However, utilizing the new urban flow maps to calibrate the learned model is very challenging due to the “catastrophic forgetting” problem and is still under-explored. In this paper, we make the first step in FUFI and present CUFAR – Continual Urban Flow inference with augmented Adaptive knowledge Replay – a novel framework for inferring the fine-grained citywide traffic flows. Specifically, (1) we design a spatial-temporal inference network that can extract better flow map features from both local and global levels; (2) then we present an augmented adaptive knowledge replay (AKR) training algorithm to selectively replay the learned knowledge to facilitate the learning process of the model on new knowledge without forgetting. We apply several data augmentation techniques to improve the generalization capability of the learning model, gaining additional performance improvements. We also propose a knowledge discriminator to avoid the “negative replaying” issue introduced by noisy urban flow maps. Extensive experiments on two large-scale real-world FUFI datasets demonstrate that our proposed model consistently outperforms strong baselines and effectively mitigates the forgetting problem.","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140680537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning (ML) and deep learning (DL) techniques are increasingly applied to produce efficient query optimizers, in particular in regards to big data systems. The optimization of spatial operations is even more challenging due to the inherent complexity of such kind of operations, like spatial join or range query, and the peculiarities of spatial data. Although a few ML-based spatial query optimizers have been proposed in literature, their design limits their use, since each one is tailored for a specific collection of datasets, a specific operation, or a specific hardware setting. Changes to any of these will require building and training a completely new model which entails collecting a new very large training dataset to obtain a good model. This paper proposes a different approach which exploits the use of the novel notion of spatial embedding to overcome these limitations. In particular, a preliminary model is defined which captures the relevant features of spatial datasets, independently from the operation to be optimized and in an unsupervised manner. This model is trained with a large amount of both synthetic and real-world data, with the aim to produce meaningful spatial embeddings. The construction of an embedding model could be intended as a preliminary step for the optimization of many different spatial operations, so the cost of its building can be compensated during the subsequent construction of specific models. Indeed, for each considered spatial operation, a specific tailored model will be trained but by using spatial embeddings as input, so a very little amount of training data points is required for them. Three peculiar operations are considered as proof of concept in this paper: range query, self-join, and binary spatial join. Finally, a comparison with an alternative technique, known as transfer learning, is provided and the advantages of the proposed technique over it are highlighted.
机器学习(ML)和深度学习(DL)技术越来越多地被应用于生成高效的查询优化器,尤其是在大数据系统方面。由于空间连接或范围查询等操作本身的复杂性以及空间数据的特殊性,空间操作的优化更具挑战性。虽然文献中已经提出了一些基于 ML 的空间查询优化器,但它们的设计限制了其使用,因为每个优化器都是为特定的数据集集合、特定的操作或特定的硬件设置量身定制的。要对其中任何一项进行更改,都需要建立和训练一个全新的模型,这就需要收集一个新的超大训练数据集,以获得一个良好的模型。 本文提出了一种不同的方法,利用新颖的空间嵌入概念来克服这些限制。特别是,本文定义了一个初步模型,该模型以无监督的方式捕捉空间数据集的相关特征,与需要优化的操作无关。该模型使用大量合成数据和真实世界数据进行训练,目的是生成有意义的空间嵌入。嵌入模型的构建可以作为许多不同空间操作优化的第一步,因此在随后构建特定模型时,可以补偿构建模型的成本。事实上,对于每一种考虑到的空间操作,都将通过使用空间嵌入作为输入来训练特定的定制模型,因此只需要很少的训练数据点。作为概念验证,本文考虑了三种特殊操作:范围查询、自连接和二进制空间连接。最后,本文与另一种称为迁移学习的技术进行了比较,并强调了所提出的技术与之相比的优势。
{"title":"A Generic Machine Learning Model for Spatial Query Optimization based on Spatial Embeddings","authors":"A. Belussi, S. Migliorini, Ahmed Eldawy","doi":"10.1145/3657633","DOIUrl":"https://doi.org/10.1145/3657633","url":null,"abstract":"Machine learning (ML) and deep learning (DL) techniques are increasingly applied to produce efficient query optimizers, in particular in regards to big data systems. The optimization of spatial operations is even more challenging due to the inherent complexity of such kind of operations, like spatial join or range query, and the peculiarities of spatial data. Although a few ML-based spatial query optimizers have been proposed in literature, their design limits their use, since each one is tailored for a specific collection of datasets, a specific operation, or a specific hardware setting. Changes to any of these will require building and training a completely new model which entails collecting a new very large training dataset to obtain a good model.\u0000 \u0000 This paper proposes a different approach which exploits the use of the novel notion of\u0000 spatial embedding\u0000 to overcome these limitations. In particular, a preliminary model is defined which captures the relevant features of spatial datasets, independently from the operation to be optimized and in an unsupervised manner. This model is trained with a large amount of both synthetic and real-world data, with the aim to produce meaningful spatial embeddings. The construction of an embedding model could be intended as a preliminary step for the optimization of many different spatial operations, so the cost of its building can be compensated during the subsequent construction of specific models. Indeed, for each considered spatial operation, a specific tailored model will be trained but by using spatial embeddings as input, so a very little amount of training data points is required for them. Three peculiar operations are considered as proof of concept in this paper: range query, self-join, and binary spatial join. Finally, a comparison with an alternative technique, known as transfer learning, is provided and the advantages of the proposed technique over it are highlighted.\u0000","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140707657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Zhang, Zhiwen Yu, Minling Dang, En Xu, Bin Guo, Yuxuan Liang, Yifang Yin, Roger Zimmermann
Human mobility is the foundation of urban dynamics and its prediction significantly benefits various downstream location-based services. Nowadays, while deep learning approaches are dominating the mobility prediction field where various model architectures/designs are continuously updating to push up the prediction accuracy, there naturally arises a question: whether these models are sufficiently good to reach the best possible prediction accuracy? To answer this question, predictability study is a method that quantifies the inherent regularities of the human mobility data and links the result to that limit. Mainstream predictability studies achieve this by analyzing the individual trajectories and merging all individual results to obtain an upper bound. However, the multiple individuals composing the city are not totally independent and the individual behavior is heavily influenced by its implicit or explicit surroundings. Therefore, the collective factor should be considered in the mobility predictability measurement, which has not been addressed before. This vision paper points out this concern and envisions a few potential research problems along such an individual-to-collective transition from both data and methodology aspects. We hope the discussion in this paper sheds some light on the human mobility predictability community.
{"title":"Predictability in Human Mobility: From Individual to Collective (Vision Paper)","authors":"Ying Zhang, Zhiwen Yu, Minling Dang, En Xu, Bin Guo, Yuxuan Liang, Yifang Yin, Roger Zimmermann","doi":"10.1145/3656640","DOIUrl":"https://doi.org/10.1145/3656640","url":null,"abstract":"Human mobility is the foundation of urban dynamics and its prediction significantly benefits various downstream location-based services. Nowadays, while deep learning approaches are dominating the mobility prediction field where various model architectures/designs are continuously updating to push up the prediction accuracy, there naturally arises a question: whether these models are sufficiently good to reach the best possible prediction accuracy? To answer this question, predictability study is a method that quantifies the inherent regularities of the human mobility data and links the result to that limit. Mainstream predictability studies achieve this by analyzing the individual trajectories and merging all individual results to obtain an upper bound. However, the multiple individuals composing the city are not totally independent and the individual behavior is heavily influenced by its implicit or explicit surroundings. Therefore, the collective factor should be considered in the mobility predictability measurement, which has not been addressed before. This vision paper points out this concern and envisions a few potential research problems along such an individual-to-collective transition from both data and methodology aspects. We hope the discussion in this paper sheds some light on the human mobility predictability community.","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140725984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}