Regionalization, also known as spatially constrained clustering, is an unsupervised machine learning technique used to identify and define spatially contiguous regions. In this work, we introduce a methodology to regionalize recommendation systems (RSs) based on a collaborative filtering approach. Two main challenges arise when performing regionalization based on users’ preferences in RSs: (1) unstructured data, as interactions are often scarce and observed on a smaller scale; and (2) the difficulty of evaluation of the quality of the clustering results. To address these challenges, our methodology relies on inductive matrix completion (IMC), a fundamental approach to recover unknown entries of a rating matrix while utilizing region information to extract a region-based feature matrix. With this feature matrix, our method becomes adaptive and seamlessly integrates with various regionalization algorithms to create regionalization candidates. This enables us to derive more accurate recommendations that consider regionalized effects and discover interesting patterns in localized user behavior. We experimentally evaluate our model on synthetic datasets to demonstrate its efficacy in settings where our underlying assumptions are correct. Furthermore, we present a real-world case study illustrating the interpretable information the model can derive in terms of regionalized recommendation relevance.
{"title":"Regionalization-based Collaborative Filtering: Harnessing Geographical Information in Recommenders","authors":"Rodrigo Alves","doi":"10.1145/3656641","DOIUrl":"https://doi.org/10.1145/3656641","url":null,"abstract":"Regionalization, also known as spatially constrained clustering, is an unsupervised machine learning technique used to identify and define spatially contiguous regions. In this work, we introduce a methodology to regionalize recommendation systems (RSs) based on a collaborative filtering approach. Two main challenges arise when performing regionalization based on users’ preferences in RSs: (1) unstructured data, as interactions are often scarce and observed on a smaller scale; and (2) the difficulty of evaluation of the quality of the clustering results. To address these challenges, our methodology relies on inductive matrix completion (IMC), a fundamental approach to recover unknown entries of a rating matrix while utilizing region information to extract a region-based feature matrix. With this feature matrix, our method becomes adaptive and seamlessly integrates with various regionalization algorithms to create regionalization candidates. This enables us to derive more accurate recommendations that consider regionalized effects and discover interesting patterns in localized user behavior. We experimentally evaluate our model on synthetic datasets to demonstrate its efficacy in settings where our underlying assumptions are correct. Furthermore, we present a real-world case study illustrating the interpretable information the model can derive in terms of regionalized recommendation relevance.","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140729425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The availability of trajectory data combined with various real life practical applications have sparked the interest of the research community to design a plethora of algorithms for various trajectory analysis techniques. However, there is an apparent lack of full-fledged systems that provide the infrastructure support for trajectory analysis techniques, which hinders the applicability of most of the designed algorithms. Inspired by the tremendous success of the BERT deep learning model in solving various Natural Language Processing (NLP) tasks, our vision is to have a BERT-like system for trajectory analysis tasks. We envision that in a few years, we will have such system, where no one needs to worry again about each specific trajectory analysis operation. Whether it is trajectory imputation, similarity, clustering, or whatever, it would be one system that researchers, developers, and practitioners can deploy to get high accuracy for their trajectory operations. Our vision stands on a solid ground that trajectories in a space are highly analogous to statements in a language. We outline the challenges and the road to our vision. Exploratory results confirm the promise and possibility of our vision.
{"title":"Let’s Speak Trajectories: A Vision To Use NLP Models For Trajectory Analysis Tasks","authors":"Mashaal Musleh, M. Mokbel","doi":"10.1145/3656470","DOIUrl":"https://doi.org/10.1145/3656470","url":null,"abstract":"The availability of trajectory data combined with various real life practical applications have sparked the interest of the research community to design a plethora of algorithms for various trajectory analysis techniques. However, there is an apparent lack of full-fledged systems that provide the infrastructure support for trajectory analysis techniques, which hinders the applicability of most of the designed algorithms. Inspired by the tremendous success of the BERT deep learning model in solving various Natural Language Processing (NLP) tasks, our vision is to have a BERT-like system for trajectory analysis tasks. We envision that in a few years, we will have such system, where no one needs to worry again about each specific trajectory analysis operation. Whether it is trajectory imputation, similarity, clustering, or whatever, it would be one system that researchers, developers, and practitioners can deploy to get high accuracy for their trajectory operations. Our vision stands on a solid ground that trajectories in a space are highly analogous to statements in a language. We outline the challenges and the road to our vision. Exploratory results confirm the promise and possibility of our vision.","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140731682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gengchen Mai, Weiming Huang, Jin Sun, Suhang Song, Deepak Mishra, Ninghao Liu, Song Gao, Tianming Liu, Gao Cong, Yingjie Hu, Chris Cundy, Ziyuan Li, Rui Zhu, Ni Lao
Large pre-trained models, also known as foundation models (FMs), are trained in a task-agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine-tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have yet seen an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges of developing multimodal foundation models for GeoAI. We first investigate the potential of many existing FMs by testing their performances on seven tasks across multiple geospatial domains including Geospatial Semantics, Health Geography, Urban Geography, and Remote Sensing. Our results indicate that on several geospatial tasks that only involve text modality such as toponym recognition, location description recognition, and US state-level/county-level dementia time series forecasting, the task-agnostic LLMs can outperform task-specific fully-supervised models in a zero-shot or few-shot learning setting. However, on other geospatial tasks, especially tasks that involve multiple data modalities (e.g., POI-based urban function classification, street view image-based urban noise intensity classification, and remote sensing image scene classification), existing foundation models still underperform task-specific models. Based on these observations, we propose that one of the major challenges of developing a foundation model for GeoAI is to address the multimodality nature of geospatial tasks. After discussing the distinct challenges of each geospatial data modality, we suggest the possibility of a multimodal foundation model which can reason over various types of geospatial data through geospatial alignments. We conclude this paper by discussing the unique risks and challenges to develop such a model for GeoAI.
{"title":"On the Opportunities and Challenges of Foundation Models for GeoAI (Vision Paper)","authors":"Gengchen Mai, Weiming Huang, Jin Sun, Suhang Song, Deepak Mishra, Ninghao Liu, Song Gao, Tianming Liu, Gao Cong, Yingjie Hu, Chris Cundy, Ziyuan Li, Rui Zhu, Ni Lao","doi":"10.1145/3653070","DOIUrl":"https://doi.org/10.1145/3653070","url":null,"abstract":"\u0000 Large pre-trained models, also known as\u0000 foundation models\u0000 (FMs), are trained in a task-agnostic manner on large-scale data and can be adapted to a wide range of downstream tasks by fine-tuning, few-shot, or even zero-shot learning. Despite their successes in language and vision tasks, we have yet seen an attempt to develop foundation models for geospatial artificial intelligence (GeoAI). In this work, we explore the promises and challenges of developing multimodal foundation models for GeoAI. We first investigate the potential of many existing FMs by testing their performances on seven tasks across multiple geospatial domains including Geospatial Semantics, Health Geography, Urban Geography, and Remote Sensing. Our results indicate that on several geospatial tasks that only involve text modality such as toponym recognition, location description recognition, and US state-level/county-level dementia time series forecasting, the task-agnostic LLMs can outperform task-specific fully-supervised models in a zero-shot or few-shot learning setting. However, on other geospatial tasks, especially tasks that involve multiple data modalities (e.g., POI-based urban function classification, street view image-based urban noise intensity classification, and remote sensing image scene classification), existing foundation models still underperform task-specific models. Based on these observations, we propose that one of the major challenges of developing a foundation model for GeoAI is to address the multimodality nature of geospatial tasks. After discussing the distinct challenges of each geospatial data modality, we suggest the possibility of a multimodal foundation model which can reason over various types of geospatial data through geospatial alignments. We conclude this paper by discussing the unique risks and challenges to develop such a model for GeoAI.\u0000","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140226718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunting Song, Riccardo Fellegara, F. Iuricich, Leila De Floriani
We address the problem of performing a topology-aware simplification algorithm on a compact and distributed data structure for triangle meshes, the Terrain trees. Topology-aware operators have been defined to coarsen a Triangulated Irregular Network (TIN) without affecting the topology of its underlying terrain, i.e., without modifying critical features of the terrain, such as pits, saddles, peaks, and their connectivity. However, their scalability is limited for large-scale meshes. Our proposed algorithm uses a batched processing strategy to reduce both the memory and time requirements of the simplification process and thanks to the spatial decomposition on the basis of Terrain trees, it can be easily parallelized. Also, since a Terrain tree after the simplification process becomes less compact and efficient, we propose an efficient post-processing step for updating hierarchical spatial decomposition. Our experiments on real-world TINs, derived from topographic and bathymetric LiDAR data, demonstrate the scalability and efficiency of our approach. Specifically, topology-aware simplification on Terrain trees uses 40% less memory and half the time compared to the most compact and efficient connectivity-based data structure for TINs. Furthermore, the parallel simplification algorithm on the Terrain trees exhibits a 12x speedup with an OpenMP implementation. The quality of the output mesh is not significantly affected by the distributed and parallel simplification strategy of Terrain trees, and we obtain similar quality levels compared to the global baseline method.
我们要解决的问题是,如何在三角形网格的紧凑分布式数据结构--地形树上执行拓扑感知简化算法。拓扑感知算子已被定义为在不影响底层地形拓扑的情况下粗化三角形不规则网络(TIN),即不修改地形的关键特征,如坑、鞍、峰及其连接性。然而,对于大规模网格,它们的可扩展性是有限的。我们提出的算法采用分批处理策略,减少了简化过程对内存和时间的要求,而且由于基于地形树的空间分解,该算法可以轻松实现并行化。此外,由于简化过程后的地形树变得不那么紧凑和高效,我们提出了一种高效的后处理步骤,用于更新分层空间分解。我们在真实世界的 TIN 上进行了实验,这些 TIN 来源于地形和测深 LiDAR 数据,证明了我们方法的可扩展性和高效性。具体来说,与最紧凑、最高效的基于连接性的 TIN 数据结构相比,在地形树上进行拓扑感知简化可节省 40% 的内存和一半的时间。此外,采用 OpenMP 实现的 Terrain 树并行简化算法的速度提高了 12 倍。Terrain 树的分布式并行简化策略对输出网格的质量影响不大,与全局基准方法相比,我们获得了相似的质量水平。
{"title":"Parallel Topology-aware Mesh Simplification on Terrain Trees","authors":"Yunting Song, Riccardo Fellegara, F. Iuricich, Leila De Floriani","doi":"10.1145/3652602","DOIUrl":"https://doi.org/10.1145/3652602","url":null,"abstract":"We address the problem of performing a topology-aware simplification algorithm on a compact and distributed data structure for triangle meshes, the Terrain trees. Topology-aware operators have been defined to coarsen a Triangulated Irregular Network (TIN) without affecting the topology of its underlying terrain, i.e., without modifying critical features of the terrain, such as pits, saddles, peaks, and their connectivity. However, their scalability is limited for large-scale meshes. Our proposed algorithm uses a batched processing strategy to reduce both the memory and time requirements of the simplification process and thanks to the spatial decomposition on the basis of Terrain trees, it can be easily parallelized. Also, since a Terrain tree after the simplification process becomes less compact and efficient, we propose an efficient post-processing step for updating hierarchical spatial decomposition. Our experiments on real-world TINs, derived from topographic and bathymetric LiDAR data, demonstrate the scalability and efficiency of our approach. Specifically, topology-aware simplification on Terrain trees uses 40% less memory and half the time compared to the most compact and efficient connectivity-based data structure for TINs. Furthermore, the parallel simplification algorithm on the Terrain trees exhibits a 12x speedup with an OpenMP implementation. The quality of the output mesh is not significantly affected by the distributed and parallel simplification strategy of Terrain trees, and we obtain similar quality levels compared to the global baseline method.","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140247941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stephan Winter, Monika Sester, M. Tomko, Alexandra Millonig
Urban mobility is a major contributor to human-induced climate change, a challenge that urban and transport planning and spatial computing academic communities have been actively addressing. In this paper we argue, however, that the common data analytics research into incremental efficiency improvements of originally non-sustainable urban mobility systems will never be able to help reach climate neutrality – the goal we must achieve by 2050 as per the Paris Agreement. This imperative is exacerbated by the observation that improvements, by data analytics, in one segment of urban mobility typically have unintended and often adverse consequences in other segments. In this vision paper we argue for a data analytics agenda to advance climate action at the core of urban mobility research. This agenda must disrupt the way we think and operate, as much as it is disrupting the accessibility issues of society in cities.
{"title":"The Challenge of Data Analytics with Climate-Neutral Urban Mobility (Vision Paper)","authors":"Stephan Winter, Monika Sester, M. Tomko, Alexandra Millonig","doi":"10.1145/3649312","DOIUrl":"https://doi.org/10.1145/3649312","url":null,"abstract":"\u0000 Urban mobility is a major contributor to human-induced climate change, a challenge that urban and transport planning and spatial computing academic communities have been actively addressing. In this paper we argue, however, that the common data analytics research into incremental efficiency improvements of originally non-sustainable urban mobility systems will never be able to help reach climate\u0000 neutrality\u0000 – the goal we must achieve by 2050 as per the Paris Agreement. This imperative is exacerbated by the observation that improvements, by data analytics, in one segment of urban mobility typically have unintended and often adverse consequences in other segments. In this vision paper we argue for a data analytics agenda to advance climate action at the core of urban mobility research. This agenda must disrupt the way we think and operate, as much as it is disrupting the accessibility issues of society in cities.\u0000","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140435288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social media platforms generate massive amounts of data that reveal valuable insights about users and communities at large. Existing techniques have not fully exploited such data to help practitioners perform a deep analysis of large online communities. Lack of scalability hinders analyzing communities of large sizes and requires tremendous system resources and unacceptable runtime. This paper proposes a new analytical query that identifies the top- k posts that a given user community has interacted with during a specific time interval and within a spatial range. We propose a novel indexing framework that captures the interactions of users and communities to provide a low query latency. Moreover, we propose exact and approximate algorithms to process the query efficiently and utilize the index content to prune the search space. The extensive experimental evaluation on real data has shown the superiority of our techniques and their scalability to support large online communities.
社交媒体平台产生了大量数据,揭示了有关用户和整个社区的宝贵信息。现有技术尚未充分利用这些数据来帮助从业人员对大型在线社区进行深入分析。缺乏可扩展性阻碍了对大型社区的分析,而且需要巨大的系统资源和难以接受的运行时间。本文提出了一种新的分析查询方法,可识别特定用户社区在特定时间间隔和空间范围内互动最多的 k 个帖子。我们提出了一种新颖的索引框架,它能捕捉用户和社区的互动,从而提供较低的查询延迟。此外,我们还提出了精确和近似的算法来高效处理查询,并利用索引内容来修剪搜索空间。在真实数据上进行的广泛实验评估显示了我们技术的优越性及其支持大型在线社区的可扩展性。
{"title":"Scalable Spatio-Temporal Top-k Interaction Queries on Dynamic Communities","authors":"Abdulaziz Almaslukh, Yongyi Liu, A. Magdy","doi":"10.1145/3648374","DOIUrl":"https://doi.org/10.1145/3648374","url":null,"abstract":"\u0000 Social media platforms generate massive amounts of data that reveal valuable insights about users and communities at large. Existing techniques have not fully exploited such data to help practitioners perform a deep analysis of large online communities. Lack of scalability hinders analyzing communities of large sizes and requires tremendous system resources and unacceptable runtime. This paper proposes a new analytical query that identifies the top-\u0000 k\u0000 posts that a given user community has interacted with during a specific time interval and within a spatial range. We propose a novel indexing framework that captures the interactions of users and communities to provide a low query latency. Moreover, we propose exact and approximate algorithms to process the query efficiently and utilize the index content to prune the search space. The extensive experimental evaluation on real data has shown the superiority of our techniques and their scalability to support large online communities.\u0000","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139961859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefan Schestakov, Simon Gottschalk, Thorben Funke, Elena Demidova
GPS trajectories are a critical asset for building spatio-temporal predictive models in urban regions in the context of road safety monitoring, traffic management, and mobility services. Currently, reliable and efficient data misuse detection methods for such personal, spatio-temporal data, particularly in data breach cases, are missing. This article addresses an essential aspect of data misuse detection, namely the re-identification of leaked and potentially modified GPS trajectories. We present RE-Trace – a contrastive learning-based model that facilitates reliable and efficient re-identification of GPS trajectories and resists specific trajectory transformation attacks aimed to obscure a trajectory’s origin. RE-Trace utilizes contrastive learning with a transformer-based trajectory encoder to create trajectory representations, robust to various trajectory modifications. We present a comprehensive threat model for GPS trajectory modifications and demonstrate the effectiveness and efficiency of the RE-Trace re-identification approach on three real-world datasets. Our evaluation results demonstrate that RE-Trace significantly outperforms state-of-the-art baselines on all data sets and identifies modified GPS trajectories effectively and efficiently.
{"title":"RE-Trace\u0000 : Re-Identification of Modified GPS Trajectories","authors":"Stefan Schestakov, Simon Gottschalk, Thorben Funke, Elena Demidova","doi":"10.1145/3643680","DOIUrl":"https://doi.org/10.1145/3643680","url":null,"abstract":"\u0000 GPS trajectories are a critical asset for building spatio-temporal predictive models in urban regions in the context of road safety monitoring, traffic management, and mobility services. Currently, reliable and efficient data misuse detection methods for such personal, spatio-temporal data, particularly in data breach cases, are missing. This article addresses an essential aspect of data misuse detection, namely the re-identification of leaked and potentially modified GPS trajectories. We present\u0000 RE-Trace\u0000 – a contrastive learning-based model that facilitates reliable and efficient re-identification of GPS trajectories and resists specific trajectory transformation attacks aimed to obscure a trajectory’s origin.\u0000 RE-Trace\u0000 utilizes contrastive learning with a transformer-based trajectory encoder to create trajectory representations, robust to various trajectory modifications. We present a comprehensive threat model for GPS trajectory modifications and demonstrate the effectiveness and efficiency of the\u0000 RE-Trace\u0000 re-identification approach on three real-world datasets. Our evaluation results demonstrate that\u0000 RE-Trace\u0000 significantly outperforms state-of-the-art baselines on all data sets and identifies modified GPS trajectories effectively and efficiently.\u0000","PeriodicalId":43641,"journal":{"name":"ACM Transactions on Spatial Algorithms and Systems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139864458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}