Geoinformatica最新文献

LENS: label sparsity-tolerant adversarial learning on spatial deceptive reviews LENS：关于空间欺骗性评论的标签稀疏容忍对抗学习

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Geoinformatica

Pub Date : 2024-09-14 DOI: 10.1007/s10707-024-00529-5

Sirish Prabakar, Haiquan Chen, Zhe Jiang, Carl Yang, Weikuan Yu, Da Yan

Online businesses and websites have recently become the main target of fake reviews, where fake reviews are intentionally composed to manipulate the business ratings positively or negatively. Most of existing works to detect fake reviews are supervised methods, whose performance highly depends on the amount, quality, and variety of the labeled data, which are often non-trivial to obtain in practice. In this paper, we propose a semi-supervised label sparsity-tolerant framework, LENS, for fake review detection by mining spatial knowledge and learning distributions of embedded topics. LENS builds on two key observations. (1) Spatial knowledge revealed in spatial entities and their co-occurring latent topic distributions may indicate the review authenticity. (2) Distributions of the embedded topics (the contextual distribution) may exhibit important patterns to differentiate between real and fake reviews. Specifically, LENS first extracts embeddings for spatial named entities using a knowledge base trained from Wikipedia webpages. Second, LENS represents each input token as a distribution over the learned latent topics in the embedded topic space. To bypass the differentiation difficulty, LENS builds on two discriminators in the actor-critic architecture using reinforcement learning. Extensive experiments using the real-world spatial and non-spatial datasets show that LENS consistently outperformed the state-of-the-art semi-supervised fake review detection methods on few labels at all different labeling rates for real and fake reviews, respectively, in a label-starving setting.

在线企业和网站最近成了虚假评论的主要目标，这些虚假评论是故意编造的，目的是操纵企业的正面或负面评价。现有的大多数检测虚假评论的工作都是有监督的方法，其性能在很大程度上取决于标签数据的数量、质量和多样性，而这些数据在实践中往往难以获得。在本文中，我们提出了一个半监督标签稀疏容错框架 LENS，通过挖掘空间知识和学习嵌入主题的分布来检测虚假评论。LENS 基于两个关键观察结果。(1) 空间实体中揭示的空间知识及其共同出现的潜在话题分布可能表明评论的真实性。(2）内嵌主题的分布（上下文分布）可能会展现出区分真假评论的重要模式。具体来说，LENS 首先使用从维基百科网页中训练出来的知识库提取空间命名实体的嵌入。其次，LENS 将每个输入标记表示为嵌入式主题空间中已学潜在主题的分布。为了绕过区分的困难，LENS 利用强化学习在演员-批评架构中建立了两个判别器。使用真实世界的空间和非空间数据集进行的大量实验表明，在标签匮乏的环境中，LENS 在真实评论和虚假评论的所有不同标注率下，在少量标签上的性能始终优于最先进的半监督式虚假评论检测方法。

{"title":"LENS: label sparsity-tolerant adversarial learning on spatial deceptive reviews","authors":"Sirish Prabakar, Haiquan Chen, Zhe Jiang, Carl Yang, Weikuan Yu, Da Yan","doi":"10.1007/s10707-024-00529-5","DOIUrl":"https://doi.org/10.1007/s10707-024-00529-5","url":null,"abstract":"Online businesses and websites have recently become the main target of fake reviews, where fake reviews are intentionally composed to manipulate the business ratings positively or negatively. Most of existing works to detect fake reviews are supervised methods, whose performance highly depends on the amount, quality, and variety of the labeled data, which are often non-trivial to obtain in practice. In this paper, we propose a semi-supervised label sparsity-tolerant framework, LENS, for fake review detection by mining spatial knowledge and learning distributions of embedded topics. LENS builds on two key observations. (1) Spatial knowledge revealed in spatial entities and their co-occurring latent topic distributions may indicate the review authenticity. (2) Distributions of the embedded topics (the contextual distribution) may exhibit important patterns to differentiate between real and fake reviews. Specifically, LENS first extracts embeddings for spatial named entities using a knowledge base trained from Wikipedia webpages. Second, LENS represents each input token as a distribution over the learned latent topics in the embedded topic space. To bypass the differentiation difficulty, LENS builds on two discriminators in the actor-critic architecture using reinforcement learning. Extensive experiments using the real-world spatial and non-spatial datasets show that LENS consistently outperformed the state-of-the-art semi-supervised fake review detection methods on few labels at all different labeling rates for real and fake reviews, respectively, in a label-starving setting.","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"18 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A case study of spatiotemporal forecasting techniques for weather forecasting 天气预报时空预报技术案例研究

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Geoinformatica

Pub Date : 2024-09-14 DOI: 10.1007/s10707-024-00530-y

Shakir Showkat Sofi, Ivan Oseledets

The majority of real-world processes are spatiotemporal, and the data generated by them exhibits both spatial and temporal evolution. Weather is one of the most essential processes in this domain, and weather forecasting has become a crucial part of our daily routine. Weather data analysis is considered the most complex and challenging task. Although numerical weather prediction models are currently state-of-the-art, they are resource-intensive and time-consuming. Numerous studies have proposed time series-based models as a viable alternative to numerical forecasts. Recent research in the area of time series analysis indicates significant advancements, particularly regarding the use of state-space-based models (white box) and, more recently, the integration of machine learning and deep neural network-based models (black box). The most famous examples of such models are RNNs and transformers. These models have demonstrated remarkable results in the field of time-series analysis and have demonstrated effectiveness in modelling temporal correlations. It is crucial to capture both temporal and spatial correlations for a spatiotemporal process, as the values at nearby locations and time affect the values of a spatiotemporal process at a specific point. This self-contained paper explores various regional data-driven weather forecasting methods, i.e., forecasting over multiple latitude-longitude points (matrix-shaped spatial grid) to capture spatiotemporal correlations. The results showed that spatiotemporal prediction models reduced computational costs while improving accuracy. In particular, the proposed tensor train dynamic mode decomposition-based forecasting model has comparable accuracy to the state-of-the-art models without the need for training. We provide convincing numerical experiments to show that the proposed approach is practical.

现实世界中的大多数过程都是时空过程，由其产生的数据同时表现出空间和时间的演变。天气是这一领域中最重要的过程之一，天气预报已成为我们日常工作的重要组成部分。天气数据分析被认为是最复杂、最具挑战性的任务。尽管数值天气预报模型是目前最先进的模型，但它们需要耗费大量资源和时间。许多研究提出了基于时间序列的模型，作为数值预报的可行替代方案。时间序列分析领域的最新研究表明，该领域取得了重大进展，尤其是在使用基于状态空间的模型（白盒）方面，以及最近在整合机器学习和基于深度神经网络的模型（黑盒）方面。此类模型最著名的例子是 RNN 和变压器。这些模型在时间序列分析领域取得了令人瞩目的成果，并在时间相关性建模方面显示出了有效性。对于时空过程来说，捕捉时间和空间相关性至关重要，因为附近地点和时间的值会影响特定点的时空过程值。这篇自成一体的论文探讨了各种区域数据驱动天气预报方法，即在多个经纬度点（矩阵形空间网格）上进行预报，以捕捉时空相关性。结果表明，时空预测模型降低了计算成本，同时提高了准确性。特别是，所提出的基于张量列车动态模式分解的预测模型无需训练，其准确度与最先进的模型相当。我们提供了令人信服的数值实验，证明所提出的方法切实可行。

{"title":"A case study of spatiotemporal forecasting techniques for weather forecasting","authors":"Shakir Showkat Sofi, Ivan Oseledets","doi":"10.1007/s10707-024-00530-y","DOIUrl":"https://doi.org/10.1007/s10707-024-00530-y","url":null,"abstract":"The majority of real-world processes are spatiotemporal, and the data generated by them exhibits both spatial and temporal evolution. Weather is one of the most essential processes in this domain, and weather forecasting has become a crucial part of our daily routine. Weather data analysis is considered the most complex and challenging task. Although numerical weather prediction models are currently state-of-the-art, they are resource-intensive and time-consuming. Numerous studies have proposed time series-based models as a viable alternative to numerical forecasts. Recent research in the area of time series analysis indicates significant advancements, particularly regarding the use of state-space-based models (white box) and, more recently, the integration of machine learning and deep neural network-based models (black box). The most famous examples of such models are RNNs and transformers. These models have demonstrated remarkable results in the field of time-series analysis and have demonstrated effectiveness in modelling temporal correlations. It is crucial to capture both temporal and spatial correlations for a spatiotemporal process, as the values at nearby locations and time affect the values of a spatiotemporal process at a specific point. This self-contained paper explores various regional data-driven weather forecasting methods, i.e., forecasting over multiple latitude-longitude points (matrix-shaped spatial grid) to capture spatiotemporal correlations. The results showed that spatiotemporal prediction models reduced computational costs while improving accuracy. In particular, the proposed tensor train dynamic mode decomposition-based forecasting model has comparable accuracy to the state-of-the-art models without the need for training. We provide convincing numerical experiments to show that the proposed approach is practical.","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"42 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CLMTR: a generic framework for contrastive multi-modal trajectory representation learning CLMTR：对比多模态轨迹表征学习的通用框架

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Geoinformatica

Pub Date : 2024-09-07 DOI: 10.1007/s10707-024-00528-6

Anqi Liang, Bin Yao, Jiong Xie, Wenli Zheng, Yanyan Shen, Qiqi Ge

Multi-modal trajectory representation learning aims to convert raw trajectories into low-dimensional embeddings to facilitate downstream trajectory analysis tasks. However, existing methods focus on spatio-temporal trajectories and often neglect additional modal features such as textual or imagery data. Moreover, these methods do not fully consider the correlations among different modal features and the relationships among trajectories, thus hindering the generation of generic and semantically enriched representations. To address these limitations, we propose a generic Contrastive Learning-based Multi-modal Trajectory Representation framework, termed CLMTR. Specifically, we incorporate intra- and inter-trajectory contrastive learning components to capture the correlations among diverse modal features and the intricate relationships among trajectories, obtaining generic and semantically enriched trajectory representations. We develop multi-modal feature embedding and attention-based fusion approaches to capture the multi-modal characteristics and adaptively obtain the unified embeddings. Experimental results on two real-world datasets demonstrate the superior performance of CLMTR over state-of-the-art methods in three downstream tasks.

多模态轨迹表征学习旨在将原始轨迹转换为低维嵌入，以促进下游轨迹分析任务。然而，现有方法只关注时空轨迹，往往忽略了文本或图像数据等其他模态特征。此外，这些方法没有充分考虑不同模态特征之间的相关性以及轨迹之间的关系，从而阻碍了通用的、语义丰富的表征的生成。为了解决这些局限性，我们提出了一种通用的基于对比学习的多模态轨迹表示框架，称为 CLMTR。具体来说，我们结合了轨迹内和轨迹间对比学习组件，以捕捉不同模态特征之间的相关性以及轨迹之间错综复杂的关系，从而获得通用的、语义丰富的轨迹表征。我们开发了多模态特征嵌入和基于注意力的融合方法，以捕捉多模态特征并自适应地获得统一的嵌入。在两个真实世界数据集上的实验结果表明，在三个下游任务中，CLMTR 的性能优于最先进的方法。

{"title":"CLMTR: a generic framework for contrastive multi-modal trajectory representation learning","authors":"Anqi Liang, Bin Yao, Jiong Xie, Wenli Zheng, Yanyan Shen, Qiqi Ge","doi":"10.1007/s10707-024-00528-6","DOIUrl":"https://doi.org/10.1007/s10707-024-00528-6","url":null,"abstract":"Multi-modal trajectory representation learning aims to convert raw trajectories into low-dimensional embeddings to facilitate downstream trajectory analysis tasks. However, existing methods focus on spatio-temporal trajectories and often neglect additional modal features such as textual or imagery data. Moreover, these methods do not fully consider the correlations among different modal features and the relationships among trajectories, thus hindering the generation of generic and semantically enriched representations. To address these limitations, we propose a generic Contrastive Learning-based Multi-modal Trajectory Representation framework, termed CLMTR. Specifically, we incorporate intra- and inter-trajectory contrastive learning components to capture the correlations among diverse modal features and the intricate relationships among trajectories, obtaining generic and semantically enriched trajectory representations. We develop multi-modal feature embedding and attention-based fusion approaches to capture the multi-modal characteristics and adaptively obtain the unified embeddings. Experimental results on two real-world datasets demonstrate the superior performance of CLMTR over state-of-the-art methods in three downstream tasks.","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"11 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Periodicity aware spatial-temporal adaptive hypergraph neural network for traffic forecasting 用于交通预测的周期感知时空自适应超图神经网络

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Geoinformatica

Pub Date : 2024-08-26 DOI: 10.1007/s10707-024-00527-7

Wenzhu Zhao, Guan Yuan, Rui Bing, Ruidong Lu, Yudong Shen

Traffic forecasting is the foundation and core task of Intelligent Transportation Systems (ITS). Due to the powerful ability of Graph Neural Network (GNN) to capture topological features, recently, it is commonly used in traffic forecasting to capture spatial features of road networks. Although existing GNN based traffic forecasting methods have achieved satisfactory results, they are still plagued by the following problems: (1) Traffic time-series usually contains complex periodic features, but they only model 1D time features, ignoring multi-periodic information in traffic data. (2) There are multivariate higher-order correlations among nodes in road networks, but they only preserve the pairwise connections by simple graphs, neglecting the higher-order multivariate correlations. (3) They cannot adaptively capture unique patterns of specific areas, only learn the shared patterns of traffic time-series. To solve the above problems, we propose a Periodicity aware spatial-temporal Adaptive Hypergraph Neural Network (PAHNN). Firstly, a temporal multi-periodic block is designed to capture the 2D-variations of traffic time-series to extract multi-periodic features and complex temporal patterns. Then, we propose a spatial adaptive hypergraph block to model spatial multivariate correlations among nodes via hypergraph neural networks. Adaptive selection of hypergraph networks for different data can extract specific spatial patterns of different traffic areas. Finally, extensive experiments are conducted on two types of forecasting tasks to evaluate the effectiveness and accuracy of our model.

交通预测是智能交通系统（ITS）的基础和核心任务。由于图神经网络（GNN）具有强大的捕捉拓扑特征的能力，近来在交通预测中被普遍用于捕捉路网的空间特征。虽然现有的基于图神经网络的交通预测方法取得了令人满意的效果，但仍存在以下问题：（1）交通时间序列通常包含复杂的周期特征，但它们只对一维时间特征建模，忽略了交通数据中的多周期信息。(2) 道路网络中的节点之间存在多变量高阶相关性，但它们仅通过简单图保留了成对连接，忽略了高阶多变量相关性。(3）无法自适应地捕捉特定区域的独特模式，只能学习交通时间序列的共享模式。为了解决上述问题，我们提出了一种周期感知时空自适应超图神经网络（PAHNN）。首先，我们设计了一个时空多周期块来捕捉交通时间序列的二维变化，以提取多周期特征和复杂的时间模式。然后，我们提出了空间自适应超图块，通过超图神经网络对节点间的空间多变量相关性进行建模。针对不同数据自适应选择超图网络，可以提取不同交通区域的特定空间模式。最后，我们在两类预测任务中进行了大量实验，以评估我们模型的有效性和准确性。

{"title":"Periodicity aware spatial-temporal adaptive hypergraph neural network for traffic forecasting","authors":"Wenzhu Zhao, Guan Yuan, Rui Bing, Ruidong Lu, Yudong Shen","doi":"10.1007/s10707-024-00527-7","DOIUrl":"https://doi.org/10.1007/s10707-024-00527-7","url":null,"abstract":"Traffic forecasting is the foundation and core task of Intelligent Transportation Systems (ITS). Due to the powerful ability of Graph Neural Network (GNN) to capture topological features, recently, it is commonly used in traffic forecasting to capture spatial features of road networks. Although existing GNN based traffic forecasting methods have achieved satisfactory results, they are still plagued by the following problems: (1) Traffic time-series usually contains complex periodic features, but they only model 1D time features, ignoring multi-periodic information in traffic data. (2) There are multivariate higher-order correlations among nodes in road networks, but they only preserve the pairwise connections by simple graphs, neglecting the higher-order multivariate correlations. (3) They cannot adaptively capture unique patterns of specific areas, only learn the shared patterns of traffic time-series. To solve the above problems, we propose a Periodicity aware spatial-temporal Adaptive Hypergraph Neural Network (PAHNN). Firstly, a temporal multi-periodic block is designed to capture the 2D-variations of traffic time-series to extract multi-periodic features and complex temporal patterns. Then, we propose a spatial adaptive hypergraph block to model spatial multivariate correlations among nodes via hypergraph neural networks. Adaptive selection of hypergraph networks for different data can extract specific spatial patterns of different traffic areas. Finally, extensive experiments are conducted on two types of forecasting tasks to evaluate the effectiveness and accuracy of our model.","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"4 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ICN: Interactive convolutional network for forecasting travel demand of shared micromobility ICN：预测共享微型交通出行需求的交互式卷积网络

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Geoinformatica

Pub Date : 2024-06-21 DOI: 10.1007/s10707-024-00525-9

Yiming Xu, Qian Ke, Xiaojian Zhang, Xilei Zhao

Accurate shared micromobility demand predictions are essential for transportation planning and management. Although deep learning methods provide robust mechanisms to tackle demand forecasting challenges, current models based on graph neural networks suffer from limited scalability and high computational cost. There is both a need and significant potential to enhance the accuracy and efficiency of existing shared micromobility demand forecasting models. To fill these research gaps, this paper proposes a deep learning model named Interactive Convolutional Network (ICN) to forecast spatiotemporal travel demand for shared micromobility. The proposed model develops a novel channel dilation method by utilizing multi-dimensional spatial information (i.e., demographics, functionality, and transportation supply) based on travel behavior knowledge for building the deep learning model. We use the convolution operation to process the dilated tensor to simultaneously capture temporal and spatial dependencies. Based on a binary-tree-structured architecture and interactive convolution, the ICN model extracts features at different temporal resolutions and then generates predictions using a fully-connected layer. We conducted two practical case studies from Chicago, IL, and Austin, TX to test the proposed model. The results show that the ICN model significantly outperforms all benchmark models. The model predictions have the potential to assist micromobility operators in developing efficient vehicle rebalancing strategies, while also providing cities with guidance on enhancing the management of their shared micromobility system.

准确的共享微型交通需求预测对于交通规划和管理至关重要。虽然深度学习方法提供了应对需求预测挑战的强大机制，但目前基于图神经网络的模型存在可扩展性有限和计算成本高等问题。提高现有共享微型交通需求预测模型的准确性和效率既有必要，也有巨大潜力。为了填补这些研究空白，本文提出了一种名为交互卷积网络（ICN）的深度学习模型，用于预测共享微型交通的时空出行需求。该模型利用基于出行行为知识的多维空间信息（即人口统计、功能和交通供给）来构建深度学习模型，从而开发出一种新颖的通道扩张方法。我们使用卷积操作来处理扩张后的张量，以同时捕捉时间和空间依赖性。基于二叉树结构架构和交互式卷积，ICN 模型提取了不同时间分辨率的特征，然后利用全连接层生成预测。我们在伊利诺伊州芝加哥市和德克萨斯州奥斯汀市进行了两个实际案例研究，以测试所提出的模型。结果表明，ICN 模型明显优于所有基准模型。该模型的预测结果有望帮助微型交通运营商制定高效的车辆再平衡策略，同时为城市加强共享微型交通系统的管理提供指导。

{"title":"ICN: Interactive convolutional network for forecasting travel demand of shared micromobility","authors":"Yiming Xu, Qian Ke, Xiaojian Zhang, Xilei Zhao","doi":"10.1007/s10707-024-00525-9","DOIUrl":"https://doi.org/10.1007/s10707-024-00525-9","url":null,"abstract":"Accurate shared micromobility demand predictions are essential for transportation planning and management. Although deep learning methods provide robust mechanisms to tackle demand forecasting challenges, current models based on graph neural networks suffer from limited scalability and high computational cost. There is both a need and significant potential to enhance the accuracy and efficiency of existing shared micromobility demand forecasting models. To fill these research gaps, this paper proposes a deep learning model named Interactive Convolutional Network (ICN) to forecast spatiotemporal travel demand for shared micromobility. The proposed model develops a novel channel dilation method by utilizing multi-dimensional spatial information (i.e., demographics, functionality, and transportation supply) based on travel behavior knowledge for building the deep learning model. We use the convolution operation to process the dilated tensor to simultaneously capture temporal and spatial dependencies. Based on a binary-tree-structured architecture and interactive convolution, the ICN model extracts features at different temporal resolutions and then generates predictions using a fully-connected layer. We conducted two practical case studies from Chicago, IL, and Austin, TX to test the proposed model. The results show that the ICN model significantly outperforms all benchmark models. The model predictions have the potential to assist micromobility operators in developing efficient vehicle rebalancing strategies, while also providing cities with guidance on enhancing the management of their shared micromobility system.","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"80 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A transformer-based method for vessel traffic flow forecasting 基于变压器的船舶流量预测方法

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Geoinformatica

Pub Date : 2024-05-30 DOI: 10.1007/s10707-024-00521-z

Petros Mandalis, Eva Chondrodima, Yannis Kontoulis, Nikos Pelekis, Yannis Theodoridis

In recent years, the maritime domain has experienced tremendous growth due to the exploitation of big traffic data. Particular emphasis has been placed on deep learning methodologies for decision-making. Accurate Vessel Traffic Flow Forecasting (VTFF) is essential for optimizing navigation efficiency and proactively managing maritime operations. In this work, we present a distributed Unified Approach for VTFF (dUA-VTFF), which employs Transformer models and leverages the Apache Spark big data distributed processing framework to learn from historical maritime data and predict future traffic flows over a time horizon of up to 30 min. Particularly, dUA-VTFF leverages vessel timestamped locations along with future vessel locations produced by a Vessel Route Forecasting model. These data are arranged into a spatiotemporal grid to formulate the traffic flows. Subsequently, through the Apache Spark, each grid cell is allocated to a computing node, where appropriately designed Transformer-based models forecast traffic flows in a distributed framework. Experimental evaluations conducted on real Automatic Identification System (AIS) datasets demonstrate the improved efficiency of the dUA-VTFF compared to state-of-the-art traffic flow forecasting methods.

近年来，由于对交通大数据的利用，海事领域经历了巨大的发展。深度学习决策方法尤其受到重视。准确的船舶交通流预测（VTFF）对于优化航行效率和主动管理海事运营至关重要。在这项工作中，我们提出了一种用于 VTFF 的分布式统一方法（dUA-VTFF），该方法采用 Transformer 模型并利用 Apache Spark 大数据分布式处理框架，从历史海事数据中学习并预测未来交通流量，时间跨度最长可达 30 分钟。特别是，dUA-VTFF 利用船只时间戳位置以及船只航线预测模型生成的未来船只位置。这些数据被编排成一个时空网格，用于计算交通流量。随后，通过 Apache Spark，每个网格单元被分配到一个计算节点，在此节点上，基于 Transformer 的适当设计模型在分布式框架中预测交通流。在真实的自动识别系统（AIS）数据集上进行的实验评估表明，与最先进的交通流预测方法相比，dUA-VTFF 的效率有所提高。

{"title":"A transformer-based method for vessel traffic flow forecasting","authors":"Petros Mandalis, Eva Chondrodima, Yannis Kontoulis, Nikos Pelekis, Yannis Theodoridis","doi":"10.1007/s10707-024-00521-z","DOIUrl":"https://doi.org/10.1007/s10707-024-00521-z","url":null,"abstract":"In recent years, the maritime domain has experienced tremendous growth due to the exploitation of big traffic data. Particular emphasis has been placed on deep learning methodologies for decision-making. Accurate Vessel Traffic Flow Forecasting (VTFF) is essential for optimizing navigation efficiency and proactively managing maritime operations. In this work, we present a distributed Unified Approach for VTFF (dUA-VTFF), which employs Transformer models and leverages the Apache Spark big data distributed processing framework to learn from historical maritime data and predict future traffic flows over a time horizon of up to 30 min. Particularly, dUA-VTFF leverages vessel timestamped locations along with future vessel locations produced by a Vessel Route Forecasting model. These data are arranged into a spatiotemporal grid to formulate the traffic flows. Subsequently, through the Apache Spark, each grid cell is allocated to a computing node, where appropriately designed Transformer-based models forecast traffic flows in a distributed framework. Experimental evaluations conducted on real Automatic Identification System (AIS) datasets demonstrate the improved efficiency of the dUA-VTFF compared to state-of-the-art traffic flow forecasting methods.","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"70 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141197258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MobilityDL: a review of deep learning from trajectory data MobilityDL：轨迹数据深度学习综述

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Geoinformatica

Pub Date : 2024-05-28 DOI: 10.1007/s10707-024-00518-8

Anita Graser, Anahid Jalali, Jasmin Lampert, Axel Weißenfeld, Krzysztof Janowicz

Trajectory data combines the complexities of time series, spatial data, and (sometimes irrational) movement behavior. As data availability and computing power have increased, so has the popularity of deep learning from trajectory data. This review paper provides the first comprehensive overview of deep learning approaches for trajectory data. We have identified eight specific mobility use cases which we analyze with regards to the deep learning models and the training data used. Besides a comprehensive quantitative review of the literature since 2018, the main contribution of our work is the data-centric analysis of recent work in this field, placing it along the mobility data continuum which ranges from detailed dense trajectories of individual movers (quasi-continuous tracking data), to sparse trajectories (such as check-in data), and aggregated trajectories (crowd information).

轨迹数据结合了时间序列、空间数据和（有时不合理的）运动行为的复杂性。随着数据可用性和计算能力的提高，轨迹数据深度学习也越来越受欢迎。本综述论文首次全面概述了针对轨迹数据的深度学习方法。我们确定了八个具体的移动性用例，并对其所使用的深度学习模型和训练数据进行了分析。除了对 2018 年以来的文献进行全面的定量回顾外，我们工作的主要贡献在于以数据为中心分析了该领域的最新工作，并将其置于移动数据连续体中，该连续体包括单个移动者的详细密集轨迹（准连续跟踪数据）、稀疏轨迹（如签到数据）和聚合轨迹（人群信息）。

引用次数: 0

Identifying and recommending taxi hotspots in spatio-temporal space 时空空间中出租车热点的识别与推荐

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Geoinformatica

Pub Date : 2024-05-25 DOI: 10.1007/s10707-024-00524-w

Saurabh Mishra, Sonia Khetarpaul

The GPS-driven mobile application-based ride-hailing systems, e.g., Uber and Ola, have become integral to daily life and natural transport choices for urban commuters. However, there is an imbalance between demand or pick-up requests and supply or drop-off requests in any area. The city planners and the researchers are working hard to balance this gap in demand and supply situation for taxi requests. The existing approaches have mainly focused on clustering the spatial regions to identify the hotspots, which refer to the locations with a high demand for pick-up requests. This study determined that if the hotspots focus on clustering high demand for pick-up requests, most of the hotspots pivot near the city center or in the two-three spatial regions, ignoring the other parts of the city. This paper (An earlier version of this paper was presented at the Australasian Database Conference and was published in its Proceedings: https://link.springer.com/chapter/10.1007/978-3-030-69377-0_10) presents a hotspot detection method that uses a dominating set problem-based solution in spatial-temporal space, which covers high-density taxi pick-up demand regions and covers those parts of the city with a moderate density of taxi pick-up demands during different hours of the day. The paper proposes algorithms based on k-hop dominating set; their performance is evaluated using real-world datasets and proves the edge over the existing state-of-the-art methods. It will also reduce the waiting time for customers and drivers looking for their subsequent pick-up requests. Therefore, this would maximize their profit and help improve their services.

基于全球定位系统（GPS）的移动应用叫车系统，如 Uber 和 Ola，已成为城市通勤者日常生活中不可或缺的自然交通选择。然而，任何地区的需求或接送请求与供给或送客请求之间都存在着不平衡。城市规划者和研究人员正在努力平衡出租车需求和供给之间的差距。现有的方法主要集中在对空间区域进行聚类，以识别热点区域，即对接送请求需求较高的地点。本研究认为，如果将热点集中在接客需求高的聚类上，则大部分热点都集中在市中心附近或二三空间区域，而忽略了城市的其他部分。本文（本文的早期版本曾在澳大拉西亚数据库会议上发表，并发表在其论文集中：https://link.springer.com/chapter/10.1007/978-3-030-69377-0_10）提出了一种热点检测方法，该方法在时空空间中使用基于支配集问题的解决方案，覆盖了高密度出租车接客需求区域，并覆盖了一天中不同时段出租车接客需求密度适中的城市部分。本文提出了基于 k 跳占优集的算法，并利用实际数据集对其性能进行了评估，证明其优于现有的先进方法。它还将减少客户和司机寻找后续接送请求的等待时间。因此，这将使他们的利润最大化，并有助于改善他们的服务。

{"title":"Identifying and recommending taxi hotspots in spatio-temporal space","authors":"Saurabh Mishra, Sonia Khetarpaul","doi":"10.1007/s10707-024-00524-w","DOIUrl":"https://doi.org/10.1007/s10707-024-00524-w","url":null,"abstract":"The GPS-driven mobile application-based ride-hailing systems, e.g., Uber and Ola, have become integral to daily life and natural transport choices for urban commuters. However, there is an imbalance between demand or pick-up requests and supply or drop-off requests in any area. The city planners and the researchers are working hard to balance this gap in demand and supply situation for taxi requests. The existing approaches have mainly focused on clustering the spatial regions to identify the hotspots, which refer to the locations with a high demand for pick-up requests. This study determined that if the hotspots focus on clustering high demand for pick-up requests, most of the hotspots pivot near the city center or in the two-three spatial regions, ignoring the other parts of the city. This paper (An earlier version of this paper was presented at the Australasian Database Conference and was published in its Proceedings: https://link.springer.com/chapter/10.1007/978-3-030-69377-0_10) presents a hotspot detection method that uses a dominating set problem-based solution in spatial-temporal space, which covers high-density taxi pick-up demand regions and covers those parts of the city with a moderate density of taxi pick-up demands during different hours of the day. The paper proposes algorithms based on k-hop dominating set; their performance is evaluated using real-world datasets and proves the edge over the existing state-of-the-art methods. It will also reduce the waiting time for customers and drivers looking for their subsequent pick-up requests. Therefore, this would maximize their profit and help improve their services.","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"27 3 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141149131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An experimental study of existing tools for outlier detection and cleaning in trajectories 现有轨迹离群点检测和清理工具的实验研究

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Geoinformatica

Pub Date : 2024-05-18 DOI: 10.1007/s10707-024-00522-y

Mariana M Garcez Duarte, Mahmoud Sakr

Outlier detection and cleaning are essential steps in data preprocessing to ensure the integrity and validity of data analyses. This paper focuses on outlier points within individual trajectories, i.e., points that deviate significantly inside a single trajectory. We experiment with ten open-source libraries to comprehensively evaluate available tools, comparing their efficiency and accuracy in identifying and cleaning outliers. This experiment considers the libraries as they are offered to end users, with real-world applicability. We compare existing outlier detection libraries, introduce a method for establishing ground-truth, and aim to guide users in choosing the most appropriate tool for their specific outlier detection needs. Furthermore, we survey the state-of-the-art algorithms for outlier detection and classify them into five types: Statistic-based methods, Sliding window algorithms, Clustering-based methods, Graph-based methods, and Heuristic-based methods. Our research provides insights into these libraries’ performance and contributes to developing data preprocessing and outlier detection methodologies.

离群点检测和清理是数据预处理的重要步骤，可确保数据分析的完整性和有效性。本文的重点是单个轨迹中的离群点，即在单个轨迹中严重偏离的点。我们使用十个开源库进行实验，全面评估现有工具，比较它们在识别和清除异常值方面的效率和准确性。本实验考虑的是提供给最终用户的库，具有现实世界的适用性。我们对现有的离群值检测库进行了比较，引入了一种建立地面实况的方法，旨在指导用户选择最适合其特定离群值检测需求的工具。此外，我们还调查了最先进的离群点检测算法，并将其分为五种类型：基于统计的方法、滑动窗口算法、基于聚类的方法、基于图形的方法和基于启发式的方法。我们的研究为这些库的性能提供了见解，有助于开发数据预处理和离群点检测方法。

引用次数: 0

How opportunistic mobile monitoring can enhance air quality assessment? 机会性移动监测如何加强空气质量评估？

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Geoinformatica

Pub Date : 2024-04-29 DOI: 10.1007/s10707-024-00516-w

Mohammad Abboud, Yehia Taher, Karine Zeitouni, Ana-Maria Olteanu-Raimond

The deteriorating air quality in urban areas, particularly in developing countries, has led to increased attention being paid to the issue. Daily reports of air pollution are essential to effectively manage public health risks. Pollution estimation has become crucial to expanding spatial and temporal coverage and estimating pollution levels at different locations. The emergence of low-cost sensors has enabled high-resolution data collection, either in fixed or mobile settings, and various approaches have been proposed to estimate air pollution using this technology. The objective of this study is to enhance the data from fixed stations by incorporating opportunistic mobile monitoring (OMM) data. The main research question we are dealing with is: How can we augment fixed station data through OMM? In order to address the challenge of limited OMM data availability, we leverage existing data collected during periods when the pollution maps align with those observed by the fixed stations. By combining the fixed and mobile data, we apply interpolation techniques to produce more accurate pollution maps. The efficacy of our approach is validated through experiments conducted on a real-life dataset.

城市地区，尤其是发展中国家的城市地区空气质量不断恶化，导致人们越来越关注这一问题。空气污染的日常报告对于有效管理公共健康风险至关重要。污染估算对于扩大时空覆盖范围和估算不同地点的污染水平至关重要。低成本传感器的出现使得在固定或移动环境中收集高分辨率数据成为可能。本研究的目的是通过纳入机会性移动监测（OMM）数据来增强来自固定站点的数据。我们要解决的主要研究问题是：如何增强固定监测站的数据？如何通过 OMM 增强固定监测站的数据？为了应对 OMM 数据可用性有限的挑战，我们利用了在污染地图与固定站点观测到的污染地图一致期间收集到的现有数据。通过结合固定数据和移动数据，我们采用插值技术绘制出更精确的污染地图。我们在现实数据集上进行的实验验证了我们方法的有效性。

{"title":"How opportunistic mobile monitoring can enhance air quality assessment?","authors":"Mohammad Abboud, Yehia Taher, Karine Zeitouni, Ana-Maria Olteanu-Raimond","doi":"10.1007/s10707-024-00516-w","DOIUrl":"https://doi.org/10.1007/s10707-024-00516-w","url":null,"abstract":"The deteriorating air quality in urban areas, particularly in developing countries, has led to increased attention being paid to the issue. Daily reports of air pollution are essential to effectively manage public health risks. Pollution estimation has become crucial to expanding spatial and temporal coverage and estimating pollution levels at different locations. The emergence of low-cost sensors has enabled high-resolution data collection, either in fixed or mobile settings, and various approaches have been proposed to estimate air pollution using this technology. The objective of this study is to enhance the data from fixed stations by incorporating opportunistic mobile monitoring (OMM) data. The main research question we are dealing with is: How can we augment fixed station data through OMM? In order to address the challenge of limited OMM data availability, we leverage existing data collected during periods when the pollution maps align with those observed by the fixed stations. By combining the fixed and mobile data, we apply interpolation techniques to produce more accurate pollution maps. The efficacy of our approach is validated through experiments conducted on a real-life dataset.","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"91 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0