Pub Date : 2024-09-14DOI: 10.1007/s10707-024-00529-5
Sirish Prabakar, Haiquan Chen, Zhe Jiang, Carl Yang, Weikuan Yu, Da Yan
Online businesses and websites have recently become the main target of fake reviews, where fake reviews are intentionally composed to manipulate the business ratings positively or negatively. Most of existing works to detect fake reviews are supervised methods, whose performance highly depends on the amount, quality, and variety of the labeled data, which are often non-trivial to obtain in practice. In this paper, we propose a semi-supervised label sparsity-tolerant framework, LENS, for fake review detection by mining spatial knowledge and learning distributions of embedded topics. LENS builds on two key observations. (1) Spatial knowledge revealed in spatial entities and their co-occurring latent topic distributions may indicate the review authenticity. (2) Distributions of the embedded topics (the contextual distribution) may exhibit important patterns to differentiate between real and fake reviews. Specifically, LENS first extracts embeddings for spatial named entities using a knowledge base trained from Wikipedia webpages. Second, LENS represents each input token as a distribution over the learned latent topics in the embedded topic space. To bypass the differentiation difficulty, LENS builds on two discriminators in the actor-critic architecture using reinforcement learning. Extensive experiments using the real-world spatial and non-spatial datasets show that LENS consistently outperformed the state-of-the-art semi-supervised fake review detection methods on few labels at all different labeling rates for real and fake reviews, respectively, in a label-starving setting.
{"title":"LENS: label sparsity-tolerant adversarial learning on spatial deceptive reviews","authors":"Sirish Prabakar, Haiquan Chen, Zhe Jiang, Carl Yang, Weikuan Yu, Da Yan","doi":"10.1007/s10707-024-00529-5","DOIUrl":"https://doi.org/10.1007/s10707-024-00529-5","url":null,"abstract":"<p>Online businesses and websites have recently become the main target of fake reviews, where fake reviews are intentionally composed to manipulate the business ratings positively or negatively. Most of existing works to detect fake reviews are supervised methods, whose performance highly depends on the amount, quality, and variety of the labeled data, which are often non-trivial to obtain in practice. In this paper, we propose a semi-supervised label sparsity-tolerant framework, LENS, for fake review detection by mining spatial knowledge and learning distributions of embedded topics. LENS builds on two key observations. (1) Spatial knowledge revealed in spatial entities and their co-occurring latent topic distributions may indicate the review authenticity. (2) Distributions of the embedded topics (the contextual distribution) may exhibit important patterns to differentiate between real and fake reviews. Specifically, LENS first extracts embeddings for spatial named entities using a knowledge base trained from Wikipedia webpages. Second, LENS represents each input token as a distribution over the learned latent topics in the embedded topic space. To bypass the differentiation difficulty, LENS builds on two discriminators in the actor-critic architecture using reinforcement learning. Extensive experiments using the real-world spatial and non-spatial datasets show that LENS consistently outperformed the state-of-the-art semi-supervised fake review detection methods on few labels at all different labeling rates for real and fake reviews, respectively, in a label-starving setting.</p>","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"18 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-14DOI: 10.1007/s10707-024-00530-y
Shakir Showkat Sofi, Ivan Oseledets
The majority of real-world processes are spatiotemporal, and the data generated by them exhibits both spatial and temporal evolution. Weather is one of the most essential processes in this domain, and weather forecasting has become a crucial part of our daily routine. Weather data analysis is considered the most complex and challenging task. Although numerical weather prediction models are currently state-of-the-art, they are resource-intensive and time-consuming. Numerous studies have proposed time series-based models as a viable alternative to numerical forecasts. Recent research in the area of time series analysis indicates significant advancements, particularly regarding the use of state-space-based models (white box) and, more recently, the integration of machine learning and deep neural network-based models (black box). The most famous examples of such models are RNNs and transformers. These models have demonstrated remarkable results in the field of time-series analysis and have demonstrated effectiveness in modelling temporal correlations. It is crucial to capture both temporal and spatial correlations for a spatiotemporal process, as the values at nearby locations and time affect the values of a spatiotemporal process at a specific point. This self-contained paper explores various regional data-driven weather forecasting methods, i.e., forecasting over multiple latitude-longitude points (matrix-shaped spatial grid) to capture spatiotemporal correlations. The results showed that spatiotemporal prediction models reduced computational costs while improving accuracy. In particular, the proposed tensor train dynamic mode decomposition-based forecasting model has comparable accuracy to the state-of-the-art models without the need for training. We provide convincing numerical experiments to show that the proposed approach is practical.
{"title":"A case study of spatiotemporal forecasting techniques for weather forecasting","authors":"Shakir Showkat Sofi, Ivan Oseledets","doi":"10.1007/s10707-024-00530-y","DOIUrl":"https://doi.org/10.1007/s10707-024-00530-y","url":null,"abstract":"<p>The majority of real-world processes are spatiotemporal, and the data generated by them exhibits both spatial and temporal evolution. Weather is one of the most essential processes in this domain, and weather forecasting has become a crucial part of our daily routine. Weather data analysis is considered the most complex and challenging task. Although numerical weather prediction models are currently state-of-the-art, they are resource-intensive and time-consuming. Numerous studies have proposed time series-based models as a viable alternative to numerical forecasts. Recent research in the area of time series analysis indicates significant advancements, particularly regarding the use of state-space-based models (white box) and, more recently, the integration of machine learning and deep neural network-based models (black box). The most famous examples of such models are RNNs and transformers. These models have demonstrated remarkable results in the field of time-series analysis and have demonstrated effectiveness in modelling temporal correlations. It is crucial to capture both temporal and spatial correlations for a spatiotemporal process, as the values at nearby locations and time affect the values of a spatiotemporal process at a specific point. This self-contained paper explores various regional data-driven weather forecasting methods, i.e., forecasting over multiple latitude-longitude points (matrix-shaped spatial grid) to capture spatiotemporal correlations. The results showed that spatiotemporal prediction models reduced computational costs while improving accuracy. In particular, the proposed tensor train dynamic mode decomposition-based forecasting model has comparable accuracy to the state-of-the-art models without the need for training. We provide convincing numerical experiments to show that the proposed approach is practical.</p>","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"42 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-07DOI: 10.1007/s10707-024-00528-6
Anqi Liang, Bin Yao, Jiong Xie, Wenli Zheng, Yanyan Shen, Qiqi Ge
Multi-modal trajectory representation learning aims to convert raw trajectories into low-dimensional embeddings to facilitate downstream trajectory analysis tasks. However, existing methods focus on spatio-temporal trajectories and often neglect additional modal features such as textual or imagery data. Moreover, these methods do not fully consider the correlations among different modal features and the relationships among trajectories, thus hindering the generation of generic and semantically enriched representations. To address these limitations, we propose a generic Contrastive Learning-based Multi-modal Trajectory Representation framework, termed CLMTR. Specifically, we incorporate intra- and inter-trajectory contrastive learning components to capture the correlations among diverse modal features and the intricate relationships among trajectories, obtaining generic and semantically enriched trajectory representations. We develop multi-modal feature embedding and attention-based fusion approaches to capture the multi-modal characteristics and adaptively obtain the unified embeddings. Experimental results on two real-world datasets demonstrate the superior performance of CLMTR over state-of-the-art methods in three downstream tasks.
{"title":"CLMTR: a generic framework for contrastive multi-modal trajectory representation learning","authors":"Anqi Liang, Bin Yao, Jiong Xie, Wenli Zheng, Yanyan Shen, Qiqi Ge","doi":"10.1007/s10707-024-00528-6","DOIUrl":"https://doi.org/10.1007/s10707-024-00528-6","url":null,"abstract":"<p>Multi-modal trajectory representation learning aims to convert raw trajectories into low-dimensional embeddings to facilitate downstream trajectory analysis tasks. However, existing methods focus on spatio-temporal trajectories and often neglect additional modal features such as textual or imagery data. Moreover, these methods do not fully consider the correlations among different modal features and the relationships among trajectories, thus hindering the generation of generic and semantically enriched representations. To address these limitations, we propose a generic <u>C</u>ontrastive <u>L</u>earning-based <u>M</u>ulti-modal <u>T</u>rajectory <u>R</u>epresentation framework, termed CLMTR. Specifically, we incorporate intra- and inter-trajectory contrastive learning components to capture the correlations among diverse modal features and the intricate relationships among trajectories, obtaining generic and semantically enriched trajectory representations. We develop multi-modal feature embedding and attention-based fusion approaches to capture the multi-modal characteristics and adaptively obtain the unified embeddings. Experimental results on two real-world datasets demonstrate the superior performance of CLMTR over state-of-the-art methods in three downstream tasks.</p>","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"11 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traffic forecasting is the foundation and core task of Intelligent Transportation Systems (ITS). Due to the powerful ability of Graph Neural Network (GNN) to capture topological features, recently, it is commonly used in traffic forecasting to capture spatial features of road networks. Although existing GNN based traffic forecasting methods have achieved satisfactory results, they are still plagued by the following problems: (1) Traffic time-series usually contains complex periodic features, but they only model 1D time features, ignoring multi-periodic information in traffic data. (2) There are multivariate higher-order correlations among nodes in road networks, but they only preserve the pairwise connections by simple graphs, neglecting the higher-order multivariate correlations. (3) They cannot adaptively capture unique patterns of specific areas, only learn the shared patterns of traffic time-series. To solve the above problems, we propose a Periodicity aware spatial-temporal Adaptive Hypergraph Neural Network (PAHNN). Firstly, a temporal multi-periodic block is designed to capture the 2D-variations of traffic time-series to extract multi-periodic features and complex temporal patterns. Then, we propose a spatial adaptive hypergraph block to model spatial multivariate correlations among nodes via hypergraph neural networks. Adaptive selection of hypergraph networks for different data can extract specific spatial patterns of different traffic areas. Finally, extensive experiments are conducted on two types of forecasting tasks to evaluate the effectiveness and accuracy of our model.
{"title":"Periodicity aware spatial-temporal adaptive hypergraph neural network for traffic forecasting","authors":"Wenzhu Zhao, Guan Yuan, Rui Bing, Ruidong Lu, Yudong Shen","doi":"10.1007/s10707-024-00527-7","DOIUrl":"https://doi.org/10.1007/s10707-024-00527-7","url":null,"abstract":"<p>Traffic forecasting is the foundation and core task of Intelligent Transportation Systems (ITS). Due to the powerful ability of Graph Neural Network (GNN) to capture topological features, recently, it is commonly used in traffic forecasting to capture spatial features of road networks. Although existing GNN based traffic forecasting methods have achieved satisfactory results, they are still plagued by the following problems: (1) Traffic time-series usually contains complex periodic features, but they only model 1D time features, ignoring multi-periodic information in traffic data. (2) There are multivariate higher-order correlations among nodes in road networks, but they only preserve the pairwise connections by simple graphs, neglecting the higher-order multivariate correlations. (3) They cannot adaptively capture unique patterns of specific areas, only learn the shared patterns of traffic time-series. To solve the above problems, we propose a <u>P</u>eriodicity aware spatial-temporal <u>A</u>daptive <u>H</u>ypergraph <u>N</u>eural <u>N</u>etwork (PAHNN). Firstly, a temporal multi-periodic block is designed to capture the 2D-variations of traffic time-series to extract multi-periodic features and complex temporal patterns. Then, we propose a spatial adaptive hypergraph block to model spatial multivariate correlations among nodes via hypergraph neural networks. Adaptive selection of hypergraph networks for different data can extract specific spatial patterns of different traffic areas. Finally, extensive experiments are conducted on two types of forecasting tasks to evaluate the effectiveness and accuracy of our model.</p>","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"4 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-21DOI: 10.1007/s10707-024-00525-9
Yiming Xu, Qian Ke, Xiaojian Zhang, Xilei Zhao
Accurate shared micromobility demand predictions are essential for transportation planning and management. Although deep learning methods provide robust mechanisms to tackle demand forecasting challenges, current models based on graph neural networks suffer from limited scalability and high computational cost. There is both a need and significant potential to enhance the accuracy and efficiency of existing shared micromobility demand forecasting models. To fill these research gaps, this paper proposes a deep learning model named Interactive Convolutional Network (ICN) to forecast spatiotemporal travel demand for shared micromobility. The proposed model develops a novel channel dilation method by utilizing multi-dimensional spatial information (i.e., demographics, functionality, and transportation supply) based on travel behavior knowledge for building the deep learning model. We use the convolution operation to process the dilated tensor to simultaneously capture temporal and spatial dependencies. Based on a binary-tree-structured architecture and interactive convolution, the ICN model extracts features at different temporal resolutions and then generates predictions using a fully-connected layer. We conducted two practical case studies from Chicago, IL, and Austin, TX to test the proposed model. The results show that the ICN model significantly outperforms all benchmark models. The model predictions have the potential to assist micromobility operators in developing efficient vehicle rebalancing strategies, while also providing cities with guidance on enhancing the management of their shared micromobility system.
{"title":"ICN: Interactive convolutional network for forecasting travel demand of shared micromobility","authors":"Yiming Xu, Qian Ke, Xiaojian Zhang, Xilei Zhao","doi":"10.1007/s10707-024-00525-9","DOIUrl":"https://doi.org/10.1007/s10707-024-00525-9","url":null,"abstract":"<p>Accurate shared micromobility demand predictions are essential for transportation planning and management. Although deep learning methods provide robust mechanisms to tackle demand forecasting challenges, current models based on graph neural networks suffer from limited scalability and high computational cost. There is both a need and significant potential to enhance the accuracy and efficiency of existing shared micromobility demand forecasting models. To fill these research gaps, this paper proposes a deep learning model named <i>Interactive Convolutional Network</i> (ICN) to forecast spatiotemporal travel demand for shared micromobility. The proposed model develops a novel channel dilation method by utilizing multi-dimensional spatial information (i.e., demographics, functionality, and transportation supply) based on travel behavior knowledge for building the deep learning model. We use the convolution operation to process the dilated tensor to simultaneously capture temporal and spatial dependencies. Based on a binary-tree-structured architecture and interactive convolution, the ICN model extracts features at different temporal resolutions and then generates predictions using a fully-connected layer. We conducted two practical case studies from Chicago, IL, and Austin, TX to test the proposed model. The results show that the ICN model significantly outperforms all benchmark models. The model predictions have the potential to assist micromobility operators in developing efficient vehicle rebalancing strategies, while also providing cities with guidance on enhancing the management of their shared micromobility system.</p>","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"80 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-30DOI: 10.1007/s10707-024-00521-z
Petros Mandalis, Eva Chondrodima, Yannis Kontoulis, Nikos Pelekis, Yannis Theodoridis
In recent years, the maritime domain has experienced tremendous growth due to the exploitation of big traffic data. Particular emphasis has been placed on deep learning methodologies for decision-making. Accurate Vessel Traffic Flow Forecasting (VTFF) is essential for optimizing navigation efficiency and proactively managing maritime operations. In this work, we present a distributed Unified Approach for VTFF (dUA-VTFF), which employs Transformer models and leverages the Apache Spark big data distributed processing framework to learn from historical maritime data and predict future traffic flows over a time horizon of up to 30 min. Particularly, dUA-VTFF leverages vessel timestamped locations along with future vessel locations produced by a Vessel Route Forecasting model. These data are arranged into a spatiotemporal grid to formulate the traffic flows. Subsequently, through the Apache Spark, each grid cell is allocated to a computing node, where appropriately designed Transformer-based models forecast traffic flows in a distributed framework. Experimental evaluations conducted on real Automatic Identification System (AIS) datasets demonstrate the improved efficiency of the dUA-VTFF compared to state-of-the-art traffic flow forecasting methods.
{"title":"A transformer-based method for vessel traffic flow forecasting","authors":"Petros Mandalis, Eva Chondrodima, Yannis Kontoulis, Nikos Pelekis, Yannis Theodoridis","doi":"10.1007/s10707-024-00521-z","DOIUrl":"https://doi.org/10.1007/s10707-024-00521-z","url":null,"abstract":"<p>In recent years, the maritime domain has experienced tremendous growth due to the exploitation of big traffic data. Particular emphasis has been placed on deep learning methodologies for decision-making. Accurate Vessel Traffic Flow Forecasting (VTFF) is essential for optimizing navigation efficiency and proactively managing maritime operations. In this work, we present a distributed Unified Approach for VTFF (dUA-VTFF), which employs Transformer models and leverages the Apache Spark big data distributed processing framework to learn from historical maritime data and predict future traffic flows over a time horizon of up to 30 min. Particularly, dUA-VTFF leverages vessel timestamped locations along with future vessel locations produced by a Vessel Route Forecasting model. These data are arranged into a spatiotemporal grid to formulate the traffic flows. Subsequently, through the Apache Spark, each grid cell is allocated to a computing node, where appropriately designed Transformer-based models forecast traffic flows in a distributed framework. Experimental evaluations conducted on real Automatic Identification System (AIS) datasets demonstrate the improved efficiency of the dUA-VTFF compared to state-of-the-art traffic flow forecasting methods.</p>","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"70 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141197258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-28DOI: 10.1007/s10707-024-00518-8
Anita Graser, Anahid Jalali, Jasmin Lampert, Axel Weißenfeld, Krzysztof Janowicz
Trajectory data combines the complexities of time series, spatial data, and (sometimes irrational) movement behavior. As data availability and computing power have increased, so has the popularity of deep learning from trajectory data. This review paper provides the first comprehensive overview of deep learning approaches for trajectory data. We have identified eight specific mobility use cases which we analyze with regards to the deep learning models and the training data used. Besides a comprehensive quantitative review of the literature since 2018, the main contribution of our work is the data-centric analysis of recent work in this field, placing it along the mobility data continuum which ranges from detailed dense trajectories of individual movers (quasi-continuous tracking data), to sparse trajectories (such as check-in data), and aggregated trajectories (crowd information).
{"title":"MobilityDL: a review of deep learning from trajectory data","authors":"Anita Graser, Anahid Jalali, Jasmin Lampert, Axel Weißenfeld, Krzysztof Janowicz","doi":"10.1007/s10707-024-00518-8","DOIUrl":"https://doi.org/10.1007/s10707-024-00518-8","url":null,"abstract":"<p>Trajectory data combines the complexities of time series, spatial data, and (sometimes irrational) movement behavior. As data availability and computing power have increased, so has the popularity of deep learning from trajectory data. This review paper provides the first comprehensive overview of deep learning approaches for trajectory data. We have identified eight specific mobility use cases which we analyze with regards to the deep learning models and the training data used. Besides a comprehensive quantitative review of the literature since 2018, the main contribution of our work is the data-centric analysis of recent work in this field, placing it along the mobility data continuum which ranges from detailed dense trajectories of individual movers (quasi-continuous tracking data), to sparse trajectories (such as check-in data), and aggregated trajectories (crowd information).</p>","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"38 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141172788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-25DOI: 10.1007/s10707-024-00524-w
Saurabh Mishra, Sonia Khetarpaul
The GPS-driven mobile application-based ride-hailing systems, e.g., Uber and Ola, have become integral to daily life and natural transport choices for urban commuters. However, there is an imbalance between demand or pick-up requests and supply or drop-off requests in any area. The city planners and the researchers are working hard to balance this gap in demand and supply situation for taxi requests. The existing approaches have mainly focused on clustering the spatial regions to identify the hotspots, which refer to the locations with a high demand for pick-up requests. This study determined that if the hotspots focus on clustering high demand for pick-up requests, most of the hotspots pivot near the city center or in the two-three spatial regions, ignoring the other parts of the city. This paper (An earlier version of this paper was presented at the Australasian Database Conference and was published in its Proceedings: https://link.springer.com/chapter/10.1007/978-3-030-69377-0_10) presents a hotspot detection method that uses a dominating set problem-based solution in spatial-temporal space, which covers high-density taxi pick-up demand regions and covers those parts of the city with a moderate density of taxi pick-up demands during different hours of the day. The paper proposes algorithms based on k-hop dominating set; their performance is evaluated using real-world datasets and proves the edge over the existing state-of-the-art methods. It will also reduce the waiting time for customers and drivers looking for their subsequent pick-up requests. Therefore, this would maximize their profit and help improve their services.
基于全球定位系统(GPS)的移动应用叫车系统,如 Uber 和 Ola,已成为城市通勤者日常生活中不可或缺的自然交通选择。然而,任何地区的需求或接送请求与供给或送客请求之间都存在着不平衡。城市规划者和研究人员正在努力平衡出租车需求和供给之间的差距。现有的方法主要集中在对空间区域进行聚类,以识别热点区域,即对接送请求需求较高的地点。本研究认为,如果将热点集中在接客需求高的聚类上,则大部分热点都集中在市中心附近或二三空间区域,而忽略了城市的其他部分。本文(本文的早期版本曾在澳大拉西亚数据库会议上发表,并发表在其论文集中:https://link.springer.com/chapter/10.1007/978-3-030-69377-0_10)提出了一种热点检测方法,该方法在时空空间中使用基于支配集问题的解决方案,覆盖了高密度出租车接客需求区域,并覆盖了一天中不同时段出租车接客需求密度适中的城市部分。本文提出了基于 k 跳占优集的算法,并利用实际数据集对其性能进行了评估,证明其优于现有的先进方法。它还将减少客户和司机寻找后续接送请求的等待时间。因此,这将使他们的利润最大化,并有助于改善他们的服务。
{"title":"Identifying and recommending taxi hotspots in spatio-temporal space","authors":"Saurabh Mishra, Sonia Khetarpaul","doi":"10.1007/s10707-024-00524-w","DOIUrl":"https://doi.org/10.1007/s10707-024-00524-w","url":null,"abstract":"<p>The GPS-driven mobile application-based ride-hailing systems, e.g., Uber and Ola, have become integral to daily life and natural transport choices for urban commuters. However, there is an imbalance between demand or pick-up requests and supply or drop-off requests in any area. The city planners and the researchers are working hard to balance this gap in demand and supply situation for taxi requests. The existing approaches have mainly focused on clustering the spatial regions to identify the hotspots, which refer to the locations with a high demand for pick-up requests. This study determined that if the hotspots focus on clustering high demand for pick-up requests, most of the hotspots pivot near the city center or in the two-three spatial regions, ignoring the other parts of the city. This paper (An earlier version of this paper was presented at the Australasian Database Conference and was published in its Proceedings: https://link.springer.com/chapter/10.1007/978-3-030-69377-0_10) presents a hotspot detection method that uses a dominating set problem-based solution in spatial-temporal space, which covers high-density taxi pick-up demand regions and covers those parts of the city with a moderate density of taxi pick-up demands during different hours of the day. The paper proposes algorithms based on <i>k</i>-hop dominating set; their performance is evaluated using real-world datasets and proves the edge over the existing state-of-the-art methods. It will also reduce the waiting time for customers and drivers looking for their subsequent pick-up requests. Therefore, this would maximize their profit and help improve their services.</p>","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"27 3 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141149131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-18DOI: 10.1007/s10707-024-00522-y
Mariana M Garcez Duarte, Mahmoud Sakr
Outlier detection and cleaning are essential steps in data preprocessing to ensure the integrity and validity of data analyses. This paper focuses on outlier points within individual trajectories, i.e., points that deviate significantly inside a single trajectory. We experiment with ten open-source libraries to comprehensively evaluate available tools, comparing their efficiency and accuracy in identifying and cleaning outliers. This experiment considers the libraries as they are offered to end users, with real-world applicability. We compare existing outlier detection libraries, introduce a method for establishing ground-truth, and aim to guide users in choosing the most appropriate tool for their specific outlier detection needs. Furthermore, we survey the state-of-the-art algorithms for outlier detection and classify them into five types: Statistic-based methods, Sliding window algorithms, Clustering-based methods, Graph-based methods, and Heuristic-based methods. Our research provides insights into these libraries’ performance and contributes to developing data preprocessing and outlier detection methodologies.
{"title":"An experimental study of existing tools for outlier detection and cleaning in trajectories","authors":"Mariana M Garcez Duarte, Mahmoud Sakr","doi":"10.1007/s10707-024-00522-y","DOIUrl":"https://doi.org/10.1007/s10707-024-00522-y","url":null,"abstract":"<p>Outlier detection and cleaning are essential steps in data preprocessing to ensure the integrity and validity of data analyses. This paper focuses on outlier points within individual trajectories, i.e., points that deviate significantly inside a single trajectory. We experiment with ten open-source libraries to comprehensively evaluate available tools, comparing their efficiency and accuracy in identifying and cleaning outliers. This experiment considers the libraries as they are offered to end users, with real-world applicability. We compare existing outlier detection libraries, introduce a method for establishing ground-truth, and aim to guide users in choosing the most appropriate tool for their specific outlier detection needs. Furthermore, we survey the state-of-the-art algorithms for outlier detection and classify them into five types: Statistic-based methods, Sliding window algorithms, Clustering-based methods, Graph-based methods, and Heuristic-based methods. Our research provides insights into these libraries’ performance and contributes to developing data preprocessing and outlier detection methodologies.</p>","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"25 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141061102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-29DOI: 10.1007/s10707-024-00516-w
Mohammad Abboud, Yehia Taher, Karine Zeitouni, Ana-Maria Olteanu-Raimond
The deteriorating air quality in urban areas, particularly in developing countries, has led to increased attention being paid to the issue. Daily reports of air pollution are essential to effectively manage public health risks. Pollution estimation has become crucial to expanding spatial and temporal coverage and estimating pollution levels at different locations. The emergence of low-cost sensors has enabled high-resolution data collection, either in fixed or mobile settings, and various approaches have been proposed to estimate air pollution using this technology. The objective of this study is to enhance the data from fixed stations by incorporating opportunistic mobile monitoring (OMM) data. The main research question we are dealing with is: How can we augment fixed station data through OMM? In order to address the challenge of limited OMM data availability, we leverage existing data collected during periods when the pollution maps align with those observed by the fixed stations. By combining the fixed and mobile data, we apply interpolation techniques to produce more accurate pollution maps. The efficacy of our approach is validated through experiments conducted on a real-life dataset.
{"title":"How opportunistic mobile monitoring can enhance air quality assessment?","authors":"Mohammad Abboud, Yehia Taher, Karine Zeitouni, Ana-Maria Olteanu-Raimond","doi":"10.1007/s10707-024-00516-w","DOIUrl":"https://doi.org/10.1007/s10707-024-00516-w","url":null,"abstract":"<p>The deteriorating air quality in urban areas, particularly in developing countries, has led to increased attention being paid to the issue. Daily reports of air pollution are essential to effectively manage public health risks. Pollution estimation has become crucial to expanding spatial and temporal coverage and estimating pollution levels at different locations. The emergence of low-cost sensors has enabled high-resolution data collection, either in fixed or mobile settings, and various approaches have been proposed to estimate air pollution using this technology. The objective of this study is to enhance the data from fixed stations by incorporating opportunistic mobile monitoring (OMM) data. The main research question we are dealing with is: How can we augment fixed station data through OMM? In order to address the challenge of limited OMM data availability, we leverage existing data collected during periods when the pollution maps align with those observed by the fixed stations. By combining the fixed and mobile data, we apply interpolation techniques to produce more accurate pollution maps. The efficacy of our approach is validated through experiments conducted on a real-life dataset.</p>","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"91 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}