基于精确轨迹查询轨迹:基于Bloom过滤器的方法。

IF 2.2 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Geoinformatica Pub Date : 2021-01-01 Epub Date: 2021-03-15 DOI:10.1007/s10707-021-00433-2

Zengjie Wang, Wen Luo, Linwang Yuan, Hong Gao, Fan Wu, Xu Hu, Zhaoyuan Yu

{"title":"基于精确轨迹查询轨迹:基于Bloom过滤器的方法。","authors":"Zengjie Wang, Wen Luo, Linwang Yuan, Hong Gao, Fan Wu, Xu Hu, Zhaoyuan Yu","doi":"10.1007/s10707-021-00433-2","DOIUrl":null,"url":null,"abstract":"Fast and precise querying in a given set of trajectory points is an important issue of trajectory query. Typically, there are massive trajectory data in the database, yet the query sets only have a few points, which is a challenge for the superior performance of trajectory querying. The current trajectory query methods commonly use the tree-based index structure and the signature-based method to classify, simplify, and filter the trajectory to improve the performance. However, the unstructured essence and the spatiotemporal heterogeneity of the trajectory-sequence lead these methods to a high degree of spatial overlap, frequent I/O, and high memory occupation. Thus, they are not suitable for the time-critical tasks of trajectory big data. In this paper, a query method of trajectory is developed on the Bloom Filter. Based on the gridded space and geocoding, the spatial trajectory sequences (tracks) query is transformed into the query of the text string. The geospace was regularly divided by the geographic grid, and each cell was assigned an independent geocode, converting the high-dimensional irregular space trajectory query into a one-dimensional string query. The point in each cell is regarded as a signature, which forms a mapping to the bit-array of the Bloom Filter. This conversion effectively eliminates the high degree of overlap and instability of query performance. Meanwhile, the independent coding ensures the uniqueness of the whole tracks. In this method, there is no need for additional I/O on the raw trajectory data when the track is queried. Compared to the original data, the memory occupied by this method is negligible. Based on Beijing Taxi and Shenzhen bus trajectory data, an experiment using this method was constructed, and random queries under a variety of conditions boundaries were constructed. The results verified that the performance and stability of our method, compared to R*tree index, have been improved by 2000 to 4000 times, based on one million to tens of millions of trajectory data. And the Bloom Filter-based query method is hardly affected by grid size, original data size, and length of tracks. With such a time advantage, our method is suitable for time-critical spatial computation tasks, such as anti-terrorism, public safety, epidemic prevention, and control, etc.","PeriodicalId":55109,"journal":{"name":"Geoinformatica","volume":"25 2","pages":"397-416"},"PeriodicalIF":2.2000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s10707-021-00433-2","citationCount":"1","resultStr":"{\"title\":\"Query the trajectory based on the precise track: a Bloom filter-based approach.\",\"authors\":\"Zengjie Wang, Wen Luo, Linwang Yuan, Hong Gao, Fan Wu, Xu Hu, Zhaoyuan Yu\",\"doi\":\"10.1007/s10707-021-00433-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fast and precise querying in a given set of trajectory points is an important issue of trajectory query. Typically, there are massive trajectory data in the database, yet the query sets only have a few points, which is a challenge for the superior performance of trajectory querying. The current trajectory query methods commonly use the tree-based index structure and the signature-based method to classify, simplify, and filter the trajectory to improve the performance. However, the unstructured essence and the spatiotemporal heterogeneity of the trajectory-sequence lead these methods to a high degree of spatial overlap, frequent I/O, and high memory occupation. Thus, they are not suitable for the time-critical tasks of trajectory big data. In this paper, a query method of trajectory is developed on the Bloom Filter. Based on the gridded space and geocoding, the spatial trajectory sequences (tracks) query is transformed into the query of the text string. The geospace was regularly divided by the geographic grid, and each cell was assigned an independent geocode, converting the high-dimensional irregular space trajectory query into a one-dimensional string query. The point in each cell is regarded as a signature, which forms a mapping to the bit-array of the Bloom Filter. This conversion effectively eliminates the high degree of overlap and instability of query performance. Meanwhile, the independent coding ensures the uniqueness of the whole tracks. In this method, there is no need for additional I/O on the raw trajectory data when the track is queried. Compared to the original data, the memory occupied by this method is negligible. Based on Beijing Taxi and Shenzhen bus trajectory data, an experiment using this method was constructed, and random queries under a variety of conditions boundaries were constructed. The results verified that the performance and stability of our method, compared to R*tree index, have been improved by 2000 to 4000 times, based on one million to tens of millions of trajectory data. And the Bloom Filter-based query method is hardly affected by grid size, original data size, and length of tracks. With such a time advantage, our method is suitable for time-critical spatial computation tasks, such as anti-terrorism, public safety, epidemic prevention, and control, etc.\",\"PeriodicalId\":55109,\"journal\":{\"name\":\"Geoinformatica\",\"volume\":\"25 2\",\"pages\":\"397-416\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1007/s10707-021-00433-2\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geoinformatica\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10707-021-00433-2\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2021/3/15 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoinformatica","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10707-021-00433-2","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/3/15 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 1

摘要

在给定的一组轨迹点上快速精确地查询是轨迹查询的重要问题。通常情况下，数据库中有大量的轨迹数据，而查询集只有几个点，这对轨迹查询的性能是一个挑战。当前的轨迹查询方法通常采用基于树的索引结构和基于签名的方法对轨迹进行分类、简化和过滤，以提高性能。然而，轨迹序列的非结构化本质和时空异质性导致这些方法存在高度的空间重叠、频繁的I/O和高内存占用。因此，它们不适合轨迹大数据的时间要求苛刻的任务。本文提出了一种基于布隆过滤器的弹道查询方法。基于网格化空间和地理编码，将空间轨迹序列(tracks)查询转化为文本字符串查询。利用地理网格对地理空间进行规则划分，并为每个单元分配独立的地理编码，将高维不规则空间轨迹查询转化为一维字符串查询。每个单元格中的点被视为一个签名，它形成了到布隆过滤器位数组的映射。这种转换有效地消除了查询性能的高度重叠和不稳定性。同时，独立编码保证了整个音轨的唯一性。在这种方法中，当查询轨迹时，不需要对原始轨迹数据进行额外的I/O。与原始数据相比，该方法占用的内存可以忽略不计。以北京出租车和深圳公交轨迹数据为例，构建了该方法的实验，并构建了多种条件边界下的随机查询。结果表明，基于100万到数千万的轨迹数据，与R*树索引相比，我们的方法的性能和稳定性提高了2000到4000倍。基于Bloom filter的查询方法几乎不受网格大小、原始数据大小和轨道长度的影响。具有这样的时间优势，我们的方法适用于时间要求苛刻的空间计算任务，如反恐、公共安全、疫情防控等。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Query the trajectory based on the precise track: a Bloom filter-based approach.

Fast and precise querying in a given set of trajectory points is an important issue of trajectory query. Typically, there are massive trajectory data in the database, yet the query sets only have a few points, which is a challenge for the superior performance of trajectory querying. The current trajectory query methods commonly use the tree-based index structure and the signature-based method to classify, simplify, and filter the trajectory to improve the performance. However, the unstructured essence and the spatiotemporal heterogeneity of the trajectory-sequence lead these methods to a high degree of spatial overlap, frequent I/O, and high memory occupation. Thus, they are not suitable for the time-critical tasks of trajectory big data. In this paper, a query method of trajectory is developed on the Bloom Filter. Based on the gridded space and geocoding, the spatial trajectory sequences (tracks) query is transformed into the query of the text string. The geospace was regularly divided by the geographic grid, and each cell was assigned an independent geocode, converting the high-dimensional irregular space trajectory query into a one-dimensional string query. The point in each cell is regarded as a signature, which forms a mapping to the bit-array of the Bloom Filter. This conversion effectively eliminates the high degree of overlap and instability of query performance. Meanwhile, the independent coding ensures the uniqueness of the whole tracks. In this method, there is no need for additional I/O on the raw trajectory data when the track is queried. Compared to the original data, the memory occupied by this method is negligible. Based on Beijing Taxi and Shenzhen bus trajectory data, an experiment using this method was constructed, and random queries under a variety of conditions boundaries were constructed. The results verified that the performance and stability of our method, compared to R*tree index, have been improved by 2000 to 4000 times, based on one million to tens of millions of trajectory data. And the Bloom Filter-based query method is hardly affected by grid size, original data size, and length of tracks. With such a time advantage, our method is suitable for time-critical spatial computation tasks, such as anti-terrorism, public safety, epidemic prevention, and control, etc.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Geoinformatica 地学-计算机：信息系统

CiteScore

5.60

自引率

10.00%

发文量

审稿时长

6 months

期刊介绍： GeoInformatica is located at the confluence of two rapidly advancing domains: Computer Science and Geographic Information Science; nowadays, Earth studies use more and more sophisticated computing theory and tools, and computer processing of Earth observations through Geographic Information Systems (GIS) attracts a great deal of attention from governmental, industrial and research worlds. This journal aims to promote the most innovative results coming from the research in the field of computer science applied to geographic information systems. Thus, GeoInformatica provides an effective forum for disseminating original and fundamental research and experience in the rapidly advancing area of the use of computer science for spatial studies.