DITA: Distributed In-Memory Trajectory Analytics

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI:10.1145/3183713.3183743

Zeyuan Shang, Guoliang Li, Z. Bao

{"title":"DITA: Distributed In-Memory Trajectory Analytics","authors":"Zeyuan Shang, Guoliang Li, Z. Bao","doi":"10.1145/3183713.3183743","DOIUrl":null,"url":null,"abstract":"Trajectory analytics can benefit many real-world applications, e.g., frequent trajectory based navigation systems, road planning, car pooling, and transportation optimizations. Existing algorithms focus on optimizing this problem in a single machine. However, the amount of trajectories exceeds the storage and processing capability of a single machine, and it calls for large-scale trajectory analytics in distributed environments. The distributed trajectory analytics faces challenges of data locality aware partitioning, load balance, easy-to-use interface, and versatility to support various trajectory similarity functions. To address these challenges, we propose a distributed in-memory trajectory analytics system DITA. We propose an effective partitioning method, global index and local index, to address the data locality problem. We devise cost-based techniques to balance the workload. We develop a filter-verification framework to improve the performance. Moreover, DITA can support most of existing similarity functions to quantify the similarity between trajectories. We integrate our framework seamlessly into Spark SQL, and make it support SQL and DataFrame API interfaces. We have conducted extensive experiments on real world datasets, and experimental results show that DITA outperforms existing distributed trajectory similarity search and join approaches significantly.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"87","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3183713.3183743","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 87

Abstract

Trajectory analytics can benefit many real-world applications, e.g., frequent trajectory based navigation systems, road planning, car pooling, and transportation optimizations. Existing algorithms focus on optimizing this problem in a single machine. However, the amount of trajectories exceeds the storage and processing capability of a single machine, and it calls for large-scale trajectory analytics in distributed environments. The distributed trajectory analytics faces challenges of data locality aware partitioning, load balance, easy-to-use interface, and versatility to support various trajectory similarity functions. To address these challenges, we propose a distributed in-memory trajectory analytics system DITA. We propose an effective partitioning method, global index and local index, to address the data locality problem. We devise cost-based techniques to balance the workload. We develop a filter-verification framework to improve the performance. Moreover, DITA can support most of existing similarity functions to quantify the similarity between trajectories. We integrate our framework seamlessly into Spark SQL, and make it support SQL and DataFrame API interfaces. We have conducted extensive experiments on real world datasets, and experimental results show that DITA outperforms existing distributed trajectory similarity search and join approaches significantly.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DITA:分布式内存轨迹分析

轨迹分析可以使许多现实世界的应用受益，例如，基于频繁轨迹的导航系统、道路规划、拼车和交通优化。现有的算法侧重于在单个机器上优化这个问题。然而，轨迹的数量超过了单个机器的存储和处理能力，它需要在分布式环境中进行大规模的轨迹分析。分布式轨迹分析面临着数据位置感知划分、负载平衡、易于使用的界面以及支持各种轨迹相似函数的多功能性等挑战。为了解决这些挑战，我们提出了一个分布式内存轨迹分析系统DITA。为了解决数据局部性问题，我们提出了一种有效的分区方法:全局索引和局部索引。我们设计了基于成本的技术来平衡工作量。我们开发了一个过滤器验证框架来提高性能。此外，DITA可以支持大多数现有的相似度函数来量化轨迹之间的相似度。我们将我们的框架无缝集成到Spark SQL中，并使其支持SQL和DataFrame API接口。我们在真实世界的数据集上进行了大量的实验，实验结果表明DITA显著优于现有的分布式轨迹相似度搜索和连接方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2018 International Conference on Management of Data

自引率

0.00%

发文量

期刊最新文献

Meta-Dataflows: Efficient Exploratory Dataflow Jobs Columnstore and B+ tree - Are Hybrid Physical Designs Important? Demonstration of VerdictDB, the Platform-Independent AQP System Efficient Selection of Geospatial Data on Maps for Interactive and Visualized Exploration Session details: Keynote1