Efficient temporal join processing using indices

Proceedings 18th International Conference on Data Engineering Pub Date : 2002-08-07 DOI:10.1109/ICDE.2002.994701

Donghui Zhang, V. Tsotras, B. Seeger

{"title":"Efficient temporal join processing using indices","authors":"Donghui Zhang, V. Tsotras, B. Seeger","doi":"10.1109/ICDE.2002.994701","DOIUrl":null,"url":null,"abstract":"We examine the problem of processing temporal joins in the presence of indexing schemes. Previous work on temporal joins has concentrated on non-indexed relations which were fully scanned. Given the large data volumes created by the ever increasing time dimension, sequential scanning is prohibitive. This is especially true when the temporal join involves only parts of the joining relations (e.g., a given time interval instead of the whole timeline). Utilizing an index becomes then beneficial as it directs the join to the data of interest. We consider temporal join algorithms for three representative indexing schemes, namely a B+-tree, an R*-tree and a temporal index, the Multiversion B+-tree (MVBT). Both the B+-tree and R*-tree result in simple but not efficient join algorithms because neither index achieves good temporal data clustering. Better clustering is maintained by the MVBT through record copying. Nevertheless, copies can greatly affect the correctness and effectiveness of the join algorithms. We identify these problems and propose efficient solutions and optimizations. An extensive comparison of all index based temporal joins, using a variety of datasets and query characteristics shows that the MVBT based join algorithms are consistently faster. In particular the link-based algorithm has the most robust behavior. In our experiments it showed a ten fold improvement over the R*-tree joins while it was between six and thirty times faster than the B+-tree joins.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"70","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 18th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2002.994701","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 70

Abstract

We examine the problem of processing temporal joins in the presence of indexing schemes. Previous work on temporal joins has concentrated on non-indexed relations which were fully scanned. Given the large data volumes created by the ever increasing time dimension, sequential scanning is prohibitive. This is especially true when the temporal join involves only parts of the joining relations (e.g., a given time interval instead of the whole timeline). Utilizing an index becomes then beneficial as it directs the join to the data of interest. We consider temporal join algorithms for three representative indexing schemes, namely a B+-tree, an R*-tree and a temporal index, the Multiversion B+-tree (MVBT). Both the B+-tree and R*-tree result in simple but not efficient join algorithms because neither index achieves good temporal data clustering. Better clustering is maintained by the MVBT through record copying. Nevertheless, copies can greatly affect the correctness and effectiveness of the join algorithms. We identify these problems and propose efficient solutions and optimizations. An extensive comparison of all index based temporal joins, using a variety of datasets and query characteristics shows that the MVBT based join algorithms are consistently faster. In particular the link-based algorithm has the most robust behavior. In our experiments it showed a ten fold improvement over the R*-tree joins while it was between six and thirty times faster than the B+-tree joins.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用索引进行有效的临时连接处理

我们研究了在存在索引方案的情况下处理时态连接的问题。以前关于时间连接的工作主要集中在非索引关系上，这些关系被完全扫描了。考虑到不断增加的时间维度所产生的大数据量，顺序扫描是令人望而却步的。当时间连接只涉及连接关系的一部分(例如，给定的时间间隔而不是整个时间轴)时尤其如此。利用索引是有益的，因为它将连接指向感兴趣的数据。我们考虑了三种代表性索引方案的时间连接算法，即B+树，R*树和时间索引，多版本B+树(MVBT)。B+树和R*树都会产生简单但效率不高的连接算法，因为这两个索引都无法实现良好的时间数据聚类。MVBT通过记录复制来维护更好的集群。然而，拷贝会极大地影响连接算法的正确性和有效性。我们识别这些问题并提出有效的解决方案和优化方案。对所有基于索引的时态连接(使用各种数据集和查询特征)的广泛比较表明，基于MVBT的连接算法始终更快。其中，基于链路的算法具有最强的鲁棒性。在我们的实验中，它比R*-tree连接快10倍，而比B+-tree连接快6到30倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings 18th International Conference on Data Engineering

自引率

0.00%

发文量

期刊最新文献

Out from under the trees [linear file template] Declarative composition and peer-to-peer provisioning of dynamic Web services Multivariate time series prediction via temporal classification Integrating workflow management systems with business-to-business interaction standards YFilter: efficient and scalable filtering of XML documents