PIEJoin: Towards Parallel Set Containment Joins

Anja Kunkel, Astrid Rheinländer, C. Schiefer, S. Helmer, Panagiotis Bouros, U. Leser
{"title":"PIEJoin: Towards Parallel Set Containment Joins","authors":"Anja Kunkel, Astrid Rheinländer, C. Schiefer, S. Helmer, Panagiotis Bouros, U. Leser","doi":"10.1145/2949689.2949694","DOIUrl":null,"url":null,"abstract":"The efficient computation of set containment joins (SCJ) over set-valued attributes is a well-studied problem with many applications in commercial and scientific fields. Nevertheless, there still exists a number of open questions: An extensive comparative evaluation is still missing, the two most recent algorithms have not yet been compared to each other, and the exact impact of item sort order and properties of the data on algorithms performance still is largely unknown. Furthermore, all previous works only considered sequential join algorithms, although modern servers offer ample opportunities for parallelization. We present PIEJoin, a novel algorithm for computing SCJ based on intersecting prefix trees built at runtime over the to-be-joined attributes. We also present a highly optimized implementation of PIEJoin which uses tree signatures for saving space and interval labeling for improving runtime of the basic method. Most importantly, PIEJoin can be parallelized easily by partitioning the tree intersection. A comprehensive evaluation on eight data sets shows that PIEJoin already in its sequential form clearly outperforms two of the three most important competitors (PRETTI and PRETTI+). It is mostly yet not always slower than the third, LIMIT+(opj) but requires significantly less space. The parallel version of PIEJoin we present here achieves significant further speed-ups, yet our evaluation also shows that further research is needed as finding the best way of partitioning the join turns out to be non-trivial.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2949689.2949694","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

The efficient computation of set containment joins (SCJ) over set-valued attributes is a well-studied problem with many applications in commercial and scientific fields. Nevertheless, there still exists a number of open questions: An extensive comparative evaluation is still missing, the two most recent algorithms have not yet been compared to each other, and the exact impact of item sort order and properties of the data on algorithms performance still is largely unknown. Furthermore, all previous works only considered sequential join algorithms, although modern servers offer ample opportunities for parallelization. We present PIEJoin, a novel algorithm for computing SCJ based on intersecting prefix trees built at runtime over the to-be-joined attributes. We also present a highly optimized implementation of PIEJoin which uses tree signatures for saving space and interval labeling for improving runtime of the basic method. Most importantly, PIEJoin can be parallelized easily by partitioning the tree intersection. A comprehensive evaluation on eight data sets shows that PIEJoin already in its sequential form clearly outperforms two of the three most important competitors (PRETTI and PRETTI+). It is mostly yet not always slower than the third, LIMIT+(opj) but requires significantly less space. The parallel version of PIEJoin we present here achieves significant further speed-ups, yet our evaluation also shows that further research is needed as finding the best way of partitioning the join turns out to be non-trivial.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PIEJoin:朝向平行集合包含连接
集值属性上的集合包容连接(SCJ)的高效计算是一个被广泛研究的问题,在商业和科学领域都有广泛的应用。然而,仍然存在许多悬而未决的问题:广泛的比较评估仍然缺失,两种最新的算法尚未相互比较,项目排序顺序和数据属性对算法性能的确切影响在很大程度上仍然未知。此外,尽管现代服务器为并行化提供了充足的机会,但以前的所有工作都只考虑顺序连接算法。我们提出了一种基于交叉前缀树的计算SCJ的新算法PIEJoin,该算法在运行时构建在待连接属性上。我们还提出了一个高度优化的PIEJoin实现,该实现使用树签名来节省空间,并使用间隔标记来改善基本方法的运行时间。最重要的是,通过划分树的交叉点,PIEJoin可以很容易地并行化。对8个数据集的综合评估表明,连续形式的PIEJoin明显优于三个最重要的竞争对手中的两个(PRETTI和PRETTI+)。它通常比第三种方法LIMIT+(opj)慢,但并不总是慢,但需要的空间要少得多。我们在这里介绍的并行版本的PIEJoin实现了显著的进一步加速,但是我们的评估也表明,需要进一步的研究,因为找到划分连接的最佳方法是非常重要的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
SMS: Stable Matching Algorithm using Skylines Graph-based modelling of query sets for differential privacy Efficient Feedback Collection for Pay-as-you-go Source Selection Multi-Assignment Single Joins for Parallel Cross-Match of Astronomic Catalogs on Heterogeneous Clusters Compact and queryable representation of raster datasets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1