Strategies for using additional resources in parallel hash-based join algorithms

Xi Zhang, T. Kurç, T. Pan, Ümit V. Çatalyürek, S. Narayanan, P. Wyckoff, J. Saltz
{"title":"Strategies for using additional resources in parallel hash-based join algorithms","authors":"Xi Zhang, T. Kurç, T. Pan, Ümit V. Çatalyürek, S. Narayanan, P. Wyckoff, J. Saltz","doi":"10.1109/HPDC.2004.34","DOIUrl":null,"url":null,"abstract":"Hash-based join is a compute- and memory-intensive algorithm. It achieves good performance and scales well to large datasets, if sufficient memory is available to hold the hash table and the distribution of computing had across nodes is balanced. We compare three adaptive algorithms that start with a partitioning of the hash table across a group of nodes and expand during the hash table building phase to additional resources, when memory on a node is used up. The split-based algorithm partitions the hash table range assigned to the node, on which memory is full, into two segments and assigns one of the segments to a new node in the system. The replication-based algorithm replicates the hash table range on a new node. The hybrid algorithm combines the first and second strategies in order to address each strategy's short comings. We perform an experimental performance evaluation of these algorithms on a PC cluster. Our results show that among the three algorithms, in most cases the hybrid algorithm either performs close to the better of the two or is the best algorithm.","PeriodicalId":446429,"journal":{"name":"Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPDC.2004.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Hash-based join is a compute- and memory-intensive algorithm. It achieves good performance and scales well to large datasets, if sufficient memory is available to hold the hash table and the distribution of computing had across nodes is balanced. We compare three adaptive algorithms that start with a partitioning of the hash table across a group of nodes and expand during the hash table building phase to additional resources, when memory on a node is used up. The split-based algorithm partitions the hash table range assigned to the node, on which memory is full, into two segments and assigns one of the segments to a new node in the system. The replication-based algorithm replicates the hash table range on a new node. The hybrid algorithm combines the first and second strategies in order to address each strategy's short comings. We perform an experimental performance evaluation of these algorithms on a PC cluster. Our results show that among the three algorithms, in most cases the hybrid algorithm either performs close to the better of the two or is the best algorithm.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在基于哈希的并行连接算法中使用额外资源的策略
基于哈希的连接是一种计算和内存密集型算法。如果有足够的内存来保存哈希表,并且跨节点的计算分布是平衡的,那么它可以实现良好的性能并很好地扩展到大型数据集。我们比较了三种自适应算法,它们从跨一组节点对哈希表进行分区开始,并在哈希表构建阶段扩展到节点上的内存用尽时使用额外的资源。基于分割的算法将分配给内存满的节点的哈希表范围划分为两个段,并将其中一个段分配给系统中的新节点。基于复制的算法在新节点上复制哈希表范围。混合算法将第一种策略和第二种策略结合起来,以解决每种策略的缺点。我们在PC集群上对这些算法进行了实验性能评估。结果表明,在三种算法中,大多数情况下混合算法的性能接近于两者中的较好算法,或者是最好的算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Measuring and understanding user comfort with resource borrowing Globus and PlanetLab resource management solutions compared FPN: a distributed hash table for commercial applications GAIS: grid advanced information service based on P2P mechanism Utilization of a local grid of Mac OS X-based computers using Xgrid
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1