主存数据库中的自适应并行哈希连接

A. M. Keller, S. Roy
{"title":"主存数据库中的自适应并行哈希连接","authors":"A. M. Keller, S. Roy","doi":"10.1109/PDIS.1991.183068","DOIUrl":null,"url":null,"abstract":"Presents an algorithm for parallel hash-join computation on main-memory databases that adapts to data skew, and its implementation on the IBM RP3 multiprocessor. The algorithm exploits the random access capabilities of main memory databases to detect and counteract skew on the fly. Data skew is detected at run time by monitoring the observed frequencies of values of the join attribute and applying to them a threshold function that takes account of the distribution of workload among processors. If and when this threshold is reached for certain values of the join attribute, the computation corresponding to it is fragmented among an appropriate number of processors. Fragmentation requires some replication of input tuples-modestly increasing the total workload, but reduces the completion time significantly by reducing workload at the overloaded processor. A simplified analysis is supplemented by experiments. The description and analysis of the algorithm are based on the shared-nothing model. The implementation uses hierarchical shared memory providing non-uniform memory access.<<ETX>>","PeriodicalId":210800,"journal":{"name":"[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"1991-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Adaptive parallel hash join in main-memory databases\",\"authors\":\"A. M. Keller, S. Roy\",\"doi\":\"10.1109/PDIS.1991.183068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Presents an algorithm for parallel hash-join computation on main-memory databases that adapts to data skew, and its implementation on the IBM RP3 multiprocessor. The algorithm exploits the random access capabilities of main memory databases to detect and counteract skew on the fly. Data skew is detected at run time by monitoring the observed frequencies of values of the join attribute and applying to them a threshold function that takes account of the distribution of workload among processors. If and when this threshold is reached for certain values of the join attribute, the computation corresponding to it is fragmented among an appropriate number of processors. Fragmentation requires some replication of input tuples-modestly increasing the total workload, but reduces the completion time significantly by reducing workload at the overloaded processor. A simplified analysis is supplemented by experiments. The description and analysis of the algorithm are based on the shared-nothing model. The implementation uses hierarchical shared memory providing non-uniform memory access.<<ETX>>\",\"PeriodicalId\":210800,\"journal\":{\"name\":\"[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1991-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDIS.1991.183068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDIS.1991.183068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

摘要

提出了一种适应数据倾斜的主存数据库并行哈希连接计算算法,并在IBM RP3多处理器上实现。该算法利用主内存数据库的随机访问能力来实时检测和抵消倾斜。数据倾斜是在运行时通过监视join属性值的观察频率并对其应用一个考虑处理器之间工作负载分布的阈值函数来检测的。如果并且当join属性的某些值达到这个阈值时,与之对应的计算将在适当数量的处理器之间进行分割。碎片需要对输入元组进行一些复制——适度地增加了总工作负载,但通过减少过载处理器上的工作负载,大大减少了完成时间。以实验作为简化分析的补充。算法的描述和分析基于无共享模型。实现使用分层共享内存,提供非统一的内存访问。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Adaptive parallel hash join in main-memory databases
Presents an algorithm for parallel hash-join computation on main-memory databases that adapts to data skew, and its implementation on the IBM RP3 multiprocessor. The algorithm exploits the random access capabilities of main memory databases to detect and counteract skew on the fly. Data skew is detected at run time by monitoring the observed frequencies of values of the join attribute and applying to them a threshold function that takes account of the distribution of workload among processors. If and when this threshold is reached for certain values of the join attribute, the computation corresponding to it is fragmented among an appropriate number of processors. Fragmentation requires some replication of input tuples-modestly increasing the total workload, but reduces the completion time significantly by reducing workload at the overloaded processor. A simplified analysis is supplemented by experiments. The description and analysis of the algorithm are based on the shared-nothing model. The implementation uses hierarchical shared memory providing non-uniform memory access.<>
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
High-quality and high-performance full-text document retrieval: the Parallel InfoGuide System Parallel sorting on a shared-nothing architecture using probabilistic splitting Marker-passing on a parallel knowledge processing testbed Achieving throughput and functionality in a common architecture: the Datacycle experiment Load balancing algorithms for parallel database processing on shared memory multiprocessors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1