主存数据库中的自适应并行哈希连接

[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems Pub Date : 1991-12-01 DOI:10.1109/PDIS.1991.183068

A. M. Keller, S. Roy

{"title":"主存数据库中的自适应并行哈希连接","authors":"A. M. Keller, S. Roy","doi":"10.1109/PDIS.1991.183068","DOIUrl":null,"url":null,"abstract":"Presents an algorithm for parallel hash-join computation on main-memory databases that adapts to data skew, and its implementation on the IBM RP3 multiprocessor. The algorithm exploits the random access capabilities of main memory databases to detect and counteract skew on the fly. Data skew is detected at run time by monitoring the observed frequencies of values of the join attribute and applying to them a threshold function that takes account of the distribution of workload among processors. If and when this threshold is reached for certain values of the join attribute, the computation corresponding to it is fragmented among an appropriate number of processors. Fragmentation requires some replication of input tuples-modestly increasing the total workload, but reduces the completion time significantly by reducing workload at the overloaded processor. A simplified analysis is supplemented by experiments. The description and analysis of the algorithm are based on the shared-nothing model. The implementation uses hierarchical shared memory providing non-uniform memory access.<<ETX>>","PeriodicalId":210800,"journal":{"name":"[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1991-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Adaptive parallel hash join in main-memory databases\",\"authors\":\"A. M. Keller, S. Roy\",\"doi\":\"10.1109/PDIS.1991.183068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Presents an algorithm for parallel hash-join computation on main-memory databases that adapts to data skew, and its implementation on the IBM RP3 multiprocessor. The algorithm exploits the random access capabilities of main memory databases to detect and counteract skew on the fly. Data skew is detected at run time by monitoring the observed frequencies of values of the join attribute and applying to them a threshold function that takes account of the distribution of workload among processors. If and when this threshold is reached for certain values of the join attribute, the computation corresponding to it is fragmented among an appropriate number of processors. Fragmentation requires some replication of input tuples-modestly increasing the total workload, but reduces the completion time significantly by reducing workload at the overloaded processor. A simplified analysis is supplemented by experiments. The description and analysis of the algorithm are based on the shared-nothing model. The implementation uses hierarchical shared memory providing non-uniform memory access.<<ETX>>\",\"PeriodicalId\":210800,\"journal\":{\"name\":\"[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1991-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDIS.1991.183068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDIS.1991.183068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

提出了一种适应数据倾斜的主存数据库并行哈希连接计算算法，并在IBM RP3多处理器上实现。该算法利用主内存数据库的随机访问能力来实时检测和抵消倾斜。数据倾斜是在运行时通过监视join属性值的观察频率并对其应用一个考虑处理器之间工作负载分布的阈值函数来检测的。如果并且当join属性的某些值达到这个阈值时，与之对应的计算将在适当数量的处理器之间进行分割。碎片需要对输入元组进行一些复制——适度地增加了总工作负载，但通过减少过载处理器上的工作负载，大大减少了完成时间。以实验作为简化分析的补充。算法的描述和分析基于无共享模型。实现使用分层共享内存，提供非统一的内存访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Adaptive parallel hash join in main-memory databases

Presents an algorithm for parallel hash-join computation on main-memory databases that adapts to data skew, and its implementation on the IBM RP3 multiprocessor. The algorithm exploits the random access capabilities of main memory databases to detect and counteract skew on the fly. Data skew is detected at run time by monitoring the observed frequencies of values of the join attribute and applying to them a threshold function that takes account of the distribution of workload among processors. If and when this threshold is reached for certain values of the join attribute, the computation corresponding to it is fragmented among an appropriate number of processors. Fragmentation requires some replication of input tuples-modestly increasing the total workload, but reduces the completion time significantly by reducing workload at the overloaded processor. A simplified analysis is supplemented by experiments. The description and analysis of the algorithm are based on the shared-nothing model. The implementation uses hierarchical shared memory providing non-uniform memory access.<>

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems

自引率

0.00%

发文量