Reducing cache misses in hash join probing phase by pre-sorting strategy (abstract only)

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data Pub Date : 2012-05-20 DOI:10.1145/2213836.2213971

Gihwan Oh, Jae-Myung Kim, Woon-Hak Kang, Sang-Won Lee

{"title":"Reducing cache misses in hash join probing phase by pre-sorting strategy (abstract only)","authors":"Gihwan Oh, Jae-Myung Kim, Woon-Hak Kang, Sang-Won Lee","doi":"10.1145/2213836.2213971","DOIUrl":null,"url":null,"abstract":"Recently, several studies on multi-core cache-aware hash join have been carried out [Kim09VLDB, Blanas11SIGMOD]. In particular, the work of Blanas has shown that rather simple no-partitioning hash join can outperform the work of Kim. Meanwhile, the simple but best performing hash join of Blanas still experiences severe cache misses in probing phase. Because the key values of tuples in outer relation are not sorted or clustered, each outer record has different hashed key value and thus accesses the different hash bucket. Since the size of hash table of inner table is usually much larger than that of the CPU cache, it is highly probable that the reference to hash bucket of inner table by each outer record would encounter cache miss. To reduce the cache misses in hash join probing phase, we propose a new join algorithm, Sorted Probing (in short, SP), which pre-sorts the hashed key values of outer table of hash join so that the access to the hash bucket of inner table has strong temporal locality, thus minimizing the cache misses during the probing phase. As an optimization technique of sorting, we used the cache-aware AlphaSort technique, which extracts the key from each record of data set to be sorted and its pointer, and then sorts the pairs of (key, rec_ptr). For performance evaluation, we used two hash join algorithms from Blanas' work, no partitioning(NP) and independent partitioning(IP) in a standard C++ program, provided by Blanas. Also, we implemented the AlphaSort and added it before each probing phase of NP and IP, and we call each algorithm as NP+SP and IP+SP. For syntactic workload, IP+SP outperforms all other algorithms: IP+SP is faster than other altorithms up to 30%.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2213836.2213971","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, several studies on multi-core cache-aware hash join have been carried out [Kim09VLDB, Blanas11SIGMOD]. In particular, the work of Blanas has shown that rather simple no-partitioning hash join can outperform the work of Kim. Meanwhile, the simple but best performing hash join of Blanas still experiences severe cache misses in probing phase. Because the key values of tuples in outer relation are not sorted or clustered, each outer record has different hashed key value and thus accesses the different hash bucket. Since the size of hash table of inner table is usually much larger than that of the CPU cache, it is highly probable that the reference to hash bucket of inner table by each outer record would encounter cache miss. To reduce the cache misses in hash join probing phase, we propose a new join algorithm, Sorted Probing (in short, SP), which pre-sorts the hashed key values of outer table of hash join so that the access to the hash bucket of inner table has strong temporal locality, thus minimizing the cache misses during the probing phase. As an optimization technique of sorting, we used the cache-aware AlphaSort technique, which extracts the key from each record of data set to be sorted and its pointer, and then sorts the pairs of (key, rec_ptr). For performance evaluation, we used two hash join algorithms from Blanas' work, no partitioning(NP) and independent partitioning(IP) in a standard C++ program, provided by Blanas. Also, we implemented the AlphaSort and added it before each probing phase of NP and IP, and we call each algorithm as NP+SP and IP+SP. For syntactic workload, IP+SP outperforms all other algorithms: IP+SP is faster than other altorithms up to 30%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过预排序策略减少哈希连接探测阶段的缓存丢失(仅抽象)

最近，对多核缓存感知哈希连接进行了一些研究[Kim09VLDB, Blanas11SIGMOD]。特别是，Blanas的工作表明，相当简单的无分区哈希连接可以胜过Kim的工作。同时，Blanas的简单但性能最好的哈希连接在探测阶段仍然会遇到严重的缓存丢失。由于外部关系中的元组的键值没有排序或聚集，因此每个外部记录具有不同的散列键值，从而访问不同的散列桶。自哈希表的内部表的大小通常是更大的比CPU缓存,它极有可能参考每个外散列桶内部表的记录会遇到缓存小姐减少缓存错过在散列连接探测阶段,我们提出一个新的连接算法,排序调查(简而言之,SP),预先散列键值的外部表的散列连接,以便访问散列桶内表有强烈的时间局部性,从而最小化探测阶段的缓存丢失。作为排序的优化技术，我们使用了缓存感知的AlphaSort技术，该技术从要排序的数据集的每个记录及其指针中提取键，然后对(key, rec_ptr)对进行排序。为了进行性能评估，我们使用了Blanas提供的标准c++程序中的两种散列连接算法，即无分区(NP)和独立分区(IP)。我们还实现了AlphaSort，并在NP和IP的每个探测阶段之前添加它，我们将每个算法称为NP+SP和IP+SP。对于语法负载，IP+SP优于所有其他算法:IP+SP比其他算法速度快30%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

自引率

0.00%

发文量