gpu上的布谷鸟节点哈希

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC) Pub Date : 2022-07-01 DOI:10.1109/ISPDC55340.2022.00013

Muhammad Javed, Hao Zhou, David Troendle, Byunghyun Jang

{"title":"gpu上的布谷鸟节点哈希","authors":"Muhammad Javed, Hao Zhou, David Troendle, Byunghyun Jang","doi":"10.1109/ISPDC55340.2022.00013","DOIUrl":null,"url":null,"abstract":"The hash table finds numerous applications in many different domains, but its potential for non-coalesced memory accesses and execution divergence characteristics impose optimization challenges on GPUs. We propose a novel hash table design, referred to as Cuckoo Node Hashing, which aims to better exploit the massive data parallelism offered by GPUs. At the core of its design, we leverage Cuckoo Hashing, one of known hash table design schemes, in a closed-address manner, which, to our knowledge, is the first attempt on GPUs. We also propose an architecture-aware warp-cooperative reordering algorithm that improves the memory performance and thread divergence of Cuckoo Node Hashing and efficiently increases the likelihood of coalesced memory accesses in hash table operations. Our experiments show that Cuckoo Node Hashing outperforms and scales better than existing state-of-the-art GPU hash table designs such as DACHash and Slab Hash with a peak performance of 5.03 billion queries/second in static searching and 4.34 billion insertions/second in static building.","PeriodicalId":389334,"journal":{"name":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cuckoo Node Hashing on GPUs\",\"authors\":\"Muhammad Javed, Hao Zhou, David Troendle, Byunghyun Jang\",\"doi\":\"10.1109/ISPDC55340.2022.00013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The hash table finds numerous applications in many different domains, but its potential for non-coalesced memory accesses and execution divergence characteristics impose optimization challenges on GPUs. We propose a novel hash table design, referred to as Cuckoo Node Hashing, which aims to better exploit the massive data parallelism offered by GPUs. At the core of its design, we leverage Cuckoo Hashing, one of known hash table design schemes, in a closed-address manner, which, to our knowledge, is the first attempt on GPUs. We also propose an architecture-aware warp-cooperative reordering algorithm that improves the memory performance and thread divergence of Cuckoo Node Hashing and efficiently increases the likelihood of coalesced memory accesses in hash table operations. Our experiments show that Cuckoo Node Hashing outperforms and scales better than existing state-of-the-art GPU hash table designs such as DACHash and Slab Hash with a peak performance of 5.03 billion queries/second in static searching and 4.34 billion insertions/second in static building.\",\"PeriodicalId\":389334,\"journal\":{\"name\":\"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPDC55340.2022.00013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPDC55340.2022.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

哈希表可以在许多不同的领域中找到许多应用程序，但是它对非合并内存访问和执行发散特性的潜在影响给gpu带来了优化挑战。我们提出了一种新的哈希表设计，称为布谷鸟节点哈希，旨在更好地利用gpu提供的海量数据并行性。在其设计的核心，我们以封闭地址的方式利用布谷鸟哈希，这是已知的哈希表设计方案之一，据我们所知，这是gpu上的第一次尝试。我们还提出了一种架构感知的warp-cooperative重排序算法，该算法提高了Cuckoo Node哈希的内存性能和线程散度，并有效地增加了哈希表操作中合并内存访问的可能性。我们的实验表明，杜鹃节点哈希比现有的最先进的GPU哈希表设计(如DACHash和Slab hash)性能和可扩展性更好，静态搜索的峰值性能为50.3亿次查询/秒，静态构建的峰值性能为43.4亿次插入/秒。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Cuckoo Node Hashing on GPUs

The hash table finds numerous applications in many different domains, but its potential for non-coalesced memory accesses and execution divergence characteristics impose optimization challenges on GPUs. We propose a novel hash table design, referred to as Cuckoo Node Hashing, which aims to better exploit the massive data parallelism offered by GPUs. At the core of its design, we leverage Cuckoo Hashing, one of known hash table design schemes, in a closed-address manner, which, to our knowledge, is the first attempt on GPUs. We also propose an architecture-aware warp-cooperative reordering algorithm that improves the memory performance and thread divergence of Cuckoo Node Hashing and efficiently increases the likelihood of coalesced memory accesses in hash table operations. Our experiments show that Cuckoo Node Hashing outperforms and scales better than existing state-of-the-art GPU hash table designs such as DACHash and Slab Hash with a peak performance of 5.03 billion queries/second in static searching and 4.34 billion insertions/second in static building.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 21st International Symposium on Parallel and Distributed Computing (ISPDC)

自引率

0.00%

发文量

期刊最新文献

Estimating the Impact of Communication Schemes for Distributed Graph Processing Sponsors and Conference Support Performance Comparison of Speculative Taskloop and OpenMP-for-Loop Thread-Level Speculation on Hardware Transactional Memory [Full] Deep Heuristic for Broadcasting in Arbitrary Networks Analysis and Mitigation of Soft-Errors on High Performance Embedded GPUs