WarpCore: A Library for fast Hash Tables on GPUs

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2020-09-16 DOI:10.1109/HiPC50609.2020.00015

Daniel Jünger, Robin Kobus, André Müller, Christian Hundt, Kai Xu, Weiguo Liu, B. Schmidt

{"title":"WarpCore: A Library for fast Hash Tables on GPUs","authors":"Daniel Jünger, Robin Kobus, André Müller, Christian Hundt, Kai Xu, Weiguo Liu, B. Schmidt","doi":"10.1109/HiPC50609.2020.00015","DOIUrl":null,"url":null,"abstract":"Hash tables are ubiquitous. Properties such as an amortized constant time complexity for insertion and querying as well as a compact memory layout make them versatile associative data structures with manifold applications. The rapidly growing amount of data emerging in many fields motivated the need for accelerated hash tables designed for modern parallel architectures. In this work, we exploit the fast memory interface of modern GPUs together with a parallel hashing scheme tailored to improve global memory access patterns, to design WarpCore – a versatile library of hash table data structures. Unique device-sided operations allow for building high performance data processing pipelines entirely on the GPU. Our implementation achieves up to 1.6 billion inserts and up to 4.3 billion retrievals per second on a single GV100 GPU thereby outperforming the state-of-the-art solutions cuDPP, SlabHash, and NVIDIA RAPIDS cuDF. This performance advantage becomes even more pronounced for high load factors of over 90%. To overcome the memory limitation of a single GPU, we scale our approach over a dense NVLink topology which gives us close-to-optimal weak scaling on DGX servers. We further show how WarpCore can be used for accelerating a real world bioinformatics application (metagenomic classification) with speedups of over two orders-of-magnitude against state-of-the-art CPU-based solutions. WarpCore is open source software written in C++/CUDA-C and can be downloaded at https://github.com/sleeepyjack/warpcore.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"642 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC50609.2020.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Hash tables are ubiquitous. Properties such as an amortized constant time complexity for insertion and querying as well as a compact memory layout make them versatile associative data structures with manifold applications. The rapidly growing amount of data emerging in many fields motivated the need for accelerated hash tables designed for modern parallel architectures. In this work, we exploit the fast memory interface of modern GPUs together with a parallel hashing scheme tailored to improve global memory access patterns, to design WarpCore – a versatile library of hash table data structures. Unique device-sided operations allow for building high performance data processing pipelines entirely on the GPU. Our implementation achieves up to 1.6 billion inserts and up to 4.3 billion retrievals per second on a single GV100 GPU thereby outperforming the state-of-the-art solutions cuDPP, SlabHash, and NVIDIA RAPIDS cuDF. This performance advantage becomes even more pronounced for high load factors of over 90%. To overcome the memory limitation of a single GPU, we scale our approach over a dense NVLink topology which gives us close-to-optimal weak scaling on DGX servers. We further show how WarpCore can be used for accelerating a real world bioinformatics application (metagenomic classification) with speedups of over two orders-of-magnitude against state-of-the-art CPU-based solutions. WarpCore is open source software written in C++/CUDA-C and can be downloaded at https://github.com/sleeepyjack/warpcore.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

WarpCore: gpu上的快速哈希表库

哈希表无处不在。诸如插入和查询的平摊常数时间复杂度以及紧凑的内存布局等属性使它们具有多种应用程序的通用关联数据结构。在许多领域中出现的快速增长的数据量激发了对为现代并行架构设计的加速哈希表的需求。在这项工作中，我们利用现代gpu的快速内存接口以及定制的并行哈希方案来改进全局内存访问模式，设计WarpCore -一个通用的哈希表数据结构库。独特的设备端操作允许完全在GPU上构建高性能数据处理管道。我们的实现在单个GV100 GPU上实现每秒高达16亿次插入和高达43亿次检索，从而优于最先进的解决方案cuDPP, SlabHash和NVIDIA RAPIDS cuDF。当负载系数超过90%时，这种性能优势变得更加明显。为了克服单个GPU的内存限制，我们在密集的NVLink拓扑上扩展我们的方法，这使我们在DGX服务器上实现了接近最佳的弱扩展。我们进一步展示了如何使用WarpCore来加速现实世界的生物信息学应用(宏基因组分类)，与最先进的基于cpu的解决方案相比，其速度超过两个数量级。WarpCore是用c++ /CUDA-C编写的开源软件，可以从https://github.com/sleeepyjack/warpcore下载。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)

自引率

0.00%

发文量

期刊最新文献

HiPC 2020 ORGANIZATION HiPC 2020 Industry Sponsors PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks Algorithms for Preemptive Co-scheduling of Kernels on GPUs 27th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2020) Technical program