WarpCore: A Library for fast Hash Tables on GPUs

Daniel Jünger, Robin Kobus, André Müller, Christian Hundt, Kai Xu, Weiguo Liu, B. Schmidt
{"title":"WarpCore: A Library for fast Hash Tables on GPUs","authors":"Daniel Jünger, Robin Kobus, André Müller, Christian Hundt, Kai Xu, Weiguo Liu, B. Schmidt","doi":"10.1109/HiPC50609.2020.00015","DOIUrl":null,"url":null,"abstract":"Hash tables are ubiquitous. Properties such as an amortized constant time complexity for insertion and querying as well as a compact memory layout make them versatile associative data structures with manifold applications. The rapidly growing amount of data emerging in many fields motivated the need for accelerated hash tables designed for modern parallel architectures. In this work, we exploit the fast memory interface of modern GPUs together with a parallel hashing scheme tailored to improve global memory access patterns, to design WarpCore – a versatile library of hash table data structures. Unique device-sided operations allow for building high performance data processing pipelines entirely on the GPU. Our implementation achieves up to 1.6 billion inserts and up to 4.3 billion retrievals per second on a single GV100 GPU thereby outperforming the state-of-the-art solutions cuDPP, SlabHash, and NVIDIA RAPIDS cuDF. This performance advantage becomes even more pronounced for high load factors of over 90%. To overcome the memory limitation of a single GPU, we scale our approach over a dense NVLink topology which gives us close-to-optimal weak scaling on DGX servers. We further show how WarpCore can be used for accelerating a real world bioinformatics application (metagenomic classification) with speedups of over two orders-of-magnitude against state-of-the-art CPU-based solutions. WarpCore is open source software written in C++/CUDA-C and can be downloaded at https://github.com/sleeepyjack/warpcore.","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"642 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC50609.2020.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Hash tables are ubiquitous. Properties such as an amortized constant time complexity for insertion and querying as well as a compact memory layout make them versatile associative data structures with manifold applications. The rapidly growing amount of data emerging in many fields motivated the need for accelerated hash tables designed for modern parallel architectures. In this work, we exploit the fast memory interface of modern GPUs together with a parallel hashing scheme tailored to improve global memory access patterns, to design WarpCore – a versatile library of hash table data structures. Unique device-sided operations allow for building high performance data processing pipelines entirely on the GPU. Our implementation achieves up to 1.6 billion inserts and up to 4.3 billion retrievals per second on a single GV100 GPU thereby outperforming the state-of-the-art solutions cuDPP, SlabHash, and NVIDIA RAPIDS cuDF. This performance advantage becomes even more pronounced for high load factors of over 90%. To overcome the memory limitation of a single GPU, we scale our approach over a dense NVLink topology which gives us close-to-optimal weak scaling on DGX servers. We further show how WarpCore can be used for accelerating a real world bioinformatics application (metagenomic classification) with speedups of over two orders-of-magnitude against state-of-the-art CPU-based solutions. WarpCore is open source software written in C++/CUDA-C and can be downloaded at https://github.com/sleeepyjack/warpcore.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
WarpCore: gpu上的快速哈希表库
哈希表无处不在。诸如插入和查询的平摊常数时间复杂度以及紧凑的内存布局等属性使它们具有多种应用程序的通用关联数据结构。在许多领域中出现的快速增长的数据量激发了对为现代并行架构设计的加速哈希表的需求。在这项工作中,我们利用现代gpu的快速内存接口以及定制的并行哈希方案来改进全局内存访问模式,设计WarpCore -一个通用的哈希表数据结构库。独特的设备端操作允许完全在GPU上构建高性能数据处理管道。我们的实现在单个GV100 GPU上实现每秒高达16亿次插入和高达43亿次检索,从而优于最先进的解决方案cuDPP, SlabHash和NVIDIA RAPIDS cuDF。当负载系数超过90%时,这种性能优势变得更加明显。为了克服单个GPU的内存限制,我们在密集的NVLink拓扑上扩展我们的方法,这使我们在DGX服务器上实现了接近最佳的弱扩展。我们进一步展示了如何使用WarpCore来加速现实世界的生物信息学应用(宏基因组分类),与最先进的基于cpu的解决方案相比,其速度超过两个数量级。WarpCore是用c++ /CUDA-C编写的开源软件,可以从https://github.com/sleeepyjack/warpcore下载。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
HiPC 2020 ORGANIZATION HiPC 2020 Industry Sponsors PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks Algorithms for Preemptive Co-scheduling of Kernels on GPUs 27th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2020) Technical program
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1