体育场哈希:在gpu上可扩展和灵活的哈希

2015 International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2015-10-18 DOI:10.1109/PACT.2015.13

Farzad Khorasani, M. Belviranli, Rajiv Gupta, L. Bhuyan

{"title":"体育场哈希:在gpu上可扩展和灵活的哈希","authors":"Farzad Khorasani, M. Belviranli, Rajiv Gupta, L. Bhuyan","doi":"10.1109/PACT.2015.13","DOIUrl":null,"url":null,"abstract":"Hashing is one of the most fundamental operations that provides a means for a program to obtain fast access to large amounts of data. Despite the emergence of GPUs as many-threaded general purpose processors, high performance parallel data hashing solutions for GPUs are yet to receive adequate attention. Existing hashing solutions for GPUs not only impose restrictions (e.g., inability to concurrently execute insertion and retrieval operations, limitation on the size of key-value data pairs) that limit their applicability, their performance does not scale to large hash tables that must be kept out-of-core in the host memory. In this paper we present Stadium Hashing (Stash) that is scalable to large hash tables and practical as it does not impose the aforementioned restrictions. To support large out-of-core hash tables, Stash uses a compact data structure named ticket-board that is separate from hash table buckets and is held inside GPU global memory. Ticket-board locally resolves significant portion of insertion and lookup operations and hence, by reducing accesses to the host memory, it accelerates the execution of these operations. Split design of the ticket-board also enables arbitrarily large keys and values. Unlike existing methods, Stash naturally supports concurrent insertions and retrievals due to its use of double hashing as the collision resolution strategy. Furthermore, we propose Stash with collaborative lanes (clStash) that enhances GPU's SIMD resource utilization for batched insertions during hash table creation. For concurrent insertion and retrieval streams, Stadium hashing can be up to 2 and 3 times faster than GPU Cuckoo hashing for in-core and out-of-core tables respectively.","PeriodicalId":385398,"journal":{"name":"2015 International Conference on Parallel Architecture and Compilation (PACT)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"Stadium Hashing: Scalable and Flexible Hashing on GPUs\",\"authors\":\"Farzad Khorasani, M. Belviranli, Rajiv Gupta, L. Bhuyan\",\"doi\":\"10.1109/PACT.2015.13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hashing is one of the most fundamental operations that provides a means for a program to obtain fast access to large amounts of data. Despite the emergence of GPUs as many-threaded general purpose processors, high performance parallel data hashing solutions for GPUs are yet to receive adequate attention. Existing hashing solutions for GPUs not only impose restrictions (e.g., inability to concurrently execute insertion and retrieval operations, limitation on the size of key-value data pairs) that limit their applicability, their performance does not scale to large hash tables that must be kept out-of-core in the host memory. In this paper we present Stadium Hashing (Stash) that is scalable to large hash tables and practical as it does not impose the aforementioned restrictions. To support large out-of-core hash tables, Stash uses a compact data structure named ticket-board that is separate from hash table buckets and is held inside GPU global memory. Ticket-board locally resolves significant portion of insertion and lookup operations and hence, by reducing accesses to the host memory, it accelerates the execution of these operations. Split design of the ticket-board also enables arbitrarily large keys and values. Unlike existing methods, Stash naturally supports concurrent insertions and retrievals due to its use of double hashing as the collision resolution strategy. Furthermore, we propose Stash with collaborative lanes (clStash) that enhances GPU's SIMD resource utilization for batched insertions during hash table creation. For concurrent insertion and retrieval streams, Stadium hashing can be up to 2 and 3 times faster than GPU Cuckoo hashing for in-core and out-of-core tables respectively.\",\"PeriodicalId\":385398,\"journal\":{\"name\":\"2015 International Conference on Parallel Architecture and Compilation (PACT)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Parallel Architecture and Compilation (PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACT.2015.13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Parallel Architecture and Compilation (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2015.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

摘要

散列是最基本的操作之一，它为程序提供了一种快速访问大量数据的方法。尽管gpu作为多线程通用处理器出现，但gpu的高性能并行数据散列解决方案尚未得到足够的重视。现有的gpu散列解决方案不仅施加了限制(例如，无法并发执行插入和检索操作，限制键值数据对的大小)，限制了它们的适用性，而且它们的性能不能扩展到必须在主机内存中保存在核外的大型散列表。在本文中，我们介绍了体育场哈希(Stash)，它可扩展到大型哈希表，并且由于没有施加上述限制而实用。为了支持大型核外哈希表，Stash使用了一个名为ticket-board的紧凑数据结构，它与哈希表桶分开，并保存在GPU全局内存中。Ticket-board在本地解析了相当一部分插入和查找操作，因此，通过减少对主机内存的访问，它加速了这些操作的执行。分割设计的票板也允许任意大的键和值。与现有方法不同，由于使用双散列作为冲突解决策略，Stash自然支持并发插入和检索。此外，我们提出了具有协作通道的Stash (clStash)，它增强了GPU在哈希表创建期间批量插入的SIMD资源利用率。对于并发插入和检索流，Stadium散列可以分别比GPU Cuckoo散列在核内表和核外表上快2倍和3倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Stadium Hashing: Scalable and Flexible Hashing on GPUs

Hashing is one of the most fundamental operations that provides a means for a program to obtain fast access to large amounts of data. Despite the emergence of GPUs as many-threaded general purpose processors, high performance parallel data hashing solutions for GPUs are yet to receive adequate attention. Existing hashing solutions for GPUs not only impose restrictions (e.g., inability to concurrently execute insertion and retrieval operations, limitation on the size of key-value data pairs) that limit their applicability, their performance does not scale to large hash tables that must be kept out-of-core in the host memory. In this paper we present Stadium Hashing (Stash) that is scalable to large hash tables and practical as it does not impose the aforementioned restrictions. To support large out-of-core hash tables, Stash uses a compact data structure named ticket-board that is separate from hash table buckets and is held inside GPU global memory. Ticket-board locally resolves significant portion of insertion and lookup operations and hence, by reducing accesses to the host memory, it accelerates the execution of these operations. Split design of the ticket-board also enables arbitrarily large keys and values. Unlike existing methods, Stash naturally supports concurrent insertions and retrievals due to its use of double hashing as the collision resolution strategy. Furthermore, we propose Stash with collaborative lanes (clStash) that enhances GPU's SIMD resource utilization for batched insertions during hash table creation. For concurrent insertion and retrieval streams, Stadium hashing can be up to 2 and 3 times faster than GPU Cuckoo hashing for in-core and out-of-core tables respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 International Conference on Parallel Architecture and Compilation (PACT)

自引率

0.00%

发文量

期刊最新文献

Storage Consolidation on SSDs: Not Always a Panacea, but Can We Ease the Pain? AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures Scalable Task Scheduling and Synchronization Using Hierarchical Effects Scalable SIMD-Efficient Graph Processing on GPUs