Farzad Khorasani, M. Belviranli, Rajiv Gupta, L. Bhuyan
{"title":"体育场哈希:在gpu上可扩展和灵活的哈希","authors":"Farzad Khorasani, M. Belviranli, Rajiv Gupta, L. Bhuyan","doi":"10.1109/PACT.2015.13","DOIUrl":null,"url":null,"abstract":"Hashing is one of the most fundamental operations that provides a means for a program to obtain fast access to large amounts of data. Despite the emergence of GPUs as many-threaded general purpose processors, high performance parallel data hashing solutions for GPUs are yet to receive adequate attention. Existing hashing solutions for GPUs not only impose restrictions (e.g., inability to concurrently execute insertion and retrieval operations, limitation on the size of key-value data pairs) that limit their applicability, their performance does not scale to large hash tables that must be kept out-of-core in the host memory. In this paper we present Stadium Hashing (Stash) that is scalable to large hash tables and practical as it does not impose the aforementioned restrictions. To support large out-of-core hash tables, Stash uses a compact data structure named ticket-board that is separate from hash table buckets and is held inside GPU global memory. Ticket-board locally resolves significant portion of insertion and lookup operations and hence, by reducing accesses to the host memory, it accelerates the execution of these operations. Split design of the ticket-board also enables arbitrarily large keys and values. Unlike existing methods, Stash naturally supports concurrent insertions and retrievals due to its use of double hashing as the collision resolution strategy. Furthermore, we propose Stash with collaborative lanes (clStash) that enhances GPU's SIMD resource utilization for batched insertions during hash table creation. For concurrent insertion and retrieval streams, Stadium hashing can be up to 2 and 3 times faster than GPU Cuckoo hashing for in-core and out-of-core tables respectively.","PeriodicalId":385398,"journal":{"name":"2015 International Conference on Parallel Architecture and Compilation (PACT)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"Stadium Hashing: Scalable and Flexible Hashing on GPUs\",\"authors\":\"Farzad Khorasani, M. Belviranli, Rajiv Gupta, L. Bhuyan\",\"doi\":\"10.1109/PACT.2015.13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hashing is one of the most fundamental operations that provides a means for a program to obtain fast access to large amounts of data. Despite the emergence of GPUs as many-threaded general purpose processors, high performance parallel data hashing solutions for GPUs are yet to receive adequate attention. Existing hashing solutions for GPUs not only impose restrictions (e.g., inability to concurrently execute insertion and retrieval operations, limitation on the size of key-value data pairs) that limit their applicability, their performance does not scale to large hash tables that must be kept out-of-core in the host memory. In this paper we present Stadium Hashing (Stash) that is scalable to large hash tables and practical as it does not impose the aforementioned restrictions. To support large out-of-core hash tables, Stash uses a compact data structure named ticket-board that is separate from hash table buckets and is held inside GPU global memory. Ticket-board locally resolves significant portion of insertion and lookup operations and hence, by reducing accesses to the host memory, it accelerates the execution of these operations. Split design of the ticket-board also enables arbitrarily large keys and values. Unlike existing methods, Stash naturally supports concurrent insertions and retrievals due to its use of double hashing as the collision resolution strategy. Furthermore, we propose Stash with collaborative lanes (clStash) that enhances GPU's SIMD resource utilization for batched insertions during hash table creation. For concurrent insertion and retrieval streams, Stadium hashing can be up to 2 and 3 times faster than GPU Cuckoo hashing for in-core and out-of-core tables respectively.\",\"PeriodicalId\":385398,\"journal\":{\"name\":\"2015 International Conference on Parallel Architecture and Compilation (PACT)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Parallel Architecture and Compilation (PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACT.2015.13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Parallel Architecture and Compilation (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2015.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Stadium Hashing: Scalable and Flexible Hashing on GPUs
Hashing is one of the most fundamental operations that provides a means for a program to obtain fast access to large amounts of data. Despite the emergence of GPUs as many-threaded general purpose processors, high performance parallel data hashing solutions for GPUs are yet to receive adequate attention. Existing hashing solutions for GPUs not only impose restrictions (e.g., inability to concurrently execute insertion and retrieval operations, limitation on the size of key-value data pairs) that limit their applicability, their performance does not scale to large hash tables that must be kept out-of-core in the host memory. In this paper we present Stadium Hashing (Stash) that is scalable to large hash tables and practical as it does not impose the aforementioned restrictions. To support large out-of-core hash tables, Stash uses a compact data structure named ticket-board that is separate from hash table buckets and is held inside GPU global memory. Ticket-board locally resolves significant portion of insertion and lookup operations and hence, by reducing accesses to the host memory, it accelerates the execution of these operations. Split design of the ticket-board also enables arbitrarily large keys and values. Unlike existing methods, Stash naturally supports concurrent insertions and retrievals due to its use of double hashing as the collision resolution strategy. Furthermore, we propose Stash with collaborative lanes (clStash) that enhances GPU's SIMD resource utilization for batched insertions during hash table creation. For concurrent insertion and retrieval streams, Stadium hashing can be up to 2 and 3 times faster than GPU Cuckoo hashing for in-core and out-of-core tables respectively.