{"title":"Scan Stack: A Search-based Concurrent Stack for GPU","authors":"Noah South, B. Jang","doi":"10.1145/3564746.3587018","DOIUrl":null,"url":null,"abstract":"Concurrent data structures play a critical role in the overall performance of GPGPU applications. Stack is one of the basic data structures and finds numerous applications where data is processed in a Last In First Out (LIFO) fashion. Although concurrent stack is well researched for multi-core CPUs, there is little research pointing to the conversion of CPU stacks into a GPU-friendly form. In this paper, we propose a concurrent search-based GPU stack named Scan Stack. The proposed stack is designed to take advantage of GPU memory access patterns, memory coalescence, and thread structures (i.e., warps) to increase throughput. Our experiments on an NVIDIA RTX 3090 show that our proposed scan stack significantly improves the throughput and scalability for all benchmarks when reducing the search area. However, the greatest improvements are shown when elimination is possible, and this improvement reaches nearly 39 times what a non-optimized structure is capable of.","PeriodicalId":322431,"journal":{"name":"Proceedings of the 2023 ACM Southeast Conference","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 ACM Southeast Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3564746.3587018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Concurrent data structures play a critical role in the overall performance of GPGPU applications. Stack is one of the basic data structures and finds numerous applications where data is processed in a Last In First Out (LIFO) fashion. Although concurrent stack is well researched for multi-core CPUs, there is little research pointing to the conversion of CPU stacks into a GPU-friendly form. In this paper, we propose a concurrent search-based GPU stack named Scan Stack. The proposed stack is designed to take advantage of GPU memory access patterns, memory coalescence, and thread structures (i.e., warps) to increase throughput. Our experiments on an NVIDIA RTX 3090 show that our proposed scan stack significantly improves the throughput and scalability for all benchmarks when reducing the search area. However, the greatest improvements are shown when elimination is possible, and this improvement reaches nearly 39 times what a non-optimized structure is capable of.