Massively Parallel ANS Decoding on GPUs

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI:10.1145/3337821.3337888

André Weißenberger, B. Schmidt

{"title":"Massively Parallel ANS Decoding on GPUs","authors":"André Weißenberger, B. Schmidt","doi":"10.1145/3337821.3337888","DOIUrl":null,"url":null,"abstract":"In recent years, graphics processors have enabled significant advances in the fields of big data and streamed deep learning. In order to keep control of rapidly growing amounts of data and to achieve sufficient throughput rates, compression features are a key part of many applications including popular deep learning pipelines. However, as most of the respective APIs rely on CPU-based preprocessing for decoding, data decompression frequently becomes a bottleneck in accelerated compute systems. This establishes the need for efficient GPU-based solutions for decompression. Asymmetric numeral systems (ANS) represent a modern approach to entropy coding, combining superior compression results with high compression and decompression speeds. Concepts for parallelizing ANS decompression on GPUs have been published recently. However, they only exhibit limited scalability in practical applications. In this paper, we present the first massively parallel, arbitrarily scalable approach to ANS decoding on GPUs, based on a novel overflow pattern. Our performance evaluation on three different CUDA-enabled GPUs (V100, TITAN V, GTX 1080) demonstrates speedups of up to 17 over 64 CPU threads, up to 31 over a high performance SIMD-based solution, and up to 39 over Zstandard's entropy codec. Our implementation is publicly available at https://github.com/weissenberger/multians.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337888","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

In recent years, graphics processors have enabled significant advances in the fields of big data and streamed deep learning. In order to keep control of rapidly growing amounts of data and to achieve sufficient throughput rates, compression features are a key part of many applications including popular deep learning pipelines. However, as most of the respective APIs rely on CPU-based preprocessing for decoding, data decompression frequently becomes a bottleneck in accelerated compute systems. This establishes the need for efficient GPU-based solutions for decompression. Asymmetric numeral systems (ANS) represent a modern approach to entropy coding, combining superior compression results with high compression and decompression speeds. Concepts for parallelizing ANS decompression on GPUs have been published recently. However, they only exhibit limited scalability in practical applications. In this paper, we present the first massively parallel, arbitrarily scalable approach to ANS decoding on GPUs, based on a novel overflow pattern. Our performance evaluation on three different CUDA-enabled GPUs (V100, TITAN V, GTX 1080) demonstrates speedups of up to 17 over 64 CPU threads, up to 31 over a high performance SIMD-based solution, and up to 39 over Zstandard's entropy codec. Our implementation is publicly available at https://github.com/weissenberger/multians.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

gpu上的大规模并行ANS解码

近年来，图形处理器在大数据和流深度学习领域取得了重大进展。为了控制快速增长的数据量并实现足够的吞吐率，压缩功能是许多应用程序的关键部分，包括流行的深度学习管道。然而，由于大多数api都依赖于基于cpu的预处理来进行解码，因此数据解压缩经常成为加速计算系统中的瓶颈。这就需要高效的基于gpu的解压缩解决方案。非对称数字系统(ANS)代表了熵编码的一种现代方法，它将优越的压缩结果与高压缩和解压缩速度相结合。在gpu上并行化ANS解压缩的概念最近已经发表。然而，它们在实际应用中只表现出有限的可伸缩性。在本文中，我们提出了基于一种新颖的溢出模式的gpu上的第一个大规模并行，任意可扩展的ANS解码方法。我们在三种不同的支持cuda的gpu (V100, TITAN V, GTX 1080)上的性能评估表明，在64个CPU线程上的速度高达17，在高性能simd解决方案上的速度高达31，在Zstandard的熵编解码器上的速度高达39。我们的实现可以在https://github.com/weissenberger/multians上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 48th International Conference on Parallel Processing

自引率

0.00%

发文量

期刊最新文献

Express Link Placement for NoC-Based Many-Core Platforms Cartesian Collective Communication Artemis A Specialized Concurrent Queue for Scheduling Irregular Workloads on GPUs diBELLA: Distributed Long Read to Long Read Alignment