FPGA-Accelerated compression of integer vectors

Proceedings of the 16th International Workshop on Data Management on New Hardware Pub Date : 2020-06-14 DOI:10.1145/3399666.3399932

Mahmoud Mohsen, Norman May, Christian Färber, David Broneske

{"title":"FPGA-Accelerated compression of integer vectors","authors":"Mahmoud Mohsen, Norman May, Christian Färber, David Broneske","doi":"10.1145/3399666.3399932","DOIUrl":null,"url":null,"abstract":"An efficient compression of integer vectors is critical in dictionary-encoded column stores like SAP HANA to keep more data in the limited and precious main memory. Past research focused on lightweight compression techniques that trade low latency of data accesses for lower compression ratios. Consequently, only few columns in a wide table benefit from light-weight and effective compression schemes like run-length encoding, prefix compression or sparse encoding. Besides bit-packing, other columns remained uncompressed, which clearly misses opportunities for a better compression ratio for many columns. Furthermore, the main executor for compression was the CPU as compression involves heavy data transfer. Especially when used with co-processors, the data transfer overhead wipes out performance gains from co-processor usage. In this paper, we investigate whether we can achieve good compression ratios even for previously uncompressed columns by using binary packing and prefix suppression offloaded to an FPGA. As a streaming-processor, an FPGA is the perfect candidate to outsource the compression task. As a result of our OpenCL-based implementation, we achieve a saturation of the available PCIe bus during compression on the FPGA, by using less than a third the FPGA's resources. Furthermore, our real-world experiments against CPU-based SAP HANA shows a performance improvement of around a factor of 2 in compression throughput while compressing the data down to 60% of the best SAP HANA compression technique.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3399666.3399932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

An efficient compression of integer vectors is critical in dictionary-encoded column stores like SAP HANA to keep more data in the limited and precious main memory. Past research focused on lightweight compression techniques that trade low latency of data accesses for lower compression ratios. Consequently, only few columns in a wide table benefit from light-weight and effective compression schemes like run-length encoding, prefix compression or sparse encoding. Besides bit-packing, other columns remained uncompressed, which clearly misses opportunities for a better compression ratio for many columns. Furthermore, the main executor for compression was the CPU as compression involves heavy data transfer. Especially when used with co-processors, the data transfer overhead wipes out performance gains from co-processor usage. In this paper, we investigate whether we can achieve good compression ratios even for previously uncompressed columns by using binary packing and prefix suppression offloaded to an FPGA. As a streaming-processor, an FPGA is the perfect candidate to outsource the compression task. As a result of our OpenCL-based implementation, we achieve a saturation of the available PCIe bus during compression on the FPGA, by using less than a third the FPGA's resources. Furthermore, our real-world experiments against CPU-based SAP HANA shows a performance improvement of around a factor of 2 in compression throughput while compressing the data down to 60% of the best SAP HANA compression technique.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

fpga加速整型向量的压缩

在字典编码的列存储(如SAP HANA)中，整数向量的有效压缩对于在有限且宝贵的主内存中保存更多数据至关重要。过去的研究集中在轻量级压缩技术上，这些技术以低延迟的数据访问换取较低的压缩比。因此，宽表中只有少数列受益于轻量级和有效的压缩方案，如游程编码、前缀压缩或稀疏编码。除了位填充之外，其他列仍然未压缩，这显然错过了许多列获得更好压缩比的机会。此外，压缩的主要执行器是CPU，因为压缩涉及大量数据传输。特别是在与协处理器一起使用时，数据传输开销会抵消使用协处理器带来的性能收益。在本文中，我们研究了是否可以通过使用二进制封装和前缀抑制卸载到FPGA来获得良好的压缩比，甚至对于先前未压缩的列。作为流处理器，FPGA是外包压缩任务的最佳选择。由于我们基于opencl的实现，我们通过使用不到三分之一的FPGA资源，在FPGA压缩期间实现了可用PCIe总线的饱和。此外，我们针对基于cpu的SAP HANA的实际实验表明，在压缩吞吐量方面的性能提高了大约2倍，同时将数据压缩到最佳SAP HANA压缩技术的60%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 16th International Workshop on Data Management on New Hardware

自引率

0.00%

发文量

期刊最新文献

Accelerating re-pair compression using FPGAs Scalable and robust latches for database systems Efficient generation of machine code for query compilers nKV Empirical evaluation across multiple GPU-accelerated DBMSes