FPGA-Accelerated compression of integer vectors

Mahmoud Mohsen, Norman May, Christian Färber, David Broneske
{"title":"FPGA-Accelerated compression of integer vectors","authors":"Mahmoud Mohsen, Norman May, Christian Färber, David Broneske","doi":"10.1145/3399666.3399932","DOIUrl":null,"url":null,"abstract":"An efficient compression of integer vectors is critical in dictionary-encoded column stores like SAP HANA to keep more data in the limited and precious main memory. Past research focused on lightweight compression techniques that trade low latency of data accesses for lower compression ratios. Consequently, only few columns in a wide table benefit from light-weight and effective compression schemes like run-length encoding, prefix compression or sparse encoding. Besides bit-packing, other columns remained uncompressed, which clearly misses opportunities for a better compression ratio for many columns. Furthermore, the main executor for compression was the CPU as compression involves heavy data transfer. Especially when used with co-processors, the data transfer overhead wipes out performance gains from co-processor usage. In this paper, we investigate whether we can achieve good compression ratios even for previously uncompressed columns by using binary packing and prefix suppression offloaded to an FPGA. As a streaming-processor, an FPGA is the perfect candidate to outsource the compression task. As a result of our OpenCL-based implementation, we achieve a saturation of the available PCIe bus during compression on the FPGA, by using less than a third the FPGA's resources. Furthermore, our real-world experiments against CPU-based SAP HANA shows a performance improvement of around a factor of 2 in compression throughput while compressing the data down to 60% of the best SAP HANA compression technique.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3399666.3399932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

An efficient compression of integer vectors is critical in dictionary-encoded column stores like SAP HANA to keep more data in the limited and precious main memory. Past research focused on lightweight compression techniques that trade low latency of data accesses for lower compression ratios. Consequently, only few columns in a wide table benefit from light-weight and effective compression schemes like run-length encoding, prefix compression or sparse encoding. Besides bit-packing, other columns remained uncompressed, which clearly misses opportunities for a better compression ratio for many columns. Furthermore, the main executor for compression was the CPU as compression involves heavy data transfer. Especially when used with co-processors, the data transfer overhead wipes out performance gains from co-processor usage. In this paper, we investigate whether we can achieve good compression ratios even for previously uncompressed columns by using binary packing and prefix suppression offloaded to an FPGA. As a streaming-processor, an FPGA is the perfect candidate to outsource the compression task. As a result of our OpenCL-based implementation, we achieve a saturation of the available PCIe bus during compression on the FPGA, by using less than a third the FPGA's resources. Furthermore, our real-world experiments against CPU-based SAP HANA shows a performance improvement of around a factor of 2 in compression throughput while compressing the data down to 60% of the best SAP HANA compression technique.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
fpga加速整型向量的压缩
在字典编码的列存储(如SAP HANA)中,整数向量的有效压缩对于在有限且宝贵的主内存中保存更多数据至关重要。过去的研究集中在轻量级压缩技术上,这些技术以低延迟的数据访问换取较低的压缩比。因此,宽表中只有少数列受益于轻量级和有效的压缩方案,如游程编码、前缀压缩或稀疏编码。除了位填充之外,其他列仍然未压缩,这显然错过了许多列获得更好压缩比的机会。此外,压缩的主要执行器是CPU,因为压缩涉及大量数据传输。特别是在与协处理器一起使用时,数据传输开销会抵消使用协处理器带来的性能收益。在本文中,我们研究了是否可以通过使用二进制封装和前缀抑制卸载到FPGA来获得良好的压缩比,甚至对于先前未压缩的列。作为流处理器,FPGA是外包压缩任务的最佳选择。由于我们基于opencl的实现,我们通过使用不到三分之一的FPGA资源,在FPGA压缩期间实现了可用PCIe总线的饱和。此外,我们针对基于cpu的SAP HANA的实际实验表明,在压缩吞吐量方面的性能提高了大约2倍,同时将数据压缩到最佳SAP HANA压缩技术的60%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Accelerating re-pair compression using FPGAs Scalable and robust latches for database systems Efficient generation of machine code for query compilers nKV Empirical evaluation across multiple GPU-accelerated DBMSes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1