ZFP-V: Hardware-Optimized Lossy Floating Point Compression

Gongjin Sun, S. Jun
{"title":"ZFP-V: Hardware-Optimized Lossy Floating Point Compression","authors":"Gongjin Sun, S. Jun","doi":"10.1109/ICFPT47387.2019.00022","DOIUrl":null,"url":null,"abstract":"Lossy floating point compression algorithms are critical components of reducing the cost and improving the performance of many modern applications, including machine learning and scientific computing. Data compression is widely used to reduce data storage requirements and transfer overhead, but traditional data-oblivious lossless compression schemes are very inefficient for floating point data. On the other hand, recently proposed lossy compression algorithms like ZFP and SZ achieve very high rates of compression while controlling the tolerable error margin. To the best of our knowledge, no efficient hardware implementation of ZFP exists yet, partially due to the inherently serial nature of the algorithm. In this paper, we present the design and implementation of ZFP-V, which identifies the serial portion of the ZFP algorithm and modifies it for more efficient hardware implementation. ZFP-V replaces the \"group testing\" part of ZFP with a variable-length header, which allows our hardware implementation to achieve up to 2x performance improvement compared to our best-effort hardware implementation of the original algorithm while using less on-chip resources, at a marginal reduction of compression ratio. We evaluate an OpenCL implementation of ZFP-V on an Intel Arria 10 FPGA using a variety of real-world scientific datasets, and show a single-pipeline throughput of 1 GB/s – 4 GB/s compression and 2 GB/s – 10 GB/s decompression on real-world datasets. Our implementation often outperforms a 32-thread software implementation on a high-end Intel Xeon CPU, and significantly outperforms a state-of-the-art FPGA implementation of SZ.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT47387.2019.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Lossy floating point compression algorithms are critical components of reducing the cost and improving the performance of many modern applications, including machine learning and scientific computing. Data compression is widely used to reduce data storage requirements and transfer overhead, but traditional data-oblivious lossless compression schemes are very inefficient for floating point data. On the other hand, recently proposed lossy compression algorithms like ZFP and SZ achieve very high rates of compression while controlling the tolerable error margin. To the best of our knowledge, no efficient hardware implementation of ZFP exists yet, partially due to the inherently serial nature of the algorithm. In this paper, we present the design and implementation of ZFP-V, which identifies the serial portion of the ZFP algorithm and modifies it for more efficient hardware implementation. ZFP-V replaces the "group testing" part of ZFP with a variable-length header, which allows our hardware implementation to achieve up to 2x performance improvement compared to our best-effort hardware implementation of the original algorithm while using less on-chip resources, at a marginal reduction of compression ratio. We evaluate an OpenCL implementation of ZFP-V on an Intel Arria 10 FPGA using a variety of real-world scientific datasets, and show a single-pipeline throughput of 1 GB/s – 4 GB/s compression and 2 GB/s – 10 GB/s decompression on real-world datasets. Our implementation often outperforms a 32-thread software implementation on a high-end Intel Xeon CPU, and significantly outperforms a state-of-the-art FPGA implementation of SZ.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
硬件优化有损浮点压缩
有损浮点压缩算法是降低成本和提高许多现代应用(包括机器学习和科学计算)性能的关键组成部分。数据压缩被广泛用于减少数据存储需求和传输开销,但传统的数据无关无损压缩方案对浮点数据的效率非常低。另一方面,最近提出的有损压缩算法,如ZFP和SZ,在控制可容忍误差范围的同时实现了非常高的压缩率。据我们所知,目前还没有有效的ZFP硬件实现,部分原因是该算法固有的串行特性。在本文中,我们提出了ZFP- v的设计和实现,它可以识别ZFP算法的串行部分,并对其进行修改,以提高硬件实现的效率。ZFP- v用可变长度的报头取代了ZFP的“组测试”部分,这使得我们的硬件实现与我们最努力的原始算法的硬件实现相比,在使用更少的片上资源的同时,在压缩比的边际降低下,实现了高达2倍的性能提升。我们使用各种真实世界的科学数据集,在英特尔Arria 10 FPGA上评估了ZFP-V的OpenCL实现,并在真实世界的数据集上展示了1 GB/s - 4 GB/s压缩和2 GB/s - 10 GB/s解压缩的单管道吞吐量。我们的实现通常优于高端Intel Xeon CPU上的32线程软件实现,并且显著优于SZ的最先进FPGA实现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
RNA: Reconfigurable LSTM Accelerator with Near Data Approximate Processing Time-SWAD: A Dataflow Engine for Time-Based Single Window Stream Aggregation Design and Development of Networked Multiple FPGA Components for Autonomous Tiny Robot Car ZFP-V: Hardware-Optimized Lossy Floating Point Compression Evolved Binary Neural Networks Through Harnessing FPGA Capabilities
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1