Compressing Floating-Point Number Stream for Numerical Applications

Hisanobu Tomari, M. Inaba, K. Hiraki
{"title":"Compressing Floating-Point Number Stream for Numerical Applications","authors":"Hisanobu Tomari, M. Inaba, K. Hiraki","doi":"10.1109/IC-NC.2010.24","DOIUrl":null,"url":null,"abstract":"A cluster of commodity computers and general-purpose computers with accelerators such as GPGPUs are now common platforms to solve computationally intensive tasks like scientific simulations. Both technologies provide users with high performance at relatively low cost. However, the low bandwidth of interconnect compared to the computing performance hinders efficient operation of both cluster and accelerator in the case of many algorithms that require heavy data transmission. For clusters the network is one of the major performance bottlenecks, and for accelerators the peripheral bus to transfer data from host to the memory on the accelerator card is. In this paper, we propose a method of accelerating the performance of floating-point intensive algorithms by compressing the floating point number stream. With the efficient software encoder and hardware decoder, the method eliminates redundancy in the exponential part in the array of numbers on the stream and compacts the entire array to 82.8% of its original size at theoretical limit. The compression ratio is better than Gzip or Bzip2 for floating point numbers. The reduction in communication time directly leads to the reduction in total application running time for programs whose processing time is largely dominated by communication performance. We implemented a high-speed decoder using FPGA that operates at over 6 GB/s. We estimated the application performance using FFT and matrix multiplication on a cluster and the GRAPE-DR accelerator respectively, and our approach is useful in both configurations.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"201 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 First International Conference on Networking and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC-NC.2010.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

A cluster of commodity computers and general-purpose computers with accelerators such as GPGPUs are now common platforms to solve computationally intensive tasks like scientific simulations. Both technologies provide users with high performance at relatively low cost. However, the low bandwidth of interconnect compared to the computing performance hinders efficient operation of both cluster and accelerator in the case of many algorithms that require heavy data transmission. For clusters the network is one of the major performance bottlenecks, and for accelerators the peripheral bus to transfer data from host to the memory on the accelerator card is. In this paper, we propose a method of accelerating the performance of floating-point intensive algorithms by compressing the floating point number stream. With the efficient software encoder and hardware decoder, the method eliminates redundancy in the exponential part in the array of numbers on the stream and compacts the entire array to 82.8% of its original size at theoretical limit. The compression ratio is better than Gzip or Bzip2 for floating point numbers. The reduction in communication time directly leads to the reduction in total application running time for programs whose processing time is largely dominated by communication performance. We implemented a high-speed decoder using FPGA that operates at over 6 GB/s. We estimated the application performance using FFT and matrix multiplication on a cluster and the GRAPE-DR accelerator respectively, and our approach is useful in both configurations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于数值应用的浮点数流压缩
一组商用计算机和带有诸如gpgpu等加速器的通用计算机现在是解决科学模拟等计算密集型任务的通用平台。这两种技术都以相对较低的成本为用户提供了高性能。然而,在许多需要大量数据传输的算法中,与计算性能相比,互连的低带宽阻碍了集群和加速器的有效运行。对于集群来说,网络是主要的性能瓶颈之一,对于加速器来说,将数据从主机传输到加速器卡上的内存的外围总线是。本文提出了一种通过压缩浮点数流来提高浮点密集型算法性能的方法。该方法利用高效的软件编码器和硬件解码器,消除了流上数字数组中指数部分的冗余,在理论极限下将整个数组压缩到原始大小的82.8%。对于浮点数,压缩比优于Gzip或Bzip2。通信时间的减少直接导致程序的总应用程序运行时间的减少,这些程序的处理时间在很大程度上取决于通信性能。我们使用FPGA实现了一个运行速度超过6 GB/s的高速解码器。我们分别在集群和GRAPE-DR加速器上使用FFT和矩阵乘法来估计应用程序性能,我们的方法在这两种配置中都很有用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An Evaluation on Sensor Network Technologies for AMI Associated Mudslide Warning System Power Saving in Mobile Devices Using Context-Aware Resource Control An Adaptive Timeout Strategy for Profiling UDP Flows Adaptive Prefetching Scheme for Peer-to-Peer Video-on-Demand Systems with a Media Server Softassign and EM-ICP on GPU
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1