Accuracy and Performance Trade-Offs of Logarithmic Number Units in Multi-Core Clusters

Michael Schaffner, Michael Gautschi, Frank K. Gürkaynak, L. Benini
{"title":"Accuracy and Performance Trade-Offs of Logarithmic Number Units in Multi-Core Clusters","authors":"Michael Schaffner, Michael Gautschi, Frank K. Gürkaynak, L. Benini","doi":"10.1109/ARITH.2016.10","DOIUrl":null,"url":null,"abstract":"When compared to traditional floating point (FP) number representation, logarithmic number systems (LNS) have superior performance when evaluating complex functions, since multiplications and divisions can be calculated with ease in the logarithmic domain. However, additions and subtractions become costly nonlinear operations. Efficient LNS units (LNUs) implementing ADD/SUB operations in hardware rely on interpolation techniques to save area. Even the most advanced LNUs are still larger than standard single-precision FPUs -- which renders them impractical for most general purpose processors. In this paper, we show that in a multi-core setting, when shared among several processor cores, LNUs become a very attractive solution. We present a methodology to generate LNUs with various error bounds and perform a design space exploration with different parameterizations. We show that already small precision relaxations in the order of a few units in the last place (ulp) reduce the LNU area significantly. Using examples from several signal processing domains, we demonstrate that shared approximate LNUs can outperform their standard FP counterpart on average by 2.14x in speed and 1.92x in energy-efficiency, with insignificant degradation of the output quality.","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARITH.2016.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

When compared to traditional floating point (FP) number representation, logarithmic number systems (LNS) have superior performance when evaluating complex functions, since multiplications and divisions can be calculated with ease in the logarithmic domain. However, additions and subtractions become costly nonlinear operations. Efficient LNS units (LNUs) implementing ADD/SUB operations in hardware rely on interpolation techniques to save area. Even the most advanced LNUs are still larger than standard single-precision FPUs -- which renders them impractical for most general purpose processors. In this paper, we show that in a multi-core setting, when shared among several processor cores, LNUs become a very attractive solution. We present a methodology to generate LNUs with various error bounds and perform a design space exploration with different parameterizations. We show that already small precision relaxations in the order of a few units in the last place (ulp) reduce the LNU area significantly. Using examples from several signal processing domains, we demonstrate that shared approximate LNUs can outperform their standard FP counterpart on average by 2.14x in speed and 1.92x in energy-efficiency, with insignificant degradation of the output quality.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多核集群中对数单位的精度和性能权衡
与传统的浮点数表示(FP)相比,对数数系统(LNS)在计算复杂函数时具有优越的性能,因为可以在对数域中轻松计算乘法和除法。然而,加法和减法成为昂贵的非线性操作。在硬件上实现ADD/SUB操作的高效LNS单元(lu)依靠插值技术来节省面积。即使是最先进的lu也比标准的单精度fpu要大,这使得它们对于大多数通用处理器来说都是不切实际的。在本文中,我们证明了在多核环境下,当在多个处理器内核之间共享时,lnu成为一个非常有吸引力的解决方案。我们提出了一种方法来生成具有不同误差界限的lu,并使用不同的参数化进行设计空间探索。我们表明,在最后一个位置(ulp)的几个单位的小精度松弛已经显著减少了LNU面积。使用来自几个信号处理领域的示例,我们证明了共享近似lnu的速度平均比标准FP的速度高出2.14倍,能效平均高出1.92倍,而输出质量却没有明显的下降。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Accuracy and Performance Trade-Offs of Logarithmic Number Units in Multi-Core Clusters Recovering Numerical Reproducibility in Hydrodynamic Simulations Multi-fault Attack Detection for RNS Cryptographic Architecture Accelerating Big Integer Arithmetic Using Intel IFMA Extensions A CRC-Based Concurrent Fault Detection Architecture for Galois/Counter Mode (GCM)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1