Accuracy and Performance Trade-Offs of Logarithmic Number Units in Multi-Core Clusters

2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH) Pub Date : 2016-07-10 DOI:10.1109/ARITH.2016.10

Michael Schaffner, Michael Gautschi, Frank K. Gürkaynak, L. Benini

{"title":"Accuracy and Performance Trade-Offs of Logarithmic Number Units in Multi-Core Clusters","authors":"Michael Schaffner, Michael Gautschi, Frank K. Gürkaynak, L. Benini","doi":"10.1109/ARITH.2016.10","DOIUrl":null,"url":null,"abstract":"When compared to traditional floating point (FP) number representation, logarithmic number systems (LNS) have superior performance when evaluating complex functions, since multiplications and divisions can be calculated with ease in the logarithmic domain. However, additions and subtractions become costly nonlinear operations. Efficient LNS units (LNUs) implementing ADD/SUB operations in hardware rely on interpolation techniques to save area. Even the most advanced LNUs are still larger than standard single-precision FPUs -- which renders them impractical for most general purpose processors. In this paper, we show that in a multi-core setting, when shared among several processor cores, LNUs become a very attractive solution. We present a methodology to generate LNUs with various error bounds and perform a design space exploration with different parameterizations. We show that already small precision relaxations in the order of a few units in the last place (ulp) reduce the LNU area significantly. Using examples from several signal processing domains, we demonstrate that shared approximate LNUs can outperform their standard FP counterpart on average by 2.14x in speed and 1.92x in energy-efficiency, with insignificant degradation of the output quality.","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARITH.2016.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

When compared to traditional floating point (FP) number representation, logarithmic number systems (LNS) have superior performance when evaluating complex functions, since multiplications and divisions can be calculated with ease in the logarithmic domain. However, additions and subtractions become costly nonlinear operations. Efficient LNS units (LNUs) implementing ADD/SUB operations in hardware rely on interpolation techniques to save area. Even the most advanced LNUs are still larger than standard single-precision FPUs -- which renders them impractical for most general purpose processors. In this paper, we show that in a multi-core setting, when shared among several processor cores, LNUs become a very attractive solution. We present a methodology to generate LNUs with various error bounds and perform a design space exploration with different parameterizations. We show that already small precision relaxations in the order of a few units in the last place (ulp) reduce the LNU area significantly. Using examples from several signal processing domains, we demonstrate that shared approximate LNUs can outperform their standard FP counterpart on average by 2.14x in speed and 1.92x in energy-efficiency, with insignificant degradation of the output quality.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多核集群中对数单位的精度和性能权衡

与传统的浮点数表示(FP)相比，对数数系统(LNS)在计算复杂函数时具有优越的性能，因为可以在对数域中轻松计算乘法和除法。然而，加法和减法成为昂贵的非线性操作。在硬件上实现ADD/SUB操作的高效LNS单元(lu)依靠插值技术来节省面积。即使是最先进的lu也比标准的单精度fpu要大，这使得它们对于大多数通用处理器来说都是不切实际的。在本文中，我们证明了在多核环境下，当在多个处理器内核之间共享时，lnu成为一个非常有吸引力的解决方案。我们提出了一种方法来生成具有不同误差界限的lu，并使用不同的参数化进行设计空间探索。我们表明，在最后一个位置(ulp)的几个单位的小精度松弛已经显著减少了LNU面积。使用来自几个信号处理领域的示例，我们证明了共享近似lnu的速度平均比标准FP的速度高出2.14倍，能效平均高出1.92倍，而输出质量却没有明显的下降。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)

自引率

0.00%

发文量

期刊最新文献

Accuracy and Performance Trade-Offs of Logarithmic Number Units in Multi-Core Clusters Recovering Numerical Reproducibility in Hydrodynamic Simulations Multi-fault Attack Detection for RNS Cryptographic Architecture Accelerating Big Integer Arithmetic Using Intel IFMA Extensions A CRC-Based Concurrent Fault Detection Architecture for Galois/Counter Mode (GCM)