首页 > 最新文献

2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)最新文献

英文 中文
Correctly Rounded Arbitrary-Precision Floating-Point Summation 正确舍入任意精度浮点和
Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.9
V. Lefèvre
We present a fast algorithm together with its low-level implementation of correctly rounded arbitrary-precision floating-point summation. The arithmetic is the one used by the GNU MPFR library: radix 2, no subnormals, each variable (each input and the output) has its own precision. We also describe how the implementation is tested.
我们提出了一种快速算法及其低级实现的正确舍入任意精度浮点和。算法是GNU MPFR库使用的算法:基数2,没有次法线,每个变量(每个输入和输出)都有自己的精度。我们还描述了如何测试实现。
{"title":"Correctly Rounded Arbitrary-Precision Floating-Point Summation","authors":"V. Lefèvre","doi":"10.1109/ARITH.2016.9","DOIUrl":"https://doi.org/10.1109/ARITH.2016.9","url":null,"abstract":"We present a fast algorithm together with its low-level implementation of correctly rounded arbitrary-precision floating-point summation. The arithmetic is the one used by the GNU MPFR library: radix 2, no subnormals, each variable (each input and the output) has its own precision. We also describe how the implementation is tested.","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128492362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Quad Precision Floating Point on the IBM z13 IBM z13上的四精度浮点
Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.26
C. Lichtenau, S. Carlough, S. M. Müller
When operating on a rapidly increasing amount of data, business analytics applications become sensitive to rounding errors, and profit from the higher stability and faster convergence of quad precision floating-point (FP-QP) arithmetic. The IBM z13TM supports this emerging trend around Big Data with an outstanding FP-QP performance. The paper details the vector and floating-point unit of IBM z13TM, with special focus on binary FP-QP. Except for divide and square root, these instructions are executed in the decimal engine. To operate such an 8-cycle decimal and quad precision pipeline at 5GHz required innovation around exponent handling, normalization, and rounding.
当对快速增长的数据量进行操作时,业务分析应用程序对舍入误差变得敏感,并从四精度浮点(FP-QP)算法的更高稳定性和更快的收敛性中获益。IBM z13TM以出色的FP-QP性能支持这一围绕大数据的新兴趋势。本文详细介绍了IBM z13TM的矢量和浮点单元,重点介绍了二进制FP-QP。除除法和平方根外,这些指令都在十进制引擎中执行。要在5GHz下运行这样的8周期十进制和四精度管道,需要在指数处理、归一化和舍入方面进行创新。
{"title":"Quad Precision Floating Point on the IBM z13","authors":"C. Lichtenau, S. Carlough, S. M. Müller","doi":"10.1109/ARITH.2016.26","DOIUrl":"https://doi.org/10.1109/ARITH.2016.26","url":null,"abstract":"When operating on a rapidly increasing amount of data, business analytics applications become sensitive to rounding errors, and profit from the higher stability and faster convergence of quad precision floating-point (FP-QP) arithmetic. The IBM z13TM supports this emerging trend around Big Data with an outstanding FP-QP performance. The paper details the vector and floating-point unit of IBM z13TM, with special focus on binary FP-QP. Except for divide and square root, these instructions are executed in the decimal engine. To operate such an 8-cycle decimal and quad precision pipeline at 5GHz required innovation around exponent handling, normalization, and rounding.","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132953351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Automated Design of Floating-Point Logarithm Functions on Integer Processors 整数处理器上浮点对数函数的自动化设计
Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.28
G. Revy
Nowadays the automated design of efficient floating-point implementations of correctly rounded elementary functions like cos, sin, log, exp, · · · is a real challenge. Indeed, the variety of hardware architectures and floating-point formats makes such implementation process tedious and error-prone. This article focuses on the particular case of floating-point logb(x) functions on integer processors. First it proposes a unified range reduction for logb(x), that enables to reduce the evaluation of these functions to a single well-chosen polynomial. Second it gives some sufficient conditions on the approximation and evaluation errors to guarantee correct rounding. And third it shows how to automate the implementation process on integer processors, when b ∈ {2, exp(1), 10}. Finally we illustrate how this automated approach enables to speedup the design of efficient implementations of logb(x) for standard floating-point formats.
如今,自动设计正确四舍五入的初等函数(如cos, sin, log, exp,···)的高效浮点实现是一个真正的挑战。实际上,各种各样的硬件架构和浮点格式使得这种实现过程冗长且容易出错。本文主要讨论整数处理器上的浮点logb(x)函数的特殊情况。首先,它提出了logb(x)的统一范围缩减,这使得这些函数的评估减少到一个单一的精心选择的多项式。其次给出了近似误差和求值误差的充分条件,保证了舍入的正确性。第三,它展示了当b∈{2,exp(1), 10}时,如何在整数处理器上实现自动化过程。最后,我们将说明这种自动化方法如何能够加快设计标准浮点格式的logb(x)的有效实现。
{"title":"Automated Design of Floating-Point Logarithm Functions on Integer Processors","authors":"G. Revy","doi":"10.1109/ARITH.2016.28","DOIUrl":"https://doi.org/10.1109/ARITH.2016.28","url":null,"abstract":"Nowadays the automated design of efficient floating-point implementations of correctly rounded elementary functions like cos, sin, log, exp, · · · is a real challenge. Indeed, the variety of hardware architectures and floating-point formats makes such implementation process tedious and error-prone. This article focuses on the particular case of floating-point logb(x) functions on integer processors. First it proposes a unified range reduction for logb(x), that enables to reduce the evaluation of these functions to a single well-chosen polynomial. Second it gives some sufficient conditions on the approximation and evaluation errors to guarantee correct rounding. And third it shows how to automate the implementation process on integer processors, when b ∈ {2, exp(1), 10}. Finally we illustrate how this automated approach enables to speedup the design of efficient implementations of logb(x) for standard floating-point formats.","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123225247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Accuracy and Performance Trade-Offs of Logarithmic Number Units in Multi-Core Clusters 多核集群中对数单位的精度和性能权衡
Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.10
Michael Schaffner, Michael Gautschi, Frank K. Gürkaynak, L. Benini
When compared to traditional floating point (FP) number representation, logarithmic number systems (LNS) have superior performance when evaluating complex functions, since multiplications and divisions can be calculated with ease in the logarithmic domain. However, additions and subtractions become costly nonlinear operations. Efficient LNS units (LNUs) implementing ADD/SUB operations in hardware rely on interpolation techniques to save area. Even the most advanced LNUs are still larger than standard single-precision FPUs -- which renders them impractical for most general purpose processors. In this paper, we show that in a multi-core setting, when shared among several processor cores, LNUs become a very attractive solution. We present a methodology to generate LNUs with various error bounds and perform a design space exploration with different parameterizations. We show that already small precision relaxations in the order of a few units in the last place (ulp) reduce the LNU area significantly. Using examples from several signal processing domains, we demonstrate that shared approximate LNUs can outperform their standard FP counterpart on average by 2.14x in speed and 1.92x in energy-efficiency, with insignificant degradation of the output quality.
与传统的浮点数表示(FP)相比,对数数系统(LNS)在计算复杂函数时具有优越的性能,因为可以在对数域中轻松计算乘法和除法。然而,加法和减法成为昂贵的非线性操作。在硬件上实现ADD/SUB操作的高效LNS单元(lu)依靠插值技术来节省面积。即使是最先进的lu也比标准的单精度fpu要大,这使得它们对于大多数通用处理器来说都是不切实际的。在本文中,我们证明了在多核环境下,当在多个处理器内核之间共享时,lnu成为一个非常有吸引力的解决方案。我们提出了一种方法来生成具有不同误差界限的lu,并使用不同的参数化进行设计空间探索。我们表明,在最后一个位置(ulp)的几个单位的小精度松弛已经显著减少了LNU面积。使用来自几个信号处理领域的示例,我们证明了共享近似lnu的速度平均比标准FP的速度高出2.14倍,能效平均高出1.92倍,而输出质量却没有明显的下降。
{"title":"Accuracy and Performance Trade-Offs of Logarithmic Number Units in Multi-Core Clusters","authors":"Michael Schaffner, Michael Gautschi, Frank K. Gürkaynak, L. Benini","doi":"10.1109/ARITH.2016.10","DOIUrl":"https://doi.org/10.1109/ARITH.2016.10","url":null,"abstract":"When compared to traditional floating point (FP) number representation, logarithmic number systems (LNS) have superior performance when evaluating complex functions, since multiplications and divisions can be calculated with ease in the logarithmic domain. However, additions and subtractions become costly nonlinear operations. Efficient LNS units (LNUs) implementing ADD/SUB operations in hardware rely on interpolation techniques to save area. Even the most advanced LNUs are still larger than standard single-precision FPUs -- which renders them impractical for most general purpose processors. In this paper, we show that in a multi-core setting, when shared among several processor cores, LNUs become a very attractive solution. We present a methodology to generate LNUs with various error bounds and perform a design space exploration with different parameterizations. We show that already small precision relaxations in the order of a few units in the last place (ulp) reduce the LNU area significantly. Using examples from several signal processing domains, we demonstrate that shared approximate LNUs can outperform their standard FP counterpart on average by 2.14x in speed and 1.92x in energy-efficiency, with insignificant degradation of the output quality.","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114144545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multi-fault Attack Detection for RNS Cryptographic Architecture 基于RNS密码体系结构的多故障攻击检测
Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.16
J. Bajard, J. Eynard, Nabil Merkiche
Residue Number Systems (RNS) have been a topic of interest for years. Many previous works show that RNS is a good candidate for fast computations in asymmetric cryptography by using its intrinsic parallelization features. A recent result demonstrates that redundant RNS and modular reduction can fit together efficiently, providing an efficient RNS modular reduction algorithm owning a single-fault detection capability. In this paper, we propose to generalize this approach by protecting the classical Cox-Rower architecture against multi-fault attacks. We prove that faults occurring at different places and at different times can be detected with a linear cost for the architecture and a constant time for the execution.
残数系统(RNS)多年来一直是人们感兴趣的一个话题。许多先前的研究表明,RNS利用其固有的并行化特征,是非对称加密中快速计算的良好候选者。最近的研究结果表明,冗余RNS和模块化约简可以有效地结合在一起,提供了一种具有单故障检测能力的高效RNS模块化约简算法。在本文中,我们提出通过保护经典Cox-Rower体系结构免受多故障攻击来推广这种方法。我们证明了在不同地点和不同时间发生的故障可以用线性的体系结构成本和恒定的执行时间来检测。
{"title":"Multi-fault Attack Detection for RNS Cryptographic Architecture","authors":"J. Bajard, J. Eynard, Nabil Merkiche","doi":"10.1109/ARITH.2016.16","DOIUrl":"https://doi.org/10.1109/ARITH.2016.16","url":null,"abstract":"Residue Number Systems (RNS) have been a topic of interest for years. Many previous works show that RNS is a good candidate for fast computations in asymmetric cryptography by using its intrinsic parallelization features. A recent result demonstrates that redundant RNS and modular reduction can fit together efficiently, providing an efficient RNS modular reduction algorithm owning a single-fault detection capability. In this paper, we propose to generalize this approach by protecting the classical Cox-Rower architecture against multi-fault attacks. We prove that faults occurring at different places and at different times can be detected with a linear cost for the architecture and a constant time for the execution.","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115379588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
An Iterative Logarithmic Multiplier with Improved Precision 改进精度的迭代对数乘法器
Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.25
Syed Ershad Ahmed, Sanket V. Kadam, M. Srinivas
Recent studies have demonstrated the potential for achieving higher area and power saving with approximate computation in error tolerant applications involving signal and image processing. Multiplication is a major mathematical operation in these applications which when performed in logarithmic number system results in faster and energy efficient design. In this paper, the authors present a method which combines the Mitchell's approximation and hardware truncation scheme in a novel way resulting in an iterative multiplier with improved precision and area. Further, proposed truncation approach and fractional predictor significantly reduce the overall hardware requirement of the multiplier. Experimental results prove the superiority of the proposed multiplier over previous designs.
最近的研究表明,在涉及信号和图像处理的容错应用中,通过近似计算可以实现更高的面积和功耗节约。乘法是这些应用程序中的主要数学运算,当在对数系统中执行时,会导致更快和更节能的设计。在本文中,作者提出了一种将米切尔近似和硬件截断方案结合起来的新方法,得到了精度和面积都有提高的迭代乘法器。此外,所提出的截断方法和分数预测器显著降低了乘法器的总体硬件要求。实验结果证明了所提出的乘法器优于以往的设计。
{"title":"An Iterative Logarithmic Multiplier with Improved Precision","authors":"Syed Ershad Ahmed, Sanket V. Kadam, M. Srinivas","doi":"10.1109/ARITH.2016.25","DOIUrl":"https://doi.org/10.1109/ARITH.2016.25","url":null,"abstract":"Recent studies have demonstrated the potential for achieving higher area and power saving with approximate computation in error tolerant applications involving signal and image processing. Multiplication is a major mathematical operation in these applications which when performed in logarithmic number system results in faster and energy efficient design. In this paper, the authors present a method which combines the Mitchell's approximation and hardware truncation scheme in a novel way resulting in an iterative multiplier with improved precision and area. Further, proposed truncation approach and fractional predictor significantly reduce the overall hardware requirement of the multiplier. Experimental results prove the superiority of the proposed multiplier over previous designs.","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116966501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A Formulation of Fast Carry Chains Suitable for Efficient Implementation with Majority Elements 适合多数元素高效实施的快速进位链公式
Pub Date : 2016-07-10 DOI: 10.1109/ARITH.2016.14
G. Jaberipur, B. Parhami, Dariush Abedi
Carry computation is a most important notion in computer arithmetic, because it dictates the speed of addition, which is in turn vital to high-speed computation, both as a directly used primitive and as a building block for synthesizing other operations. The theory of fast addition is well-established, but from time to time, changes in technology necessitate a reassessment of strategies for carry network implementation, even though the logical functions to be realized remain the same. We study the implications of the availability of simple, fast, and power-efficient majority gates (in technologies such as quantum-dot cellular automata, single-electron tunneling, tunneling phase logic, magnetic tunnel junction, and nanoscale bar magnets) to the design of carry networks, offering a reformulation of the carry recurrence that allows for building carry networks exclusively out of fully utilized majority elements. We compare our novel implementations based on 3-input majority elements to prior proposals based on these elements, demonstrating advantages in both speed and circuit complexity.
进位计算是计算机算术中最重要的概念,因为它决定了加法的速度,而加法对于高速计算是至关重要的,它既是直接使用的原语,也是综合其他运算的构建块。快速加法的理论是完善的,但有时,技术的变化需要重新评估进位网络的实现策略,即使要实现的逻辑功能保持不变。我们研究了简单、快速和节能的多数门(在量子点细胞自动机、单电子隧道、隧道相位逻辑、磁性隧道结和纳米条形磁铁等技术中)对携带网络设计的影响,提供了一种携带递归的重新表述,允许完全利用多数元素构建携带网络。我们将基于3输入多数元件的新实现与基于这些元件的先前提案进行了比较,展示了速度和电路复杂性方面的优势。
{"title":"A Formulation of Fast Carry Chains Suitable for Efficient Implementation with Majority Elements","authors":"G. Jaberipur, B. Parhami, Dariush Abedi","doi":"10.1109/ARITH.2016.14","DOIUrl":"https://doi.org/10.1109/ARITH.2016.14","url":null,"abstract":"Carry computation is a most important notion in computer arithmetic, because it dictates the speed of addition, which is in turn vital to high-speed computation, both as a directly used primitive and as a building block for synthesizing other operations. The theory of fast addition is well-established, but from time to time, changes in technology necessitate a reassessment of strategies for carry network implementation, even though the logical functions to be realized remain the same. We study the implications of the availability of simple, fast, and power-efficient majority gates (in technologies such as quantum-dot cellular automata, single-electron tunneling, tunneling phase logic, magnetic tunnel junction, and nanoscale bar magnets) to the design of carry networks, offering a reformulation of the carry recurrence that allows for building carry networks exclusively out of fully utilized majority elements. We compare our novel implementations based on 3-input majority elements to prior proposals based on these elements, demonstrating advantages in both speed and circuit complexity.","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126320929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Evaluating Straight-Line Programs over Balls 评估球上的直线规划
Pub Date : 2016-07-01 DOI: 10.1109/ARITH.2016.12
J. Hoeven, Grégoire Lecerf
Interval arithmetic achieves numerical reliability for a wide range of applications, at the price of a performance penalty. For applications to homotopy continuation, one key ingredient is the efficient and reliable evaluation of complex polynomials represented by straight-line programs. This is best achieved using ball arithmetic, a variant of interval arithmetic. In this article, we describe strategies for reducing the performance penalty of basic operations on balls. We also show how to bound the effect of rounding errors at the global level of evaluating a straight-line program. This allows us to introduce a new and faster “transient” variant of ball arithmetic.
区间算法以牺牲性能为代价,为广泛的应用实现了数值可靠性。对于同伦延拓的应用,一个关键的因素是直线规划表示的复数多项式的有效可靠的评价。这是最好的实现使用球算术,区间算术的一种变体。在本文中,我们描述了减少对球的基本操作的性能损失的策略。我们还展示了如何在评估直线程序的全局级别上限制舍入误差的影响。这允许我们引入一种新的更快的“瞬态”球算法。
{"title":"Evaluating Straight-Line Programs over Balls","authors":"J. Hoeven, Grégoire Lecerf","doi":"10.1109/ARITH.2016.12","DOIUrl":"https://doi.org/10.1109/ARITH.2016.12","url":null,"abstract":"Interval arithmetic achieves numerical reliability for a wide range of applications, at the price of a performance penalty. For applications to homotopy continuation, one key ingredient is the efficient and reliable evaluation of complex polynomials represented by straight-line programs. This is best achieved using ball arithmetic, a variant of interval arithmetic. In this article, we describe strategies for reducing the performance penalty of basic operations on balls. We also show how to bound the effect of rounding errors at the global level of evaluating a straight-line program. This allows us to introduce a new and faster “transient” variant of ball arithmetic.","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125510831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Computing floating-point logarithms with fixed-point operations 用定点运算计算浮点对数
Pub Date : 2016-07-01 DOI: 10.1109/ARITH.2016.24
Julien Le Maire, Nicolas Brunie, F. D. Dinechin, J. Muller
Elementary functions from the mathematical library input and output floating-point numbers. However it is possible to implement them purely using integer/fixed-point arithmetic. This option was not attractive between 1985 and 2005, because mainstream processor hardware supported 64-bit floating-point, but only 32-bit integers. This has changed in recent years, in particular with the generalization of native 64-bit integer support. The purpose of this article is therefore to reevaluate the relevance of computing floating-point functions in fixed-point. For this, several variants of the double-precision logarithm function are implemented and evaluated. Formulating the problem as a fixed-point one is easy after the range has been (classically) reduced. Then, 64-bit integers provide slightly more accuracy than 53-bit mantissa, which helps speed up the evaluation. Finally, multi-word arithmetic, critical for accurate implementations, is much faster in fixed-point, and natively supported by recent compilers. Thanks to all this, a purely integer implementation of the correctly rounded double-precision logarithm outperforms the previous state of the art, with the worst-case execution time reduced by a factor 5. This work also introduces variants of the logarithm that input a floating-point number and output the result in fixed-point. These are shown to be both more accurate and more efficient than the traditional floating-point functions for some applications.
初等函数从数学库输入和输出浮点数。然而,它是可能实现它们纯粹使用整数/定点算术。这个选项在1985年到2005年之间没有吸引力,因为主流处理器硬件支持64位浮点数,而只支持32位整数。近年来,这种情况发生了变化,特别是随着原生64位整数支持的推广。因此,本文的目的是重新评估在定点中计算浮点函数的相关性。为此,实现并评估了双精度对数函数的几种变体。在范围(经典地)缩小之后,将问题表述为不动点问题很容易。然后,64位整数提供比53位尾数稍高的精度,这有助于加快计算速度。最后,对于精确实现至关重要的多字算法在定点中要快得多,并且由最新的编译器本地支持。由于这一切,正确舍入双精度对数的纯整数实现优于以前的技术状态,最坏情况的执行时间减少了1 / 5。这项工作还介绍了对数的变体,这些变体输入浮点数并输出定点结果。在某些应用程序中,这些函数比传统的浮点函数更精确、更高效。
{"title":"Computing floating-point logarithms with fixed-point operations","authors":"Julien Le Maire, Nicolas Brunie, F. D. Dinechin, J. Muller","doi":"10.1109/ARITH.2016.24","DOIUrl":"https://doi.org/10.1109/ARITH.2016.24","url":null,"abstract":"Elementary functions from the mathematical library input and output floating-point numbers. However it is possible to implement them purely using integer/fixed-point arithmetic. This option was not attractive between 1985 and 2005, because mainstream processor hardware supported 64-bit floating-point, but only 32-bit integers. This has changed in recent years, in particular with the generalization of native 64-bit integer support. The purpose of this article is therefore to reevaluate the relevance of computing floating-point functions in fixed-point. For this, several variants of the double-precision logarithm function are implemented and evaluated. Formulating the problem as a fixed-point one is easy after the range has been (classically) reduced. Then, 64-bit integers provide slightly more accuracy than 53-bit mantissa, which helps speed up the evaluation. Finally, multi-word arithmetic, critical for accurate implementations, is much faster in fixed-point, and natively supported by recent compilers. Thanks to all this, a purely integer implementation of the correctly rounded double-precision logarithm outperforms the previous state of the art, with the worst-case execution time reduced by a factor 5. This work also introduces variants of the logarithm that input a floating-point number and output the result in fixed-point. These are shown to be both more accurate and more efficient than the traditional floating-point functions for some applications.","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"32 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114118645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
On-line Multiplication and Division in Real and Complex Bases 实数和复数基的在线乘法和除法
Pub Date : 2016-07-01 DOI: 10.1109/ARITH.2016.13
Marta Brzicova, Christiane Frougny, E. Pelantová, Milena Svobodová
A positional numeration system is given by a base and by a set of digits. The base is a real or complex number β such that |β| > 1, and the digit set A is a finite set of real or complex digits (including 0). In this paper, we first formulate a generalized version of the on-line algorithms for multiplication and division of Trivedi and Ercegovac for the cases that β is any real or complex number, and digits are real or complex. We show that if (β, A) satisfies the so-called (OL) Property, then on-line multiplication and division are feasible by the Trivedi-Ercegovac algorithms. For a real base β and alphabet A of contiguous integers, the system (β, A) has the (OL) Property if #A > |β| . Provided that addition and subtraction are realizable in parallel in the system (β, A), our on-line algorithms for multiplication and division have linear time complexity. Three examples are presented in detail: base β = 3+√5/2 with alphabet A = {-1, 0, 1}; base β = 2i with alphabet A = {-2, -1, 0, 1, 2} (redundant Knuth numeration system); and base β = -3/2 + z√3/2 = -1 + ω, where ω = exp 2iπ/3 , with alphabet A = {0, ±1, ±ω, ±ω2} (redundant Eisenstein numeration system).
位置计数系统由一个基数和一组数字给出。基是一个实数或复数β,使得|β| > 1,数字集a是一个有限的实数或复数(包括0)的集合。本文首先给出了Trivedi和Ercegovac在β为任意实数或复数,数字为实数或复数情况下的在线乘法和除法算法的一个推广版本。我们证明了如果(β, A)满足所谓的(OL)性质,则Trivedi-Ercegovac算法的在线乘法和除法是可行的。对于连续整数的实基β和字母a,系统(β, a)具有(OL)性质,如果# a > |β|。假设在系统(β, A)中可以并行实现加减法,则我们的在线乘法和除法算法具有线性时间复杂度。详细给出了三个例子:基底β = 3+√5/2,字母A = {- 1,0,1};base β = 2i,其中字母A ={-2, - 1,0,1,2}(冗余Knuth计数系统);基底β = -3/2 + z√3/2 = -1 + ω,其中ω = exp 2iπ/3,字母A ={0,±1,±ω,±ω2}(冗余爱森斯坦计数系统)。
{"title":"On-line Multiplication and Division in Real and Complex Bases","authors":"Marta Brzicova, Christiane Frougny, E. Pelantová, Milena Svobodová","doi":"10.1109/ARITH.2016.13","DOIUrl":"https://doi.org/10.1109/ARITH.2016.13","url":null,"abstract":"A positional numeration system is given by a base and by a set of digits. The base is a real or complex number β such that |β| > 1, and the digit set A is a finite set of real or complex digits (including 0). In this paper, we first formulate a generalized version of the on-line algorithms for multiplication and division of Trivedi and Ercegovac for the cases that β is any real or complex number, and digits are real or complex. We show that if (β, A) satisfies the so-called (OL) Property, then on-line multiplication and division are feasible by the Trivedi-Ercegovac algorithms. For a real base β and alphabet A of contiguous integers, the system (β, A) has the (OL) Property if #A > |β| . Provided that addition and subtraction are realizable in parallel in the system (β, A), our on-line algorithms for multiplication and division have linear time complexity. Three examples are presented in detail: base β = 3+√5/2 with alphabet A = {-1, 0, 1}; base β = 2i with alphabet A = {-2, -1, 0, 1, 2} (redundant Knuth numeration system); and base β = -3/2 + z√3/2 = -1 + ω, where ω = exp 2iπ/3 , with alphabet A = {0, ±1, ±ω, ±ω<sup>2</sup>} (redundant Eisenstein numeration system).","PeriodicalId":145448,"journal":{"name":"2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126355931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1