首页 > 最新文献

2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)最新文献

英文 中文
Proceedings of the 25th International Symposium on Computer Arithmetic 第25届计算机算术国际研讨会论文集
Pub Date : 2018-06-01 DOI: 10.1109/arith.2018.8464697
{"title":"Proceedings of the 25th International Symposium on Computer Arithmetic","authors":"","doi":"10.1109/arith.2018.8464697","DOIUrl":"https://doi.org/10.1109/arith.2018.8464697","url":null,"abstract":"","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88508282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Formally-Proved Algorithm to Compute the Correct Average of Decimal Floating-Point Numbers 计算十进制浮点数正确平均值的正式证明算法
Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464761
S. Boldo, Florian Faissole, Vincent Tourneur
Some modern processors include decimal floating-point units, with a conforming implementation of the IEEE-754 2008 standard. Unfortunately, many algorithms from the computer arithmetic literature are not correct anymore when computations are done in radix 10. This is in particular the case for the computation of the average of two floating-point numbers. Several radix-2 algorithms are available, including one that provides the correct rounding, but none hold in radix 10. This paper presents a new radix-10 algorithm that computes the correctly-rounded average. To guarantee a higher level of confidence, we also provide a Coq formal proof of our theorems, that takes gradual underflow into account. Note that our formal proof was generalized to ensure this algorithm is correct when computations are done with any even radix.
一些现代处理器包括十进制浮点单位,符合IEEE-754 2008标准。不幸的是,当以10为基数进行计算时,计算机算术文献中的许多算法不再正确。在计算两个浮点数的平均值时尤其如此。有几种可用的基数-2算法,包括一种提供正确舍入的算法,但没有一种算法适用于基数10。本文提出了一种新的计算正四舍五入平均值的基数-10算法。为了保证更高的置信度,我们还提供了对我们的定理的Coq形式化证明,它考虑了逐渐的下流。请注意,我们的形式证明是一般化的,以确保在使用任何偶数基数进行计算时该算法是正确的。
{"title":"A Formally-Proved Algorithm to Compute the Correct Average of Decimal Floating-Point Numbers","authors":"S. Boldo, Florian Faissole, Vincent Tourneur","doi":"10.1109/ARITH.2018.8464761","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464761","url":null,"abstract":"Some modern processors include decimal floating-point units, with a conforming implementation of the IEEE-754 2008 standard. Unfortunately, many algorithms from the computer arithmetic literature are not correct anymore when computations are done in radix 10. This is in particular the case for the computation of the average of two floating-point numbers. Several radix-2 algorithms are available, including one that provides the correct rounding, but none hold in radix 10. This paper presents a new radix-10 algorithm that computes the correctly-rounded average. To guarantee a higher level of confidence, we also provide a Coq formal proof of our theorems, that takes gradual underflow into account. Note that our formal proof was generalized to ensure this algorithm is correct when computations are done with any even radix.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"11 1","pages":"69-75"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85280178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Karatsuba with Rectangular Multipliers for FPGAs 用于fpga的矩形乘法器
Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464809
M. Kumm, O. Gustafsson, F. D. Dinechin, Johannes Kappauf, P. Zipf
This work presents an extension of Karatsuba's method to efficiently use rectangular multipliers as a base for larger multipliers. The rectangular multipliers that motivate this work are the embedded 18 ⨯ 25-bit signed multipliers found in the DSP blocks of recent Xilinx FPGAs: The traditional Karatsuba approach must under-use them as square 18 ⨯ 18 ones. This work shows that rectangular multipliers can be efficiently exploited in a modified Karatsuba method if their input word sizes have a large greatest common divider. In the Xilinx FPG A case, this can be obtained by using the embedded multipliers as 16 ⨯ 24 unsigned and as 17 ⨯ 25 signed ones. The obtained architectures are implemented with due detail to architectural features such as the pre-adders and post-adders available in Xilinx DSP blocks. They are synthesized and compared with traditional Karatsuba, but also with (non-Karatsuba) state-of-the-art tiling techniques that make use of the full rectangular multipliers. The proposed technique improves resource consumption and performance for multipliers of numbers larger than 64 bits.
这项工作提出了Karatsuba方法的扩展,以有效地使用矩形乘法器作为更大乘法器的基础。激发这项工作的矩形乘法器是在最新的Xilinx fpga的DSP块中发现的嵌入式25位带符号乘法器:传统的Karatsuba方法必须将它们作为方形乘法器使用。这项工作表明,如果矩形乘法器的输入字大小具有较大的最大公约数,则可以在改进的Karatsuba方法中有效地利用矩形乘法器。在Xilinx FPG A的情况下,这可以通过使用16个无符号乘法器和17个有符号乘法器来获得。所获得的体系结构被详细地实现,如Xilinx DSP模块中可用的前置加法器和后置加法器。它们与传统的Karatsuba进行了合成和比较,但也使用了(非Karatsuba)最先进的平铺技术,利用了完整的矩形乘法器。所提出的技术改善了大于64位数字乘法器的资源消耗和性能。
{"title":"Karatsuba with Rectangular Multipliers for FPGAs","authors":"M. Kumm, O. Gustafsson, F. D. Dinechin, Johannes Kappauf, P. Zipf","doi":"10.1109/ARITH.2018.8464809","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464809","url":null,"abstract":"This work presents an extension of Karatsuba's method to efficiently use rectangular multipliers as a base for larger multipliers. The rectangular multipliers that motivate this work are the embedded 18 ⨯ 25-bit signed multipliers found in the DSP blocks of recent Xilinx FPGAs: The traditional Karatsuba approach must under-use them as square 18 ⨯ 18 ones. This work shows that rectangular multipliers can be efficiently exploited in a modified Karatsuba method if their input word sizes have a large greatest common divider. In the Xilinx FPG A case, this can be obtained by using the embedded multipliers as 16 ⨯ 24 unsigned and as 17 ⨯ 25 signed ones. The obtained architectures are implemented with due detail to architectural features such as the pre-adders and post-adders available in Xilinx DSP blocks. They are synthesized and compared with traditional Karatsuba, but also with (non-Karatsuba) state-of-the-art tiling techniques that make use of the full rectangular multipliers. The proposed technique improves resource consumption and performance for multipliers of numbers larger than 64 bits.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"43 1","pages":"13-20"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91272278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Digit Elision for Arbitrary-accuracy Iterative Computation 任意精度迭代计算中的数字省略
Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464691
He Li, James J. Davis, John Wickerson, G. Constantinides
We recently proposed the first hardware architecture enabling the iterative solution of systems of linear equations to accuracies limited only by the amount of available memory. This technique, named ARCHITECT, achieves exact numeric computation by using online arithmetic to allow the refinement of results from earlier iterations over time, eschewing rounding error. ARCHITECT has a key drawback, however: often, many more digits than strictly necessary are generated, with this problem exacerbating the more accurate a solution is sought. In this paper, we infer the locations of these superfluous digits within stationary iterative calculations by exploiting online arithmetic's digit dependencies and using forward error analysis. We demonstrate that their lack of computation is guaranteed not to affect the ability to reach a solution of any accuracy. Versus ARCHITECT, our illustrative hardware implementation achieves a geometric mean 20.1× speedup in the solution of a set of representative linear systems through the avoidance of redundant digit calculation. For the computation of high-precision results, we also obtain an up-to 22.4times× memory requirement reduction over the same baseline. Finally, we demonstrate that solvers implemented following our proposals can show superiority over conventional arithmetic implementations by virtue of their runtime-tunable precisions.
我们最近提出了第一个硬件架构,使线性方程组的迭代求解精度仅受可用内存的限制。这种名为ARCHITECT的技术通过使用在线算法来实现精确的数值计算,从而允许对早期迭代的结果进行细化,避免舍入误差。然而,ARCHITECT有一个关键的缺点:通常,生成的数字比严格要求的要多,这个问题在寻求更精确的解决方案时加剧了。在本文中,我们通过利用在线算法的数字依赖关系和前向误差分析来推断这些多余数字在平稳迭代计算中的位置。我们证明,他们的缺乏计算是保证不影响的能力,以达到任何精度的解决方案。与ARCHITECT相比,我们的说明性硬件实现通过避免冗余数字计算,在一组代表性线性系统的解决方案中实现了几何平均20.1倍的加速。对于高精度结果的计算,在相同的基线上,我们还获得了高达22.4 timesx的内存需求减少。最后,我们证明了根据我们的建议实现的求解器可以凭借其运行时可调的精度显示出优于传统算法实现的优势。
{"title":"Digit Elision for Arbitrary-accuracy Iterative Computation","authors":"He Li, James J. Davis, John Wickerson, G. Constantinides","doi":"10.1109/ARITH.2018.8464691","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464691","url":null,"abstract":"We recently proposed the first hardware architecture enabling the iterative solution of systems of linear equations to accuracies limited only by the amount of available memory. This technique, named ARCHITECT, achieves exact numeric computation by using online arithmetic to allow the refinement of results from earlier iterations over time, eschewing rounding error. ARCHITECT has a key drawback, however: often, many more digits than strictly necessary are generated, with this problem exacerbating the more accurate a solution is sought. In this paper, we infer the locations of these superfluous digits within stationary iterative calculations by exploiting online arithmetic's digit dependencies and using forward error analysis. We demonstrate that their lack of computation is guaranteed not to affect the ability to reach a solution of any accuracy. Versus ARCHITECT, our illustrative hardware implementation achieves a geometric mean 20.1× speedup in the solution of a set of representative linear systems through the avoidance of redundant digit calculation. For the computation of high-precision results, we also obtain an up-to 22.4times× memory requirement reduction over the same baseline. Finally, we demonstrate that solvers implemented following our proposals can show superiority over conventional arithmetic implementations by virtue of their runtime-tunable precisions.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"6 1","pages":"107-114"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86552219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
VeriTracer: Context-enriched tracer for floating-point arithmetic analysis VeriTracer:用于浮点算术分析的上下文丰富的跟踪程序
Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464687
Yohan Chatelain, P. D. O. Castro, E. Petit, D. Defour, J. Bieder, M. Torrent
VeriTracer automatically instruments a code and traces the accuracy of floating-point variables over time. VeriTracer enriches the visual traces with contextual information such as the call site path in which a value was modified. Contextual information is important to understand how the floating-point errors propagate in complex codes. VeriTracer is implemented as an LLVM compiler tool on top of Verificarlo. We demonstrate how VeriTracer can detect accuracy loss and quantify the impact of using a compensated algorithm on ABINIT, an industrial HPC application for Ab Initio quantum computation.
VeriTracer自动检测代码并随时间跟踪浮点变量的准确性。VeriTracer使用上下文信息(如修改值的调用站点路径)丰富了可视化跟踪。上下文信息对于理解浮点错误如何在复杂代码中传播非常重要。VeriTracer是在Verificarlo之上实现的LLVM编译器工具。我们演示了VeriTracer如何检测精度损失并量化使用补偿算法对ABINIT(用于Ab Initio量子计算的工业高性能计算应用程序)的影响。
{"title":"VeriTracer: Context-enriched tracer for floating-point arithmetic analysis","authors":"Yohan Chatelain, P. D. O. Castro, E. Petit, D. Defour, J. Bieder, M. Torrent","doi":"10.1109/ARITH.2018.8464687","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464687","url":null,"abstract":"VeriTracer automatically instruments a code and traces the accuracy of floating-point variables over time. VeriTracer enriches the visual traces with contextual information such as the call site path in which a value was modified. Contextual information is important to understand how the floating-point errors propagate in complex codes. VeriTracer is implemented as an LLVM compiler tool on top of Verificarlo. We demonstrate how VeriTracer can detect accuracy loss and quantify the impact of using a compensated algorithm on ABINIT, an industrial HPC application for Ab Initio quantum computation.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"1 1","pages":"61-68"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89375412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
High Density and Performance Multiplication for FPGA FPGA的高密度和高性能乘法
Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464695
M. Langhammer, Gregg Baeckler
Arithmetic based applications are one of the most common use cases for modern FPGAs. Currently, machine learning is emerging as the fastest growth area for FPG As, renewing an interest in low precision multiplication. There is now a new focus on multiplication in the soft fabric - very high-density systems, consisting of many thousands of operations, are the current norm. In this paper we introduce multiplier regularization, which restructures common multiplier algorithms into smaller, and more efficient architectures. The multiplier structure is parameterizable, and results are given for a continuous range of input sizes, although the algorithm is most efficient for small input precisions. The multiplier is particularly effective for typical machine learning inferencing uses, and the presented cores can be used for dot products required for these applications. Although the examples presented here are optimized for Intel Stratix 10 devices, the concept of regularized arithmetic structures are applicable to generic FPGA LUT architectures. Results are compared to Intel Megafunction IP as well as contrasted with normalized representations of recently published results for Xilinx devices. We report a 10% to 35% smaller area, and a more significant latency reduction, in the range of 25% to 50%, for typical inferencing use cases.
基于算术的应用是现代fpga最常见的用例之一。目前,机器学习正在成为FPG as增长最快的领域,重新引起了人们对低精度乘法的兴趣。现在有一个新的重点是在软结构中进行乘法运算——由成千上万个操作组成的非常高密度的系统是当前的标准。在本文中,我们引入了乘法器正则化,它将常见的乘法器算法重构成更小、更高效的结构。该乘法器结构是可参数化的,并且对于连续的输入大小范围给出了结果,尽管该算法对于较小的输入精度是最有效的。乘法器对于典型的机器学习推理应用特别有效,并且所呈现的核心可用于这些应用所需的点积。尽管这里给出的示例针对Intel Stratix 10设备进行了优化,但正则化算法结构的概念适用于通用的FPGA LUT体系结构。将结果与Intel megfunction IP进行比较,并与最近发布的Xilinx设备结果的规范化表示进行对比。我们报告说,对于典型的推理用例,面积减少了10%到35%,延迟减少了25%到50%。
{"title":"High Density and Performance Multiplication for FPGA","authors":"M. Langhammer, Gregg Baeckler","doi":"10.1109/ARITH.2018.8464695","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464695","url":null,"abstract":"Arithmetic based applications are one of the most common use cases for modern FPGAs. Currently, machine learning is emerging as the fastest growth area for FPG As, renewing an interest in low precision multiplication. There is now a new focus on multiplication in the soft fabric - very high-density systems, consisting of many thousands of operations, are the current norm. In this paper we introduce multiplier regularization, which restructures common multiplier algorithms into smaller, and more efficient architectures. The multiplier structure is parameterizable, and results are given for a continuous range of input sizes, although the algorithm is most efficient for small input precisions. The multiplier is particularly effective for typical machine learning inferencing uses, and the presented cores can be used for dot products required for these applications. Although the examples presented here are optimized for Intel Stratix 10 devices, the concept of regularized arithmetic structures are applicable to generic FPGA LUT architectures. Results are compared to Intel Megafunction IP as well as contrasted with normalized representations of recently published results for Xilinx devices. We report a 10% to 35% smaller area, and a more significant latency reduction, in the range of 25% to 50%, for typical inferencing use cases.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"421 1","pages":"5-12"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72713518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Flexpoint: Predictive Numerics for Deep Learning Flexpoint:深度学习的预测数字
Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464801
Valentina Popescu, M. Nassar, Xin Wang, E. Tumer, T. Webb
Deep learning has been undergoing rapid growth in recent years thanks to its state-of-the-art performance across a wide range of real-world applications. Traditionally neural networks were trained in IEEE-754 binary64 or binary32 format, a common practice in general scientific computing. However, the unique computational requirements of deep neural network training workloads allow for much more efficient and inexpensive alternatives, unleashing a new wave of numerical innovations powering specialized computing hardware. We previously presented Flexpoint, a blocked fixed-point data type combined with a novel predictive exponent management algorithm designed to support training of deep networks without modifications, aiming at a seamless replacement of the binary32 widely in practice today. We showed that Flexpoint with 16-bit mantissa and 5-bit shared exponent (flex16+S) achieved numerical parity to binary32 in training a number of convolutional neural networks. In the current paper we review the continuing trend of predictive numerics enhancing deep neural network training in specialized computing devices such as the Intel®N ervana ™ Neural Network Processor.
近年来,由于深度学习在广泛的现实应用中具有最先进的性能,它一直在快速增长。传统上,神经网络以IEEE-754 binary64或binary32格式进行训练,这是一般科学计算中的常见做法。然而,深度神经网络训练工作负载的独特计算需求允许更高效、更廉价的替代方案,释放出一波新的数字创新,为专门的计算硬件提供动力。我们之前提出了Flexpoint,这是一种阻塞的定点数据类型,结合了一种新的预测指数管理算法,旨在支持深度网络的训练而无需修改,旨在无缝替代目前广泛应用的binary32。我们证明了具有16位尾数和5位共享指数(flex16+S)的Flexpoint在训练一些卷积神经网络时实现了与binary32的数值奇偶性。在本文中,我们回顾了预测数值在专业计算设备(如Intel®N ervana™神经网络处理器)中增强深度神经网络训练的持续趋势。
{"title":"Flexpoint: Predictive Numerics for Deep Learning","authors":"Valentina Popescu, M. Nassar, Xin Wang, E. Tumer, T. Webb","doi":"10.1109/ARITH.2018.8464801","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464801","url":null,"abstract":"Deep learning has been undergoing rapid growth in recent years thanks to its state-of-the-art performance across a wide range of real-world applications. Traditionally neural networks were trained in IEEE-754 binary64 or binary32 format, a common practice in general scientific computing. However, the unique computational requirements of deep neural network training workloads allow for much more efficient and inexpensive alternatives, unleashing a new wave of numerical innovations powering specialized computing hardware. We previously presented Flexpoint, a blocked fixed-point data type combined with a novel predictive exponent management algorithm designed to support training of deep networks without modifications, aiming at a seamless replacement of the binary32 widely in practice today. We showed that Flexpoint with 16-bit mantissa and 5-bit shared exponent (flex16+S) achieved numerical parity to binary32 in training a number of convolutional neural networks. In the current paper we review the continuing trend of predictive numerics enhancing deep neural network training in specialized computing devices such as the Intel®N ervana ™ Neural Network Processor.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"118 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73082788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Combining Restoring Array and Logarithmic Dividers into an Approximate Hybrid Design 将恢复阵列和对数分频器组合成近似混合设计
Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464807
Weiqiang Liu, Jing Li, Tao Xu, Chenghua Wang, P. Montuschi, F. Lombardi
This paper proposes a new design of an approximate hybrid divider (AXHD), which combines the restoring array and the logarithmic dividers to achieve an excellent tradeoff between accuracy and hardware performance. Exact restoring divider cells (EXDCrs) are used to generate the MSBs of the quotient for attaining a high accuracy; the other quotient digits are processed by a logarithmic divider as inexact scheme to improve figures of merit such as power consumption, area and delay. The proposed AXHD is evaluated and analyzed using error and hardware metrics. The proposed design is also compared with the exact restoring divider (EXDr) and previous approximate restoring dividers (AXDrs). The results show that the proposed design achieves very good performance in terms of accuracy and hardware; case studies for image processing also show the validity of the proposed designs.
本文提出了一种近似混合分频器(AXHD)的新设计,它将恢复阵列和对数分频器结合起来,在精度和硬件性能之间取得了很好的平衡。精确恢复分裂细胞(EXDCrs)用于生成商的msb,以获得高精度;其他商位采用对数除法作为非精确方案进行处理,以改善功耗、面积和延迟等优点。利用误差和硬件指标对所提出的AXHD进行了评估和分析。并与精确恢复分频器(EXDr)和近似恢复分频器(axdr)进行了比较。结果表明,所提出的设计在精度和硬件方面都达到了很好的性能;图像处理的实例研究也表明了所提设计的有效性。
{"title":"Combining Restoring Array and Logarithmic Dividers into an Approximate Hybrid Design","authors":"Weiqiang Liu, Jing Li, Tao Xu, Chenghua Wang, P. Montuschi, F. Lombardi","doi":"10.1109/ARITH.2018.8464807","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464807","url":null,"abstract":"This paper proposes a new design of an approximate hybrid divider (AXHD), which combines the restoring array and the logarithmic dividers to achieve an excellent tradeoff between accuracy and hardware performance. Exact restoring divider cells (EXDCrs) are used to generate the MSBs of the quotient for attaining a high accuracy; the other quotient digits are processed by a logarithmic divider as inexact scheme to improve figures of merit such as power consumption, area and delay. The proposed AXHD is evaluated and analyzed using error and hardware metrics. The proposed design is also compared with the exact restoring divider (EXDr) and previous approximate restoring dividers (AXDrs). The results show that the proposed design achieves very good performance in terms of accuracy and hardware; case studies for image processing also show the validity of the proposed designs.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"24 1","pages":"92-98"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82612612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Approximate Fixed-Point Elementary Function Accelerator for the SpiNNaker-2 Neuromorphic Chip SpiNNaker-2神经形态芯片的近似定点初等函数加速器
Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464785
M. Mikaitis, D. Lester, D. Shang, S. Furber, Gengting Liu, J. Garside, Stefan Scholze, S. Höppner, Andreas Dixius
Neuromorphic chips are used to model biologically inspired Spiking-Neural-Networks(SNNs) where most models are based on differential equations. Equations for most SNN algorithms usually contain variables with one or more $e^{x}$ components. SpiNNaker is a digital neuromorphic chip that has so far been using pre-calculated look-up tables for exponential function. However this approach is limited because the memory requirements grow as more complex neural models are developed. To save already limited memory resources in the next generation SpiNNaker chip, we are including a fast exponential function in the silicon. In this paper we analyse iterative algorithms for elementary functions and show how to build a single hardware accelerator for exp and natural log, for a neuromorphic chip prototype, to be manufactured in a 22 nm FDSOI process. We present the accelerator that has algorithmic level approximation control, allowing it to trade precision for latency and energy efficiency. As an addition to neuromorphic chip application, we provide analysis of a parameterized elementary function unit that can be tailored for other systems with different power, area, accuracy and latency constraints.
神经形态芯片用于模拟受生物学启发的脉冲神经网络(snn),其中大多数模型都基于微分方程。大多数SNN算法的方程通常包含一个或多个$e^{x}$分量的变量。SpiNNaker是一种数字神经形态芯片,迄今为止一直使用预先计算的指数函数查找表。然而,这种方法是有限的,因为随着更复杂的神经模型的发展,记忆需求也在增长。为了在下一代SpiNNaker芯片中节省已经有限的内存资源,我们在硅中加入了一个快速指数函数。在本文中,我们分析了初等函数的迭代算法,并展示了如何构建一个用于exp和自然对数的单一硬件加速器,用于神经形态芯片原型,将在22 nm FDSOI工艺中制造。我们提出了具有算法级近似控制的加速器,允许它以精度换取延迟和能量效率。作为神经形态芯片应用的补充,我们提供了一个参数化的基本功能单元的分析,可以为具有不同功率,面积,精度和延迟限制的其他系统量身定制。
{"title":"Approximate Fixed-Point Elementary Function Accelerator for the SpiNNaker-2 Neuromorphic Chip","authors":"M. Mikaitis, D. Lester, D. Shang, S. Furber, Gengting Liu, J. Garside, Stefan Scholze, S. Höppner, Andreas Dixius","doi":"10.1109/ARITH.2018.8464785","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464785","url":null,"abstract":"Neuromorphic chips are used to model biologically inspired Spiking-Neural-Networks(SNNs) where most models are based on differential equations. Equations for most SNN algorithms usually contain variables with one or more $e^{x}$ components. SpiNNaker is a digital neuromorphic chip that has so far been using pre-calculated look-up tables for exponential function. However this approach is limited because the memory requirements grow as more complex neural models are developed. To save already limited memory resources in the next generation SpiNNaker chip, we are including a fast exponential function in the silicon. In this paper we analyse iterative algorithms for elementary functions and show how to build a single hardware accelerator for exp and natural log, for a neuromorphic chip prototype, to be manufactured in a 22 nm FDSOI process. We present the accelerator that has algorithmic level approximation control, allowing it to trade precision for latency and energy efficiency. As an addition to neuromorphic chip application, we provide analysis of a parameterized elementary function unit that can be tailored for other systems with different power, area, accuracy and latency constraints.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"16 1","pages":"37-44"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81908145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
New Area Record for the AES Combined S-Box/Inverse S-Box AES组合s盒/逆s盒的新区域记录
Pub Date : 2018-06-01 DOI: 10.1109/ARITH.2018.8464780
A. Reyhani-Masoleh, Mostafa M. I. Taha, Doaa Ashmawy
The AES combined S-box/inverse S-box is a single construction that is shared between the encryption and decryption data paths of the AES. The currently most compact implementation of the AES combined S-box/inverse S-box is Canright's design, introduced back in 2005. Since then, the research community has introduced several optimizations over the S-box only, however the combined S-boxlinverse S-box received little attention. In this paper, we propose a new AES combined S-boxlinverse S-box design that is both smaller and faster than Canright's design. We achieve this goal by proposing to use new tower field and optimizing each and every block inside the combined architecture for this field. Our complexity analysis and ASIC implementation results in the CMOS STM 65nm and NanGate 15nm technologies show that our design outperforms the counterparts in terms of area and speed.
AES组合S-box/逆S-box是在AES的加密和解密数据路径之间共享的单一结构。目前最紧凑的AES组合S-box/逆S-box的实现是Canright的设计,早在2005年就推出了。从那以后,研究界只在S-box上引入了几种优化,但是组合的S-boxlinverse S-box却很少受到关注。在本文中,我们提出了一种新的AES组合S-box - linverse S-box设计,它比Canright的设计更小、更快。为了实现这一目标,我们建议使用新的塔楼场地,并针对该场地优化组合建筑中的每个块。我们在CMOS STM 65nm和NanGate 15nm技术上的复杂性分析和ASIC实现结果表明,我们的设计在面积和速度方面优于同行。
{"title":"New Area Record for the AES Combined S-Box/Inverse S-Box","authors":"A. Reyhani-Masoleh, Mostafa M. I. Taha, Doaa Ashmawy","doi":"10.1109/ARITH.2018.8464780","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464780","url":null,"abstract":"The AES combined S-box/inverse S-box is a single construction that is shared between the encryption and decryption data paths of the AES. The currently most compact implementation of the AES combined S-box/inverse S-box is Canright's design, introduced back in 2005. Since then, the research community has introduced several optimizations over the S-box only, however the combined S-boxlinverse S-box received little attention. In this paper, we propose a new AES combined S-boxlinverse S-box design that is both smaller and faster than Canright's design. We achieve this goal by proposing to use new tower field and optimizing each and every block inside the combined architecture for this field. Our complexity analysis and ASIC implementation results in the CMOS STM 65nm and NanGate 15nm technologies show that our design outperforms the counterparts in terms of area and speed.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"30 1","pages":"145-152"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77632722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1