2011 IEEE 20th Symposium on Computer Arithmetic最新文献

英文中文

Bit-Sliced Binary Normal Basis Multiplication 位切片二进制正基乘法

2011 IEEE 20th Symposium on Computer Arithmetic

Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.36

B. Brumley, D. Page

The performance of many cryptographic primitives is reliant on efficient algorithms and implementation techniques for arithmetic in binary fields. While dedicated hardware support for said arithmetic is an emerging trend, the study of software-only implementation techniques remains important for legacy or non-equipped processors. One such technique is that of software-based bit-slicing. In the context of binary fields, this is an interesting option since there is extensive previous work on bit-oriented designs for arithmetic in hardware, such designs are intuitively well suited to bit-slicing in software. In this paper we harness previous work, using it to investigate bit-sliced, software-only implementation arithmetic for binary fields, over a range of practical field sizes and using a normal basis representation. We apply our results to demonstrate significant performance improvements for a stream cipher, and over the frequently employed Ning-Yin approach to normal basis implementation in software.

许多密码原语的性能依赖于有效的算法和二进制字段的算术实现技术。虽然对上述算法的专用硬件支持是一种新兴趋势，但研究纯软件实现技术对于遗留或未配备的处理器仍然很重要。其中一种技术就是基于软件的位切片技术。在二进制字段的上下文中，这是一个有趣的选择，因为之前有大量关于硬件中算法的面向位设计的工作，这种设计直观地非常适合于软件中的位切片。在本文中，我们利用以前的工作，使用它来研究二进制字段的位切片，仅软件实现算法，在一系列实际字段大小和使用正常基表示。我们将我们的结果应用于流密码的显着性能改进，并将经常使用的宁阴方法应用于软件中的正常基实现。

引用次数: 2

A General Approach for Improving RNS Montgomery Exponentiation Using Pre-processing 一种利用预处理改进RNS Montgomery幂的通用方法

2011 IEEE 20th Symposium on Computer Arithmetic

Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.35

F. Gandino, F. Lamberti, P. Montuschi, J. Bajard

The hardware implementation of modular exponentiation for very large integers is a well-known topic in digital arithmetic. An effective approach for obtaining parallel and carry-free implementations consists in using the Montgomery exponentiation algorithm and executing the necessary operations in RNS. Two efficient methods for performing the RNS Montgomery exponentiation have been proposed by Kawamura et al. and by Bajard and Imbert. The above approaches mainly differ in the algorithm used for implementing the base extension. This paper presents a modified RNS Montgomery exponentiation algorithm, where several multiplications are moved outside the main execution loop and replaced by an effective pre-processing stage producing a significant saving on the overall delay with respect to state-of-the-art approaches. Since the proposed modification should be applied to both of the above algorithms, two versions are specifically discussed.

大整数模幂运算的硬件实现是数字算法中一个众所周知的课题。在RNS中使用Montgomery指数算法并执行必要的操作是实现并行和无携带实现的有效方法。Kawamura等人以及Bajard和Imbert提出了两种执行RNS Montgomery幂的有效方法。上述方法的主要区别在于实现基扩展所使用的算法。本文提出了一种改进的RNS Montgomery幂运算算法，其中几个乘法被移出主执行循环，并被一个有效的预处理阶段所取代，与最先进的方法相比，这大大节省了总体延迟。由于所提出的修改应适用于上述两种算法，因此具体讨论了两种版本。

引用次数: 26

Fused Multiply-Add Microarchitecture Comprising Separate Early-Normalizing Multiply and Add Pipelines 融合乘加微架构，包括独立的早期归一化乘加管道

2011 IEEE 20th Symposium on Computer Arithmetic

Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.25

D. Lutz

We present an IEEE 754-2008 and ARM compliant floating-point micro architecture that preserves the higher performance of separate multiply and add units while decreasing the effective latency of fused multiply-adds (FMAs). The multiplier supports subnormals in a novel and faster manner, shifting the partial products so that injection rounding can be used. The early-normalizing adder retains the low latency of a split path near/far adder, but does so in a unified path with less area. The adder also allows rounding on effective subtractions involving one input that is twice the normal width, a necessary feature for handling FMAs. The resulting floating-point unit has about twice the (IPC) performance of the best previous ARM design, and can be clocked at a higher speed despite the wider paths required by FMAs.

我们提出了一种符合IEEE 754-2008和ARM标准的浮点微架构，该架构既保留了独立乘法和加法单元的更高性能，又降低了融合乘法和加法(fma)的有效延迟。乘法器以一种新颖的、更快的方式支持次法线，移动部分乘积，从而可以使用注入舍入。早期归一化加法器保留了分割路径近/远加法器的低延迟，但在面积较小的统一路径上实现了这一点。加法器还允许对有效减法进行舍入，其中一个输入是正常宽度的两倍，这是处理fma的必要功能。由此产生的浮点单元的IPC性能大约是以前最好的ARM设计的两倍，尽管fma需要更宽的路径，但可以以更高的速度进行时钟处理。

引用次数: 23

Tight Certification Techniques for Digit-by-Rounding Algorithms with Application to a New 1/sqrt(x) Design 四舍五入数位算法的严格认证技术及其在1/sqrt(x)新设计中的应用

2011 IEEE 20th Symposium on Computer Arithmetic

Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.29

P. T. P. Tang, J. A. Butts, R. Dror, D. Shaw

Digit-by-rounding algorithms enable efficient hardware implementations of algebraic functions such as the reciprocal, square root, or reciprocal square root, but certifying the correctness of such algorithms is a nontrivial endeavor. Traditionally, sufficient conditions for correctness are derived as closed-form formulae relating key design parameters. These sufficient conditions, however, often prove stricter than necessary, excluding correct and efficient designs. In this paper, we present a rigorous, computer-aided method for correctness certification that better approximates the necessary conditions, lowering the risk of rejecting correct designs. We also present two specific applications of this method. First, when applied to a conventional digit-by-rounding reciprocal square root design, our method enabled a fourfold reduction in lookup table size relative to the minimum dictated by a standard sufficient condition. Second, our method certified the correctness of a novel reciprocal square root design that we developed to parallelize two computational steps whose sequential execution lies on the critical path of conventional designs. The difficulty in deriving closed-form sufficient conditions to ascertain this design's correctness provided the original motivation for development of the new certification method.

四舍五入算法能够有效地在硬件上实现代数函数，比如倒数、平方根或倒数的平方根，但是证明这些算法的正确性是一项艰巨的任务。传统上，正确性的充分条件推导为与关键设计参数相关的封闭形式公式。然而，这些充分条件往往被证明比必要条件更严格，排除了正确和有效的设计。在本文中，我们提出了一种严格的，计算机辅助的正确性认证方法，可以更好地接近必要的条件，降低拒绝正确设计的风险。我们还介绍了这种方法的两个具体应用。首先，当应用于传统的四舍五入倒数平方根设计时，我们的方法使查找表大小相对于标准充分条件规定的最小值减少了四倍。其次，我们的方法证明了一种新的倒数平方根设计的正确性，我们开发了并行化两个计算步骤，其顺序执行位于传统设计的关键路径上。推导确定该设计正确性的封闭形式充分条件的困难为开发新的认证方法提供了原始动机。

{"title":"Tight Certification Techniques for Digit-by-Rounding Algorithms with Application to a New 1/sqrt(x) Design","authors":"P. T. P. Tang, J. A. Butts, R. Dror, D. Shaw","doi":"10.1109/ARITH.2011.29","DOIUrl":"https://doi.org/10.1109/ARITH.2011.29","url":null,"abstract":"Digit-by-rounding algorithms enable efficient hardware implementations of algebraic functions such as the reciprocal, square root, or reciprocal square root, but certifying the correctness of such algorithms is a nontrivial endeavor. Traditionally, sufficient conditions for correctness are derived as closed-form formulae relating key design parameters. These sufficient conditions, however, often prove stricter than necessary, excluding correct and efficient designs. In this paper, we present a rigorous, computer-aided method for correctness certification that better approximates the necessary conditions, lowering the risk of rejecting correct designs. We also present two specific applications of this method. First, when applied to a conventional digit-by-rounding reciprocal square root design, our method enabled a fourfold reduction in lookup table size relative to the minimum dictated by a standard sufficient condition. Second, our method certified the correctness of a novel reciprocal square root design that we developed to parallelize two computational steps whose sequential execution lies on the critical path of conventional designs. The difficulty in deriving closed-form sufficient conditions to ascertain this design's correctness provided the original motivation for development of the new certification method.","PeriodicalId":272151,"journal":{"name":"2011 IEEE 20th Symposium on Computer Arithmetic","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127817948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Family of High Radix Signed Digit Adders 一类高基数带符号加法器

2011 IEEE 20th Symposium on Computer Arithmetic

Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.24

S. Gorgin, G. Jaberipur

Signed digit (SD) number systems allow for high performance carry-free adders. Maximally redundant SD (MRSD) alternatives provide maximal encoding efficiency among Radix-2^h SD number systems, whereby value of h tunes the area-time trade-off. Straightforward implementation of the conventional carry-free addition algorithm requires three O(log h) addition-like operations in sequence. However, there are several MRSD implementations with only one such operation. Some of them are delay optimized, but suffer from extensive hardware redundancy, while some other equally fast adders show less power/area consumption. A careful study of the latter cases hints on variety of improvement options, based on which and a new transfer computation technique, we develop a family of faster MRSD adders that consume less power/area than all the previous relevant works. They also fit efficiently within the redundant digit floating point addition scheme. However, similar to their relevant ancestor designs, suffer from an inherent property of MRSD adders, i.e., difficulty of handling hidden leading zero-digits. To remedy this problem, we use less redundant SD representations, where our transfer extraction method applies efficiently and leads to far less complex leading zero-digit detection. All the presented designs are supported by exhaustive correctness tests and performance evaluation via 0.13 micrometer CMOS technology synthesis.

有符号数字(SD)数字系统允许高性能免携带加法器。最大冗余SD (MRSD)替代方案在基数-2^h SD数字系统中提供最大的编码效率，其中h的值调整了面积-时间权衡。传统的无进位加法算法的直接实现需要三个O(log h)类似加法的操作顺序。然而，有几个MRSD实现只有一个这样的操作。其中一些是延迟优化，但遭受广泛的硬件冗余，而其他一些同样快速的加法器显示更少的功率/面积消耗。对后一种情况的仔细研究提示了各种改进方案，并在此基础上采用新的传递计算技术，我们开发了一系列更快的MRSD加法器，其功耗/面积比以往所有相关工作都要小。它们还可以有效地适应冗余数字浮点加法方案。然而，与它们相关的祖先设计相似，MRSD加法器有一个固有的特性，即难以处理隐藏的前导零位数。为了解决这个问题，我们使用了较少冗余的SD表示，其中我们的传输提取方法有效地应用，并导致远不复杂的前导零位数检测。所有设计都通过0.13微米CMOS技术合成进行了详尽的正确性测试和性能评估。

{"title":"A Family of High Radix Signed Digit Adders","authors":"S. Gorgin, G. Jaberipur","doi":"10.1109/ARITH.2011.24","DOIUrl":"https://doi.org/10.1109/ARITH.2011.24","url":null,"abstract":"Signed digit (SD) number systems allow for high performance carry-free adders. Maximally redundant SD (MRSD) alternatives provide maximal encoding efficiency among Radix-2^h SD number systems, whereby value of h tunes the area-time trade-off. Straightforward implementation of the conventional carry-free addition algorithm requires three O(log h) addition-like operations in sequence. However, there are several MRSD implementations with only one such operation. Some of them are delay optimized, but suffer from extensive hardware redundancy, while some other equally fast adders show less power/area consumption. A careful study of the latter cases hints on variety of improvement options, based on which and a new transfer computation technique, we develop a family of faster MRSD adders that consume less power/area than all the previous relevant works. They also fit efficiently within the redundant digit floating point addition scheme. However, similar to their relevant ancestor designs, suffer from an inherent property of MRSD adders, i.e., difficulty of handling hidden leading zero-digits. To remedy this problem, we use less redundant SD representations, where our transfer extraction method applies efficiently and leads to far less complex leading zero-digit detection. All the presented designs are supported by exhaustive correctness tests and performance evaluation via 0.13 micrometer CMOS technology synthesis.","PeriodicalId":272151,"journal":{"name":"2011 IEEE 20th Symposium on Computer Arithmetic","volume":"1125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131425695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

High Intelligence Computing: The New Era of High Performance Computing 高智能计算:高性能计算的新时代

2011 IEEE 20th Symposium on Computer Arithmetic

Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.42

Ralf Fischer

This paper discusses about High Performance Computing including the introduction of the fused multiply-add dataflow, and innovations in vector computing and multi processing. This has led to a new era in high performance that has created human intelligence in computers.

本文讨论了高性能计算，包括融合乘加数据流的介绍，矢量计算和多处理方面的创新。这导致了一个高性能的新时代，在计算机中创造了人类智能。

引用次数: 0

Augmented Precision Square Roots and 2-D Norms, and Discussion on Correctly Rounding sqrt(x^2+y^2) 增广精度平方根与二维范数，以及对平方根(x^2+y^2)正确舍入的讨论

2011 IEEE 20th Symposium on Computer Arithmetic

Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.13

N. Brisebarre, Mioara Joldes, Peter Kornerup, Érik Martin-Dorel, J. Muller

Define an "augmented precision" algorithm as an algorithm that returns, in precision-p floating-point arithmetic, its result as the unevaluated sum of two floating-point numbers, with a relative error of the order of 2^(-2p). Assuming an FMA instruction is available, we perform a tight error analysis of an augmented precision algorithm for the square root, and introduce two slightly different augmented precision algorithms for the 2D-norm sqrt(x^2+y^2). Then we give tight lower bounds on the minimum distance (in ulps) between sqrt(x^2+y^2) and a midpoint when sqrt(x^2+y^2) is not itself a midpoint. This allows us to determine cases when our algorithms make it possible to return correctly-rounded 2D-norms.

将“增广精度”算法定义为这样一种算法:在精度为p的浮点运算中，其结果返回为两个浮点数的未求值和，相对误差为2^(-2p)。假设FMA指令可用，我们对平方根的增广精度算法进行了严格的误差分析，并为2d范数根号(x^2+y^2)引入了两种略有不同的增广精度算法。然后，当根号(x²+y²)本身不是中点时，我们给出根号(x²+y²)与中点之间的最小距离(以ulps为单位)的严格下界。这使我们能够确定当我们的算法能够返回正确舍入的2d规范时的情况。

引用次数: 1

ROM-less LNS ROM-less LNS

2011 IEEE 20th Symposium on Computer Arithmetic

Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.15

Rizalafande Che Ismail, J. N. Coleman

The logarithmic number system has been proposed as an alternative to floating-point arithmetic. Multiplication, division and square-root operations are accomplished with fixed-point methods, but addition and subtraction are considerably more challenging. Recent work has demonstrated that these operations too can be done with similar speed and accuracy to their FP equivalents, but the necessary circuitry is complex. In particular, it is dominated by the need for large ROM tables for the storage of non-linear functions. This paper describes two algorithms, a new co-transformation procedure and an improvement to an existing interpolation method, that reduce these tables to an extent that allows their easy synthesis in logic. An implementation shows substantial reductions in area and delay from the previous best 32-bit realisation, with equivalent accuracy.

对数数制已被提出作为浮点运算的替代方法。乘法、除法和平方根运算都是用不动点法完成的，但加法和减法要困难得多。最近的研究表明，这些操作也可以以类似FP的速度和精度完成，但必要的电路是复杂的。特别是，它主要是由于需要大型ROM表来存储非线性函数。本文描述了两种算法，一种新的共变换过程和对现有插值方法的改进，将这些表简化到易于逻辑综合的程度。一种实现显示，与之前最好的32位实现相比，在同等精度下，面积和延迟大幅减少。

引用次数: 34

Fast Ripple-Carry Adders in Standard-Cell CMOS VLSI 标准单元CMOS VLSI中的快速纹波进位加法器

2011 IEEE 20th Symposium on Computer Arithmetic

Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.23

N. Burgess

This paper presents a number of new high-radix ripple-carry adder designs based on Ling's addition technique and a recently-published expansion thereof. The proposed adders all have one inverting CMOS cell per stage along the carry-in to carry-out critical path and, at 16-b word lengths, the fastest of them matches the speed of a 16-b prefix adder for only 63% of the area. These adders will be of use in VLSI circuits implementing modern wireless DSP algorithms and in Floating-Point Unit exponent logic, both of which typically use short word length arithmetic.

本文介绍了基于Ling的加法技术的一些新的高基数纹波进位加法器设计以及最近发表的对其的扩展。所提出的加法器在携带到执行关键路径上每级都有一个反相CMOS单元，并且在16b字长的情况下，它们中最快的加法器的速度与16b前缀加法器的速度相匹配，只有63%的面积。这些加法器将用于实现现代无线DSP算法的VLSI电路和浮点单元指数逻辑，这两者通常使用短字长度算法。

引用次数: 16

How to Square Floats Accurately and Efficiently on the ST231 Integer Processor 如何在ST231整数处理器上准确有效地平方浮点数

2011 IEEE 20th Symposium on Computer Arithmetic

Pub Date : 2011-07-25 DOI: 10.1109/ARITH.2011.19

C. Jeannerod, Jingyan Jourdan-Lu, Christophe Monat, G. Revy

We consider the problem of computing IEEE floating-point squares by means of integer arithmetic. We show how to exploit the specific properties of squaring in order to design and implement algorithms that have much lower latency than those for general multiplication, while still guaranteeing correct rounding. Our algorithms are parameterized by the floating-point format, aim at high instruction-level parallelism (ILP) exposure, and cover all rounding modes. We show further that their C implementation for the binary32 format yields efficient codes for targets like the ST231 VLIW integer processor from ST Microelectronics, with a latency at least 1.75x smaller than that of general multiplication in the same context.

考虑了用整数算法计算IEEE浮点数平方的问题。我们展示了如何利用平方的特定属性来设计和实现比一般乘法延迟低得多的算法，同时仍然保证正确的舍入。我们的算法由浮点格式参数化，目标是高指令级并行性(ILP)曝光，并涵盖所有舍入模式。我们进一步表明，他们对binary32格式的C实现为ST微电子的ST231 VLIW整数处理器等目标产生了有效的代码，其延迟至少比相同上下文中的一般乘法小1.75倍。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2011 IEEE 20th Symposium on Computer Arithmetic

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀