首页 > 最新文献

[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic最新文献

英文 中文
The redundant cell adder 冗余单元加法器
Pub Date : 1991-06-26 DOI: 10.1109/ARITH.1991.145553
Thomas W. Lynch, E. Swartzlander
The design of the 56-b significand adder for the Advanced Micro Devices, Am29050 microprocessor, is described. This is a 1- mu m design rule CMOS realization of a high-performance RISC (reduced instruction set computer) microprocessor that implements IEEE Standard 754 floating-point arithmetic. To achieve an add time of under 4 ns for the 56-b significand and to avoid multistage pipelines which significantly impair compiler efficiency, a redundant cell adder has been developed. This redundant cell adder design combines carry lookahead adders realized with Manchester carry chains and the carry select adder concept to achieve approximately twice the speed of the traditional carry lookahead adder. This adder achieves a 3.2-ns measured add time for 56-bit operands and is of reasonable size.<>
介绍了Am29050微处理器56-b有效加法器的设计。这是一个1 μ m设计规则CMOS实现的高性能RISC(精简指令集计算机)微处理器,实现IEEE标准754浮点运算。为了使56-b显式的添加时间小于4 ns,并避免多级管道严重影响编译器的效率,开发了冗余单元加法器。该冗余单元加法器设计结合了曼彻斯特进位链实现的进位前瞻加法器和进位选择加法器的概念,实现了传统进位前瞻加法器速度的两倍左右。该加法器对56位操作数的测量添加时间为3.2 ns,并且大小合理。
{"title":"The redundant cell adder","authors":"Thomas W. Lynch, E. Swartzlander","doi":"10.1109/ARITH.1991.145553","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145553","url":null,"abstract":"The design of the 56-b significand adder for the Advanced Micro Devices, Am29050 microprocessor, is described. This is a 1- mu m design rule CMOS realization of a high-performance RISC (reduced instruction set computer) microprocessor that implements IEEE Standard 754 floating-point arithmetic. To achieve an add time of under 4 ns for the 56-b significand and to avoid multistage pipelines which significantly impair compiler efficiency, a redundant cell adder has been developed. This redundant cell adder design combines carry lookahead adders realized with Manchester carry chains and the carry select adder concept to achieve approximately twice the speed of the traditional carry lookahead adder. This adder achieves a 3.2-ns measured add time for 56-bit operands and is of reasonable size.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117065698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Small moduli replications in the MRRNS MRRNS中的小模复制
Pub Date : 1991-06-26 DOI: 10.1109/ARITH.1991.145539
N. Wigley, G. Jullien, Daniel Reaume, W. Miller
The authors describe mapping, scaling, and conversion processes using a new mapping strategy for the modulus replication residue number system (MRRNS). The strategy allows direct mapping of bits of either a purely real or multiplexed bit coded complex number to a set of independent rings, defined by moduli 3, 5, and 7. The MRRNS technique is superior to a large QRNS system operating with a computational dynamic range of over 27 b. A classical radix-4 implementation of a 1024 FFT is used for the comparison. The scaling and conversion procedure is shown to be a set of finite ring calculations followed by an array of ordinary binary adders. The VLSI implementation of the most complex finite ring circuit required (a Mod 7 multiplier) is shown to be easily implemented using the switching tree approach, and mask extracted simulations at 50 MHz demonstrate the embedding of the switching trees in a dynamic pipeline/evaluate circuit with restoring latch.<>
作者描述了映射,缩放和转换过程使用一个新的映射策略的模复制剩余数系统(MRRNS)。该策略允许将纯实数或复用位编码的复数的位直接映射到由模3、5和7定义的一组独立环。MRRNS技术优于大型QRNS系统,其计算动态范围超过27b。采用1024 FFT的经典基数4实现进行比较。缩放和转换过程显示为一组有限环计算,后面跟着一组普通二进制加法器。使用开关树方法可以轻松实现所需的最复杂有限环电路(Mod 7乘法器)的VLSI实现,并且在50 MHz下的掩模提取仿真证明了在具有恢复锁存器的动态管道/评估电路中嵌入开关树。
{"title":"Small moduli replications in the MRRNS","authors":"N. Wigley, G. Jullien, Daniel Reaume, W. Miller","doi":"10.1109/ARITH.1991.145539","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145539","url":null,"abstract":"The authors describe mapping, scaling, and conversion processes using a new mapping strategy for the modulus replication residue number system (MRRNS). The strategy allows direct mapping of bits of either a purely real or multiplexed bit coded complex number to a set of independent rings, defined by moduli 3, 5, and 7. The MRRNS technique is superior to a large QRNS system operating with a computational dynamic range of over 27 b. A classical radix-4 implementation of a 1024 FFT is used for the comparison. The scaling and conversion procedure is shown to be a set of finite ring calculations followed by an array of ordinary binary adders. The VLSI implementation of the most complex finite ring circuit required (a Mod 7 multiplier) is shown to be easily implemented using the switching tree approach, and mask extracted simulations at 50 MHz demonstrate the embedding of the switching trees in a dynamic pipeline/evaluate circuit with restoring latch.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127875939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SVD by constant-factor-redundant-CORDIC 用常数因子冗余cordic进行SVD
Pub Date : 1991-06-26 DOI: 10.1109/ARITH.1991.145570
Jeong-A Lee, T. Lang
A constant-factor-redundant-CORDIC (CFR-CORDIC) scheme is developed where the scale factor is forced to be constant while computing angles for SVD (singular value decomposition). Based on the scheme, a fixed-point implementation of SVD is presented with the following additional features: (1) the final scaling operation is done by shifting; (2) the number of iterations in the CORDIC rotation unit is reduced by about 25% by expressing the direction of the rotation in radix-2 and radix-4; and (3) the conventional number representation of rotated output is obtained on-the-fly, not from a carry-propagate adder. The authors compare this scheme with previously proposed ones and show that it provides an execution time similar to that of redundant CORDIC with variable scaling factor, with significant saving in area.<>
提出了一种常因子-冗余- cordic (CFR-CORDIC)方案,该方案在计算奇异值分解(SVD)的角度时,将尺度因子强制为常数。在此基础上,提出了一种奇异值分解的定点实现方法,并增加了以下特征:(1)最终缩放操作通过移位完成;(2)通过在基数2和基数4中表示旋转方向,使CORDIC旋转单元的迭代次数减少约25%;(3)旋转输出的传统数字表示是动态获得的,而不是从进位传播加法器获得的。作者将该方案与先前提出的方案进行了比较,结果表明,该方案的执行时间与可变缩放因子的冗余CORDIC相似,并且显著节省了面积
{"title":"SVD by constant-factor-redundant-CORDIC","authors":"Jeong-A Lee, T. Lang","doi":"10.1109/ARITH.1991.145570","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145570","url":null,"abstract":"A constant-factor-redundant-CORDIC (CFR-CORDIC) scheme is developed where the scale factor is forced to be constant while computing angles for SVD (singular value decomposition). Based on the scheme, a fixed-point implementation of SVD is presented with the following additional features: (1) the final scaling operation is done by shifting; (2) the number of iterations in the CORDIC rotation unit is reduced by about 25% by expressing the direction of the rotation in radix-2 and radix-4; and (3) the conventional number representation of rotated output is obtained on-the-fly, not from a carry-propagate adder. The authors compare this scheme with previously proposed ones and show that it provides an execution time similar to that of redundant CORDIC with variable scaling factor, with significant saving in area.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127398153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Implementation and analysis of extended SLI operations 扩展SLI操作的实现和分析
Pub Date : 1991-06-26 DOI: 10.1109/ARITH.1991.145547
P. Turner
Extended arithmetic operations, such as forming scalar products, in symmetric level index (SLI) arithmetic are considered. Schemes for the implementation of such algorithms are described and analyzed in terms of comparative timings for these operations and their floating-point counterparts and in terms of the control of errors in the computation. With sufficient parallelism available in the SLI processor, the computation can be as fast as for floating-point operations. The SLI operation can be modified to produce just a single rounding error from extended operations very economically. The implementation details suggest that any time-penalty associated with the use of SLI arithmetic can be kept to a very small factor on highly parallel computers, perhaps on the order of just two or three for typical scientific computing programs.<>
讨论了对称水平索引(SLI)算法中的扩展算术运算,如形成标量积。这些算法的实现方案被描述和分析在这些操作和它们的浮点对应的比较计时和在计算中的错误控制方面。在SLI处理器中有足够的并行性,计算速度可以和浮点运算一样快。可以修改SLI操作,使其从扩展操作中只产生一个舍入误差,这非常经济。实现细节表明,在高度并行的计算机上,与使用SLI算法相关的任何时间损失都可以保持在一个非常小的因素上,对于典型的科学计算程序来说,可能只有2到3个。
{"title":"Implementation and analysis of extended SLI operations","authors":"P. Turner","doi":"10.1109/ARITH.1991.145547","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145547","url":null,"abstract":"Extended arithmetic operations, such as forming scalar products, in symmetric level index (SLI) arithmetic are considered. Schemes for the implementation of such algorithms are described and analyzed in terms of comparative timings for these operations and their floating-point counterparts and in terms of the control of errors in the computation. With sufficient parallelism available in the SLI processor, the computation can be as fast as for floating-point operations. The SLI operation can be modified to produce just a single rounding error from extended operations very economically. The implementation details suggest that any time-penalty associated with the use of SLI arithmetic can be kept to a very small factor on highly parallel computers, perhaps on the order of just two or three for typical scientific computing programs.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115439551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
The CORDIC Householder algorithm CORDIC Householder算法
Pub Date : 1991-06-26 DOI: 10.1109/ARITH.1991.145569
Shen-Fu Hsiao, J. Delosme
A novel n-dimensional (n-D) CORDIC algorithm for Euclidean and pseudo-Euclidean rotations is proposed. This algorithm is closely related to Householder transformations. It is shown to converge faster than CORDIC algorithms developed earlier for n=3 and 4. Processor architectures for the algorithm are presented. The area and time performance of n-D CORDIC processors are evaluated. For a comparable time performance, the processors require significantly less area than parallel Householder processors. Furthermore, arrays of n-D Euclidean CORDIC processors are shown to speed up the QR decomposition of rectangular matrices by a factor of n-1 in comparison with a 2-D CORDIC processor array.<>
提出了一种新的n维(n-D)欧氏和伪欧氏旋转CORDIC算法。该算法与Householder变换密切相关。对于n=3和4,它比先前开发的CORDIC算法收敛得更快。给出了该算法的处理器结构。评价了n-D CORDIC处理器的面积性能和时间性能。对于可比较的时间性能,处理器比并行Householder处理器需要的面积少得多。此外,与二维CORDIC处理器阵列相比,n-D欧几里得CORDIC处理器阵列将矩形矩阵的QR分解速度提高了n-1倍。
{"title":"The CORDIC Householder algorithm","authors":"Shen-Fu Hsiao, J. Delosme","doi":"10.1109/ARITH.1991.145569","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145569","url":null,"abstract":"A novel n-dimensional (n-D) CORDIC algorithm for Euclidean and pseudo-Euclidean rotations is proposed. This algorithm is closely related to Householder transformations. It is shown to converge faster than CORDIC algorithms developed earlier for n=3 and 4. Processor architectures for the algorithm are presented. The area and time performance of n-D CORDIC processors are evaluated. For a comparable time performance, the processors require significantly less area than parallel Householder processors. Furthermore, arrays of n-D Euclidean CORDIC processors are shown to speed up the QR decomposition of rectangular matrices by a factor of n-1 in comparison with a 2-D CORDIC processor array.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126012898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Optimal purely systolic addition 最优纯收缩加法
Pub Date : 1991-06-26 DOI: 10.1109/ARITH.1991.145555
L. Kuhnel
The author introduces a purely systolic hardware algorithm for addition which is based on a mesh-connected arrangement of cells. The proposed FASTA algorithm is well suited for realization in integrated technologies. Its area, computation time, and period satisfy A(n)=O(n), T(n)=O( square root n), and P(n)=O( square root n), respectively, where n denotes the operand length. Therefore, this adder is T-, APT-, and AT/sup 2/-optimal in the linear model for signal propagation delays. In the class of Theta ( square root n) time adders it is optimal with respect to A, P, T, AT, APT, AP/sup 2/, and AT/sup 2/. The suggested algorithm essentially is a solution to the general problem of parallel prefix computation. Therefore, it can serve as a paradigm for the design of optimal purely systolic hardware algorithms in a wide range of application domains.<>
作者介绍了一种基于网格连接单元排列的纯收缩加法硬件算法。提出的FASTA算法非常适合在集成技术中实现。其面积、计算时间、周期分别满足A(n)=O(n)、T(n)=O(平方根n)、P(n)=O(平方根n),其中n为操作数长度。因此,该加法器在信号传播延迟的线性模型中是T-、APT-和AT/sup 2/-最优的。在Theta(√n)时间加法器类中,它对于A、P、T、AT、APT、AP/sup 2/和AT/sup 2/是最优的。所提出的算法本质上是解决并行前缀计算的一般问题。因此,它可以作为在广泛的应用领域中设计最佳纯收缩硬件算法的范例。
{"title":"Optimal purely systolic addition","authors":"L. Kuhnel","doi":"10.1109/ARITH.1991.145555","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145555","url":null,"abstract":"The author introduces a purely systolic hardware algorithm for addition which is based on a mesh-connected arrangement of cells. The proposed FASTA algorithm is well suited for realization in integrated technologies. Its area, computation time, and period satisfy A(n)=O(n), T(n)=O( square root n), and P(n)=O( square root n), respectively, where n denotes the operand length. Therefore, this adder is T-, APT-, and AT/sup 2/-optimal in the linear model for signal propagation delays. In the class of Theta ( square root n) time adders it is optimal with respect to A, P, T, AT, APT, AP/sup 2/, and AT/sup 2/. The suggested algorithm essentially is a solution to the general problem of parallel prefix computation. Therefore, it can serve as a paradigm for the design of optimal purely systolic hardware algorithms in a wide range of application domains.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"427 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115654719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Constant time arbitrary length synchronous binary counters 常数时间任意长度同步二进制计数器
Pub Date : 1991-06-26 DOI: 10.1109/ARITH.1991.145556
J. Vuillemin
The author introduces a synchronous binary counter which can be operated under a high clock frequency, independent of the counter's length n: all signals traverse at most two three-input logic gates during each clock phase. The proposed design is simple enough to have practical implications, as illustrated by a CMOS programmable gate array implementation which has counted up to 2/sup 40/ with a 40-MHz clock. The area required for laying out this design is no larger than that of the (much slower) carry-ripple counter.<>
作者介绍了一种同步二进制计数器,它可以在高时钟频率下工作,与计数器的长度n无关:在每个时钟阶段,所有信号最多穿过两个三输入逻辑门。所提出的设计足够简单,具有实际意义,如CMOS可编程门阵列实现所示,该实现在40 mhz时钟下计数高达2/sup 40/。布置这种设计所需的面积不大于(慢得多)携带纹波计数器的面积。
{"title":"Constant time arbitrary length synchronous binary counters","authors":"J. Vuillemin","doi":"10.1109/ARITH.1991.145556","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145556","url":null,"abstract":"The author introduces a synchronous binary counter which can be operated under a high clock frequency, independent of the counter's length n: all signals traverse at most two three-input logic gates during each clock phase. The proposed design is simple enough to have practical implications, as illustrated by a CMOS programmable gate array implementation which has counted up to 2/sup 40/ with a 40-MHz clock. The area required for laying out this design is no larger than that of the (much slower) carry-ripple counter.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"151 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133657196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Table-lookup algorithms for elementary functions and their error analysis 初等函数的表查找算法及其误差分析
Pub Date : 1991-06-26 DOI: 10.1109/ARITH.1991.145565
P. T. P. Tang
Table-lookup algorithms for calculating elementary functions offer superior speed and accuracy when compared with more traditional algorithms. It is shown that, with careful design, it is feasible to implement table-lookup algorithms in hardware. A uniform approach for carrying out a tight error analysis for such implementations is presented. The advantages of table-lookup algorithms over CORDIC and ordinary (without table-lookup) polynomial algorithms are described.<>
与更传统的算法相比,用于计算基本函数的表查找算法具有更高的速度和准确性。结果表明,通过精心设计,在硬件上实现表查找算法是可行的。提出了一种统一的方法来对这种实现进行严格的误差分析。描述了表查找算法相对于CORDIC和普通(没有表查找)多项式算法的优点。
{"title":"Table-lookup algorithms for elementary functions and their error analysis","authors":"P. T. P. Tang","doi":"10.1109/ARITH.1991.145565","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145565","url":null,"abstract":"Table-lookup algorithms for calculating elementary functions offer superior speed and accuracy when compared with more traditional algorithms. It is shown that, with careful design, it is feasible to implement table-lookup algorithms in hardware. A uniform approach for carrying out a tight error analysis for such implementations is presented. The advantages of table-lookup algorithms over CORDIC and ordinary (without table-lookup) polynomial algorithms are described.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133205887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 218
High-speed multiplier design using multi-input counter and compressor circuits 高速乘法器设计采用多输入计数器和压缩电路
Pub Date : 1991-06-26 DOI: 10.1109/ARITH.1991.145532
Mayur Mehta, Vijay Parmar, E. Swartzlander
The design of a fast multiplier implemented using either
使用任意一种实现的快速乘法器设计
{"title":"High-speed multiplier design using multi-input counter and compressor circuits","authors":"Mayur Mehta, Vijay Parmar, E. Swartzlander","doi":"10.1109/ARITH.1991.145532","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145532","url":null,"abstract":"The design of a fast multiplier implemented using either","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131200934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
A high-radix hardware algorithm for calculating the exponential M/sup E/ modulo N 一种计算指数M/sup E/模N的高基数硬件算法
Pub Date : 1991-06-26 DOI: 10.1109/ARITH.1991.145533
Holger Orup, Peter Kornerup
In a class of cryptosystems, fast computation of modulo exponentials is essential. The authors present a parallel version of a well-known exponentiation algorithm that halves the worst-case computing time. It is described how a high radix modulo multiplication can be implemented by interleaving a serial-parallel multiplication scheme with an SRT division scheme. The problems associated with high radices are efficiently solved by the use of a redundant representation of intermediate operands. It is shown how the algorithms can be realized as a highly regular VLSI circuit. Simulations indicate that a radix 32 implementation of the algorithms is capable of computing 512-b operand exponentials in 3.2 ms. This is more than five times faster than other known implementations.<>
在一类密码系统中,模指数的快速计算是必不可少的。作者提出了一个著名的指数运算算法的并行版本,该算法将最坏情况的计算时间减半。描述了如何通过将串行并行乘法方案与SRT除法方案交叉使用来实现高基数模乘法。通过使用中间操作数的冗余表示,有效地解决了与高基数相关的问题。说明了如何将这些算法实现为高度规则的VLSI电路。仿真结果表明,基于基数32的算法能够在3.2 ms内计算512-b个操作数指数。这比其他已知的实现快五倍以上。
{"title":"A high-radix hardware algorithm for calculating the exponential M/sup E/ modulo N","authors":"Holger Orup, Peter Kornerup","doi":"10.1109/ARITH.1991.145533","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145533","url":null,"abstract":"In a class of cryptosystems, fast computation of modulo exponentials is essential. The authors present a parallel version of a well-known exponentiation algorithm that halves the worst-case computing time. It is described how a high radix modulo multiplication can be implemented by interleaving a serial-parallel multiplication scheme with an SRT division scheme. The problems associated with high radices are efficiently solved by the use of a redundant representation of intermediate operands. It is shown how the algorithms can be realized as a highly regular VLSI circuit. Simulations indicate that a radix 32 implementation of the algorithms is capable of computing 512-b operand exponentials in 3.2 ms. This is more than five times faster than other known implementations.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116917494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
期刊
[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1