Proceedings of IEEE 11th Symposium on Computer Arithmetic最新文献

英文中文

Exact rounding of certain elementary functions 某些初等函数的精确舍入

Proceedings of IEEE 11th Symposium on Computer Arithmetic

Pub Date : 1993-06-29 DOI: 10.1109/ARITH.1993.378099

M. Schulte, E. Swartzlander

An algorithm is described which produces exactly rounded results for the functions of reciprocal, square root, 2/sup x/, and log 2/sup x/. Hardware designs based on this algorithm are presented for floating point numbers with 16- and 24-b significands. These designs use a polynomial approximation in which coefficients are originally selected based on the Chebyshev series approximation and are then adjusted to ensure exactly rounded results for all inputs. To reduce the number of terms in the approximation, the input interval is divided into subintervals of equal size and different coefficients are used for each subinterval. For floating point numbers with 16-b significands, the exactly rounded value of the function can be computed in 51 ns on a 20-mm/sup 2/ chip. For floating point numbers with 24-b significands, the functions can be computed in 80 ns on a 98-mm/sup 2/ chip.<>

对倒数、平方根、2/sup x/和log 2/sup x/的函数给出了一种精确舍入结果的算法。针对16位和24位浮点数，给出了基于该算法的硬件设计。这些设计使用多项式近似，其中系数最初是根据切比雪夫级数近似选择的，然后进行调整以确保所有输入的精确四舍五入结果。为了减少近似中的项数，将输入区间划分为大小相等的子区间，并对每个子区间使用不同的系数。对于具有16-b位的浮点数，在20mm /sup /芯片上可以在51 ns内计算出该函数的精确舍入值。对于具有24b有效位的浮点数，函数可以在98mm /sup /芯片上在80ns内计算。

引用次数: 48

An accurate LNS arithmetic unit using interleaved memory function interpolator 一个精确的LNS算术单元使用交错记忆功能插补器

Proceedings of IEEE 11th Symposium on Computer Arithmetic

Pub Date : 1993-06-29 DOI: 10.1109/ARITH.1993.378115

D. Lewis

A logarithmic number system (LNS) arithmetic unit using a new method for polynomial interpolation in hardware is described. The use of an interleaved memory reduces storage requirements by allowing each stored function value to be used in interpolation across several segments. This strategy always uses fewer words of memory than an optimized polynomial with stored polynomial coefficients. Many accuracy requirements for the LNS arithmetic unit are possible, but a round to nearest cannot be easily achieved. The goal suggested here is to ensure that the worst case LNS relative error is smaller than the worst case FP relative error. Using the interleaved memory interpolator, the detailed design of an LNS arithmetic unit is performed using a second-order polynomial interpolator including approximately 91K bits of ROM.<>

介绍了一种采用硬件多项式插值新方法的对数数制(LNS)算术单元。交错存储器的使用通过允许在跨几个段的插值中使用每个存储的函数值来减少存储需求。这种策略总是比存储多项式系数的优化多项式使用更少的内存字。对LNS算术单元的许多精度要求是可能的，但是舍入到最近时不容易实现。此处建议的目标是确保最坏情况下的LNS相对误差小于最坏情况下的FP相对误差。使用交错存储器插补器，LNS算术单元的详细设计使用二阶多项式插补器，包括大约91K位ROM。

引用次数: 20

Fast implementations of RSA cryptography RSA加密的快速实现

Proceedings of IEEE 11th Symposium on Computer Arithmetic

Pub Date : 1993-06-29 DOI: 10.1109/ARITH.1993.378085

M. Shand, J. Vuillemin

The authors detail and analyze the critical techniques that may be combined in the design of fast hardware for RSA cryptography: chinese remainders, star chains, Hensel's odd division (also known as Montgomery modular reduction), carry-save representation, quotient pipelining, and asynchronous carry completion adders. A fully operational PAM (programmable active memory) implementation of RSA that combines all of the techniques presented here delivers an RSA secret decryption rate over 600-kb/s for 512-b keys, and 165-kb/s for 1-kb keys. This is an order of magnitude faster than any previously reported running implementation. While the implementation makes full use of the PAM's reconfigurability, it is possible to derive from the (multiple PAM designs) implementation a (single) gate-array specification with estimated size under 100 K gates and speed over 1 Mb/s for RSA 512-b keys. Matching gains in software performance which are also analyzed.<>

作者详细分析了RSA加密快速硬件设计中可能结合的关键技术:中国余数，星型链，Hensel奇除法(也称为Montgomery模约法)，进位保存表示，商管道和异步进位完成加法器。一个完全可操作的RSA PAM(可编程活动内存)实现结合了本文介绍的所有技术，对于512-b密钥，它提供了超过600 kb/s的RSA秘密解密速率，对于1-kb密钥，它提供了超过165 kb/s的解密速率。这比以前报道的任何正在运行的实现都要快一个数量级。虽然该实现充分利用了PAM的可重构性，但有可能从(多个PAM设计)实现中获得(单个)门阵列规范，其估计大小小于100 K门，RSA 512-b密钥的速度超过1 Mb/s。软件性能的匹配增益也进行了分析。

引用次数: 228

BKM: A new hardware algorithm for complex elementary functions BKM:一种新的复杂初等函数的硬件算法

Proceedings of IEEE 11th Symposium on Computer Arithmetic

Pub Date : 1993-06-29 DOI: 10.1109/ARITH.1993.378098

J. Bajard, Sylvanus Kla, J. Muller

An algorithm for computing complex logarithms and exponentials is proposed. The algorithm is based on shift-and-add elementary steps, and it generalizes the Cordic algorithm. It can compute the usual real elementary functions. This algorithm is more suitable for computations in a redundant number system than Cordic, since there is no scaling factor for computation of trigonometric functions.<>

提出了一种计算复对数和复指数的算法。该算法基于移加基本步骤，对Cordic算法进行了推广。它可以计算一般的实初等函数。该算法比Cordic算法更适合于冗余数系的计算，因为它没有三角函数计算的比例因子。

引用次数: 50

A modular multiplication algorithm with triangle additions 一个带有三角形加法的模乘法算法

Proceedings of IEEE 11th Symposium on Computer Arithmetic

Pub Date : 1993-06-29 DOI: 10.1109/ARITH.1993.378083

N. Takagi

An algorithm for multiple-precision modular multiplication is proposed. In the algorithm, the upper half triangle of the whole partial products is first added up, and then the residue of the sum is calculated. Next, the sum of the lower half triangle of the whole partial products is added to the residue, and then the residue of the total amount is calculated. An efficient procedure for residue calculation that accelerates the algorithm is also proposed. Since it is both fast and uses a small amount of main memory, the algorithm is efficient for implementation on small computers, such as card computers, and is useful for application of a public-key cryptosystem to such computers.<>

提出了一种多精度模乘法算法。该算法首先对整个偏积的上半三角形进行相加，然后计算和的余数。接下来，将整个偏积的下半三角形的和加到余数中，然后计算出总量的余数。提出了一种有效的残差计算方法，加快了算法的速度。由于它既快又占用少量的主存，因此该算法对于小型计算机(如卡片计算机)的实现是有效的，并且对于在此类计算机上应用公钥密码系统是有用的。

引用次数: 11

Measuring the accuracy of ROM reciprocal tables 测量ROM倒数表的精度

Proceedings of IEEE 11th Symposium on Computer Arithmetic

Pub Date : 1993-06-29 DOI: 10.1109/ARITH.1993.378104

Debjit Das Sarma, D. Matula

It is proved that a conventional ROM reciprocal table construction algorithm generates tables that minimize the relative error. The worst case relative errors realized for such optimally computed k-bits-in, m-bits-out ROM reciprocal tables are then determined for all table sizes 3 /spl les/ k, m /spl les/ 12. It is then proved that the table construction algorithm always generates a k-bits-in, k-bits-out table with relative errors never any greater than 3(2/sup -k/)/4 for any k, and, more generally with g guard bits, that for (k + g)-bits-out the relative error is never any greater than 2/sup -(k+1)/(1 + 1/(2/sup g+1/)). To provide for determining test data without prior construction of a full ROM reciprocal table, a procedure that requires generation and searching of only a small portion of such a table to determine regions containing input data yielding the worst case relative errors is described.<>

证明了传统的ROM互易表构造算法能生成相对误差最小的表。对于这种最佳计算的k-bit -in, m-bit -out ROM倒数表，然后确定所有表大小为3 /spl les/ k, m /spl les/ 12的最坏情况下实现的相对误差。然后证明了表构造算法总是生成一个k-bit -in, k-bit -out表，对于任何k，相对误差永远不会大于3(2/sup -k/)/4，并且，更一般地说，对于(k+ g)-bit -out，相对误差永远不会大于2/sup -(k+1)/(1 +1/ (2/sup g+1/))。为了确定测试数据而不需要事先构造一个完整的ROM互易表，本文描述了一个过程，该过程只需要生成和搜索这样一个表的一小部分，以确定包含产生最坏情况相对错误的输入数据的区域

引用次数: 94

Multi-parallel convolvers 设计进行卷积器

Proceedings of IEEE 11th Symposium on Computer Arithmetic

Pub Date : 1993-06-29 DOI: 10.1109/ARITH.1993.378107

L. Dadda, V. Piuri, R. Stefanelli

A scheme for a convolver design, called a multiparallel convolver, that is based on concurrent processing of p adjacent samples that are input simultaneously to the p-parallel convolver is presented. The scheme uses p units, each of which receives the input samples and produces one convolution every p samples; these are called p-phase subconvolvers. The detailed design of the p-phase subconvolvers and of the whole p-parallel convolver is presented and discussed. The scheme can be used for both the bit-parallel and the bit-serial input presentation of each sample. The input sample rate of the p-parallel convolver is p times the sample rate of a standard (1-parallel) convolver implemented using the same integration technology. The number of components required by a p-parallel convolver is approximately p times the number of components required by a standard convolver.<>

提出了一种多并行卷积器设计方案，该方案基于对p并行卷积器同时输入的p个相邻样本的并发处理。该方案使用p个单元，每个单元接收输入样本，每p个样本产生一次卷积;这些被称为p相子卷积。给出并讨论了p相子卷积器和整个p平行卷积器的详细设计。该方案可用于每个采样的位并行和位串行输入表示。p-并联卷积器的输入采样率是使用相同积分技术实现的标准(1-并联)卷积器采样率的p倍。p-并联卷积器所需的组件数大约是标准卷积器所需组件数的p倍

引用次数: 2

Efficient multiprecision floating point multiplication with optimal directional rounding 有效的多精度浮点乘法与最佳的方向舍入

Proceedings of IEEE 11th Symposium on Computer Arithmetic

Pub Date : 1993-06-29 DOI: 10.1109/ARITH.1993.378088

W. Krandick, Jeremy R. Johnson

An algorithm is described for multiplying multiprecision floating-point numbers. The algorithm can produce either the smallest floating-point number greater than or equal to the true product, or the greatest floating-point number smaller than or equal to the true product. Software implementations of multiprecision floating-point multiplication can reduce the computation time by a factor of two if they do not compute the low-order digits of the product of the two mantissas. However, these algorithms do not necessarily provide optimally rounded results. The algorithms described here is guaranteed to produce optimally rounded results and typically obtains the same savings.<>

描述了一种多精度浮点数乘法算法。该算法既可以产生大于等于真积的最小浮点数，也可以产生小于等于真积的最大浮点数。多精度浮点乘法的软件实现如果不计算两个尾数乘积的低阶数字，则可以将计算时间减少两倍。然而，这些算法不一定提供最优的四舍五入结果。这里描述的算法保证产生最优的四舍五入结果，并且通常获得相同的节省

引用次数: 27

Complex SLI arithmetic: Representation, algorithms and analysis 复杂SLI算法:表示、算法和分析

Proceedings of IEEE 11th Symposium on Computer Arithmetic

Pub Date : 1993-06-29 DOI: 10.1109/ARITH.1993.378113

P. Turner

The extension of the SLI (symmetric level index) system to complex numbers and arithmetic is discussed. The natural form for representation of complex quantities in SLI is in the modulus-argument form, and this can be sensibly packed into a single 64-b word for the equivalent of the 32-b real SLI representation. The arithmetic algorithms prove to be very slightly more complicated than for real SLI arithmetic. The representation, the arithmetic algorithms, and the control of errors within these algorithms are described.<>

讨论了对称级索引系统在复数和算术中的推广。在SLI中，复数量的自然表示形式是模参数形式，这可以被合理地打包成一个64-b字，以等效32-b的实际SLI表示。该算法被证明比实际的SLI算法稍微复杂一些。描述了这些算法的表示、算术算法和误差控制。

引用次数: 9

The Gauss machine: A Galois-enhanced quadratic residue number system systolic array 高斯机:伽罗瓦增强型二次余数系统收缩阵列

Proceedings of IEEE 11th Symposium on Computer Arithmetic

Pub Date : 1993-06-29 DOI: 10.1109/ARITH.1993.378097

J. Mellott, Jermy C. Smith, F. Taylor

The Gauss machine is a SIMD systolic array architecture that takes advantage of the Galois-enhanced residue number system (GEQRNS) to form reduced-complexity arithmetic elements. The Gauss machine is targeted at front-end signal and image processing applications. A discrete prototype that achieves a peak rating of 320 million complex arithmetic operations per second while operating at 10 MHz has been constructed. A VLSI implementation of the Gauss machine's processor cell has been created. The VLSI implementation is implemented in 2.0-/spl mu/m CMOS and achieves greater than 20-MHz performance, using less than 2.0-mm/sup 2/ die area. It is shown that techniques for defect tolerance in RNS systolic arrays can result in substantial yield enhancement, thereby making larger than conventional (ULSI) systems possible.<>

高斯机是一种SIMD收缩阵列架构，利用伽罗瓦增强剩余数系统(GEQRNS)形成降低复杂度的算术元素。高斯机是针对前端信号和图像处理应用。已经构建了一个离散原型，该原型在10mhz工作时每秒达到3.2亿次复杂算术运算的峰值等级。高斯机的处理器单元的VLSI实现已经创建。VLSI实现在2.0-/spl mu/m CMOS中实现，使用小于2.0 mm/sup 2/芯片面积，实现大于20 mhz的性能。研究表明，RNS收缩阵列中的缺陷容限技术可以显著提高产量，从而使比传统(ULSI)系统更大的系统成为可能。

引用次数: 11

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of IEEE 11th Symposium on Computer Arithmetic

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀