17th IEEE Symposium on Computer Arithmetic (ARITH'05)最新文献

英文中文

The residue logarithmic number system: theory and implementation 残数对数系统:理论与实现

17th IEEE Symposium on Computer Arithmetic (ARITH'05)

Pub Date : 2005-06-27 DOI: 10.1109/ARITH.2005.44

M. Arnold

The residue logarithmic number system (RLNS) represents real values as quantized logarithms which, in turn, are represented using the residue number system (RNS). Compared to the conventional logarithmic number system (LNS) in which quantized logarithms are represented as binary integers, RLNS offers faster multiplication and division times. RLNS and LNS use a table lookup involving all bits for addition. The width, dynamic range, precision and naive table size of RLNS (with careful moduli selection) is as good as those for conventional LNS. Conventional LNS can be more efficient than naive addition lookup. First, commutativity allows interchanging arguments. Second, the addition function is often essentially zero, and does not have to be tabulated. In binary, comparisons are easy. In residue, comparisons are slow. Although RLNS inherently demands comparison, this paper shows a novel way comparisons can be performed in parallel to the lookup from a small table. This paper also describes a novel tool that generates synthesizable Verilog, making RLNS viable in practical applications that can benefit from shorter multiply and divide times.

残数对数系统(RLNS)将实数表示为量化对数，而量化对数又用残数系统(RNS)表示。传统对数系统(LNS)将量化的对数表示为二进制整数，与之相比，RLNS提供了更快的乘法和除法时间。RLNS和LNS使用包含所有位的表查找来进行加法。RLNS的宽度、动态范围、精度和原始表大小(经过仔细的模选择)与传统LNS一样好。传统的LNS比单纯的加法查找更有效。首先，交换性允许交换参数。其次，加法函数通常基本上为零，不需要制作表格。在二进制中，比较很容易。在残留物中，比较是缓慢的。尽管RLNS本质上需要比较，但本文展示了一种新的方法，可以在从一个小表查找的同时并行执行比较。本文还描述了一种生成可合成Verilog的新工具，使RLNS在实际应用中可行，可以从更短的乘法和除法时间中获益。

引用次数: 23

Data dependent power use in multipliers 乘法器中与数据相关的功率使用

17th IEEE Symposium on Computer Arithmetic (ARITH'05)

Pub Date : 2005-06-27 DOI: 10.1109/ARITH.2005.14

C. D. Walter, David Samyde

Recent research has demonstrated the vulnerability of certain smart card architectures to power and electromagnetic analysis when multiplier operations are insufficiently shielded from external monitoring. In this paper several standard multipliers are investigated in more detail in order to provide the foundation for understanding potential weaknesses and enabling the subsequent successful repair of those systems. A model is built which accurately predicts power use as a function of the Hamming weights of inputs without the combinatorial explosion of exhaustive simulation. This confirms that power use is indeed data dependent at least for those multipliers. Laboratory experiments confirm that EMR also corresponds closely to these power predictions over a wide range of frequencies.

最近的研究表明，当乘数操作对外部监控屏蔽不足时，某些智能卡架构容易受到电源和电磁分析的影响。本文对几种标准乘数进行了更详细的研究，以便为理解潜在的弱点和使这些系统的后续成功修复提供基础。建立了一个模型，该模型可以准确地预测功率使用作为输入汉明权重的函数，而不需要穷举模拟的组合爆炸。这证实了至少对于那些乘数器，电量使用确实是数据依赖的。实验室实验证实，EMR在很宽的频率范围内也与这些功率预测密切相关。

引用次数: 13

Parallel prefix adder design with matrix representation 矩阵表示的并行前缀加法器设计

17th IEEE Symposium on Computer Arithmetic (ARITH'05)

Pub Date : 2005-06-27 DOI: 10.1109/ARITH.2005.35

Youngmoon Choi, E. Swartzlander

The paper presents a one-shot batch process that generates a wide range of designs for a group of parallel prefix adders. The prefix adders are represented by two two-dimensional matrices and two vectors. This matrix representation makes it possible to compose two functions for gate sizing which calculate the delay and the total transistor width of the carry propagation graph of adders. After gate sizing, the critical path net-lists of the carry propagation graph are generated from the matrix representation for spice delay calculation. The process is illustrated by generating sets of delay and total transistor width pairs for 32-bit and 64-bit cases.

本文提出了一种一次性批量处理的方法，该方法产生了一组并行前缀加法器的各种设计。前缀加法器由两个二维矩阵和两个向量表示。这种矩阵表示法使得计算加法器进位传播图的延迟和晶体管总宽度的两个门尺寸函数成为可能。栅极确定后，由矩阵表示生成进位传播图的关键路径网络列表，用于spice delay的计算。该过程通过32位和64位情况下的延迟和总晶体管宽度对的生成集来说明。

引用次数: 29

Low latency pipelined circular CORDIC 低延迟流水线循环CORDIC

17th IEEE Symposium on Computer Arithmetic (ARITH'05)

Pub Date : 2005-06-27 DOI: 10.1109/ARITH.2005.30

E. Antelo, J. Villalba

The pipelined CORDIC with linear approximation to rotation has been proposed to achieve reductions in delay, power and area; however, the schemes for rotation (multiplication) and vectoring (division) complicate implementation in a single unit. In this work, we improve the linear approximation scheme, leading to a unified implementation for rotation and vectoring where fully parallel tree multipliers are used instead of the second half of CORDIC iterations. We also combine the linear approximation to rotation with the scale factor compensation so that the compensation is performed concurrently with the rotation process. Comparison with other designs is also provided.

提出了一种线性逼近旋转的流水线式CORDIC，以实现延迟、功耗和面积的降低;然而，旋转(乘法)和矢量(除法)的方案使单个单元的实现变得复杂。在这项工作中，我们改进了线性近似方案，从而实现了旋转和矢量的统一实现，其中使用了完全并行的树乘法器而不是CORDIC迭代的后半部分。我们还将旋转的线性逼近与尺度因子补偿相结合，使补偿与旋转过程同时进行。并与其他设计进行了比较。

引用次数: 7

Efficient mapping of addition recurrence algorithms in CMOS CMOS中加法递归算法的有效映射

17th IEEE Symposium on Computer Arithmetic (ARITH'05)

Pub Date : 2005-06-27 DOI: 10.1109/ARITH.2005.19

B. Zeydel, Ties Kluter, V. Oklobdzija

Efficient adder design requires proper selection of a recurrence algorithm and its realization. Each of the algorithms: Weinberger's, Ling's and Doran's were analyzed for its flexibility in representation and suitability for realization in CMOS. We describe general techniques for developing efficient realizations based on CMOS technology constraints when using Ling's algorithm. From these techniques we propose two high-performance realizations that achieve 1 FO4 delay improvement at the same energy and 50% energy reduction at the same delay than existing Ling and Weinberger designs.

有效的加法器设计需要合理选择递归算法并实现。每种算法:Weinberger's, Ling's和Doran's的分析其灵活性的表示和适合在CMOS中实现。我们描述了在使用Ling算法时基于CMOS技术约束开发高效实现的一般技术。从这些技术中，我们提出了两种高性能实现，与现有的Ling和Weinberger设计相比，在相同的能量下实现了1 FO4延迟的改善，在相同的延迟下实现了50%的能量降低。

引用次数: 30

An improved unified scalable radix-2 Montgomery multiplier 改进的统一可伸缩基数-2蒙哥马利乘法器

17th IEEE Symposium on Computer Arithmetic (ARITH'05)

Pub Date : 2005-06-27 DOI: 10.1109/ARITH.2005.9

D. Harris, R. Krishnamurthy, M. Anders, S. Mathew, S. Hsu

This paper describes an improved version of the Tenca-Koc unified scalable radix-2 Montgomery multiplier with half the latency for small and moderate precision operands and half the queue memory requirement. Like the Tenca-Koc multiplier, this design is reconfigurable to accept any input precision in either GF(p) or GF(2/sup n/) up to the size of the on-chip memory. An FPGA implementation can perform 1024-bit modular exponentiation in 16 ms using 5598 4-input lookup tables, making it the fastest unified scalable design yet reported.

本文描述了Tenca-Koc统一可伸缩的基数-2蒙哥马利乘法器的改进版本，它将小精度和中等精度操作数的延迟降低了一半，并将队列内存需求降低了一半。与Tenca-Koc乘法器一样，该设计可重新配置，以接受GF(p)或GF(2/sup n/)的任何输入精度，直至片上存储器的大小。FPGA实现可以使用5598个4输入查找表在16毫秒内执行1024位模块化幂运算，使其成为迄今为止报道的最快的统一可扩展设计。

引用次数: 107

Error-free computation of 8/spl times/8 2D DCT and IDCT using two-dimensional algebraic integer quantization 基于二维代数整数量化的8/spl次/8二维DCT和IDCT的无误差计算

17th IEEE Symposium on Computer Arithmetic (ARITH'05)

Pub Date : 2005-06-27 DOI: 10.1109/ARITH.2005.20

K. Wahid, V. Dimitrov, G. Jullien

This paper presents a novel error-free (infinite-precision) architecture for the fast implementation of both 8/spl times/8 2D discrete cosine transform and inverse DCT. The architecture uses a new algebraic integer quantization of a 1D radix-8 DCT that allows the separable computation of a 2D 8/spl times/8 DCT without any intermediate number representation conversions. This is a considerable improvement on previously introduced algebraic integer encoding techniques to compute both DCT and IDCT which eliminates the requirements to approximate the transformation matrix elements by obtaining their exact representations and hence mapping the transcendental functions without any errors. Using this encoding scheme, an entire 8/spl times/8 1D DCT-SQ (scalar quantization) algorithm can be implemented with only 24 adders. Apart from the multiplication-free nature, this new mapping scheme fits to this algorithm, eliminating any computational or quantization errors and resulting short-word-length and high-speed-design.

本文提出了一种新的无误差(无限精度)架构，用于快速实现8/spl次/8二维离散余弦变换和逆DCT。该架构使用了一种新的一维基数-8 DCT的代数整数量化，允许二维8/spl次/8 DCT的可分离计算，而无需任何中间数字表示转换。这是对以前引入的计算DCT和IDCT的代数整数编码技术的一个相当大的改进，它消除了通过获得变换矩阵元素的精确表示来近似变换矩阵元素的要求，从而映射超越函数而没有任何错误。使用该编码方案，仅用24个加法器就可以实现完整的8/spl × /8一维DCT-SQ(标量量化)算法。除了没有乘法的特性外，这种新的映射方案适合该算法，消除了任何计算或量化错误以及由此产生的短字长和高速设计。

引用次数: 18

Fast modular reduction for large wordlengths via one linear and one cyclic convolution 通过一个线性和一个循环卷积快速模块化缩减大字长

17th IEEE Symposium on Computer Arithmetic (ARITH'05)

Pub Date : 2005-06-27 DOI: 10.1109/ARITH.2005.21

D. Phatak, T. Goff

Modular reduction is a fundamental operation in cryptographic systems. Most well known modular reduction methods including Barrett's and Montgomery's algorithms leverage some-pre computations to avoid divisions so that the main complexity of these methods lies in a sequence of two long multiplications. For large wordlengths a multiplication which is tantamount to a linear convolution is performed via the fast Fourier transform (FFT) or other transform-based techniques as in the Schonhage-Strassen multiplication algorithm. We show a fundamental property (the separation principle): in a modular reduction based on long multiplications, the linear convolution required by one of the two long multiplications can be replaced by a cyclic convolution, and the halves can be separated using other information available due to the intrinsic redundancy of the operations. This reduces the number of operations by about 25%. We demonstrate that both Barrett's and Montgomery's methods can be sped up by using the aforementioned fundamental principle. It is shown that a direct application of this algorithm to modular exponentiation (either using Barrett's or Montgomery's methods) can be expected to yield about about 17% speedup.

模约简是密码系统中的一个基本运算。最著名的模块化约简方法，包括Barrett和Montgomery的算法，利用一些预先计算来避免除法，因此这些方法的主要复杂性在于两个长乘法的序列。对于较大的字长，通过快速傅里叶变换(FFT)或Schonhage-Strassen乘法算法中的其他基于变换的技术执行相当于线性卷积的乘法。我们展示了一个基本性质(分离原则):在基于长乘法的模块化约简中，两个长乘法之一所需的线性卷积可以被循环卷积所取代，并且由于操作的内在冗余，可以使用其他可用的信息将两半分离。这减少了大约25%的操作次数。我们证明，巴雷特和蒙哥马利的方法都可以通过使用上述基本原则来加速。结果表明，将该算法直接应用于模幂运算(使用Barrett's或Montgomery's方法)可以预期产生约17%的加速。

{"title":"Fast modular reduction for large wordlengths via one linear and one cyclic convolution","authors":"D. Phatak, T. Goff","doi":"10.1109/ARITH.2005.21","DOIUrl":"https://doi.org/10.1109/ARITH.2005.21","url":null,"abstract":"Modular reduction is a fundamental operation in cryptographic systems. Most well known modular reduction methods including Barrett's and Montgomery's algorithms leverage some-pre computations to avoid divisions so that the main complexity of these methods lies in a sequence of two long multiplications. For large wordlengths a multiplication which is tantamount to a linear convolution is performed via the fast Fourier transform (FFT) or other transform-based techniques as in the Schonhage-Strassen multiplication algorithm. We show a fundamental property (the separation principle): in a modular reduction based on long multiplications, the linear convolution required by one of the two long multiplications can be replaced by a cyclic convolution, and the halves can be separated using other information available due to the intrinsic redundancy of the operations. This reduces the number of operations by about 25%. We demonstrate that both Barrett's and Montgomery's methods can be sped up by using the aforementioned fundamental principle. It is shown that a direct application of this algorithm to modular exponentiation (either using Barrett's or Montgomery's methods) can be expected to yield about about 17% speedup.","PeriodicalId":194902,"journal":{"name":"17th IEEE Symposium on Computer Arithmetic (ARITH'05)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133151336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Long number bit-serial squarers 长数字位序列平方

17th IEEE Symposium on Computer Arithmetic (ARITH'05)

Pub Date : 2005-06-27 DOI: 10.1109/ARITH.2005.28

E. Chaniotakis, P. Kalivas, K. Pekmestzi

New bit serial squarers for long numbers in LSB first form, are presented in this paper. The first presented scheme is a 50% operational efficient squarer than has the half number of cells compared to the traditional squarers. The second scheme is a 100% operational efficient squarer. In this scheme, the number of the cells remain unchanged compared to other proposed schemes but the number of the required registers is reduced significantly. Both schemes are presented in non-systolic and systolic form and are compared against other squarers presented in the bibliography from the aspect of hardware complexity.

本文提出了一种新的LSB一阶形式的长数位串行平方器。第一个提出的方案是一个50%的操作效率的平方比具有一半的单元数相比，传统的平方。第二种方案是100%操作效率的平方器。在此方案中，单元的数量与其他方案相比保持不变，但所需寄存器的数量显着减少。这两种方案分别以非收缩和收缩形式提出，并从硬件复杂性方面与参考文献中提出的其他方阵进行了比较。

引用次数: 7

Guaranteed proofs using interval arithmetic 使用区间算术的保证证明

17th IEEE Symposium on Computer Arithmetic (ARITH'05)

Pub Date : 2005-06-27 DOI: 10.1109/ARITH.2005.25

M. Daumas, G. Melquiond, C. Muñoz

This paper presents a set of tools for mechanical reasoning of numerical bounds using interval arithmetic. The tools implement two techniques for reducing decorrelation: interval splitting and Taylor's series expansions. Although the tools are designed for the proof assistant system PVS, expertise on PVS is not required. The ultimate goal of the tools is to provide guaranteed proofs of numerical properties with a minimal human-theorem prover interaction.

本文提出了一套用区间算法进行数值界机械推理的工具。这些工具实现了两种减少去相关的技术:区间分裂和泰勒级数展开。虽然这些工具是为证明辅助系统PVS而设计的，但不需要PVS方面的专业知识。这些工具的最终目标是用最少的人-定理证明者交互提供有保证的数值性质证明。

引用次数: 63

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

17th IEEE Symposium on Computer Arithmetic (ARITH'05)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀