Proceedings of the 12th Symposium on Computer Arithmetic最新文献

英文中文

Hardware design and arithmetic algorithms for a variable-precision, interval arithmetic coprocessor 可变精度区间算术协处理器的硬件设计和算法

Proceedings of the 12th Symposium on Computer Arithmetic

Pub Date : 1995-07-19 DOI: 10.1109/ARITH.1995.465354

M. Schulte, E. Swartzlander

This paper presents the hardware design and arithmetic algorithms for a coprocessor that performs variable-precision, interval arithmetic. The coprocessor gives the programmer the ability to specify the precision of the computation, determine the accuracy of the result, and recompute inaccurate results with higher precision. Direct hardware support and efficient algorithms for variable-precision, interval arithmetic greatly improve the speed, accuracy, and reliability of numerical computations. Performance estimates indicate that the coprocessor is 200 to 1,000 times faster than a software package for variable-precision, interval arithmetic. The coprocessor can be implemented on a single chip with a cycle time that is comparable to IEEE double-precision floating point coprocessors.<>

本文介绍了一种可实现变精度区间运算的协处理器的硬件设计和算法。协处理器使程序员能够指定计算的精度，确定结果的精度，并以更高的精度重新计算不准确的结果。直接的硬件支持和有效的变精度、区间算法大大提高了数值计算的速度、精度和可靠性。性能估计表明，协处理器比用于变精度区间算法的软件包快200到1000倍。该协处理器可以在单个芯片上实现，其周期时间与IEEE双精度浮点协处理器相当。

引用次数: 40

An area/performance comparison of subtractive and multiplicative divide/square root implementations 减法和乘法除法/平方根实现的面积/性能比较

Proceedings of the 12th Symposium on Computer Arithmetic

Pub Date : 1995-07-19 DOI: 10.1109/ARITH.1995.465366

Peter Soderquist, M. Leeser

The implementations of division and square root in the FPU's of current microprocessors are based on one of two categories of algorithms. Multiplicative techniques, exemplified by the Newton-Raphson method and Goldschmidt's algorithm, share functionality with the floating-point multiplier. Subtractive methods, such as the many variations of radix-4 SRT, generally use dedicated, parallel hardware. These different approaches give rise to the distinct area and performance characteristics which are explored in this paper. Area comparisons are derived from measurements of commercial and academic hardware implementations. Representative divide/square root implementations are paired with typical add-multiply structures and simulated, using data from current microprocessor and arithmetic coprocessor designs, to obtain performance estimates. The results suggest that subtractive implementations offer a superior balance of area and performance, and stand to benefit most decisively from improvements in technology and growing transistor budgets due to their parallel operation. Multiplicative methods lend themselves best to situations where hardware re-use is mandated due to area or architectural constraints.<>

在当前微处理器的FPU中，除法和平方根的实现基于两类算法中的一种。以Newton-Raphson方法和Goldschmidt算法为例的乘法技术与浮点乘法器共享功能。减法方法，如基数-4 SRT的许多变体，通常使用专用的并行硬件。这些不同的方法产生了不同的区域和性能特征，本文将对此进行探讨。面积比较来自商业和学术硬件实现的测量。代表性的除法/平方根实现与典型的加乘结构配对，并使用当前微处理器和算术协处理器设计的数据进行模拟，以获得性能估计。结果表明，减法实现提供了面积和性能的卓越平衡，并且由于其并行操作而从技术改进和晶体管预算增长中获益最多。乘法方法最适合由于区域或体系结构限制而强制要求硬件重用的情况。

{"title":"An area/performance comparison of subtractive and multiplicative divide/square root implementations","authors":"Peter Soderquist, M. Leeser","doi":"10.1109/ARITH.1995.465366","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465366","url":null,"abstract":"The implementations of division and square root in the FPU's of current microprocessors are based on one of two categories of algorithms. Multiplicative techniques, exemplified by the Newton-Raphson method and Goldschmidt's algorithm, share functionality with the floating-point multiplier. Subtractive methods, such as the many variations of radix-4 SRT, generally use dedicated, parallel hardware. These different approaches give rise to the distinct area and performance characteristics which are explored in this paper. Area comparisons are derived from measurements of commercial and academic hardware implementations. Representative divide/square root implementations are paired with typical add-multiply structures and simulated, using data from current microprocessor and arithmetic coprocessor designs, to obtain performance estimates. The results suggest that subtractive implementations offer a superior balance of area and performance, and stand to benefit most decisively from improvements in technology and growing transistor budgets due to their parallel operation. Multiplicative methods lend themselves best to situations where hardware re-use is mandated due to area or architectural constraints.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127317304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Application of fast layout synthesis environment to dividers evaluation 快速布局综合环境在分压器评价中的应用

Proceedings of the 12th Symposium on Computer Arithmetic

Pub Date : 1995-07-19 DOI: 10.1109/ARITH.1995.465375

A. Houelle, H. Mehrez, N. Vaucher, L. Montalvo, A. Guyot

Experience has shown that generator programs are quite often written by VLSI designers, as they hold the empirical knowledge better than anyone. However, their ability does not necessarily include programming and debugging skills: these designers have to focus on the problem at hand not on the tools or the language they use to solve it. GenOptim has been created to quickly design efficient IEEE 754 floating-point macro-cell generators that do not rely on particular target technologies. Whereas the design of fast and efficient adders, multipliers and shifters is well understood division and square root remain a serious design challenge. GenOptim was used to quickly evaluate new divider architectures.<>

经验表明，生成器程序通常是由VLSI设计人员编写的，因为他们比任何人都拥有经验知识。然而，他们的能力并不一定包括编程和调试技能:这些设计师必须关注手头的问题，而不是他们用来解决问题的工具或语言。GenOptim的创建是为了快速设计高效的IEEE 754浮点宏单元生成器，不依赖于特定的目标技术。然而，快速高效的加法器、乘法器和移位器的设计是众所周知的，除法和平方根仍然是一个严重的设计挑战。GenOptim用于快速评估新的分压器架构。

引用次数: 2

An /spl epsiv/ arithmetic for removing degeneracies 一种用于去除简并的算法

Proceedings of the 12th Symposium on Computer Arithmetic

Pub Date : 1995-07-19 DOI: 10.1109/ARITH.1995.465353

D. Michelucci

Symbolic perturbation by infinitely small values removes degeneracies in geometric algorithms and enables programmers to handle only generic cases: there are a few such cases, whereas there are an overwhelming number of degenerate cases. Current perturbation schemes have limitations. To overcome them, the paper proposes to use an /spl epsiv/-arithmetic, i.e. to represent in an explicit way infinitely small numbers and to define arithmetic operations (+,-,*,/,<,=) on them.<>

无限小值的符号扰动消除了几何算法中的简并性，使程序员能够只处理一般情况:有一些这样的情况，而有大量的简并性情况。目前的摄动方案有局限性。为了克服这些问题，本文提出了使用/spl / epsiv/-算法，即以显式的方式表示无限小的数，并定义了算术运算(+，-，*，/，>)。

引用次数: 2

It takes six ones to reach a flaw [Pentium processor] 6个才有缺陷[奔腾处理器]

Proceedings of the 12th Symposium on Computer Arithmetic

Pub Date : 1995-07-19 DOI: 10.1109/ARITH.1995.465365

T. Coe, P. T. P. Tang

The initial release of the Pentium processor has a flaw in its radix-4 SRT division implementation. It is widely-known that five entries were missing in the lookup table, yielding reduced-precision quotients occasionally. In this paper, we use mathematical techniques to analyze the divisors that can possibly cause failures. In particular, we show that Bits 5 through 10 (where Bit 0 is the MSB) of such divisors must be all ones. This result is useful in compiler-level software patches for systems with unreplaced chips; and we believe that the techniques used here are applicable in analyzing SRT division as well as other hardware algorithms for floating-point arithmetic.<>

最初发布的Pentium处理器在其基数-4 SRT除法实现中存在缺陷。众所周知，查找表中缺少5个条目，偶尔会产生精度降低的商。在本文中，我们使用数学技术来分析可能导致故障的除数。特别地，我们证明了这些除数的第5位到第10位(其中第0位是MSB)必须都是1。这一结果对于未更换芯片的系统的编译器级软件补丁非常有用;我们相信这里使用的技术适用于分析SRT除法以及其他浮点运算的硬件算法

引用次数: 15

167 MHz radix-4 floating point multiplier 167兆赫的基数4浮点乘法器

Proceedings of the 12th Symposium on Computer Arithmetic

Pub Date : 1995-07-19 DOI: 10.1109/ARITH.1995.465364

R. Yu, G. Zyner

An IEEE floating point multiplier with partial support for subnormal operands and results is presented. Radix-4 or modified Booth encoding and a binary tree of 4:2 compressors are used to generate the 53/spl times/53 double-precision product. Delay matching techniques were used in the binary tree stage and in the final addition stage to reduce cycle time. New techniques in rounding and sticky-bit generation were also used to reduce area and timing. The overall multiplier has a latency of 3 cycles a throughput of 1 cycle, and a cycle time of 6.0 ns. This multiplier has been implemented in a 0.5 /spl mu/m static CMOS technology in the UltraSPARC RISC microprocessor.<>

给出了一个部分支持次正规操作数的IEEE浮点乘法器及其结果。使用Radix-4或改进的Booth编码和4:2压缩器的二叉树来生成53/spl乘以/53的双精度乘积。在二叉树阶段和最终加法阶段采用延迟匹配技术，以减少周期时间。采用了舍入和粘位生成的新技术来减少面积和时间。整个乘法器的延迟为3个周期，吞吐量为1个周期，周期时间为6.0 ns。该乘法器已在UltraSPARC RISC微处理器中以0.5 /spl mu/m的静态CMOS技术实现。

引用次数: 79

High-speed double precision computation of nonlinear functions 非线性函数的高速双精度计算

Proceedings of the 12th Symposium on Computer Arithmetic

Pub Date : 1995-07-19 DOI: 10.1109/ARITH.1995.465370

V. Jain, L. Lin

High-speed coprocessors for computing nonlinear functions are important for advanced scientific computing as well as real-time image processing. In this paper we develop an efficient interpolative approach to such coprocessors. Performed on suitable subintervals of the range of interest, our interpolation which uses third degree polynomial is adequate for many elementary functions of interest with double precision mantissas. Our method requires only one major multiplication, two minor multiplications and a few additions. The minor multiplications are for the second and third degree terms, and their significant bits are much fewer than those of the first degree term. This leads to a very fast and efficient VLSI architecture for such coprocessors. It appears that polynomial based interpolation can yield considerable benefits over previously used approaches, when execution time and silicon area are considered. Further, it is possible to combine the computation of multiple functions on a single chip, with most of the resources on the chip shared for several functions.<>

用于计算非线性函数的高速协处理器对于先进的科学计算和实时图像处理非常重要。在本文中，我们开发了一种有效的插值方法来处理这种协处理器。利用三次多项式在感兴趣范围的适当子区间上进行插值，可以满足许多具有双精度尾数的感兴趣初等函数。我们的方法只需要一个大乘法，两个小乘法和一些加法。次要乘法是针对二次和三次项的，它们的有效位比一次项的有效位少得多。这为这种协处理器带来了非常快速和高效的VLSI架构。当考虑到执行时间和硅面积时，基于多项式的插值似乎比以前使用的方法产生了相当大的好处。此外，可以在单个芯片上组合多个功能的计算，并且芯片上的大部分资源用于多个功能共享。

引用次数: 25

The SNAP project: towards sub-nanosecond arithmetic SNAP项目:迈向亚纳秒算法

Proceedings of the 12th Symposium on Computer Arithmetic

Pub Date : 1995-07-19 DOI: 10.1109/ARITH.1995.465374

M. Flynn, K. Nowka, G. Bewick, E. Schwarz, Nhon T. Quach

SNAP-the Stanford subnanosecond arithmetic processor-is an interdisciplinary effort to develop theory, tools, and technology for realizing an arithmetic processor with execution rates under 1 ns. Specific improvements in clocking methods, floating-point addition algorithms, floating-point multiplication algorithms, division and higher-level function algorithms, design tools, and packaging technology were studied. These improvements have been demonstrated in the implementation of several VLSI designs.<>

snap——斯坦福亚纳秒算术处理器——是一项跨学科的努力，旨在开发实现执行速度低于1ns的算术处理器的理论、工具和技术。研究了时钟方法、浮点加法算法、浮点乘法算法、除法和高级函数算法、设计工具和封装技术等方面的具体改进。这些改进已经在几个VLSI设计的实现中得到了证明。

引用次数: 13

Faithful bipartite ROM reciprocal tables 忠实的二部ROM互易表

Proceedings of the 12th Symposium on Computer Arithmetic

Pub Date : 1995-07-19 DOI: 10.1109/ARITH.1995.465381

Debjit Das Sarma, D. Matula

We describe bipartite reciprocal tables that employ separate table lookup of the positive and negative portions of a borrow-save reciprocal value. The fusion of the parts includes a rounding so the output reciprocals are guaranteed correct to a unit in the last place, and typically provide a round-to-nearest reciprocal for over 90% of input arguments. The output rounding can be accomplished in conjunction with multiplier recoding yielding practically no cost in logic complexity or time in employing bipartite tables. We demonstrate these tables to be 2 to 4 times smaller than conventional 4-bit reciprocal tables. For 10-16 bit reciprocal table lookup the compression grows from a factor of 4 to over 16, making possible the use of larger seed reciprocals than previously considered cost effective.<>

我们描述了二部互易表，该表采用对借贷-储蓄互易值的正负部分的单独表查找。这些部分的融合包括四舍五入，因此输出的倒数保证正确到最后一个单位，并且通常为超过90%的输入参数提供四舍五入到最接近的倒数。输出舍入可以与乘法器编码一起完成，在使用二部表时几乎没有逻辑复杂性或时间成本。我们证明这些表比传统的4位倒数表小2到4倍。对于10-16位倒数表查找，压缩从4倍增长到16倍以上，使得使用比以前认为的成本效益更大的种子倒数成为可能。

引用次数: 183

High speed DCT/IDCT using a pipelined CORDIC algorithm 高速DCT/IDCT使用流水线CORDIC算法

Proceedings of the 12th Symposium on Computer Arithmetic

Pub Date : 1995-07-19 DOI: 10.1109/ARITH.1995.465361

Feng Zhou, Peter Kornerup

This paper describes DCT (IDCT) computations using the CORDIC algorithm. By rewriting the DCT, for a 1/spl times/8 DCT only 6 CORDIC computations are needed, whereas a 1/spl times/16 DCT requires 22 CORDIC computations. But these can all be pipelined through a single CORDIC unit, so 16/spl times/16 DCT's becomes feasible for HDTV compression. Only some simple adders, registers and a more complicated carry look-ahead adder are needed, end the computing speed can be very high. Limited only by the delay of a carry look-ahead adder, the delay time of the pipelined structure is 2-10 ns and the data rate as 100-500 MHz for an 8/spl times/8 DCT/IDCT and 72.2-366.6 MHz for a 16/spl times/16 DCT/IDCT when using two units.<>

本文描述了用CORDIC算法计算DCT (IDCT)。通过重写DCT，对于1/spl乘以8的DCT只需要6次CORDIC计算，而1/spl乘以16的DCT需要22次CORDIC计算。但这些都可以通过单个CORDIC单元进行流水线，因此16/spl倍/16 DCT对于HDTV压缩是可行的。只需要一些简单的加法器，寄存器和一个更复杂的进位预判加法器，因此计算速度可以非常高。当使用两个单元时，仅受进位预加器延迟的限制，流水线结构的延迟时间为2-10 ns，对于8/spl倍/8 DCT/IDCT，数据速率为100-500 MHz，对于16/spl倍/16 DCT/IDCT，数据速率为72.2-366.6 MHz。

引用次数: 18

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 12th Symposium on Computer Arithmetic

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀