Pub Date : 1991-06-26DOI: 10.1109/ARITH.1991.145548
T. E. Hull, M. S. Cohen, C. Hall
The authors have been developing a programming system intended to be especially convenient for scientific computing. Its main features are variable precision (decimal) floating-point arithmetic and convenient exception handling. The software implementation of the system has evolved over a number of years, and a partial hardware implementation of the arithmetic itself was constructed and used during the early stages of the project. Based on this experience, the authors have developed a set of specifications for an arithmetic coprocessor to support such a system. These specifications are described. An outline of the language features and how they can be used is also provided, to help justify the particular choice of coprocessor specifications. The authors also indicate what other hardware features would be most helpful to the systems programmer, especially for implementation of the exception handling.<>
{"title":"Specifications for a variable-precision arithmetic coprocessor","authors":"T. E. Hull, M. S. Cohen, C. Hall","doi":"10.1109/ARITH.1991.145548","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145548","url":null,"abstract":"The authors have been developing a programming system intended to be especially convenient for scientific computing. Its main features are variable precision (decimal) floating-point arithmetic and convenient exception handling. The software implementation of the system has evolved over a number of years, and a partial hardware implementation of the arithmetic itself was constructed and used during the early stages of the project. Based on this experience, the authors have developed a set of specifications for an arithmetic coprocessor to support such a system. These specifications are described. An outline of the language features and how they can be used is also provided, to help justify the particular choice of coprocessor specifications. The authors also indicate what other hardware features would be most helpful to the systems programmer, especially for implementation of the exception handling.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116795339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-06-26DOI: 10.1109/ARITH.1991.145530
M. Paterson, Uri Zwick
Y. Ofman (1963), C.S. Wallace (1964), and others used carry save adders to design multiplication circuits whose total delay is proportional to the logarithm of the length of two numbers multiplied. An extension of their work is presented. A general theory is presented describing the optimal way in which given carry save adders can be combined into carry save networks. Two new designs of basic carry save adders are described. Using these building blocks and the general theory, the shallowest known theoretical circuits for multiplication are obtained.<>
Y. Ofman (1963), C.S. Wallace(1964)等人使用进位保存加法器设计乘法电路,其总延迟与两个数字相乘长度的对数成正比。介绍了他们工作的延伸。给出了将给定的进位保存加法器组合成进位保存网络的最优方法。介绍了两种新的基本进位存加器的设计。利用这些基本单元和一般理论,我们得到了已知最浅的乘法理论电路。
{"title":"Shallow multiplication circuits","authors":"M. Paterson, Uri Zwick","doi":"10.1109/ARITH.1991.145530","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145530","url":null,"abstract":"Y. Ofman (1963), C.S. Wallace (1964), and others used carry save adders to design multiplication circuits whose total delay is proportional to the logarithm of the length of two numbers multiplied. An extension of their work is presented. A general theory is presented describing the optimal way in which given carry save adders can be combined into carry save networks. Two new designs of basic carry save adders are described. Using these building blocks and the general theory, the shallowest known theoretical circuits for multiplication are obtained.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115850442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-06-26DOI: 10.1109/ARITH.1991.145568
P. Tu, M. Ercegovac
A scheme for the singular value decomposition (SVD) problem, based on online arithmetic, is discussed. The design, using radix-2 floating-point online operations, implemented in the LSI HCMOS gate-array technology, is compared with a compatible conventional arithmetic implementation. The preliminary results indicate that the proposed online approach achieves a speedup of 2.4-3.2 with respect to the conventional solutions, with 1.3-5.5 more gates and more than 6 times fewer interconnections.<>
{"title":"Application of on-line arithmetic algorithms to the SVD computation: preliminary results","authors":"P. Tu, M. Ercegovac","doi":"10.1109/ARITH.1991.145568","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145568","url":null,"abstract":"A scheme for the singular value decomposition (SVD) problem, based on online arithmetic, is discussed. The design, using radix-2 floating-point online operations, implemented in the LSI HCMOS gate-array technology, is compared with a compatible conventional arithmetic implementation. The preliminary results indicate that the proposed online approach achieves a speedup of 2.4-3.2 with respect to the conventional solutions, with 1.3-5.5 more gates and more than 6 times fewer interconnections.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125992834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-06-26DOI: 10.1109/ARITH.1991.145571
A. D. Lange, E. Deprettere
The authors describe the design and implementation of an algorithm and a processor which can be used to accelerate computations in which large amounts of rotations (circular as well as hyperbolic) are involved. The processor is a low-cost high-throughput VLSI implementation of the algorithm. With 10/sup 7/ rotations per second, many real-time and interaction-time applications in scientific computation become feasible. The required storage and/or silicon area is low and the execution time is independent of the particular operation performed. Another feature of this CORDIC design is its pipelined architecture and floating point extension. It is angle-pipelinable at the bit-level and has an execution time which is independent of any possible operation that can be executed.<>
{"title":"Design and implementation of a floating-point quasi-systolic general purpose CORDIC rotator for high-rate parallel data and signal processing","authors":"A. D. Lange, E. Deprettere","doi":"10.1109/ARITH.1991.145571","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145571","url":null,"abstract":"The authors describe the design and implementation of an algorithm and a processor which can be used to accelerate computations in which large amounts of rotations (circular as well as hyperbolic) are involved. The processor is a low-cost high-throughput VLSI implementation of the algorithm. With 10/sup 7/ rotations per second, many real-time and interaction-time applications in scientific computation become feasible. The required storage and/or silicon area is low and the execution time is independent of the particular operation performed. Another feature of this CORDIC design is its pipelined architecture and floating point extension. It is angle-pipelinable at the bit-level and has an execution time which is independent of any possible operation that can be executed.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124986731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-06-26DOI: 10.1109/ARITH.1991.145566
W. Ferguson, T. Brightman
A technique for computing monotonicity preserving approximations F/sub a/(x) of a function F(x) is presented. This technique involves computing an extra precise approximation of F(x) that is rounded to produce the value of F/sub a/(x). For example, only a few extra bits of precision are used to make the accurate transcendental functions found on the Cyrix FasMath line of 80387 compatible math coprocessors monotonic.<>
{"title":"Accurate and monotone approximations of some transcendental functions","authors":"W. Ferguson, T. Brightman","doi":"10.1109/ARITH.1991.145566","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145566","url":null,"abstract":"A technique for computing monotonicity preserving approximations F/sub a/(x) of a function F(x) is presented. This technique involves computing an extra precise approximation of F(x) that is rounded to produce the value of F/sub a/(x). For example, only a few extra bits of precision are used to make the accurate transcendental functions found on the Cyrix FasMath line of 80387 compatible math coprocessors monotonic.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"252 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133407189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-06-26DOI: 10.1109/ARITH.1991.145559
D. Wong, M. Flynn
A class of iterative integer division algorithms is presented based on lookup table Taylor-series approximations to the reciprocal. The algorithm iterates by using the reciprocal to find an approximate quotient and then subtracting the quotient multiplied by the divisor from the dividend to find a remaining dividend. Fast implementations can produce an average of either 14 or 27 b per iteration, depending on whether the basic or advanced version of this method is implemented. Detailed analyses are presented to support the claimed accuracy per iteration. Speed estimates using state-of-the-art ECL (emitted coupled logic) components show that this method is faster than the Newton-Raphson technique and can produce 53-b quotients of 53-b numbers in about 28 or 22 ns for the basic and advanced versions.<>
{"title":"Fast division using accurate quotient approximations to reduce the number of iterations","authors":"D. Wong, M. Flynn","doi":"10.1109/ARITH.1991.145559","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145559","url":null,"abstract":"A class of iterative integer division algorithms is presented based on lookup table Taylor-series approximations to the reciprocal. The algorithm iterates by using the reciprocal to find an approximate quotient and then subtracting the quotient multiplied by the divisor from the dividend to find a remaining dividend. Fast implementations can produce an average of either 14 or 27 b per iteration, depending on whether the basic or advanced version of this method is implemented. Detailed analyses are presented to support the claimed accuracy per iteration. Speed estimates using state-of-the-art ECL (emitted coupled logic) components show that this method is faster than the Newton-Raphson technique and can produce 53-b quotients of 53-b numbers in about 28 or 22 ns for the basic and advanced versions.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130779484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-06-26DOI: 10.1109/ARITH.1991.145564
A. Guyot
Various algorithms for finding the greatest common divisor (GCD) and extended GCD of very large integers are explored. In particular, the tradeoff between computation time and area is examined. Two of the algorithms, from which the method for deriving variants is straightforward, are detailed. Then the architecture of a VLSI processor dedicated to GCD as well as multiply, divide, square root, etc. of very large numbers (>600 decimal digits), using an internal radix 2 redundant representation and supporting multiple precision, is described.<>
{"title":"OCAPI: architecture of a VLSI coprocessor for the GCD and the extended GCD of large numbers","authors":"A. Guyot","doi":"10.1109/ARITH.1991.145564","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145564","url":null,"abstract":"Various algorithms for finding the greatest common divisor (GCD) and extended GCD of very large integers are explored. In particular, the tradeoff between computation time and area is examined. Two of the algorithms, from which the method for deriving variants is straightforward, are detailed. Then the architecture of a VLSI processor dedicated to GCD as well as multiply, divide, square root, etc. of very large numbers (>600 decimal digits), using an internal radix 2 redundant representation and supporting multiple precision, is described.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114689215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-06-26DOI: 10.1109/ARITH.1991.145563
S. N. Parikh, D. Matula
An efficient implementation of the Euclidean GCD (greatest common divisor) algorithm employing the redundant binary number system is described. The time complexity is O(n), utilizing O(n)4-2 signed 1-b adders to determine the GCD of two n-b integers. The process is similar to that used in SRT division. The efficiency of the algorithm is competitive, to within a small factor, with floating point division in terms of the number of shift and add/subtract operations. The novelty of the algorithm is based on properties derived from the proposed scheme of normalization of signed bit fractions. The implementation is well suited for systolic hardware design.<>
{"title":"A redundant binary Euclidean GCD algorithm","authors":"S. N. Parikh, D. Matula","doi":"10.1109/ARITH.1991.145563","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145563","url":null,"abstract":"An efficient implementation of the Euclidean GCD (greatest common divisor) algorithm employing the redundant binary number system is described. The time complexity is O(n), utilizing O(n)4-2 signed 1-b adders to determine the GCD of two n-b integers. The process is similar to that used in SRT division. The efficiency of the algorithm is competitive, to within a small factor, with floating point division in terms of the number of shift and add/subtract operations. The novelty of the algorithm is based on properties derived from the proposed scheme of normalization of signed bit fractions. The implementation is well suited for systolic hardware design.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114067360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-06-26DOI: 10.1109/ARITH.1991.145528
Christiane Frougny
Numeration systems, the bases of which are defined by a linear recurrence with integer coefficients, are considered. Conditions on the recurrence are given under which the function of normalization which transforms any representation of an integer into the normal one-obtained by the usual algorithm-can be realized by a finite automaton. Addition is a particular case of normalization. The same questions are discussed for the representation of real numbers in basis theta , where theta is a real number >1. In particular it is shown that, if theta is a Pisot number, then the normalization and the addition in basis theta are computable by a finite automaton.<>
{"title":"Representation of numbers in nonclassical numeration systems","authors":"Christiane Frougny","doi":"10.1109/ARITH.1991.145528","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145528","url":null,"abstract":"Numeration systems, the bases of which are defined by a linear recurrence with integer coefficients, are considered. Conditions on the recurrence are given under which the function of normalization which transforms any representation of an integer into the normal one-obtained by the usual algorithm-can be realized by a finite automaton. Addition is a particular case of normalization. The same questions are discussed for the representation of real numbers in basis theta , where theta is a real number >1. In particular it is shown that, if theta is a Pisot number, then the normalization and the addition in basis theta are computable by a finite automaton.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"77 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134411195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-06-26DOI: 10.1109/ARITH.1991.145561
T. Williams, M. Horowitz
A full-custom VLSI chip demonstrates an arithmetic implementation for computing the mantissa of a 54-b (floating-point double-precision) division operation in 45 ns to 160 ns, depending on the data. The design uses self-timing to avoid the need to partition logic into clock cycles and the need for high-speed clocks. Self-timing allows the circuits to iterate with no overhead over the pure combinational logic delays. It also allows a greater-efficiency symmetric overlapped execution of the SRT stages because of dynamic path ordering. The design has several other performance enhancements, and their effects on the performance are discussed. The total effect of all the performance enhancements provides a factor of two increase in performance due to architectural improvements over a straightforward SRT approach.<>
{"title":"A 160 ns 54 bit CMOS division implementation using self-timing and symmetrically overlapped SRT stages","authors":"T. Williams, M. Horowitz","doi":"10.1109/ARITH.1991.145561","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145561","url":null,"abstract":"A full-custom VLSI chip demonstrates an arithmetic implementation for computing the mantissa of a 54-b (floating-point double-precision) division operation in 45 ns to 160 ns, depending on the data. The design uses self-timing to avoid the need to partition logic into clock cycles and the need for high-speed clocks. Self-timing allows the circuits to iterate with no overhead over the pure combinational logic delays. It also allows a greater-efficiency symmetric overlapped execution of the SRT stages because of dynamic path ordering. The design has several other performance enhancements, and their effects on the performance are discussed. The total effect of all the performance enhancements provides a factor of two increase in performance due to architectural improvements over a straightforward SRT approach.<<ETX>>","PeriodicalId":190650,"journal":{"name":"[1991] Proceedings 10th IEEE Symposium on Computer Arithmetic","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123093880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}