Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465377
Chung Nan Lyu, D. Matula
We investigate the efficiencies attainable pursuing Booth recoding directly from redundant binary input with limited carry propagation. As a digit conversion problem we extend the important result that each radix 4 Booth recoded digit can be determined from 5 consecutive input signed bits to obtain that each radix 2/sup k/ Booth recoded digit can be determined from 2k+1 consecutive input signed bits and prove this to be the minimum possible for any k/spl ges/2. Analysis of alternative bit pair encodings of signed bits yields the improved result that each radix 2/sup k/ Booth recoded digit can be determined from only 2k encoded bit pairs employing sign and magnitude bit encoding, a result which does not extend to conventional borrow-save or carry-save redundant binary digit encodings. Radices 4 and 8 gate level designs are illustrated for alternative encodings, with our signed bit design shown to yield smaller depth and fewer gates than existing redundant binary Booth recoding circuits from the literature.<>
{"title":"Redundant binary Booth recoding","authors":"Chung Nan Lyu, D. Matula","doi":"10.1109/ARITH.1995.465377","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465377","url":null,"abstract":"We investigate the efficiencies attainable pursuing Booth recoding directly from redundant binary input with limited carry propagation. As a digit conversion problem we extend the important result that each radix 4 Booth recoded digit can be determined from 5 consecutive input signed bits to obtain that each radix 2/sup k/ Booth recoded digit can be determined from 2k+1 consecutive input signed bits and prove this to be the minimum possible for any k/spl ges/2. Analysis of alternative bit pair encodings of signed bits yields the improved result that each radix 2/sup k/ Booth recoded digit can be determined from only 2k encoded bit pairs employing sign and magnitude bit encoding, a result which does not extend to conventional borrow-save or carry-save redundant binary digit encodings. Radices 4 and 8 gate level designs are illustrated for alternative encodings, with our signed bit design shown to yield smaller depth and fewer gates than existing redundant binary Booth recoding circuits from the literature.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131194331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465362
E. Antelo, J. Bruguera, J. Villalba, E. Zapata
We present a Cordic rotator, using carry-save arithmetic, based on the prediction of all the coefficients into which the rotation angle is decomposed. The prediction algorithm is based on the use of radix-2 microrotations with multiple shifts in the first iterations and the use of a redundant radix-2 and radix-4 representation for the coefficients in the rest of the microrotations. The use of multiple shifts facilitates the prediction of the coefficients in the case of microrotations where i/spl les/n/4, being n the precision of the algorithm, and the use of radix-4 microrotations helps to reduce the total number of iterations. The prediction is carried out using the redundant representation of the z coordinate, without any need for conversions to a non-redundant representation. Finally, we present a VLSI architecture based on this algorithm. As the production of the coefficients is very fast, and they are known before starting each microrotation, the resulting architecture can be highly pipelined and consequently appropriate for applications where high speeds are required.<>
{"title":"Redundant Cordic rotator based on parallel prediction","authors":"E. Antelo, J. Bruguera, J. Villalba, E. Zapata","doi":"10.1109/ARITH.1995.465362","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465362","url":null,"abstract":"We present a Cordic rotator, using carry-save arithmetic, based on the prediction of all the coefficients into which the rotation angle is decomposed. The prediction algorithm is based on the use of radix-2 microrotations with multiple shifts in the first iterations and the use of a redundant radix-2 and radix-4 representation for the coefficients in the rest of the microrotations. The use of multiple shifts facilitates the prediction of the coefficients in the case of microrotations where i/spl les/n/4, being n the precision of the algorithm, and the use of radix-4 microrotations helps to reduce the total number of iterations. The prediction is carried out using the redundant representation of the z coordinate, without any need for conversions to a non-redundant representation. Finally, we present a VLSI architecture based on this algorithm. As the production of the coefficients is very fast, and they are known before starting each microrotation, the resulting architecture can be highly pipelined and consequently appropriate for applications where high speeds are required.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132896545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465367
T. Lang, P. Montuschi
An algorithm for square root with prescaling is developed and combined with a similar scheme for division. An implementation is described, evaluated and compared with other combined div/sqrt implementations.<>
{"title":"Very-high radix combined division and square root with prescaling and selection by rounding","authors":"T. Lang, P. Montuschi","doi":"10.1109/ARITH.1995.465367","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465367","url":null,"abstract":"An algorithm for square root with prescaling is developed and combined with a similar scheme for division. An implementation is described, evaluated and compared with other combined div/sqrt implementations.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128179394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465368
Thomas W. Lynch, Ashraf Ahmed, M. Schulte, T. K. Callaway, Robert Tisdale
This paper describes the development of the transcendental instructions for the K5, AMD's recently completed x86 compatible superscalar microprocessor. A multi-level development cycle, with testing between levels, facilitated the early detection of errors and limited their effect on the design schedule. The algorithms for the transcendental functions use table-driven reductions followed by polynomial approximations. Multiprecision arithmetic operations are used when necessary to maintain sufficient accuracy and to ensure that the transcendental functions have a maximum error of one unit in the last place.<>
{"title":"The K5 transcendental functions","authors":"Thomas W. Lynch, Ashraf Ahmed, M. Schulte, T. K. Callaway, Robert Tisdale","doi":"10.1109/ARITH.1995.465368","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465368","url":null,"abstract":"This paper describes the development of the transcendental instructions for the K5, AMD's recently completed x86 compatible superscalar microprocessor. A multi-level development cycle, with testing between levels, facilitated the early detection of errors and limited their effect on the design schedule. The algorithms for the transcendental functions use table-driven reductions followed by polynomial approximations. Multiprecision arithmetic operations are used when necessary to maintain sufficient accuracy and to ensure that the transcendental functions have a maximum error of one unit in the last place.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130556083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465371
G. Matsubara, N. Ide, H. Tago, Seigo Suzuki, N. Goto
A shared radix 2 division and square root implementation using a self-timed circuit is presented. The same execution time for division and square root is achieved by using an on-the-fly digit decoding and a root multiple generation technique. Most of the hardware is shared, and only several multiplexers are required to exchange a divisor multiple and a root multiple. Moreover, quotient selection logic is accelerated by a new algorithm using a 3-b carry propagation adder. The implementation of the shared division and square root unit is realized by assuming 0.3 /spl mu/m CMOS technology. The wiring capacitance and other parasitic parameters are taken into account. The execution time of floating point 55-b full mantissa division and square root is expected to be less than 30 ns in the worst case of an input vector determined by an intensive circuit simulation.<>
{"title":"30-ns 55-b shared radix 2 division and square root using a self-timed circuit","authors":"G. Matsubara, N. Ide, H. Tago, Seigo Suzuki, N. Goto","doi":"10.1109/ARITH.1995.465371","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465371","url":null,"abstract":"A shared radix 2 division and square root implementation using a self-timed circuit is presented. The same execution time for division and square root is achieved by using an on-the-fly digit decoding and a root multiple generation technique. Most of the hardware is shared, and only several multiplexers are required to exchange a divisor multiple and a root multiple. Moreover, quotient selection logic is accelerated by a new algorithm using a 3-b carry propagation adder. The implementation of the shared division and square root unit is realized by assuming 0.3 /spl mu/m CMOS technology. The wiring capacitance and other parasitic parameters are taken into account. The execution time of floating point 55-b full mantissa division and square root is expected to be less than 30 ns in the worst case of an input vector determined by an intensive circuit simulation.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130935076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465360
T. Hamano, N. Takagi, S. Yajima, F. Preparata
An O(n)-depth polynomial-size combinational circuit algorithm is proposed for n-bit modular exponentiation, i.e., for the computation of "x/sup y/ mod m" for arbitrary integers x, y and m. Represented as n-bit binary integers, within bounds 2/sup n-1//spl les/m<2/sup n/ and 0/spl les/x,y>
提出了一种O(n)深度多项式大小的组合电路算法,用于n位模求幂,即对任意整数x, y, m进行“x/sup y/ mod m”的计算。表示为n位二进制整数,在2/sup n-1//spl les/m>
{"title":"O(n)-depth circuit algorithm for modular exponentiation","authors":"T. Hamano, N. Takagi, S. Yajima, F. Preparata","doi":"10.1109/ARITH.1995.465360","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465360","url":null,"abstract":"An O(n)-depth polynomial-size combinational circuit algorithm is proposed for n-bit modular exponentiation, i.e., for the computation of \"x/sup y/ mod m\" for arbitrary integers x, y and m. Represented as n-bit binary integers, within bounds 2/sup n-1//spl les/m<2/sup n/ and 0/spl les/x,y<m. The algorithm is a generalization of the square-and-multiply method. An obvious implementation of the square-and-multiply method yields a circuit of depth O(nlogn) and size O(n/sup 3/). In the proposed algorithm, the terms x/sup 2/ mod m's for all i's /spl epsiv.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122182316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465357
R. V. Drunen, L. Spaanenburg, P. Lucassen, J. Nijhuis, J. T. Udding
Of the three factors named in Moore's first Law that drive the advance of computational systems, circuit design receives relatively little mention. We introduce here a circuit variety that allows to include accuracy considerations. It is shown that accuracy-drive can be effectively realised and leads to 60% speed improvement. Details are given of a floating-point unit with full hardware support of complex calculations, specifically tailored to speed-up MD-simulations on the GROMACS scientific parallel computer.<>
{"title":"Arithmetic for relative accuracy","authors":"R. V. Drunen, L. Spaanenburg, P. Lucassen, J. Nijhuis, J. T. Udding","doi":"10.1109/ARITH.1995.465357","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465357","url":null,"abstract":"Of the three factors named in Moore's first Law that drive the advance of computational systems, circuit design receives relatively little mention. We introduce here a circuit variety that allows to include accuracy considerations. It is shown that accuracy-drive can be effectively realised and leads to 60% speed improvement. Details are given of a floating-point unit with full hardware support of complex calculations, specifically tailored to speed-up MD-simulations on the GROMACS scientific parallel computer.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116251775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465358
J. Muller, A. Tisserand, Alexandre Scherbyna
We present a new class of number systems, called semi-logarithmic number systems, that constitute a family of various compromises between floating-point and logarithmic number systems. We propose arithmetic algorithms for the semi-logarithmic number systems, and we compare these number systems to the classical floating-point or logarithmic number systems.<>
{"title":"Semi-logarithmic number systems","authors":"J. Muller, A. Tisserand, Alexandre Scherbyna","doi":"10.1109/ARITH.1995.465358","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465358","url":null,"abstract":"We present a new class of number systems, called semi-logarithmic number systems, that constitute a family of various compromises between floating-point and logarithmic number systems. We propose arithmetic algorithms for the semi-logarithmic number systems, and we compare these number systems to the classical floating-point or logarithmic number systems.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115858578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465372
S. Cui, N. Burgess, M. Liebelt, K. Eshraghian
This paper presents a GaAs IEEE floating point standard single precision multiplier. A modified carry save array is used in conjunction with Booth's algorithm to reduce the partial product addition and interconnection. A special rounding technique called Trailing-1's Predictor is used to speed up the final addition and rounding. The combination of the fast arithmetic architecture and compact layout style achieves 4 ns multiplication time with 3.5 W power dissipation at 75/spl deg/C giving 14 mW/MHz. The area is 2.43 mm by 3.77 mm (excluding pads) and uses 28,000 transistors to give a density of 3056 transistors/mm/sup 2/ for 0.8-/spl mu/m GaAs technology.<>
{"title":"A GaAs IEEE floating point standard single precision multiplier","authors":"S. Cui, N. Burgess, M. Liebelt, K. Eshraghian","doi":"10.1109/ARITH.1995.465372","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465372","url":null,"abstract":"This paper presents a GaAs IEEE floating point standard single precision multiplier. A modified carry save array is used in conjunction with Booth's algorithm to reduce the partial product addition and interconnection. A special rounding technique called Trailing-1's Predictor is used to speed up the final addition and rounding. The combination of the fast arithmetic architecture and compact layout style achieves 4 ns multiplication time with 3.5 W power dissipation at 75/spl deg/C giving 14 mW/MHz. The area is 2.43 mm by 3.77 mm (excluding pads) and uses 28,000 transistors to give a density of 3056 transistors/mm/sup 2/ for 0.8-/spl mu/m GaAs technology.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122797459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465356
C. Baumhof
A new vector arithmetic coprocessor MIM XPA3233 with integrated PCI bus interface has been developed in CMOS VLSI technology. The chip performs dot products of vectors with components of the IEEE DOUBLE data format to full accuracy or with only one final rounding. Details on the realisation of the multiplication, accumulation and carry resolution processes are discussed. Performance data and some details about the actual VLSI realisation are presented. Software support for the coprocessor is available in the programming languages PASCAL-XSC and C-XSC or from a special C subroutine library. Programming examples are shown using PASCAL-XSC and C.<>
{"title":"A new VLSI vector arithmetic coprocessor for the PC","authors":"C. Baumhof","doi":"10.1109/ARITH.1995.465356","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465356","url":null,"abstract":"A new vector arithmetic coprocessor MIM XPA3233 with integrated PCI bus interface has been developed in CMOS VLSI technology. The chip performs dot products of vectors with components of the IEEE DOUBLE data format to full accuracy or with only one final rounding. Details on the realisation of the multiplication, accumulation and carry resolution processes are discussed. Performance data and some details about the actual VLSI realisation are presented. Software support for the coprocessor is available in the programming languages PASCAL-XSC and C-XSC or from a special C subroutine library. Programming examples are shown using PASCAL-XSC and C.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128868144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}