Pub Date : 2018-02-08DOI: 10.1109/ARITH.1987.6158694
J. Demmel
Recently Clenshaw/Olver and Iri/Matsui proposed new floating point arithmetics which seek to eliminate overflows and underflows from most computations. Their common approach is to redistribute the available numbers to spread out the largest and smallest numbers much more thinly than in standard floating point, thus achieving a larger range at the cost of lower precision at the ends of the range. The goal of these arithmetics is to eliminate much of the effort needed to write code which is reliable despite over/under flow. In this paper we argue that for many codes this eliminated effort will reappear in the error analyses needed to ascertain or guarantee the accuracy of the computed solution. Thus reliability with respect to over/under flow has been traded for reliability with respect to roundoff. We also propose a hardware flag, analogous to the “sticky flags” of the IEEE binary floating point standard, to do some of this extra error analysis automatically.
{"title":"On error analysis in arithmetic with varying relative precision","authors":"J. Demmel","doi":"10.1109/ARITH.1987.6158694","DOIUrl":"https://doi.org/10.1109/ARITH.1987.6158694","url":null,"abstract":"Recently Clenshaw/Olver and Iri/Matsui proposed new floating point arithmetics which seek to eliminate overflows and underflows from most computations. Their common approach is to redistribute the available numbers to spread out the largest and smallest numbers much more thinly than in standard floating point, thus achieving a larger range at the cost of lower precision at the ends of the range. The goal of these arithmetics is to eliminate much of the effort needed to write code which is reliable despite over/under flow. In this paper we argue that for many codes this eliminated effort will reappear in the error analyses needed to ascertain or guarantee the accuracy of the computed solution. Thus reliability with respect to over/under flow has been traded for reliability with respect to roundoff. We also propose a hardware flag, analogous to the “sticky flags” of the IEEE binary floating point standard, to do some of this extra error analysis automatically.","PeriodicalId":424620,"journal":{"name":"1987 IEEE 8th Symposium on Computer Arithmetic (ARITH)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126671420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-05-18DOI: 10.1109/ARITH.1987.6158700
B. Hochet, P. Quinton, Y. Robert
We propose two systolic architectures for the Gaussian triangularization and the Gauss-Jordan diagonalization of large dense nxn matrices over GF(p), where p is a prime number. The solution of large dense linear systems over GF(p) is the major computational step in various algorithms issued from arithmetic number theory and computer algebra. The two proposed architectures implement the elimination with partial pivoting, although the operation of the array remains purely systolic. The last section is devoted to the design and layout of a CMOS 8 by 8 Gauss-Jordan diagonalization systolic chip over GF(2).
{"title":"Systolic solution of linear systems over GF(p) with partial pivoting","authors":"B. Hochet, P. Quinton, Y. Robert","doi":"10.1109/ARITH.1987.6158700","DOIUrl":"https://doi.org/10.1109/ARITH.1987.6158700","url":null,"abstract":"We propose two systolic architectures for the Gaussian triangularization and the Gauss-Jordan diagonalization of large dense nxn matrices over GF(p), where p is a prime number. The solution of large dense linear systems over GF(p) is the major computational step in various algorithms issued from arithmetic number theory and computer algebra. The two proposed architectures implement the elimination with partial pivoting, although the operation of the array remains purely systolic. The last section is devoted to the design and layout of a CMOS 8 by 8 Gauss-Jordan diagonalization systolic chip over GF(2).","PeriodicalId":424620,"journal":{"name":"1987 IEEE 8th Symposium on Computer Arithmetic (ARITH)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124791385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-05-18DOI: 10.1109/ARITH.1987.6158720
H. Umeo
We present an optimum bit-parallel/word-sequential systolic convolver. Our design is the best one among the previous many convolvers in the sense that its optimality in time and space performances is simultaneously attained without augmenting any global control, broadcasting, preloading, and/or multi sequential or parallel I/O ports, which were allowed in most of the previous designs. As an application of our convolver we give a systolic polynomial divider which can compute the polynomial division in exactly n + 0(1) steps on [min (n−m, m)/2] + 0(1) systolic cells, for the division of any degree n polynomial by any degree m polynomial(n ≧ m).
{"title":"A design of time-optimum and register-number-minimum systolic convolver","authors":"H. Umeo","doi":"10.1109/ARITH.1987.6158720","DOIUrl":"https://doi.org/10.1109/ARITH.1987.6158720","url":null,"abstract":"We present an optimum bit-parallel/word-sequential systolic convolver. Our design is the best one among the previous many convolvers in the sense that its optimality in time and space performances is simultaneously attained without augmenting any global control, broadcasting, preloading, and/or multi sequential or parallel I/O ports, which were allowed in most of the previous designs. As an application of our convolver we give a systolic polynomial divider which can compute the polynomial division in exactly n + 0(1) steps on [min (n−m, m)/2] + 0(1) systolic cells, for the division of any degree n polynomial by any degree m polynomial(n ≧ m).","PeriodicalId":424620,"journal":{"name":"1987 IEEE 8th Symposium on Computer Arithmetic (ARITH)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126319038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-05-18DOI: 10.1109/ARITH.1987.6158705
Peter Kornerup, D. Matula
We describe a binary implementation of an algorithm of Gosper to compute the sum, difference, product, quotient and certain rational functions of two rational operands applicable to integrated approximate and exact rational computation. The arithmetic unit we propose is an eight register computation cell with bit serial input and output employing the binary lexicographic continued fraction (LCF) representation of the rational operands. The operands and results are processed in a most-significant-bit first on-line fashion with bit level logic leading to less delay in the computation cell when compared to operation on the full partial quotients of the standard continued fraction representation. Minimization of delay is investigated with the aim of supporting greater throughput in cascaded parallel computation with such computation cells.
{"title":"A bit-serial arithmetic unit for rational arithmetic","authors":"Peter Kornerup, D. Matula","doi":"10.1109/ARITH.1987.6158705","DOIUrl":"https://doi.org/10.1109/ARITH.1987.6158705","url":null,"abstract":"We describe a binary implementation of an algorithm of Gosper to compute the sum, difference, product, quotient and certain rational functions of two rational operands applicable to integrated approximate and exact rational computation. The arithmetic unit we propose is an eight register computation cell with bit serial input and output employing the binary lexicographic continued fraction (LCF) representation of the rational operands. The operands and results are processed in a most-significant-bit first on-line fashion with bit level logic leading to less delay in the computation cell when compared to operation on the full partial quotients of the standard continued fraction representation. Minimization of delay is investigated with the aim of supporting greater throughput in cascaded parallel computation with such computation cells.","PeriodicalId":424620,"journal":{"name":"1987 IEEE 8th Symposium on Computer Arithmetic (ARITH)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125808367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-05-18DOI: 10.1109/ARITH.1987.6158691
M. Cosnard, A. Guyot, B. Hochet, J. Muller, H. Ouaouicha, P. Paul, E. Zysman
We describe a general VLSI architecture for the computation of arithmetic expressions including floating-point trancendental functions. This architecture is divided in three parts: a communication machine, the control part of a computation machine and the operative part of this computation machine. In order to compute the most usual trancendental functions, we introduced some general algorithms, presented briefly here, including as a particular case the CORDIC scheme. Our major architecture goals were regularity, parametrization and automatic design. The final chip is designed in a 2-Alu CMOS technology, and its name is FELIN (“Fonctions ELémentaires INtégrées is the french for integrated elementary functions”). This work was supported in part by the GRECO C3 and the GCIS of the French CNRS.
{"title":"The FELIN arithmetic coprocessor chip","authors":"M. Cosnard, A. Guyot, B. Hochet, J. Muller, H. Ouaouicha, P. Paul, E. Zysman","doi":"10.1109/ARITH.1987.6158691","DOIUrl":"https://doi.org/10.1109/ARITH.1987.6158691","url":null,"abstract":"We describe a general VLSI architecture for the computation of arithmetic expressions including floating-point trancendental functions. This architecture is divided in three parts: a communication machine, the control part of a computation machine and the operative part of this computation machine. In order to compute the most usual trancendental functions, we introduced some general algorithms, presented briefly here, including as a particular case the CORDIC scheme. Our major architecture goals were regularity, parametrization and automatic design. The final chip is designed in a 2-Alu CMOS technology, and its name is FELIN (“Fonctions ELémentaires INtégrées is the french for integrated elementary functions”). This work was supported in part by the GRECO C3 and the GCIS of the French CNRS.","PeriodicalId":424620,"journal":{"name":"1987 IEEE 8th Symposium on Computer Arithmetic (ARITH)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131505275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-05-18DOI: 10.1109/ARITH.1987.6158684
B. K. Bose, Li-fan Pei, G. Taylor, D. Patterson
This paper presents the design of a fast and area-efficient multiply-divide unit used in building a VLSI floating-point processor (FPU), conforming to the IEEE standard 754. Details of the algorithms, implementation techniques and design tradeoffs are presented, The multiplier and divider are implemented in 2 micron CMOS technology with two layers of metal, and occupy 23 square mm (23% of the entire FPU). We expect to perform extended-precision multiplication and division in 1.1 and 2.8 microseconds, respectively.
{"title":"Fast multiply and divide for a VLSI floating-point unit","authors":"B. K. Bose, Li-fan Pei, G. Taylor, D. Patterson","doi":"10.1109/ARITH.1987.6158684","DOIUrl":"https://doi.org/10.1109/ARITH.1987.6158684","url":null,"abstract":"This paper presents the design of a fast and area-efficient multiply-divide unit used in building a VLSI floating-point processor (FPU), conforming to the IEEE standard 754. Details of the algorithms, implementation techniques and design tradeoffs are presented, The multiplier and divider are implemented in 2 micron CMOS technology with two layers of metal, and occupy 23 square mm (23% of the entire FPU). We expect to perform extended-precision multiplication and division in 1.1 and 2.8 microseconds, respectively.","PeriodicalId":424620,"journal":{"name":"1987 IEEE 8th Symposium on Computer Arithmetic (ARITH)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116037034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-05-18DOI: 10.1109/ARITH.1987.6158711
V. Peng, S. Samudrala, M. Gavrielov
Several options for the implementation of combinatorial shifters, multipliers, and dividers for a VLSI floating point unit are presented and compared. The comparisons are made in the context of a single chip implementation in light of the constraints imposed by currently available MOS technology.
{"title":"On the implementation of shifters, multipliers, and dividers in VLSI floating point units","authors":"V. Peng, S. Samudrala, M. Gavrielov","doi":"10.1109/ARITH.1987.6158711","DOIUrl":"https://doi.org/10.1109/ARITH.1987.6158711","url":null,"abstract":"Several options for the implementation of combinatorial shifters, multipliers, and dividers for a VLSI floating point unit are presented and compared. The comparisons are made in the context of a single chip implementation in light of the constraints imposed by currently available MOS technology.","PeriodicalId":424620,"journal":{"name":"1987 IEEE 8th Symposium on Computer Arithmetic (ARITH)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128218308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-05-18DOI: 10.1109/ARITH.1987.6158719
P. Tu, M. Ercegovac
We present an on-line algorithm for radix-4 floating point division. The divisor is first transformed in to a range such that the quotient digits are computed as a function of the scaled partial remainder only.
{"title":"A radix-4 on-line division algorithm","authors":"P. Tu, M. Ercegovac","doi":"10.1109/ARITH.1987.6158719","DOIUrl":"https://doi.org/10.1109/ARITH.1987.6158719","url":null,"abstract":"We present an on-line algorithm for radix-4 floating point division. The divisor is first transformed in to a range such that the quotient digits are computed as a function of the scaled partial remainder only.","PeriodicalId":424620,"journal":{"name":"1987 IEEE 8th Symposium on Computer Arithmetic (ARITH)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129903678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-05-18DOI: 10.1109/ARITH.1987.6158718
Steven G. Smith, P. Denyer
A methodology is presented for synthesis of area-efficient, high-performance VLSI modules for vector and matrix multiplication. Three fundamental computational elements are employed in the composition of these architectures: memory register, multiplexer (1-from-2 data selecter), and carry-save add-shift computer. Two's complement serial/parallel carry-save accumulation provides performance, while the use of symmetric-coded distributed arithmetic eliminates redundant computation to effect area-savings.
{"title":"Synthesis of area-efficient VLSI architectures for vector and matrix multiplication","authors":"Steven G. Smith, P. Denyer","doi":"10.1109/ARITH.1987.6158718","DOIUrl":"https://doi.org/10.1109/ARITH.1987.6158718","url":null,"abstract":"A methodology is presented for synthesis of area-efficient, high-performance VLSI modules for vector and matrix multiplication. Three fundamental computational elements are employed in the composition of these architectures: memory register, multiplexer (1-from-2 data selecter), and carry-save add-shift computer. Two's complement serial/parallel carry-save accumulation provides performance, while the use of symmetric-coded distributed arithmetic eliminates redundant computation to effect area-savings.","PeriodicalId":424620,"journal":{"name":"1987 IEEE 8th Symposium on Computer Arithmetic (ARITH)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116455258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1987-05-18DOI: 10.1109/ARITH.1987.6158714
J. E. Robertson
The properties of Hamming codes for error detection and correction can be extended from the binary parity check to addition, modulo 2r. Malfunctions in hardware during addition, modulo 2r, can be detected and corrected. Since carry-save and signed-digit addition, radix r, are included in addition, modulo 2r, this extension of Hamming codes makes possible new techniques for detection and correction of hardware malfunctions during signed-digit and carry-save addition.
{"title":"Error detection and correction for addition and subtraction, through use of higher radix extensions of hamming codes","authors":"J. E. Robertson","doi":"10.1109/ARITH.1987.6158714","DOIUrl":"https://doi.org/10.1109/ARITH.1987.6158714","url":null,"abstract":"The properties of Hamming codes for error detection and correction can be extended from the binary parity check to addition, modulo 2r. Malfunctions in hardware during addition, modulo 2r, can be detected and corrected. Since carry-save and signed-digit addition, radix r, are included in addition, modulo 2r, this extension of Hamming codes makes possible new techniques for detection and correction of hardware malfunctions during signed-digit and carry-save addition.","PeriodicalId":424620,"journal":{"name":"1987 IEEE 8th Symposium on Computer Arithmetic (ARITH)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116574137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}