Pub Date : 1993-06-29DOI: 10.1109/ARITH.1993.378109
P. Montuschi, L. Ciminiera
Carry-save multipliers require an adder at the last step to convert the carry-sum representation of the most significant half of the result into an irredundant form. A multiplication scheme where by this conversion is performed with a circuit operating in parallel with the carry-save array is presented. The resulting implementation, when a radix-2 adder array is used, produces a result on 2n bits with a delay comparable to that of the multiplier proposed by M.D. Ercegovac and T. Lang (1990). When a radix-4 array is used, the proposed unit is almost twice as fast as units proposed previously.<>
{"title":"n /spl times/ n carry-save multipliers without final addition","authors":"P. Montuschi, L. Ciminiera","doi":"10.1109/ARITH.1993.378109","DOIUrl":"https://doi.org/10.1109/ARITH.1993.378109","url":null,"abstract":"Carry-save multipliers require an adder at the last step to convert the carry-sum representation of the most significant half of the result into an irredundant form. A multiplication scheme where by this conversion is performed with a circuit operating in parallel with the carry-save array is presented. The resulting implementation, when a radix-2 adder array is used, produces a result on 2n bits with a delay comparable to that of the multiplier proposed by M.D. Ercegovac and T. Lang (1990). When a radix-4 array is used, the proposed unit is almost twice as fast as units proposed previously.<<ETX>>","PeriodicalId":414758,"journal":{"name":"Proceedings of IEEE 11th Symposium on Computer Arithmetic","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121095598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-06-29DOI: 10.1109/ARITH.1993.378110
S. Bizzan, G. Jullien, N. Wigley, W. Miller
A finite polynomial ring structure for mapping inner product computations to parallel independent ring computations over 3-b moduli has been introduced by N.M. Wigley et al. (1992). The main algorithmic computation architecture can be implemented using well-established systolic array mapping principles, and a project to construct a Polynomial Ring Engine (PRE) is underway to exploit the VLSI implementation properties of such computations. A semi-systolic architecture for the input and output conversion mappings that are required in the engine is introduced here. It is shown that the entire mappings procedure can be carried out with pipelined six-input logic blocks and small, fast, binary adders. CMOS implementation techniques for the pipelined blocks are discussed, and the design procedure is illustrated with results from a recently completed module generator.<>
{"title":"Integer mapping architectures for the polynomial ring engine","authors":"S. Bizzan, G. Jullien, N. Wigley, W. Miller","doi":"10.1109/ARITH.1993.378110","DOIUrl":"https://doi.org/10.1109/ARITH.1993.378110","url":null,"abstract":"A finite polynomial ring structure for mapping inner product computations to parallel independent ring computations over 3-b moduli has been introduced by N.M. Wigley et al. (1992). The main algorithmic computation architecture can be implemented using well-established systolic array mapping principles, and a project to construct a Polynomial Ring Engine (PRE) is underway to exploit the VLSI implementation properties of such computations. A semi-systolic architecture for the input and output conversion mappings that are required in the engine is introduced here. It is shown that the entire mappings procedure can be carried out with pipelined six-input logic blocks and small, fast, binary adders. CMOS implementation techniques for the pipelined blocks are discussed, and the design procedure is illustrated with results from a recently completed module generator.<<ETX>>","PeriodicalId":414758,"journal":{"name":"Proceedings of IEEE 11th Symposium on Computer Arithmetic","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129549343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-06-29DOI: 10.1109/ARITH.1993.378086
Mohand Ourabah Benouamer, P. Jaillon, D. Michelucci, J. Moreau
Systems based on exact arithmetic are very slow. In practical situations, very few computations need be performed exactly as approximating the results is very often sufficient. Unfortunately, it is impossible to know at the time when the computation is called for whether an exact evaluation will be necessary or not. The arithmetic library presented here achieves laziness by postponing any exact computation until it is proved to be indispensable. This yields very substantial gains in performance while allowing exact decisions. The lazy arithmetic techniques are presented in the context of rational computations, using the field of computational geometry as a background.<>
{"title":"A lazy exact arithmetic","authors":"Mohand Ourabah Benouamer, P. Jaillon, D. Michelucci, J. Moreau","doi":"10.1109/ARITH.1993.378086","DOIUrl":"https://doi.org/10.1109/ARITH.1993.378086","url":null,"abstract":"Systems based on exact arithmetic are very slow. In practical situations, very few computations need be performed exactly as approximating the results is very often sufficient. Unfortunately, it is impossible to know at the time when the computation is called for whether an exact evaluation will be necessary or not. The arithmetic library presented here achieves laziness by postponing any exact computation until it is proved to be indispensable. This yields very substantial gains in performance while allowing exact decisions. The lazy arithmetic techniques are presented in the context of rational computations, using the field of computational geometry as a background.<<ETX>>","PeriodicalId":414758,"journal":{"name":"Proceedings of IEEE 11th Symposium on Computer Arithmetic","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129576051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-06-29DOI: 10.1109/ARITH.1993.378092
D. Etiemble, K. Navi
Algorithms for the sum of two (three and four) digits in the binary stored-carry number system, using the smallest set of values for the positional sum, are presented. The corresponding adders, which use multivalued current-mode circuits, are also presented. The implementation of multioperand additions using these adders is compared with the usual binary implementation.<>
{"title":"Algorithms and multi-valued circuits for the multioperand addition in the binary stored-carry number system","authors":"D. Etiemble, K. Navi","doi":"10.1109/ARITH.1993.378092","DOIUrl":"https://doi.org/10.1109/ARITH.1993.378092","url":null,"abstract":"Algorithms for the sum of two (three and four) digits in the binary stored-carry number system, using the smallest set of values for the positional sum, are presented. The corresponding adders, which use multivalued current-mode circuits, are also presented. The implementation of multioperand additions using these adders is compared with the usual binary implementation.<<ETX>>","PeriodicalId":414758,"journal":{"name":"Proceedings of IEEE 11th Symposium on Computer Arithmetic","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129604084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-06-29DOI: 10.1109/ARITH.1993.378094
T. Jebelean
The execution times of several algorithms for computing the GCD of arbitrary precision integers are compared. These algorithms are the known ones (Euclidean, binary, plus-minus), and the improved variants of these for multidigit computation (Lehmer and similar), as well as new algorithms introduced by the author: an improved Lehmer algorithm using two digits in partial consequence computation, and a generation of the binary algorithm using a new concept of modular conjugates. The last two algorithms prove to be the fastest of all, giving a speedup of six to eight times over the classical Euclidean scheme, and two times over the best currently known algorithms. Also, the generalized binary algorithm is suitable for systolic parallelization in a least-significant digits first pipelined manner.<>
{"title":"Comparing several GCD algorithms","authors":"T. Jebelean","doi":"10.1109/ARITH.1993.378094","DOIUrl":"https://doi.org/10.1109/ARITH.1993.378094","url":null,"abstract":"The execution times of several algorithms for computing the GCD of arbitrary precision integers are compared. These algorithms are the known ones (Euclidean, binary, plus-minus), and the improved variants of these for multidigit computation (Lehmer and similar), as well as new algorithms introduced by the author: an improved Lehmer algorithm using two digits in partial consequence computation, and a generation of the binary algorithm using a new concept of modular conjugates. The last two algorithms prove to be the fastest of all, giving a speedup of six to eight times over the classical Euclidean scheme, and two times over the best currently known algorithms. Also, the generalized binary algorithm is suitable for systolic parallelization in a least-significant digits first pipelined manner.<<ETX>>","PeriodicalId":414758,"journal":{"name":"Proceedings of IEEE 11th Symposium on Computer Arithmetic","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129902030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-06-29DOI: 10.1109/ARITH.1993.378091
M. E. Louie, M. Ercegovac
The flexibility of field programmable gate arrays (FPGAs) can provide arithmetic-intensive programs with the benefits of custom hardware but without the high cost of custom silicon implementations. Efficient mappings are key to fast arithmetic implementations on FPGAs. A process for developing such mappings with lookup table based FPGAs is explored. The development process is illustrated with SRT division and the Xilinx XC4010 FPGA. With this mapping process a linear sequential array design that avoids the common problem of large fanout delay in the critical path is created. This approach has a cycle time that is independent of precision, yet it requires approximately the same number of logic blocks as a conventional implementation.<>
{"title":"On digit-recurrence division implementations for field programmable gate arrays","authors":"M. E. Louie, M. Ercegovac","doi":"10.1109/ARITH.1993.378091","DOIUrl":"https://doi.org/10.1109/ARITH.1993.378091","url":null,"abstract":"The flexibility of field programmable gate arrays (FPGAs) can provide arithmetic-intensive programs with the benefits of custom hardware but without the high cost of custom silicon implementations. Efficient mappings are key to fast arithmetic implementations on FPGAs. A process for developing such mappings with lookup table based FPGAs is explored. The development process is illustrated with SRT division and the Xilinx XC4010 FPGA. With this mapping process a linear sequential array design that avoids the common problem of large fanout delay in the critical path is created. This approach has a cycle time that is independent of precision, yet it requires approximately the same number of logic blocks as a conventional implementation.<<ETX>>","PeriodicalId":414758,"journal":{"name":"Proceedings of IEEE 11th Symposium on Computer Arithmetic","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132571297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-06-29DOI: 10.1109/ARITH.1993.378096
W. S. Briggs, D. Matula
The authors describe a numeric processor with a kernel that is a tree of redundant binary adders and effects either a 17 /spl times/ 69-b multiply-and-add or a 19 /spl times/ 69-b multiply with exact redundant binary output and single cycle latency. Feedback paths selectively allow a high-order or low-order part of the adder tree output to be fed back in redundant binary form to the multiplicand and/or addend inputs to the adder tree. The authors describe algorithms iteratively using this adder tree kernel for IEEE double extended multiplication, division, and square root; conversions between 18-digit BCD integers and 64-b binary integers; and transcendental function evaluation. The multiplier design described was implemented in the Cyrix 83D87 numeric coprocessor (typically 33 MHz). Results for this coprocessor as compared with competitive x87 units are included.<>
{"title":"A 17 /spl times/ 69 bit multiply and add unit with redundant binary feedback and single cycle latency","authors":"W. S. Briggs, D. Matula","doi":"10.1109/ARITH.1993.378096","DOIUrl":"https://doi.org/10.1109/ARITH.1993.378096","url":null,"abstract":"The authors describe a numeric processor with a kernel that is a tree of redundant binary adders and effects either a 17 /spl times/ 69-b multiply-and-add or a 19 /spl times/ 69-b multiply with exact redundant binary output and single cycle latency. Feedback paths selectively allow a high-order or low-order part of the adder tree output to be fed back in redundant binary form to the multiplicand and/or addend inputs to the adder tree. The authors describe algorithms iteratively using this adder tree kernel for IEEE double extended multiplication, division, and square root; conversions between 18-digit BCD integers and 64-b binary integers; and transcendental function evaluation. The multiplier design described was implemented in the Cyrix 83D87 numeric coprocessor (typically 33 MHz). Results for this coprocessor as compared with competitive x87 units are included.<<ETX>>","PeriodicalId":414758,"journal":{"name":"Proceedings of IEEE 11th Symposium on Computer Arithmetic","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130602140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-06-29DOI: 10.1109/ARITH.1993.378111
B. Kirsch, P. Turner
The adaptive beamforming problem is solved using an algorithm-architecture-arithmetic combination that can be used for a small platform such as are found on aircraft or sonobuoys. The arithmetic used is the RNS system implemented on an array of processors that can be reassigned as the algorithm proceeds. The underlying algorithm is a modified Gaussian elimination. The (non-RNS) division operations are eliminated in favor of some scaling and the adaptive use of the processor array to accommodate the growth in dynamic range.<>
{"title":"Adaptive beamforming using RNS arithmetic","authors":"B. Kirsch, P. Turner","doi":"10.1109/ARITH.1993.378111","DOIUrl":"https://doi.org/10.1109/ARITH.1993.378111","url":null,"abstract":"The adaptive beamforming problem is solved using an algorithm-architecture-arithmetic combination that can be used for a small platform such as are found on aircraft or sonobuoys. The arithmetic used is the RNS system implemented on an array of processors that can be reassigned as the algorithm proceeds. The underlying algorithm is a modified Gaussian elimination. The (non-RNS) division operations are eliminated in favor of some scaling and the adaptive use of the processor array to accommodate the growth in dynamic range.<<ETX>>","PeriodicalId":414758,"journal":{"name":"Proceedings of IEEE 11th Symposium on Computer Arithmetic","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116040770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-06-29DOI: 10.1109/ARITH.1993.378089
Stephen Richardson
The notion of trivial computation, in which the appearance of simple operands renders potentially complex operations simple, is discussed. An example of a trivial operation is integer division, where the divisor is two; the division becomes a simple shift operation. The concept of redundant computation, in which some operation repeatedly does the same function because it repeatedly sees the same operands, is also discussed. Experiments on two separate benchmark suites, the SPEC benchmarks and the Perfect Club, find a surprising amount of trivial and redundant operation. Various architectural means of exploiting this knowledge to improve computational efficiency include detection of trivial operands and the result cache. Further experimentation shows significant speedup from these techniques, as measured on three different styles of machine architecture.<>
{"title":"Exploiting trivial and redundant computation","authors":"Stephen Richardson","doi":"10.1109/ARITH.1993.378089","DOIUrl":"https://doi.org/10.1109/ARITH.1993.378089","url":null,"abstract":"The notion of trivial computation, in which the appearance of simple operands renders potentially complex operations simple, is discussed. An example of a trivial operation is integer division, where the divisor is two; the division becomes a simple shift operation. The concept of redundant computation, in which some operation repeatedly does the same function because it repeatedly sees the same operands, is also discussed. Experiments on two separate benchmark suites, the SPEC benchmarks and the Perfect Club, find a surprising amount of trivial and redundant operation. Various architectural means of exploiting this knowledge to improve computational efficiency include detection of trivial operands and the result cache. Further experimentation shows significant speedup from these techniques, as measured on three different styles of machine architecture.<<ETX>>","PeriodicalId":414758,"journal":{"name":"Proceedings of IEEE 11th Symposium on Computer Arithmetic","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129188062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-06-29DOI: 10.1109/ARITH.1993.378112
W. Jenkins, B. Schnaufer, A. Mansen
This paper proposes combining system-level modular redundancy with the arithmetic modularity of residue number system (RNS) arithmetic to achieve fault tolerance in high speed digital signal processing (DSP) systems. Double, triple, and quadruple modular redundancy are combined with RNS modularity for realizing important DSP computational kernels. The discussion includes the development of the serial-by-modulus (SBM) RNS architecture in which residue digits are processed sequentially in circuits that handle only one modular operation at a given time, thereby sacrificing speed for circuit simplicity. As a potential application of the SBM concept, a variable-word-length sum-of-products signal processing kernel is developed based on a serial-by-modulus RNS architecture. Because the RNS is not a weighted number representation, if the instantaneous dynamic range requirement can be estimated it may be possible to perform the computation with only enough residue digits to provide the necessary dynamic range.<>
{"title":"Combined system-level redundancy and modular arithmetic for fault tolerant digital signal processing","authors":"W. Jenkins, B. Schnaufer, A. Mansen","doi":"10.1109/ARITH.1993.378112","DOIUrl":"https://doi.org/10.1109/ARITH.1993.378112","url":null,"abstract":"This paper proposes combining system-level modular redundancy with the arithmetic modularity of residue number system (RNS) arithmetic to achieve fault tolerance in high speed digital signal processing (DSP) systems. Double, triple, and quadruple modular redundancy are combined with RNS modularity for realizing important DSP computational kernels. The discussion includes the development of the serial-by-modulus (SBM) RNS architecture in which residue digits are processed sequentially in circuits that handle only one modular operation at a given time, thereby sacrificing speed for circuit simplicity. As a potential application of the SBM concept, a variable-word-length sum-of-products signal processing kernel is developed based on a serial-by-modulus RNS architecture. Because the RNS is not a weighted number representation, if the instantaneous dynamic range requirement can be estimated it may be possible to perform the computation with only enough residue digits to provide the necessary dynamic range.<<ETX>>","PeriodicalId":414758,"journal":{"name":"Proceedings of IEEE 11th Symposium on Computer Arithmetic","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126925206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}