Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465363
J. Prabhu, G. Zyner
UltraSPARC's IEEE-754 compliant floating point divide and square root implementation is presented. Three overlapping stages of SRT radix-2 quotient selection logic enable an effective radix-8 calculation at 167 MHz while only a single radix-2 quotient selection logic delay is seen in the critical path. Speculative partial remainder and quotient calculation in the main datapath also improves cycle time. The quotient selection logic is slightly modified to prevent the formation of a negative partial remainder for exact results. This saves latency and hardware as the partial remainder no longer needs to be restored before calculating the sticky bit for rounding.<>
{"title":"167 MHz radix-8 divide and square root using overlapped radix-2 stages","authors":"J. Prabhu, G. Zyner","doi":"10.1109/ARITH.1995.465363","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465363","url":null,"abstract":"UltraSPARC's IEEE-754 compliant floating point divide and square root implementation is presented. Three overlapping stages of SRT radix-2 quotient selection logic enable an effective radix-8 calculation at 167 MHz while only a single radix-2 quotient selection logic delay is seen in the critical path. Speculative partial remainder and quotient calculation in the main datapath also improves cycle time. The quotient selection logic is slightly modified to prevent the formation of a negative partial remainder for exact results. This saves latency and hardware as the partial remainder no longer needs to be restored before calculating the sticky bit for rounding.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127302787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465382
Hannes Hassler, N. Takagi
We describe a general approach decomposing a function into a sum of functions, each with a smaller input site than the original. Hence we can map such functions with essentially the same precision using small ROM tables and adders. We derive an easy method to compute the worst case error for many elementary functions and an error bound for the rest. Important applications are reciprocals, logarithms, exponentials and others.<>
{"title":"Function evaluation by table look-up and addition","authors":"Hannes Hassler, N. Takagi","doi":"10.1109/ARITH.1995.465382","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465382","url":null,"abstract":"We describe a general approach decomposing a function into a sum of functions, each with a smaller input site than the original. Hence we can map such functions with essentially the same precision using small ROM tables and adders. We derive an easy method to compute the worst case error for many elementary functions and an error bound for the rest. Important applications are reciprocals, logarithms, exponentials and others.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134391918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465376
M. Ercegovac, T. Lang
We present an approach to reducing the average number of signal transitions (T,,) in the design of sign-detection and comparison of magnitudes. Our approach reduces T/sub av/ from 21n/8 (n-operand precision in bits) to 4.5 in the case of iterative implementation, and from about n to roughly k+n/2/sup k-1/ in the tree network implemented with k-bit modules. We also discuss comparison of small numbers. The approach is applicable to other arithmetic problems.<>
{"title":"Sign detection and comparison networks with a small number of transitions","authors":"M. Ercegovac, T. Lang","doi":"10.1109/ARITH.1995.465376","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465376","url":null,"abstract":"We present an approach to reducing the average number of signal transitions (T,,) in the design of sign-detection and comparison of magnitudes. Our approach reduces T/sub av/ from 21n/8 (n-operand precision in bits) to 4.5 in the case of iterative implementation, and from about n to roughly k+n/2/sup k-1/ in the tree network implemented with k-bit modules. We also discuss comparison of small numbers. The approach is applicable to other arithmetic problems.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127624361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465355
W. Ferguson
Results are presented that identify when the computed value of a sum or difference is exact. The accuracy of an argument reduction algorithm is analyzed using these results. This analysis demonstrates that catastrophic cancellation does not occur in this algorithm's computation of the reduced argument.<>
{"title":"Exact computation of a sum or difference with applications to argument reduction","authors":"W. Ferguson","doi":"10.1109/ARITH.1995.465355","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465355","url":null,"abstract":"Results are presented that identify when the computed value of a sum or difference is exact. The accuracy of an argument reduction algorithm is analyzed using these results. This analysis demonstrates that catastrophic cancellation does not occur in this algorithm's computation of the reduced argument.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125177828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465373
Belle W. Y. Wei, He Du, Honglu Chen
This paper describes the design of a 16/spl times/16 complex-number multiplier developed as part of the arithmetic datapath of a complex-number digital signal processor. The complex-number multiplier internally uses binary signed digits for fast multiplication and compact layout. It employs the traditional three-multiplication scheme while minimizing the logic and delay associated with the three extra pre-multiplication binary additions which that scheme requires. The minimization comes from producing the redundant binary sum for each of the pre-multiplication binary additions with minimal hardware, and then recoding the redundant sums as radix-4 multiplier operands. The radix-4 operands halve the number of summands to be added in each of the three real multiplier units. Furthermore, an additional factor of two reduction in the number of summands is effectuated by our coding scheme for representing binary signed digits. The result is a fast and compact complex-number multiplier.<>
{"title":"A complex-number multiplier using radix-4 digits","authors":"Belle W. Y. Wei, He Du, Honglu Chen","doi":"10.1109/ARITH.1995.465373","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465373","url":null,"abstract":"This paper describes the design of a 16/spl times/16 complex-number multiplier developed as part of the arithmetic datapath of a complex-number digital signal processor. The complex-number multiplier internally uses binary signed digits for fast multiplication and compact layout. It employs the traditional three-multiplication scheme while minimizing the logic and delay associated with the three extra pre-multiplication binary additions which that scheme requires. The minimization comes from producing the redundant binary sum for each of the pre-multiplication binary additions with minimal hardware, and then recoding the redundant sums as radix-4 multiplier operands. The radix-4 operands halve the number of summands to be added in each of the three real multiplier units. Furthermore, an additional factor of two reduction in the number of summands is effectuated by our coding scheme for representing binary signed digits. The result is a fast and compact complex-number multiplier.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130775753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465369
H. Kwan, R.L. Nelson, E. Swartzlander
We present an unconventional method of computing the inverse of the square root. It implements the equivalent of two iterations of a well-known multiplicative method to obtain 24-bit mantissa accuracy. We implement each "iteration" as a separate logic module and exploit knowledge about the relative error during computation. To reduce the size of the implementation. We use overflow lookahead logic to facilitate the exponent computations. No division is required in the entire process. Examples and error analysis are given.<>
{"title":"Cascaded implementation of an iterative inverse-square-root algorithm, with overflow lookahead","authors":"H. Kwan, R.L. Nelson, E. Swartzlander","doi":"10.1109/ARITH.1995.465369","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465369","url":null,"abstract":"We present an unconventional method of computing the inverse of the square root. It implements the equivalent of two iterations of a well-known multiplicative method to obtain 24-bit mantissa accuracy. We implement each \"iteration\" as a separate logic module and exploit knowledge about the relative error during computation. To reduce the size of the implementation. We use overflow lookahead logic to facilitate the exponent computations. No division is required in the entire process. Examples and error analysis are given.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115056893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465378
C. Martel, V. Oklobdzija, R. Ravi, P. Stelling
We present new design and analysis techniques for the synthesis of fast parallel multiplier circuits. V.G. Oklobdzija, D. Villeger, and S.S. Lui (1995) suggested a new approach, the three dimensional method (TDM), for partial product reduction tree (PPRT) design that produces multipliers which outperform the current best designs. The goal of TDM is to produce a minimum delay PPRT using full adders. This is done by carefully modelling the relationship of the output delays to the input delays an an adder, and then interconnecting the adders in a globally optimal way. Oklobdzija, et. al. suggested a good heuristic for finding the optimal PPRT, but no proofs about the performance of this heuristic were given. We provide a formal characterization of optimal PPRT circuits and prove a number of properties about them. For the problem of summing a set of input bits within the minimum delay, we present an algorithm that produces a minimum delay circuit in time linear in the size of the inputs. Our techniques allow us to prove tight lower bounds on multiplier circuit delays. These results are combined to create a program which finds optimal TDM multiplier designs.<>
我们提出了新的设计和分析技术,用于快速并联乘法器电路的合成。V.G. Oklobdzija, D. Villeger和S.S. Lui(1995)提出了一种新的方法,三维方法(TDM),用于部分产品简化树(PPRT)设计,产生优于当前最佳设计的乘数。TDM的目标是使用全加法器产生最小延迟PPRT。这是通过仔细建模输出延迟与输入延迟和加法器之间的关系,然后以全局最优的方式将加法器互连来完成的。Oklobdzija等人提出了一种寻找最佳PPRT的良好启发式方法,但没有给出关于该启发式方法性能的证明。我们提供了最优PPRT电路的形式化表征,并证明了它们的一些性质。对于在最小延迟内对一组输入位求和的问题,我们提出了一种算法,该算法产生的最小延迟电路与输入的大小呈时间线性关系。我们的技术允许我们证明乘法器电路延迟的严格下界。将这些结果结合起来创建一个程序,以找到最佳的时分复用乘法器设计。
{"title":"Design strategies for optimal multiplier circuits","authors":"C. Martel, V. Oklobdzija, R. Ravi, P. Stelling","doi":"10.1109/ARITH.1995.465378","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465378","url":null,"abstract":"We present new design and analysis techniques for the synthesis of fast parallel multiplier circuits. V.G. Oklobdzija, D. Villeger, and S.S. Lui (1995) suggested a new approach, the three dimensional method (TDM), for partial product reduction tree (PPRT) design that produces multipliers which outperform the current best designs. The goal of TDM is to produce a minimum delay PPRT using full adders. This is done by carefully modelling the relationship of the output delays to the input delays an an adder, and then interconnecting the adders in a globally optimal way. Oklobdzija, et. al. suggested a good heuristic for finding the optimal PPRT, but no proofs about the performance of this heuristic were given. We provide a formal characterization of optimal PPRT circuits and prove a number of properties about them. For the problem of summing a set of input bits within the minimum delay, we present an algorithm that produces a minimum delay circuit in time linear in the size of the inputs. Our techniques allow us to prove tight lower bounds on multiplier circuit delays. These results are combined to create a program which finds optimal TDM multiplier designs.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115781140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465379
R. Owens, R. Bajwa, M. J. Irwin
In this paper we consider the problem of multiplying reasonably small integers using fewer counters than that required by straightforward partial product accumulation. Not surprisingly the method we use is based on the observation that integer multiplication can be formulated as aperiodic convolution. However, instead of using something like the Fast Fourier Transform to compute the aperiodic convolution, we use what are known as a "fast" convolution algorithms. In this way we can construct multipliers for as small as eighteen bit integers which use fewer counters than that required by straightforward partial product accumulation. Because of the perceived "overhead" involved with an aperiodic formulation of integer multiplication, the ability to do this goes somewhat against the conventional wisdom that aperiodic formulation of integer multiplication gains an advantage over a straightforward partial product formulation only for fairly large integers.<>
{"title":"Reducing the number of counters needed for integer multiplication","authors":"R. Owens, R. Bajwa, M. J. Irwin","doi":"10.1109/ARITH.1995.465379","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465379","url":null,"abstract":"In this paper we consider the problem of multiplying reasonably small integers using fewer counters than that required by straightforward partial product accumulation. Not surprisingly the method we use is based on the observation that integer multiplication can be formulated as aperiodic convolution. However, instead of using something like the Fast Fourier Transform to compute the aperiodic convolution, we use what are known as a \"fast\" convolution algorithms. In this way we can construct multipliers for as small as eighteen bit integers which use fewer counters than that required by straightforward partial product accumulation. Because of the perceived \"overhead\" involved with an aperiodic formulation of integer multiplication, the ability to do this goes somewhat against the conventional wisdom that aperiodic formulation of integer multiplication gains an advantage over a straightforward partial product formulation only for fairly large integers.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114086562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465383
Masayuki Ito, N. Takagi, S. Yajima
Efficient initial approximations and fast converging algorithms are important to achieve the desired precision faster at lower hardware cost in multiplicative division and square root. In this paper, a new initial approximation method for division, an accelerated higher order converging division algorithm, and a new square root algorithm are proposed. They are all suitable for implementation on an arithmetic unit where one multiply-accumulate operation, can be executed in one cycle. In the case of division, the combination of our initial approximation method and our converging algorithm, enables a single iteration of the converging algorithm to produce double-precision quotients. Our new square root algorithm can form, double-precision square roots faster using smaller look-up tables than the Newton-Raphson method.<>
{"title":"Efficient initial approximation and fast converging methods for division and square root","authors":"Masayuki Ito, N. Takagi, S. Yajima","doi":"10.1109/ARITH.1995.465383","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465383","url":null,"abstract":"Efficient initial approximations and fast converging algorithms are important to achieve the desired precision faster at lower hardware cost in multiplicative division and square root. In this paper, a new initial approximation method for division, an accelerated higher order converging division algorithm, and a new square root algorithm are proposed. They are all suitable for implementation on an arithmetic unit where one multiply-accumulate operation, can be executed in one cycle. In the case of division, the combination of our initial approximation method and our converging algorithm, enables a single iteration of the converging algorithm to produce double-precision quotients. Our new square root algorithm can form, double-precision square roots faster using smaller look-up tables than the Newton-Raphson method.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121784436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-07-19DOI: 10.1109/ARITH.1995.465380
H. Bederr, M. Nicolaidis, A. Guyot
Several systematic design approaches are known to be representatives of the techniques well adapted for testing sequential circuits (partial and full scan, LSSD...). However in some cases, like for the test of on-line operators, ad-hoc DFT (design for testability) schemes become more suitable. Indeed, on-line arithmetic are used for high precision numbers resulting on high length operators. Thus the length of a test sequence for a scan design approach can grow quite large due to the shift in (shift out) of test values (test responses) and therefore the test application time would become prohibitive. Moreover, the arithmetic nature of these operators imply that some errors detected locally are masked before their observation at the primary outputs. In this paper we describe an analytic approach for testing on-line multipliers that allows to avoid error masking without adding extra hardware for internal state observability while maintaining a 100% fault coverage. Compared to a DFT approach using parity trees, this method leads to a reduction of the area overhead from 7% to 1% and of the extra pins count from 6 to 3 in the case of the on-line multipliers considered in this paper.<>
{"title":"Analytic approach for error masking elimination in on-line multipliers","authors":"H. Bederr, M. Nicolaidis, A. Guyot","doi":"10.1109/ARITH.1995.465380","DOIUrl":"https://doi.org/10.1109/ARITH.1995.465380","url":null,"abstract":"Several systematic design approaches are known to be representatives of the techniques well adapted for testing sequential circuits (partial and full scan, LSSD...). However in some cases, like for the test of on-line operators, ad-hoc DFT (design for testability) schemes become more suitable. Indeed, on-line arithmetic are used for high precision numbers resulting on high length operators. Thus the length of a test sequence for a scan design approach can grow quite large due to the shift in (shift out) of test values (test responses) and therefore the test application time would become prohibitive. Moreover, the arithmetic nature of these operators imply that some errors detected locally are masked before their observation at the primary outputs. In this paper we describe an analytic approach for testing on-line multipliers that allows to avoid error masking without adding extra hardware for internal state observability while maintaining a 100% fault coverage. Compared to a DFT approach using parity trees, this method leads to a reduction of the area overhead from 7% to 1% and of the extra pins count from 6 to 3 in the case of the on-line multipliers considered in this paper.<<ETX>>","PeriodicalId":332829,"journal":{"name":"Proceedings of the 12th Symposium on Computer Arithmetic","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1995-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134513532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}