This paper presents extensions to Lefevre's algorithm that computes a lower bound on the distance between a segment and a regular grid Zopf2. This algorithm and, in particular, the extensions are useful in the search for worst cases for the exact rounding of unary elementary functions or base-conversion functions. The proof that is presented is simpler and less technical than the original proof. This paper also gives benchmark results with various optimization parameters, explanations of these results, and an application to base conversion
{"title":"New Results on the Distance between a Segment and Z². Application to the Exact Rounding","authors":"V. Lefèvre","doi":"10.1109/ARITH.2005.32","DOIUrl":"https://doi.org/10.1109/ARITH.2005.32","url":null,"abstract":"This paper presents extensions to Lefevre's algorithm that computes a lower bound on the distance between a segment and a regular grid Zopf2. This algorithm and, in particular, the extensions are useful in the search for worst cases for the exact rounding of unary elementary functions or base-conversion functions. The proof that is presented is simpler and less technical than the original proof. This paper also gives benchmark results with various optimization parameters, explanations of these results, and an application to base conversion","PeriodicalId":194902,"journal":{"name":"17th IEEE Symposium on Computer Arithmetic (ARITH'05)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114726320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gal's accurate tables algorithm aims at providing an efficient implementation of mathematical functions with correct rounding as often as possible. This method requires an expensive pre-computation of the values taken by the function - or by several related functions - at some distinguished points. Our improvements of Gal's method are two-fold: on the one hand we describe what is the arguably best set of distinguished values and how it improves the efficiency and accuracy of the function implementation, and on the other hand we give an algorithm which drastically decreases the cost of the pre-computation. These improvements are related to the worst cases for the correct rounding of mathematical functions and to the algorithms for finding them. We demonstrate how the whole method can be turned into practice for 2/sup x/ and sin x for x/spl isin/[1/2,1[, in double precision.
Gal的精确表算法旨在提供尽可能经常使用正确舍入的数学函数的有效实现。这种方法需要对函数(或几个相关函数)在某些不同点处取的值进行昂贵的预计算。我们对Gal的方法进行了两方面的改进:一方面,我们描述了什么是可论证的最佳区分值集,以及它如何提高函数实现的效率和准确性;另一方面,我们给出了一个大大降低预计算成本的算法。这些改进与数学函数正确舍入的最坏情况以及找到它们的算法有关。我们演示了如何将整个方法应用于双精度的2/sup x/和sin x (x/spl isin/[1/2,1])。
{"title":"Gal's accurate tables method revisited","authors":"D. Stehlé, P. Zimmermann","doi":"10.1109/ARITH.2005.24","DOIUrl":"https://doi.org/10.1109/ARITH.2005.24","url":null,"abstract":"Gal's accurate tables algorithm aims at providing an efficient implementation of mathematical functions with correct rounding as often as possible. This method requires an expensive pre-computation of the values taken by the function - or by several related functions - at some distinguished points. Our improvements of Gal's method are two-fold: on the one hand we describe what is the arguably best set of distinguished values and how it improves the efficiency and accuracy of the function implementation, and on the other hand we give an algorithm which drastically decreases the cost of the pre-computation. These improvements are related to the worst cases for the correct rounding of mathematical functions and to the algorithms for finding them. We demonstrate how the whole method can be turned into practice for 2/sup x/ and sin x for x/spl isin/[1/2,1[, in double precision.","PeriodicalId":194902,"journal":{"name":"17th IEEE Symposium on Computer Arithmetic (ARITH'05)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115836090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Aharoni, Sigal Asaf, Ron Maharik, Ilan Nehama, Ilya Nikulshin, A. Ziv
Test generation for datapath floating-point verification involves targeting intricate corner cases, which can often be solved only through complex constraint solving. In the process of calculating the result, we use an intermediate result whose significand comprises a finite number of bits and a sticky bit that is 0 if and only if the intermediate result is exact. We refer to all the bits beyond those represented in the final result as the invisible bits. We deal with corner cases that can only be defined via constraints on the intermediate result. Our work investigates the following problem: given a floating-point operation, and constraints on the invisible bits and the sticky bit, find two inputs for the operation that yield an intermediate result compatible with the constraints. The paper supplies a deterministic solution for addition and subtraction, and probabilistic solutions for multiplication and division. It also discusses the application of these algorithms to the verification of floating-point implementations.
{"title":"Solving constraints on the invisible bits of the intermediate result for floating-point verification","authors":"M. Aharoni, Sigal Asaf, Ron Maharik, Ilan Nehama, Ilya Nikulshin, A. Ziv","doi":"10.1109/ARITH.2005.38","DOIUrl":"https://doi.org/10.1109/ARITH.2005.38","url":null,"abstract":"Test generation for datapath floating-point verification involves targeting intricate corner cases, which can often be solved only through complex constraint solving. In the process of calculating the result, we use an intermediate result whose significand comprises a finite number of bits and a sticky bit that is 0 if and only if the intermediate result is exact. We refer to all the bits beyond those represented in the final result as the invisible bits. We deal with corner cases that can only be defined via constraints on the intermediate result. Our work investigates the following problem: given a floating-point operation, and constraints on the invisible bits and the sticky bit, find two inputs for the operation that yield an intermediate result compatible with the constraints. The paper supplies a deterministic solution for addition and subtraction, and probabilistic solutions for multiplication and division. It also discusses the application of these algorithms to the verification of floating-point implementations.","PeriodicalId":194902,"journal":{"name":"17th IEEE Symposium on Computer Arithmetic (ARITH'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130075816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a new number representation and arithmetic for the elements of the ring of integers modulo p. The so-called polynomial modular number system (PMNS) allows for fast polynomial arithmetic and easy parallelization. The most important contribution of this paper is the fundamental theorem of a modular number system, which provides a bound for the coefficients of the polynomials used to represent the set /spl Zopf//sub p/. However, we also propose a complete set of algorithms to perform the arithmetic operations over a PMNS, which make this system of practical interest for people concerned about efficient implementation of modular arithmetic.
{"title":"Arithmetic operations in the polynomial modular number system","authors":"J. Bajard, L. Imbert, T. Plantard","doi":"10.1109/ARITH.2005.11","DOIUrl":"https://doi.org/10.1109/ARITH.2005.11","url":null,"abstract":"We propose a new number representation and arithmetic for the elements of the ring of integers modulo p. The so-called polynomial modular number system (PMNS) allows for fast polynomial arithmetic and easy parallelization. The most important contribution of this paper is the fundamental theorem of a modular number system, which provides a bound for the coefficients of the polynomials used to represent the set /spl Zopf//sub p/. However, we also propose a complete set of algorithms to perform the arithmetic operations over a PMNS, which make this system of practical interest for people concerned about efficient implementation of modular arithmetic.","PeriodicalId":194902,"journal":{"name":"17th IEEE Symposium on Computer Arithmetic (ARITH'05)","volume":"2 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116774978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We are proposing a micro-architecture for high-performance IEEE floating-point addition that is based on a (non-redundant) high-radix representation of the floating-point operands. The main improvement of the proposed IEEE FP addition implementation is achieved by avoiding the computation of full alignment and normalization shifts which impose major delays in conventional implementations of IEEE FP addition. This reduction is achieved at the cost of wider operand interfaces and an increased complexity for IEEE compliant rounding. We present a detailed discussion of an IEEE FP adder implementation using the proposed high-radix format and explain the specific benefits and challenges of the design.
{"title":"High-radix implementation of IEEE floating-point addition","authors":"P. Seidel","doi":"10.1109/ARITH.2005.26","DOIUrl":"https://doi.org/10.1109/ARITH.2005.26","url":null,"abstract":"We are proposing a micro-architecture for high-performance IEEE floating-point addition that is based on a (non-redundant) high-radix representation of the floating-point operands. The main improvement of the proposed IEEE FP addition implementation is achieved by avoiding the computation of full alignment and normalization shifts which impose major delays in conventional implementations of IEEE FP addition. This reduction is achieved at the cost of wider operand interfaces and an increased complexity for IEEE compliant rounding. We present a detailed discussion of an IEEE FP adder implementation using the proposed high-radix format and explain the specific benefits and challenges of the design.","PeriodicalId":194902,"journal":{"name":"17th IEEE Symposium on Computer Arithmetic (ARITH'05)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127420835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents a novel design for fixed-point decimal multiplication that utilizes a simple recoding scheme to produce signed-magnitude representations of the operands thereby greatly simplifying the process of generating partial products for each multiplier digit. The partial products are generated using a digit-by-digit multiplier on a word-by-digit basis, first in a signed-digit form with two digits per position, and then combined via a combinational circuit. As the signed-digit partial products are developed one at a time while traversing the recoded multiplier operand from the least significant digit to the most significant digit, each partial product is added along with the accumulated sum of previous partial products via a signed-digit adder. This work is significantly different from other work employing digit-by-digit multipliers due to the efficiency gained by restricting the range of digits throughout the multiplication process.
{"title":"Decimal multiplication with efficient partial product generation","authors":"M. A. Erle, E. Schwarz, M. Schulte","doi":"10.1109/ARITH.2005.15","DOIUrl":"https://doi.org/10.1109/ARITH.2005.15","url":null,"abstract":"Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents a novel design for fixed-point decimal multiplication that utilizes a simple recoding scheme to produce signed-magnitude representations of the operands thereby greatly simplifying the process of generating partial products for each multiplier digit. The partial products are generated using a digit-by-digit multiplier on a word-by-digit basis, first in a signed-digit form with two digits per position, and then combined via a combinational circuit. As the signed-digit partial products are developed one at a time while traversing the recoded multiplier operand from the least significant digit to the most significant digit, each partial product is added along with the accumulated sum of previous partial products via a signed-digit adder. This work is significantly different from other work employing digit-by-digit multipliers due to the efficiency gained by restricting the range of digits throughout the multiplication process.","PeriodicalId":194902,"journal":{"name":"17th IEEE Symposium on Computer Arithmetic (ARITH'05)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128065190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Integer division on modern processors is expensive compared to multiplication. Previous algorithms for performing unsigned division by an invariant divisor, via reciprocal approximation, suffer in the worst case from a common requirement for n+1 bit multiplication, which typically must be synthesized from n-bit multiplication and extra arithmetic operations. This paper presents, and proves, a hybrid of previous algorithms that replaces n+1 bit multiplication with a single fused multiply-add operation on n-bit operands, thus reducing any n-bit unsigned division to the upper n bits of a multiply-add, followed by a single right shift. An additional benefit is that the prerequisite calculations are simple and fast. On the Itanium/spl reg/ 2 processor, the technique is advantageous for as few as two quotients that share a common run-time divisor.
{"title":"N-bit unsigned division via n-bit multiply-add","authors":"A. Robison","doi":"10.1109/ARITH.2005.31","DOIUrl":"https://doi.org/10.1109/ARITH.2005.31","url":null,"abstract":"Integer division on modern processors is expensive compared to multiplication. Previous algorithms for performing unsigned division by an invariant divisor, via reciprocal approximation, suffer in the worst case from a common requirement for n+1 bit multiplication, which typically must be synthesized from n-bit multiplication and extra arithmetic operations. This paper presents, and proves, a hybrid of previous algorithms that replaces n+1 bit multiplication with a single fused multiply-add operation on n-bit operands, thus reducing any n-bit unsigned division to the upper n bits of a multiply-add, followed by a single right shift. An additional benefit is that the prerequisite calculations are simple and fast. On the Itanium/spl reg/ 2 processor, the technique is advantageous for as few as two quotients that share a common run-time divisor.","PeriodicalId":194902,"journal":{"name":"17th IEEE Symposium on Computer Arithmetic (ARITH'05)","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133263968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Muller, A. Tisserand, B. Dinechin, Christophe Monat
Algorithms for Euclidean (i.e., integer) division by a constant operation are presented. They allow fast computation for some values of the divisor (known at compile time) or also when both quotient and modulus are required. These algorithms are based on the multiply-accumulate instruction and the 40-bit arithmetic available in DSPs such as the ST100 DSP from STMicroelectronics. The results are demonstrated in the case of standard speech coding applications.
{"title":"Division by constant for the ST100 DSP microprocessor","authors":"J. Muller, A. Tisserand, B. Dinechin, Christophe Monat","doi":"10.1109/ARITH.2005.17","DOIUrl":"https://doi.org/10.1109/ARITH.2005.17","url":null,"abstract":"Algorithms for Euclidean (i.e., integer) division by a constant operation are presented. They allow fast computation for some values of the divisor (known at compile time) or also when both quotient and modulus are required. These algorithms are based on the multiply-accumulate instruction and the 40-bit arithmetic available in DSPs such as the ST100 DSP from STMicroelectronics. The results are demonstrated in the case of standard speech coding applications.","PeriodicalId":194902,"journal":{"name":"17th IEEE Symposium on Computer Arithmetic (ARITH'05)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132213189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce an inheritance property and related table lookup structures applicable to simplified evaluation of the modular operations "multiplicative inverse", "discrete log", and "exponential residue" in the particular modulus 2/sup k/. Regarding applications, we describe an integer representation system of Benschop for transforming integer multiplications into additions which benefits from our table lookup function evaluation procedures. We focus herein on the multiplicative inverse modulo 2/sup k/ to exhibit simplifications in hardware implementations realized from the inheritance property. A table lookup structure given by a bit string that can be interpreted with reference to a binary tree is described and analyzed. Using observed symmetries, the lookup structure size is reduced allowing a novel direct lookup process for multiplicative inverses for all 16-bit odd integers to be obtained from a table of size less than two KBytes. The 16-bit multiplicative inverse operation is also applicable for providing a seed inverse for obtaining 32/64-bit multiplicative inverses by one/two iterations of a known quadratic refinement algorithm.
{"title":"Table lookup structures for multiplicative inverses modulo 2/sup k/","authors":"D. Matula, A. Fit-Florea, M. Thornton","doi":"10.1109/ARITH.2005.43","DOIUrl":"https://doi.org/10.1109/ARITH.2005.43","url":null,"abstract":"We introduce an inheritance property and related table lookup structures applicable to simplified evaluation of the modular operations \"multiplicative inverse\", \"discrete log\", and \"exponential residue\" in the particular modulus 2/sup k/. Regarding applications, we describe an integer representation system of Benschop for transforming integer multiplications into additions which benefits from our table lookup function evaluation procedures. We focus herein on the multiplicative inverse modulo 2/sup k/ to exhibit simplifications in hardware implementations realized from the inheritance property. A table lookup structure given by a bit string that can be interpreted with reference to a binary tree is described and analyzed. Using observed symmetries, the lookup structure size is reduced allowing a novel direct lookup process for multiplicative inverses for all 16-bit odd integers to be obtained from a table of size less than two KBytes. The 16-bit multiplicative inverse operation is also applicable for providing a seed inverse for obtaining 32/64-bit multiplicative inverses by one/two iterations of a known quadratic refinement algorithm.","PeriodicalId":194902,"journal":{"name":"17th IEEE Symposium on Computer Arithmetic (ARITH'05)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133621552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a search for an algorithm to compute atan(x) which has both low latency and few floating point instructions, an interesting variant of familiar trigonometry formulas was discovered that allow the start of argument reduction to commence before any references to tables stored in memory are needed. Low latency makes the method suitable for a closed subroutine, and few floating-point operations make the method advantageous for a software-pipelined implementation.
{"title":"A fast-start method for computing the inverse tangent","authors":"Peter W. Markstein","doi":"10.1109/ARITH.2005.5","DOIUrl":"https://doi.org/10.1109/ARITH.2005.5","url":null,"abstract":"In a search for an algorithm to compute atan(x) which has both low latency and few floating point instructions, an interesting variant of familiar trigonometry formulas was discovered that allow the start of argument reduction to commence before any references to tables stored in memory are needed. Low latency makes the method suitable for a closed subroutine, and few floating-point operations make the method advantageous for a software-pipelined implementation.","PeriodicalId":194902,"journal":{"name":"17th IEEE Symposium on Computer Arithmetic (ARITH'05)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133613258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}