Roger A. Golliver, S. M. Müller, S. Oberman, M. Schmookler, Debjit Das Sarma, A. Beaumont-Smith
In 1990 there was a dramatic change in the overall design of floating-point units (FPUs) with the introduction of the fused multiply-add dataflow. This design is common today due to its performance advantage over separated units. Recently the constraining parameters have been changing for sub 10 micron technologies and the resulting designs are focusing on increasing the frequency at the cost of pipeline depth. Wire lengths are a crucial design parameter and there is a great deal of effort spent in floorplanning the execution elements to be very close together. It is now typical that a signal sent across an FPU may take 1 or more clock cycles. Thus, the physical design is very important and requires global optimizations of placement of macros as well as complex power reduction. Additionally technology scaling continues to decrease feature sizes and more execution units or even processor cores can be placed on a chip. Execution units such as Decimal FPUs are in product plans. There are single chip designs with 8 vector processing units which are used to accelerate the video games we play. The processing power in these single chip game processors is the equivalent of supercomputers. What is the next trendsetting design or key problem in computer arithmetic? We have asked a panel of expert arithmetic unit hardware designers to discuss the current pain versus gain tradeoffs and to speculate on the future of arithmetic design. Panelists:
{"title":"Pain versus Gain in the Hardware Design of FPUs and Supercomputers","authors":"Roger A. Golliver, S. M. Müller, S. Oberman, M. Schmookler, Debjit Das Sarma, A. Beaumont-Smith","doi":"10.1109/ARITH.2005.33","DOIUrl":"https://doi.org/10.1109/ARITH.2005.33","url":null,"abstract":"In 1990 there was a dramatic change in the overall design of floating-point units (FPUs) with the introduction of the fused multiply-add dataflow. This design is common today due to its performance advantage over separated units. Recently the constraining parameters have been changing for sub 10 micron technologies and the resulting designs are focusing on increasing the frequency at the cost of pipeline depth. Wire lengths are a crucial design parameter and there is a great deal of effort spent in floorplanning the execution elements to be very close together. It is now typical that a signal sent across an FPU may take 1 or more clock cycles. Thus, the physical design is very important and requires global optimizations of placement of macros as well as complex power reduction. Additionally technology scaling continues to decrease feature sizes and more execution units or even processor cores can be placed on a chip. Execution units such as Decimal FPUs are in product plans. There are single chip designs with 8 vector processing units which are used to accelerate the video games we play. The processing power in these single chip game processors is the equivalent of supercomputers. What is the next trendsetting design or key problem in computer arithmetic? We have asked a panel of expert arithmetic unit hardware designers to discuss the current pain versus gain tradeoffs and to speculate on the future of arithmetic design. Panelists:","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"119 1","pages":"39"},"PeriodicalIF":0.0,"publicationDate":"2005-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80393076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-04-14DOI: 10.1109/ARITH.1999.10005
R. Brent
Advances in computer hardware often have little impact until they become accessible to programmers using high-level languages. For example, the IEEE floating-point arithmetic standard provides various rounding modes and exceptions, but it is difficult or impossible to take advantage of these from most high-level languages, so the full capabilities of IEEE-compatible hardware are seldom used. When they are used by writing in machine or assembly language, there is a high cost in program development and testing time, lack of portability, and difficulty of software maintenance.In this talk we discuss several areas in which computer hardware, especially arithmetic hardware, can or should significantly influence programming language design. These include: vector units, floating-point exception handling, floating-point rounding modes, high/extended precision registers/arithmetic, and use of unusual number systems. Relevant application areas include interval arithmetic, high-precision integer arithmetic for computer algebra and cryptography, and testing of hardware by comparison with software simulations.
{"title":"Computer Arithmetic - A Programmer's Perspective","authors":"R. Brent","doi":"10.1109/ARITH.1999.10005","DOIUrl":"https://doi.org/10.1109/ARITH.1999.10005","url":null,"abstract":"Advances in computer hardware often have little impact until they become accessible to programmers using high-level languages. For example, the IEEE floating-point arithmetic standard provides various rounding modes and exceptions, but it is difficult or impossible to take advantage of these from most high-level languages, so the full capabilities of IEEE-compatible hardware are seldom used. When they are used by writing in machine or assembly language, there is a high cost in program development and testing time, lack of portability, and difficulty of software maintenance.In this talk we discuss several areas in which computer hardware, especially arithmetic hardware, can or should significantly influence programming language design. These include: vector units, floating-point exception handling, floating-point rounding modes, high/extended precision registers/arithmetic, and use of unusual number systems. Relevant application areas include interval arithmetic, high-precision integer arithmetic for computer algebra and cryptography, and testing of hardware by comparison with software simulations.","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"23 1","pages":"2-"},"PeriodicalIF":0.0,"publicationDate":"1999-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84440949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-01-01DOI: 10.1109/ARITH.1991.145536
A. Knöfel
Matrix and vector operations based on dot product expressions occur in almost all scientific and engineering applications. The lack of the popular programming languages and computer architectures to provide operators for these data types and corresponding accurate hardware instructions forced users to emulate the vector operations by constructing loops with scalar floating point instructions. Cancellation and immediate rounding in these loops cause uncertain and inaccurate numerical results and aggrevate an error analysis.
{"title":"Fast hardware units for the computation of accurate dot products","authors":"A. Knöfel","doi":"10.1109/ARITH.1991.145536","DOIUrl":"https://doi.org/10.1109/ARITH.1991.145536","url":null,"abstract":"Matrix and vector operations based on dot product expressions occur in almost all scientific and engineering applications. The lack of the popular programming languages and computer architectures to provide operators for these data types and corresponding accurate hardware instructions forced users to emulate the vector operations by constructing loops with scalar floating point instructions. Cancellation and immediate rounding in these loops cause uncertain and inaccurate numerical results and aggrevate an error analysis.","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"2 1","pages":"70-74"},"PeriodicalIF":0.0,"publicationDate":"1991-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85300733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1989-09-06DOI: 10.1109/ARITH.1989.72829
Ç. Koç, P. Cappello
The authors present several time-optimal and space-time-optimal systolic arrays for computing a process dependence graph corresponding to the mixed-radix conversion algorithm. The arrays are particularly suitable for software implementations of algorithms from the applications of residue number systems on a programmable systolic/wavefront array. Examples of such applications are the exact solution of linear systems and matrix problems over integral domains. The authors also describe a decomposition strategy for treating a mixed-radix conversion problem whose size exceeds the array size. >
{"title":"Systolic arrays for integer Chinese remaindering","authors":"Ç. Koç, P. Cappello","doi":"10.1109/ARITH.1989.72829","DOIUrl":"https://doi.org/10.1109/ARITH.1989.72829","url":null,"abstract":"The authors present several time-optimal and space-time-optimal systolic arrays for computing a process dependence graph corresponding to the mixed-radix conversion algorithm. The arrays are particularly suitable for software implementations of algorithms from the applications of residue number systems on a programmable systolic/wavefront array. Examples of such applications are the exact solution of linear systems and matrix problems over integral domains. The authors also describe a decomposition strategy for treating a mixed-radix conversion problem whose size exceeds the array size. >","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"1 1","pages":"216-223"},"PeriodicalIF":0.0,"publicationDate":"1989-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74979429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1985-06-04DOI: 10.1109/ARITH.1985.6158945
J. Eldon
Although the advantages of floating point arithmetic have long been recognized, hardware complexity and expense have impeded its use in high speed digital signal processing (DSP). Now, however, the availability of a growing number of fast dedicated floating point adder and multiplier chips is spurring renewed interest in floating point for real time filtering and spectral analysis.
{"title":"A family of CMOS floating point arithmetic chips","authors":"J. Eldon","doi":"10.1109/ARITH.1985.6158945","DOIUrl":"https://doi.org/10.1109/ARITH.1985.6158945","url":null,"abstract":"Although the advantages of floating point arithmetic have long been recognized, hardware complexity and expense have impeded its use in high speed digital signal processing (DSP). Now, however, the availability of a growing number of fast dedicated floating point adder and multiplier chips is spurring renewed interest in floating point for real time filtering and spectral analysis.","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"19 2 1","pages":"101-107"},"PeriodicalIF":0.0,"publicationDate":"1985-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78410174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1972-05-15DOI: 10.1109/ARITH.1972.6153887
D. Matula
Completeness and uniqueness properties for the representation of the base 㬲 digital numbers by finite length radix polynomials with various digit sets are studied. The digit sets guaranteeing completeness and uniqueness are characterized. A digital conversion algorithm is introduced for determining a base β radix polynomial with digits from a specified set D having a particular value whenever such a radix polynomial exists. The notion of precision of a radix polynomial is formalized, and the determination of the precision from the given base β, digit set D, and real value a of the radix polynomial is investigated.
{"title":"Foundations of finite precision arithmetic","authors":"D. Matula","doi":"10.1109/ARITH.1972.6153887","DOIUrl":"https://doi.org/10.1109/ARITH.1972.6153887","url":null,"abstract":"Completeness and uniqueness properties for the representation of the base 㬲 digital numbers by finite length radix polynomials with various digit sets are studied. The digit sets guaranteeing completeness and uniqueness are characterized. A digital conversion algorithm is introduced for determining a base β radix polynomial with digits from a specified set D having a particular value whenever such a radix polynomial exists. The notion of precision of a radix polynomial is formalized, and the determination of the precision from the given base β, digit set D, and real value a of the radix polynomial is investigated.","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"7 1","pages":"1-35"},"PeriodicalIF":0.0,"publicationDate":"1972-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74561158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1972-05-15DOI: 10.1109/ARITH.1972.6153892
T. L. Cauthen, T. Rao
The precision required for residue operations is primarily associated with resistive components and operational amplifiers. However, technology has advanced to the point that analog components can be built to an accuracy of less than one tenth of one percent (at least with discrete components). Companies are attempting to build DAC'S with fifteen (15) bit accuracy which will settle to the least significant bit in one hundred (100) nanoseconds. None are commercially available but seven or eight bit DAC's with settling times on the order of one hundred (100) nanoseconds are not uncommon. Sixteen (16) bit ADC's have been announced recently as a result of new analog components. It is not uncommon to find operational amplifiers on the market with gain bandwidth products in excess of one hundred fifty (150) megahertz and linearities on the order of one tenth of one percent. One can also find analog voltage comparators (such as the Motorola MC1650) which has a hysteresis of ten millivolts and a switching speed of less than two nanoseconds. Shotkey diodes have been introduced which allow transistors to switch in less than one and one-half nanoseconds, and manufacturers have learned to build both active and passive components to a tolerance of less than one-tenth of one percent.
{"title":"Analog techniques for residue operations","authors":"T. L. Cauthen, T. Rao","doi":"10.1109/ARITH.1972.6153892","DOIUrl":"https://doi.org/10.1109/ARITH.1972.6153892","url":null,"abstract":"The precision required for residue operations is primarily associated with resistive components and operational amplifiers. However, technology has advanced to the point that analog components can be built to an accuracy of less than one tenth of one percent (at least with discrete components). Companies are attempting to build DAC'S with fifteen (15) bit accuracy which will settle to the least significant bit in one hundred (100) nanoseconds. None are commercially available but seven or eight bit DAC's with settling times on the order of one hundred (100) nanoseconds are not uncommon. Sixteen (16) bit ADC's have been announced recently as a result of new analog components. It is not uncommon to find operational amplifiers on the market with gain bandwidth products in excess of one hundred fifty (150) megahertz and linearities on the order of one tenth of one percent. One can also find analog voltage comparators (such as the Motorola MC1650) which has a hysteresis of ten millivolts and a switching speed of less than two nanoseconds. Shotkey diodes have been introduced which allow transistors to switch in less than one and one-half nanoseconds, and manufacturers have learned to build both active and passive components to a tolerance of less than one-tenth of one percent.","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"1 1","pages":"1-23"},"PeriodicalIF":0.0,"publicationDate":"1972-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88001938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1972-05-15DOI: 10.1109/ARITH.1972.6153889
P. Monteiro, T. Rao
A new class of (separate) multiresidue codes has been proposed, which is capable of double error correction. The codes are derived from a class of AN codes where A is of the form π i=1 3(2ki−1). Previously all discussions on separate code implementation had restricted themselves to single error correcting codes only. We have shown that these multiple-error correcting separate codes can be relatively easily implemented as the check bases are of the form 2k−1. A comparison with the multiresidue codes derived from Barrows-Mandelbaum codes has shown that these codes have in general a higher information rate and an easier implementation.
{"title":"Multiresidue codes for double error correction","authors":"P. Monteiro, T. Rao","doi":"10.1109/ARITH.1972.6153889","DOIUrl":"https://doi.org/10.1109/ARITH.1972.6153889","url":null,"abstract":"A new class of (separate) multiresidue codes has been proposed, which is capable of double error correction. The codes are derived from a class of AN codes where A is of the form π i=1 3(2ki−1). Previously all discussions on separate code implementation had restricted themselves to single error correcting codes only. We have shown that these multiple-error correcting separate codes can be relatively easily implemented as the check bases are of the form 2k−1. A comparison with the multiresidue codes derived from Barrows-Mandelbaum codes has shown that these codes have in general a higher information rate and an easier implementation.","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"162 1","pages":"1-13"},"PeriodicalIF":0.0,"publicationDate":"1972-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76758809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1972-05-15DOI: 10.1109/ARITH.1972.6153915
J. Marasa, D. Matula
The accumulated round-off error incurred in long arithmetic computations involving a randomized mixture of addition, subtraction, multiplication and division operations applied to an initial randomly generated data base is studied via simulation. Truncated and rounded floating-point arithmetic and truncated and rounded logarithmic arithmetic are simultaneously utilized for each of the computation sequences and the resulting round-off error accumulations for these four systems are compared. Fundamental results related to the nature of the correlated errors incurred under various arithmetic operator mixes are discussed.
{"title":"A simulative study of correlated error propagation in various finite arithmetics","authors":"J. Marasa, D. Matula","doi":"10.1109/ARITH.1972.6153915","DOIUrl":"https://doi.org/10.1109/ARITH.1972.6153915","url":null,"abstract":"The accumulated round-off error incurred in long arithmetic computations involving a randomized mixture of addition, subtraction, multiplication and division operations applied to an initial randomly generated data base is studied via simulation. Truncated and rounded floating-point arithmetic and truncated and rounded logarithmic arithmetic are simultaneously utilized for each of the computation sequences and the resulting round-off error accumulations for these four systems are compared. Fundamental results related to the nature of the correlated errors incurred under various arithmetic operator mixes are discussed.","PeriodicalId":6526,"journal":{"name":"2015 IEEE 22nd Symposium on Computer Arithmetic","volume":"128 1","pages":"1-44"},"PeriodicalIF":0.0,"publicationDate":"1972-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76775717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}