Pub Date : 2018-06-01DOI: 10.1109/ARITH.2018.8464818
Clothilde Jeangoudoux, C. Lauter
The IEEE 754–2008 Standard governs Floating-Point Arithmetic in all types of Computer Systems. The Standard provides for two radices, 2 and 10. It specifies conversion operations between these radices, but does not allow floating-point formats of different radices to be mixed in computational operations. In contrast, the Standard does provide for mixing formats of one radix in one operation. In order to enhance the Standard and make it closed under all basic computational operations, we propose an algorithm for a correctly rounded mixed-radix Fused-Multiply-and-Add (FMA). Our algorithm takes any combination of IEEE754 binary64 and decimal64 numbers in argument and provides a result in IEEE754 binary64 and decimal64, rounded according to any for the five IEEE754 rounding modes. Our implementation does not require any dynamic memory allocation; its runtime can be bounded statically. We compare our implementation to a basic mixed-radix FMA implementation based on the GMP Multiple Precision library.
{"title":"A Correctly Rounded Mixed-Radix Fused-Multiply-Add","authors":"Clothilde Jeangoudoux, C. Lauter","doi":"10.1109/ARITH.2018.8464818","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464818","url":null,"abstract":"The IEEE 754–2008 Standard governs Floating-Point Arithmetic in all types of Computer Systems. The Standard provides for two radices, 2 and 10. It specifies conversion operations between these radices, but does not allow floating-point formats of different radices to be mixed in computational operations. In contrast, the Standard does provide for mixing formats of one radix in one operation. In order to enhance the Standard and make it closed under all basic computational operations, we propose an algorithm for a correctly rounded mixed-radix Fused-Multiply-and-Add (FMA). Our algorithm takes any combination of IEEE754 binary64 and decimal64 numbers in argument and provides a result in IEEE754 binary64 and decimal64, rounded according to any for the five IEEE754 rounding modes. Our implementation does not require any dynamic memory allocation; its runtime can be bounded statically. We compare our implementation to a basic mixed-radix FMA implementation based on the GMP Multiple Precision library.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"20 1","pages":"21-28"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79664333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-06-01DOI: 10.1109/ARITH.2018.8464784
D. Defour
When dealing with floating-point numbers, there are several sources of error which can drastically reduce the numerical quality of computed results. One of those error sources is the loss of significance or cancellation, which occurs during for example, the subtraction of two nearly equal numbers. In this article, we propose a representation format named Floating-Point Adaptive Noise Reduction (FP-ANR). This format embeds cancellation information directly into the floating-point representation format thanks to a dedicated pattern. With this format, insignificant trailing bits lost during cancellation are removed from every manipulated floating-point number. The immediate consequence is that it increases the numerical confidence of computed values. The proposed representation format corresponds to a simple and efficient implementation of significance arithmetic based and compatible with the IEEE Standard 754 standard.
{"title":"FP-ANR: A representation format to handle floating-point cancellation at run-time","authors":"D. Defour","doi":"10.1109/ARITH.2018.8464784","DOIUrl":"https://doi.org/10.1109/ARITH.2018.8464784","url":null,"abstract":"When dealing with floating-point numbers, there are several sources of error which can drastically reduce the numerical quality of computed results. One of those error sources is the loss of significance or cancellation, which occurs during for example, the subtraction of two nearly equal numbers. In this article, we propose a representation format named Floating-Point Adaptive Noise Reduction (FP-ANR). This format embeds cancellation information directly into the floating-point representation format thanks to a dedicated pattern. With this format, insignificant trailing bits lost during cancellation are removed from every manipulated floating-point number. The immediate consequence is that it increases the numerical confidence of computed values. The proposed representation format corresponds to a simple and efficient implementation of significance arithmetic based and compatible with the IEEE Standard 754 standard.","PeriodicalId":6576,"journal":{"name":"2018 IEEE 25th Symposium on Computer Arithmetic (ARITH)","volume":"1 1","pages":"76-83"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89196510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}