快速可重复浮点求和

2013 IEEE 21st Symposium on Computer Arithmetic Pub Date : 2013-04-07 DOI:10.1109/ARITH.2013.9

J. Demmel, Hong Diep Nguyen

{"title":"快速可重复浮点求和","authors":"J. Demmel, Hong Diep Nguyen","doi":"10.1109/ARITH.2013.9","DOIUrl":null,"url":null,"abstract":"Reproducibility, i.e. getting the bitwise identical floating point results from multiple runs of the same program, is a property that many users depend on either for debugging or correctness checking in many codes [1]. However, the combination of dynamic scheduling of parallel computing resources, and floating point nonassociativity, make attaining reproducibility a challenge even for simple reduction operations like computing the sum of a vector of numbers in parallel. We propose a technique for floating point summation that is reproducible independent of the order of summation. Our technique uses Rump's algorithm for error-free vector transformation [2], and is much more efficient than using (possibly very) high precision arithmetic. Our algorithm trades off efficiency and accuracy: we reproducibly attain reasonably accurate results (with an absolute error bound c · n2 · macheps · max |vi| for a small constant c) with just 2n + O(1) floating-point operations, and quite accurate results (with an absolute error bound c · n3 · macheps2 · max |vi| with 5n + O(1) floating point operations, both with just two reduction operations. Higher accuracies are also possible by increasing the number of error-free transformations. As long as the same rounding mode is used, results computed by the proposed algorithms are reproducible for any run on any platform.","PeriodicalId":211528,"journal":{"name":"2013 IEEE 21st Symposium on Computer Arithmetic","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"76","resultStr":"{\"title\":\"Fast Reproducible Floating-Point Summation\",\"authors\":\"J. Demmel, Hong Diep Nguyen\",\"doi\":\"10.1109/ARITH.2013.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reproducibility, i.e. getting the bitwise identical floating point results from multiple runs of the same program, is a property that many users depend on either for debugging or correctness checking in many codes [1]. However, the combination of dynamic scheduling of parallel computing resources, and floating point nonassociativity, make attaining reproducibility a challenge even for simple reduction operations like computing the sum of a vector of numbers in parallel. We propose a technique for floating point summation that is reproducible independent of the order of summation. Our technique uses Rump's algorithm for error-free vector transformation [2], and is much more efficient than using (possibly very) high precision arithmetic. Our algorithm trades off efficiency and accuracy: we reproducibly attain reasonably accurate results (with an absolute error bound c · n2 · macheps · max |vi| for a small constant c) with just 2n + O(1) floating-point operations, and quite accurate results (with an absolute error bound c · n3 · macheps2 · max |vi| with 5n + O(1) floating point operations, both with just two reduction operations. Higher accuracies are also possible by increasing the number of error-free transformations. As long as the same rounding mode is used, results computed by the proposed algorithms are reproducible for any run on any platform.\",\"PeriodicalId\":211528,\"journal\":{\"name\":\"2013 IEEE 21st Symposium on Computer Arithmetic\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"76\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 21st Symposium on Computer Arithmetic\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ARITH.2013.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 21st Symposium on Computer Arithmetic","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARITH.2013.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 76

摘要

可重复性，即从同一程序的多次运行中获得按位相同的浮点结果，是许多用户在许多代码中进行调试或正确性检查所依赖的属性[1]。然而，并行计算资源的动态调度和浮点非结合性使得即使是简单的约简操作(如并行计算数字向量的和)也很难获得再现性。我们提出了一种与求和顺序无关的可重复浮点求和技术。我们的技术使用Rump算法进行无误差向量变换[2]，并且比使用(可能非常)高精度的算法要高效得多。我们的算法权衡了效率和精度:只需2n + O(1)个浮点运算，我们就可以重复地获得相当准确的结果(对于小常数c，绝对误差范围为c·n2·macheps·max |vi|)，并且只需5n + O(1)个浮点运算，我们就可以获得相当准确的结果(绝对误差范围为c·n3·macheps2·max |vi|)，两者都只需要两个约简运算。通过增加无错误转换的数量也可以提高精度。只要使用相同的舍入模式，所提出的算法计算的结果在任何平台上运行都是可重复的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fast Reproducible Floating-Point Summation

Reproducibility, i.e. getting the bitwise identical floating point results from multiple runs of the same program, is a property that many users depend on either for debugging or correctness checking in many codes [1]. However, the combination of dynamic scheduling of parallel computing resources, and floating point nonassociativity, make attaining reproducibility a challenge even for simple reduction operations like computing the sum of a vector of numbers in parallel. We propose a technique for floating point summation that is reproducible independent of the order of summation. Our technique uses Rump's algorithm for error-free vector transformation [2], and is much more efficient than using (possibly very) high precision arithmetic. Our algorithm trades off efficiency and accuracy: we reproducibly attain reasonably accurate results (with an absolute error bound c · n2 · macheps · max |vi| for a small constant c) with just 2n + O(1) floating-point operations, and quite accurate results (with an absolute error bound c · n3 · macheps2 · max |vi| with 5n + O(1) floating point operations, both with just two reduction operations. Higher accuracies are also possible by increasing the number of error-free transformations. As long as the same rounding mode is used, results computed by the proposed algorithms are reproducible for any run on any platform.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助