Algorithms for Efficient Reproducible Floating Point Summation

Peter Ahrens, J. Demmel, Hong Diep Nguyen
{"title":"Algorithms for Efficient Reproducible Floating Point Summation","authors":"Peter Ahrens, J. Demmel, Hong Diep Nguyen","doi":"10.1145/3389360","DOIUrl":null,"url":null,"abstract":"We define “reproducibility” as getting bitwise identical results from multiple runs of the same program, perhaps with different hardware resources or other changes that should not affect the answer. Many users depend on reproducibility for debugging or correctness. However, dynamic scheduling of parallel computing resources, combined with nonassociative floating point addition, makes reproducibility challenging even for summation, or operations like the BLAS. We describe a “reproducible accumulator” data structure (the “binned number”) and associated algorithms to reproducibly sum binary floating point numbers, independent of summation order. We use a subset of the IEEE Floating Point Standard 754-2008 and bitwise operations on the standard representations in memory. Our approach requires only one read-only pass over the data, and one reduction in parallel, using a 6-word reproducible accumulator (more words can be used for higher accuracy), enabling standard tiling optimization techniques. Summing n words with a 6-word reproducible accumulator requires approximately 9n floating point operations (arithmetic, comparison, and absolute value) and approximately 3n bitwise operations. The final error bound with a 6-word reproducible accumulator and our default settings can be up to 229 times smaller than the error bound for conventional (recursive) summation on ill-conditioned double-precision inputs.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"56 1","pages":"1 - 49"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Mathematical Software (TOMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3389360","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

We define “reproducibility” as getting bitwise identical results from multiple runs of the same program, perhaps with different hardware resources or other changes that should not affect the answer. Many users depend on reproducibility for debugging or correctness. However, dynamic scheduling of parallel computing resources, combined with nonassociative floating point addition, makes reproducibility challenging even for summation, or operations like the BLAS. We describe a “reproducible accumulator” data structure (the “binned number”) and associated algorithms to reproducibly sum binary floating point numbers, independent of summation order. We use a subset of the IEEE Floating Point Standard 754-2008 and bitwise operations on the standard representations in memory. Our approach requires only one read-only pass over the data, and one reduction in parallel, using a 6-word reproducible accumulator (more words can be used for higher accuracy), enabling standard tiling optimization techniques. Summing n words with a 6-word reproducible accumulator requires approximately 9n floating point operations (arithmetic, comparison, and absolute value) and approximately 3n bitwise operations. The final error bound with a 6-word reproducible accumulator and our default settings can be up to 229 times smaller than the error bound for conventional (recursive) summation on ill-conditioned double-precision inputs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
高效可重复浮点求和算法
我们将“可重复性”定义为从同一程序的多次运行中获得按位相同的结果,可能使用不同的硬件资源或其他不应影响答案的更改。许多用户依赖于可再现性来进行调试或正确性。但是,并行计算资源的动态调度与非关联浮点加法相结合,即使对于求和或BLAS之类的操作,也会使再现性受到挑战。我们描述了一个“可重复累加器”数据结构(“二进制数”)和相关的算法来可重复地和二进制浮点数,与求和顺序无关。我们使用IEEE浮点标准754-2008的一个子集,并对内存中的标准表示进行位操作。我们的方法只需要对数据进行一次只读传递,并使用6个单词的可重复累加器(更多的单词可以用于更高的精度)并行地进行一次缩减,从而支持标准的平铺优化技术。使用6字可重复累加器对n个单词求和需要大约9n个浮点操作(算术、比较和绝对值)和大约3n个位操作。使用6字可重复累加器和我们的默认设置的最终误差界可以比在条件恶劣的双精度输入上进行传统(递归)求和的误差界小229倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Configurable Open-source Data Structure for Distributed Conforming Unstructured Homogeneous Meshes with GPU Support Algorithm 1027: NOMAD Version 4: Nonlinear Optimization with the MADS Algorithm Toward Accurate and Fast Summation Algorithm 1028: VTMOP: Solver for Blackbox Multiobjective Optimization Problems Parallel QR Factorization of Block Low-rank Matrices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1