Floating-point arithmetic

IF 11.3 1区数学 Q1 MATHEMATICS Acta Numerica Pub Date : 2023-05-01 DOI:10.1017/S0962492922000101

S. Boldo, C. Jeannerod, G. Melquiond, Jean-Michel Muller

{"title":"Floating-point arithmetic","authors":"S. Boldo, C. Jeannerod, G. Melquiond, Jean-Michel Muller","doi":"10.1017/S0962492922000101","DOIUrl":null,"url":null,"abstract":"Floating-point numbers have an intuitive meaning when it comes to physics-based numerical computations, and they have thus become the most common way of approximating real numbers in computers. The IEEE-754 Standard has played a large part in making floating-point arithmetic ubiquitous today, by specifying its semantics in a strict yet useful way as early as 1985. In particular, floating-point operations should be performed as if their results were first computed with an infinite precision and then rounded to the target format. A consequence is that floating-point arithmetic satisfies the ‘standard model’ that is often used for analysing the accuracy of floating-point algorithms. But that is only scraping the surface, and floating-point arithmetic offers much more. In this survey we recall the history of floating-point arithmetic as well as its specification mandated by the IEEE-754 Standard. We also recall what properties it entails and what every programmer should know when designing a floating-point algorithm. We provide various basic blocks that can be implemented with floating-point arithmetic. In particular, one can actually compute the rounding error caused by some floating-point operations, which paves the way to designing more accurate algorithms. More generally, properties of floating-point arithmetic make it possible to extend the accuracy of computations beyond working precision.","PeriodicalId":48863,"journal":{"name":"Acta Numerica","volume":"32 1","pages":"203 - 290"},"PeriodicalIF":11.3000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Numerica","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1017/S0962492922000101","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 14

Abstract

Floating-point numbers have an intuitive meaning when it comes to physics-based numerical computations, and they have thus become the most common way of approximating real numbers in computers. The IEEE-754 Standard has played a large part in making floating-point arithmetic ubiquitous today, by specifying its semantics in a strict yet useful way as early as 1985. In particular, floating-point operations should be performed as if their results were first computed with an infinite precision and then rounded to the target format. A consequence is that floating-point arithmetic satisfies the ‘standard model’ that is often used for analysing the accuracy of floating-point algorithms. But that is only scraping the surface, and floating-point arithmetic offers much more. In this survey we recall the history of floating-point arithmetic as well as its specification mandated by the IEEE-754 Standard. We also recall what properties it entails and what every programmer should know when designing a floating-point algorithm. We provide various basic blocks that can be implemented with floating-point arithmetic. In particular, one can actually compute the rounding error caused by some floating-point operations, which paves the way to designing more accurate algorithms. More generally, properties of floating-point arithmetic make it possible to extend the accuracy of computations beyond working precision.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

浮点算术

当涉及到基于物理的数值计算时，浮点数具有直观的含义，因此它们已成为计算机中近似实数的最常见方式。IEEE-754标准早在1985年就以严格而有用的方式规定了浮点运算的语义，在使浮点运算在今天无处不在方面发挥了很大的作用。特别是，执行浮点操作时，应该把它们的结果首先以无限精度计算，然后四舍五入到目标格式。其结果是，浮点运算满足通常用于分析浮点算法精度的“标准模型”。但这仅仅是皮毛，浮点运算提供了更多。在这个调查中，我们回顾了浮点运算的历史，以及它在IEEE-754标准中规定的规范。我们还回顾了它所需要的属性以及每个程序员在设计浮点算法时应该知道的内容。我们提供了各种可以用浮点运算实现的基本块。特别是，可以实际计算由某些浮点操作引起的舍入误差，这为设计更精确的算法铺平了道路。更一般地说，浮点运算的特性使计算精度超越工作精度成为可能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Acta Numerica MATHEMATICS-

CiteScore

26.00

自引率

0.70%

发文量

期刊介绍： Acta Numerica stands as the preeminent mathematics journal, ranking highest in both Impact Factor and MCQ metrics. This annual journal features a collection of review articles that showcase survey papers authored by prominent researchers in numerical analysis, scientific computing, and computational mathematics. These papers deliver comprehensive overviews of recent advances, offering state-of-the-art techniques and analyses. Encompassing the entirety of numerical analysis, the articles are crafted in an accessible style, catering to researchers at all levels and serving as valuable teaching aids for advanced instruction. The broad subject areas covered include computational methods in linear algebra, optimization, ordinary and partial differential equations, approximation theory, stochastic analysis, nonlinear dynamical systems, as well as the application of computational techniques in science and engineering. Acta Numerica also delves into the mathematical theory underpinning numerical methods, making it a versatile and authoritative resource in the field of mathematics.

期刊最新文献

Sparse linear least-squares problems Cut finite element methods The discontinuous Petrov–Galerkin method Time parallelization for hyperbolic and parabolic problems Optimization problems governed by systems of PDEs with uncertainties