浮点算术

IF 16.3 1区数学 Q1 MATHEMATICS Acta Numerica Pub Date : 2023-05-01 DOI:10.1017/S0962492922000101

S. Boldo, C. Jeannerod, G. Melquiond, Jean-Michel Muller

{"title":"浮点算术","authors":"S. Boldo, C. Jeannerod, G. Melquiond, Jean-Michel Muller","doi":"10.1017/S0962492922000101","DOIUrl":null,"url":null,"abstract":"Floating-point numbers have an intuitive meaning when it comes to physics-based numerical computations, and they have thus become the most common way of approximating real numbers in computers. The IEEE-754 Standard has played a large part in making floating-point arithmetic ubiquitous today, by specifying its semantics in a strict yet useful way as early as 1985. In particular, floating-point operations should be performed as if their results were first computed with an infinite precision and then rounded to the target format. A consequence is that floating-point arithmetic satisfies the ‘standard model’ that is often used for analysing the accuracy of floating-point algorithms. But that is only scraping the surface, and floating-point arithmetic offers much more. In this survey we recall the history of floating-point arithmetic as well as its specification mandated by the IEEE-754 Standard. We also recall what properties it entails and what every programmer should know when designing a floating-point algorithm. We provide various basic blocks that can be implemented with floating-point arithmetic. In particular, one can actually compute the rounding error caused by some floating-point operations, which paves the way to designing more accurate algorithms. More generally, properties of floating-point arithmetic make it possible to extend the accuracy of computations beyond working precision.","PeriodicalId":48863,"journal":{"name":"Acta Numerica","volume":"32 1","pages":"203 - 290"},"PeriodicalIF":16.3000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Floating-point arithmetic\",\"authors\":\"S. Boldo, C. Jeannerod, G. Melquiond, Jean-Michel Muller\",\"doi\":\"10.1017/S0962492922000101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Floating-point numbers have an intuitive meaning when it comes to physics-based numerical computations, and they have thus become the most common way of approximating real numbers in computers. The IEEE-754 Standard has played a large part in making floating-point arithmetic ubiquitous today, by specifying its semantics in a strict yet useful way as early as 1985. In particular, floating-point operations should be performed as if their results were first computed with an infinite precision and then rounded to the target format. A consequence is that floating-point arithmetic satisfies the ‘standard model’ that is often used for analysing the accuracy of floating-point algorithms. But that is only scraping the surface, and floating-point arithmetic offers much more. In this survey we recall the history of floating-point arithmetic as well as its specification mandated by the IEEE-754 Standard. We also recall what properties it entails and what every programmer should know when designing a floating-point algorithm. We provide various basic blocks that can be implemented with floating-point arithmetic. In particular, one can actually compute the rounding error caused by some floating-point operations, which paves the way to designing more accurate algorithms. More generally, properties of floating-point arithmetic make it possible to extend the accuracy of computations beyond working precision.\",\"PeriodicalId\":48863,\"journal\":{\"name\":\"Acta Numerica\",\"volume\":\"32 1\",\"pages\":\"203 - 290\"},\"PeriodicalIF\":16.3000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Acta Numerica\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1017/S0962492922000101\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Numerica","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1017/S0962492922000101","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 14

摘要

当涉及到基于物理的数值计算时，浮点数具有直观的含义，因此它们已成为计算机中近似实数的最常见方式。IEEE-754标准早在1985年就以严格而有用的方式规定了浮点运算的语义，在使浮点运算在今天无处不在方面发挥了很大的作用。特别是，执行浮点操作时，应该把它们的结果首先以无限精度计算，然后四舍五入到目标格式。其结果是，浮点运算满足通常用于分析浮点算法精度的“标准模型”。但这仅仅是皮毛，浮点运算提供了更多。在这个调查中，我们回顾了浮点运算的历史，以及它在IEEE-754标准中规定的规范。我们还回顾了它所需要的属性以及每个程序员在设计浮点算法时应该知道的内容。我们提供了各种可以用浮点运算实现的基本块。特别是，可以实际计算由某些浮点操作引起的舍入误差，这为设计更精确的算法铺平了道路。更一般地说，浮点运算的特性使计算精度超越工作精度成为可能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Floating-point arithmetic

Floating-point numbers have an intuitive meaning when it comes to physics-based numerical computations, and they have thus become the most common way of approximating real numbers in computers. The IEEE-754 Standard has played a large part in making floating-point arithmetic ubiquitous today, by specifying its semantics in a strict yet useful way as early as 1985. In particular, floating-point operations should be performed as if their results were first computed with an infinite precision and then rounded to the target format. A consequence is that floating-point arithmetic satisfies the ‘standard model’ that is often used for analysing the accuracy of floating-point algorithms. But that is only scraping the surface, and floating-point arithmetic offers much more. In this survey we recall the history of floating-point arithmetic as well as its specification mandated by the IEEE-754 Standard. We also recall what properties it entails and what every programmer should know when designing a floating-point algorithm. We provide various basic blocks that can be implemented with floating-point arithmetic. In particular, one can actually compute the rounding error caused by some floating-point operations, which paves the way to designing more accurate algorithms. More generally, properties of floating-point arithmetic make it possible to extend the accuracy of computations beyond working precision.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Acta Numerica MATHEMATICS-

CiteScore

26.00

自引率

0.70%

发文量

期刊介绍： Acta Numerica stands as the preeminent mathematics journal, ranking highest in both Impact Factor and MCQ metrics. This annual journal features a collection of review articles that showcase survey papers authored by prominent researchers in numerical analysis, scientific computing, and computational mathematics. These papers deliver comprehensive overviews of recent advances, offering state-of-the-art techniques and analyses. Encompassing the entirety of numerical analysis, the articles are crafted in an accessible style, catering to researchers at all levels and serving as valuable teaching aids for advanced instruction. The broad subject areas covered include computational methods in linear algebra, optimization, ordinary and partial differential equations, approximation theory, stochastic analysis, nonlinear dynamical systems, as well as the application of computational techniques in science and engineering. Acta Numerica also delves into the mathematical theory underpinning numerical methods, making it a versatile and authoritative resource in the field of mathematics.

期刊最新文献

Splitting methods for differential equations Adaptive finite element methods The geometry of monotone operator splitting methods Numerical analysis of physics-informed neural networks and related models in physics-informed machine learning Optimal experimental design: Formulations and computations