2013 IEEE 21st Symposium on Computer Arithmetic最新文献

英文中文

Relation Collection for the Function Field Sieve 函数字段筛选的关系集合

2013 IEEE 21st Symposium on Computer Arithmetic

Pub Date : 2013-04-07 DOI: 10.1109/ARITH.2013.28

J. Detrey, P. Gaudry, M. Videau

In this paper, we focus on the relation collection step of the Function Field Sieve (FFS), which is to date the best algorithm known for computing discrete logarithms in small-characteristic finite fields of cryptographic sizes. Denoting such a finite field by Fpn, where p is much smaller than n, the main idea behind this step is to find polynomials of the form a(t)-b(t)x in Fp[t][x] which, when considered as principal ideals in carefully selected function fields, can be factored into products of low-degree prime ideals. Such polynomials are called "relations", and current record-sized discrete-logarithm computations need billions of those. Collecting relations is therefore a crucial and extremely expensive step in FFS, and a practical implementation thereof requires heavy use of cache-aware sieving algorithms, along with efficient polynomial arithmetic over Fp[t]. This paper presents the algorithmic and arithmetic techniques which were put together as part of a new public implementation of FFS, aimed at medium-to record-sized computations.

在本文中，我们重点研究了函数域筛(Function Field Sieve, FFS)的关系收集步骤，这是迄今为止已知的在密码大小的小特征有限域中计算离散对数的最佳算法。用Fpn表示这样一个有限域，其中p比n小得多，这一步背后的主要思想是在Fp[t][x]中找到形式为a(t)-b(t)x的多项式，当这些多项式被认为是精心选择的函数域中的主理想时，可以分解成低次素理想的乘积。这样的多项式被称为“关系”，而当前创纪录规模的离散对数计算需要数十亿个这样的多项式。因此，收集关系在FFS中是一个关键且极其昂贵的步骤，其实际实现需要大量使用缓存感知筛选算法，以及对Fp[t]的高效多项式算法。本文介绍了作为FFS新公共实现的一部分的算法和算术技术，旨在进行中等到记录大小的计算。

引用次数: 18

SIPE: Small Integer Plus Exponent SIPE:小整数加指数

2013 IEEE 21st Symposium on Computer Arithmetic

Pub Date : 2013-04-07 DOI: 10.1109/ARITH.2013.22

Vincent Lefèvre

SIPE (Small Integer Plus Exponent) is a mini-library in the form of a C header file, to perform floating-point computations in very low precisions with correct rounding to nearest in radix 2. The goal of such a tool is to do proofs of algorithms/properties or computations of tight error bounds in these precisions by exhaustive tests, in order to try to generalize them to higher precisions. The currently supported operations are addition, subtraction, multiplication (possibly with the error term), FMA, and miscellaneous comparisons and conversions. Timing comparisons have been done with hardware IEEE-754 floating point and with GNU MPFR.

SIPE(小整数加指数)是一个C头文件形式的迷你库，用于执行非常低精度的浮点计算，并正确舍入到最接近的基数2。这种工具的目标是通过详尽的测试来证明算法/属性或计算这些精度中的严格误差界限，以便尝试将它们推广到更高的精度。目前支持的操作是加法、减法、乘法(可能带有错误项)、FMA以及各种比较和转换。在硬件IEEE-754浮点和GNU MPFR上进行了时间比较。

引用次数: 4

Numerical Reproducibility and Accuracy at ExaScale 在ExaScale上的数值再现性和准确性

2013 IEEE 21st Symposium on Computer Arithmetic

Pub Date : 2013-04-07 DOI: 10.1109/ARITH.2013.43

J. Demmel, Hong Diep Nguyen

Given current hardware trends, ExaScale computing (1018 floating point operations per second) is projected to be available in less than a decade, achieved by using a huge number of processors, of order 109. Given the likely hardware heterogeneity in both platform and network, and the possibility of intermittent failures, dynamic scheduling will be needed to adapt to changing resources and loads. This will make it likely that repeated runs of a program will not execute operations like reductions in exactly the same order. This in turn will make reproducibility, i.e. getting bitwise identical results from run to run, difficult to achieve, because floating point operations like addition are not associative, so computing sums in different orders often leads to different results. Indeed, this is already a challenge on today's platforms.

考虑到当前的硬件趋势，ExaScale计算(每秒1018次浮点运算)预计将在不到十年的时间内实现，通过使用数量级为109的大量处理器来实现。考虑到平台和网络中可能存在的硬件异构性，以及间歇性故障的可能性，将需要动态调度来适应不断变化的资源和负载。这将使程序的重复运行可能不会以完全相同的顺序执行诸如缩减之类的操作。这反过来又会使再现性难以实现，即每次运行都获得相同的位结果，因为浮点运算(如加法)不是关联运算，因此以不同顺序计算总和通常会导致不同的结果。事实上，这在今天的平台上已经是一个挑战。

引用次数: 18

Improved Architectures for a Floating-Point Fused Dot Product Unit 浮点融合点积单元的改进架构

2013 IEEE 21st Symposium on Computer Arithmetic

Pub Date : 2013-04-07 DOI: 10.1109/ARITH.2013.26

Jongwook Sohn, E. Swartzlander

This paper presents improved architectures for a floating-point fused two-term dot product unit. The floating-point fused dot product unit is useful for a wide variety of digital signal processing (DSP) applications including complex multiplication and fast Fourier transform (FFT) and discrete cosine transform (DCT) butterfly operations. In order to improve the performance, a new alignment scheme, early normalization, a four-input leading zero anticipation (LZA), a dual-path algorithm, and pipelining are applied. The proposed designs are implemented for single precision and synthesized with a 45nm standard cell library. The proposed dual-path design reduces the latency by 25% compared to the traditional floating-point fused dot product unit. Based on a data flow analysis, the proposed design can be split into three pipeline stages. Since the latencies of the three stages are fairly well balanced, the throughput is increased by a factor of 2.8 compared to the non-pipelined dual-path design.

提出了一种改进的浮点融合两项点积单元结构。浮点融合点积单元可用于各种数字信号处理(DSP)应用，包括复杂乘法和快速傅立叶变换(FFT)和离散余弦变换(DCT)蝴蝶运算。为了提高性能，采用了一种新的对齐方案、早期归一化、四输入前导零预判(LZA)、双路径算法和流水线。所提出的设计实现了单精度，并与45nm标准细胞库合成。与传统的浮点融合点积单元相比，所提出的双路径设计将延迟降低了25%。基于数据流分析，提出的设计可分为三个管道阶段。由于三个阶段的延迟相当平衡，因此与非流水线双路径设计相比，吞吐量增加了2.8倍。

引用次数: 42

Comparison between Binary64 and Decimal64 Floating-Point Numbers Binary64和Decimal64浮点数的比较

2013 IEEE 21st Symposium on Computer Arithmetic

Pub Date : 2013-04-07 DOI: 10.1109/ARITH.2013.23

N. Brisebarre, M. Mezzarobba, J. Muller, C. Lauter

We introduce a software-oriented algorithm that allows one to quickly compare a binary64 floating-point (FP) number and a decimal64 FP number, assuming the "binary encoding" of the decimal formats specified by the IEEE 754-2008 standard for FP arithmetic is used. It is a two-step algorithm: a first pass, based on the exponents only, makes it possible to quickly eliminate most cases, then when the first pass does not suffice, a more accurate second pass is required. We provide an implementation of several variants of our algorithm, and compare them.

我们介绍了一种面向软件的算法，它允许人们快速比较二进制64浮点数(FP)和十进制64浮点数，假设使用IEEE 754-2008标准为FP算法指定的十进制格式的“二进制编码”。它是一个两步算法:第一次通过，仅基于指数，使快速消除大多数情况成为可能，然后当第一次通过不够时，需要更准确的第二次通过。我们提供了算法的几个变体的实现，并对它们进行了比较。

引用次数: 5

Fast Reproducible Floating-Point Summation 快速可重复浮点求和

2013 IEEE 21st Symposium on Computer Arithmetic

Pub Date : 2013-04-07 DOI: 10.1109/ARITH.2013.9

J. Demmel, Hong Diep Nguyen

Reproducibility, i.e. getting the bitwise identical floating point results from multiple runs of the same program, is a property that many users depend on either for debugging or correctness checking in many codes [1]. However, the combination of dynamic scheduling of parallel computing resources, and floating point nonassociativity, make attaining reproducibility a challenge even for simple reduction operations like computing the sum of a vector of numbers in parallel. We propose a technique for floating point summation that is reproducible independent of the order of summation. Our technique uses Rump's algorithm for error-free vector transformation [2], and is much more efficient than using (possibly very) high precision arithmetic. Our algorithm trades off efficiency and accuracy: we reproducibly attain reasonably accurate results (with an absolute error bound c · n2 · macheps · max |vi| for a small constant c) with just 2n + O(1) floating-point operations, and quite accurate results (with an absolute error bound c · n3 · macheps2 · max |vi| with 5n + O(1) floating point operations, both with just two reduction operations. Higher accuracies are also possible by increasing the number of error-free transformations. As long as the same rounding mode is used, results computed by the proposed algorithms are reproducible for any run on any platform.

可重复性，即从同一程序的多次运行中获得按位相同的浮点结果，是许多用户在许多代码中进行调试或正确性检查所依赖的属性[1]。然而，并行计算资源的动态调度和浮点非结合性使得即使是简单的约简操作(如并行计算数字向量的和)也很难获得再现性。我们提出了一种与求和顺序无关的可重复浮点求和技术。我们的技术使用Rump算法进行无误差向量变换[2]，并且比使用(可能非常)高精度的算法要高效得多。我们的算法权衡了效率和精度:只需2n + O(1)个浮点运算，我们就可以重复地获得相当准确的结果(对于小常数c，绝对误差范围为c·n2·macheps·max |vi|)，并且只需5n + O(1)个浮点运算，我们就可以获得相当准确的结果(绝对误差范围为c·n3·macheps2·max |vi|)，两者都只需要两个约简运算。通过增加无错误转换的数量也可以提高精度。只要使用相同的舍入模式，所提出的算法计算的结果在任何平台上运行都是可重复的。

{"title":"Fast Reproducible Floating-Point Summation","authors":"J. Demmel, Hong Diep Nguyen","doi":"10.1109/ARITH.2013.9","DOIUrl":"https://doi.org/10.1109/ARITH.2013.9","url":null,"abstract":"Reproducibility, i.e. getting the bitwise identical floating point results from multiple runs of the same program, is a property that many users depend on either for debugging or correctness checking in many codes [1]. However, the combination of dynamic scheduling of parallel computing resources, and floating point nonassociativity, make attaining reproducibility a challenge even for simple reduction operations like computing the sum of a vector of numbers in parallel. We propose a technique for floating point summation that is reproducible independent of the order of summation. Our technique uses Rump's algorithm for error-free vector transformation [2], and is much more efficient than using (possibly very) high precision arithmetic. Our algorithm trades off efficiency and accuracy: we reproducibly attain reasonably accurate results (with an absolute error bound c · n2 · macheps · max |vi| for a small constant c) with just 2n + O(1) floating-point operations, and quite accurate results (with an absolute error bound c · n3 · macheps2 · max |vi| with 5n + O(1) floating point operations, both with just two reduction operations. Higher accuracies are also possible by increasing the number of error-free transformations. As long as the same rounding mode is used, results computed by the proposed algorithms are reproducible for any run on any platform.","PeriodicalId":211528,"journal":{"name":"2013 IEEE 21st Symposium on Computer Arithmetic","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124855960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 76

On the Componentwise Accuracy of Complex Floating-Point Division with an FMA 基于FMA的复杂浮点除法分量精度研究

2013 IEEE 21st Symposium on Computer Arithmetic

Pub Date : 2013-04-07 DOI: 10.1109/ARITH.2013.8

C. Jeannerod, N. Louvet, J. Muller

This paper deals with the accuracy of complex division in radix-two floating-point arithmetic. Assuming that a fused multiply-add (FMA) instruction is available and that no underflow/overflow occurs, we study how to ensure high relative accuracy in the component wise sense. Since this essentially reduces to evaluating accurately three expressions of the form ac+bd, an obvious approach would be to perform three calls to Kahan's compensated algorithm for 2 by 2 determinants. However, in the context of complex division, two of those expressions are such that ac and bd have the same sign, suggesting that cheaper schemes should be used here (since cancellation cannot occur). We first give a detailed accuracy analysis of such schemes for the sum of two nonnegative products, providing not only sharp bounds on both their absolute and relative errors, but also sufficient conditions for the output of one of them to coincide with the output of Kahan's algorithm. By combining Kahan's algorithm with this particular scheme, we then deduce two new division algorithms. Our first algorithm is a straight-line program whose component wise relative error is always at most 5u+13u2 with u the unit round off, we also provide examples of inputs for which the error of this algorithm approaches 5u, thus showing that our upper bound is essentially the best possible. When tests are allowed we show with a second algorithm that the bound above can be further reduced to 4.5u+9u2, and that this improved bound is reasonably sharp.

本文讨论了双基数浮点运算中复数除法的精度问题。假设有一个融合乘加(FMA)指令可用，并且没有下溢/溢出的情况下，我们研究了如何在组件意义上保证较高的相对精度。由于这本质上简化为准确地计算三个形式为ac+bd的表达式，因此一个明显的方法是对2 × 2行列式的Kahan补偿算法执行三次调用。然而，在复数除法的上下文中，其中两个表达式使得ac和bd具有相同的符号，这表明这里应该使用更便宜的方案(因为无法进行消去)。我们首先对这两种非负积和方案进行了详细的精度分析，不仅给出了它们的绝对误差和相对误差的明确界限，而且给出了其中一种方案的输出与Kahan算法的输出一致的充分条件。通过将Kahan算法与该方案相结合，我们推导出了两种新的除法算法。我们的第一个算法是一个直线程序，其组件相对误差总是最多5u+13u2，其中u是单位四舍五入，我们还提供了该算法误差接近5u的输入示例，从而表明我们的上限本质上是最好的可能。在允许测试的情况下，我们用第二种算法证明，上面的边界可以进一步减小到4.5u+9u2，并且这个改进的边界相当尖锐。

{"title":"On the Componentwise Accuracy of Complex Floating-Point Division with an FMA","authors":"C. Jeannerod, N. Louvet, J. Muller","doi":"10.1109/ARITH.2013.8","DOIUrl":"https://doi.org/10.1109/ARITH.2013.8","url":null,"abstract":"This paper deals with the accuracy of complex division in radix-two floating-point arithmetic. Assuming that a fused multiply-add (FMA) instruction is available and that no underflow/overflow occurs, we study how to ensure high relative accuracy in the component wise sense. Since this essentially reduces to evaluating accurately three expressions of the form ac+bd, an obvious approach would be to perform three calls to Kahan's compensated algorithm for 2 by 2 determinants. However, in the context of complex division, two of those expressions are such that ac and bd have the same sign, suggesting that cheaper schemes should be used here (since cancellation cannot occur). We first give a detailed accuracy analysis of such schemes for the sum of two nonnegative products, providing not only sharp bounds on both their absolute and relative errors, but also sufficient conditions for the output of one of them to coincide with the output of Kahan's algorithm. By combining Kahan's algorithm with this particular scheme, we then deduce two new division algorithms. Our first algorithm is a straight-line program whose component wise relative error is always at most 5u+13u2 with u the unit round off, we also provide examples of inputs for which the error of this algorithm approaches 5u, thus showing that our upper bound is essentially the best possible. When tests are allowed we show with a second algorithm that the bound above can be further reduced to 4.5u+9u2, and that this improved bound is reasonably sharp.","PeriodicalId":211528,"journal":{"name":"2013 IEEE 21st Symposium on Computer Arithmetic","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130589541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

How to Compute the Area of a Triangle: A Formal Revisit 如何计算三角形的面积:一个正式的回顾

2013 IEEE 21st Symposium on Computer Arithmetic

Pub Date : 2013-04-07 DOI: 10.1109/ARITH.2013.29

S. Boldo

Mathematical values are usually computed using well-known mathematical formulas without thinking about their accuracy, which may turn awful with particular instances. This is the case for the computation of the area of a triangle. When the triangle is needle-like, the common formula has a very poor accuracy. Kahan proposed in 1986 an algorithm he claimed correct within a few ulps. Goldberg took over this algorithm in 1991 and gave a precise error bound. This article presents a formal proof of this algorithm, an improvement of its error bound and new investigations in case of underflow.

通常使用众所周知的数学公式计算数学值，而不考虑其准确性，这在特定情况下可能会变得很糟糕。这就是计算三角形面积的例子。当三角形呈针状时，常用公式的精度很差。卡汉在1986年提出了一种算法，他声称在几秒钟内就能正确。Goldberg在1991年继承了这个算法，并给出了一个精确的误差范围。本文给出了该算法的形式化证明，改进了其误差界，并对下流情况进行了新的研究。

引用次数: 4

Truncated Logarithmic Approximation 截断对数近似

2013 IEEE 21st Symposium on Computer Arithmetic

Pub Date : 2013-04-07 DOI: 10.1109/ARITH.2013.34

Michael B. Sullivan, E. Swartzlander

The speed and levels of integration of modern devices have risen to the point that arithmetic can be performed very fast and with high precision. Precise arithmetic comes at a hidden cost-by computing results past the precision they require, systems inefficiently utilize their resources. Numerous designs over the past fifty years have demonstrated scalable efficiency by utilizing approximate logarithms. Many such designs are based off of a linear approximation algorithm developed by Mitchell. This paper evaluates a truncated form of binary logarithm as a replacement for Mitchell's algorithm. The truncated approximate logarithm simultaneously improves the efficiency and precision of Mitchell's approximation while remaining simple to implement.

现代设备的集成速度和水平已经提高到可以非常快速和高精度地执行算术的程度。精确的计算有一个隐藏的代价——计算结果的精度超过了它们所要求的精度，系统就不能有效地利用它们的资源。在过去的五十年中，许多设计已经通过使用近似对数证明了可扩展的效率。许多这样的设计都是基于米切尔开发的线性近似算法。本文评估了截断形式的二进制对数作为米切尔算法的替代。截断的近似对数同时提高了米切尔近似的效率和精度，同时保持了简单的实现。

引用次数: 18

Precision, Accuracy, and Rounding Error Propagation in Exascale Computing 百亿亿次计算中的精度、准确度和舍入误差传播

2013 IEEE 21st Symposium on Computer Arithmetic

Pub Date : 2013-04-07 DOI: 10.1109/ARITH.2013.42

Marius Cornea

Exascale level computers might be available in less than a decade. Computer architects are already thinking of, and planning to achieve such levels of performance. It is reasonable to expect that researchers and engineers will carry out scientific and engineering computations more complex than ever before, and will attempt breakthroughs not possible today. If the size of the problems solved on such machines scales accordingly, we may face new issues related to precision, accuracy, performance, and programmability. The paper examines some relevant aspects of this problem.

百亿亿次级计算机可能在不到十年的时间内问世。计算机架构师已经在考虑并计划实现这样的性能水平。我们有理由期望，研究人员和工程师将进行比以往任何时候都更加复杂的科学和工程计算，并将尝试今天不可能实现的突破。如果在这样的机器上解决的问题的规模相应地扩大，我们可能会面临与精度、准确性、性能和可编程性相关的新问题。本文探讨了这一问题的一些相关方面。

引用次数: 3

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 IEEE 21st Symposium on Computer Arithmetic

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀