2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)最新文献

英文中文

Generating Very Large RNS Bases 生成非常大的RNS基

2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)

Pub Date : 2022-09-01 DOI: 10.1109/arith54963.2022.00027

J. Bajard, Kazuhide Fukushima, T. Plantard, Arnaud Sipasseuth

引用次数: 0

Accelerating Variants of the Conjugate Gradient with the Variable Precision Processor 用变精度处理器加速共轭梯度的变分

2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)

Pub Date : 2022-09-01 DOI: 10.1109/ARITH54963.2022.00017

Y. Durand, E. Guthmuller, C. F. Tortolero, Jérôme Fereyre, Andrea Bocco, Riccardo Alidori

Linear algebra kernels such as linear solvers, eigen-solvers are the actual working engine underneath many scientific applications. The growing scale of these applications has led researchers to rely on high-precision computing for improving their efficiency and their stability. In this work, we investigate the impact of arbitrary extended precision on multiple variants of the Conjugate Gradient method (CG). We show how our VRP processor improves the convergence and the efficiency of these kernels. We also illustrate how our set of tools (library, software environment) enables to migrate legacy applications in a fast and intuitive way while preserving high-performance. We observe up to an 8X improvements on kernel iteration count, and up to a 40 % improvement on latency. Nevertheless, the main benefit is the stability gained with the precision. It makes it possible to resolve larger and ill-conditioned systems without costly compensating techniques.

线性代数核，如线性解算器、特征解算器是许多科学应用的实际工作引擎。这些应用的规模越来越大，使得研究人员依靠高精度计算来提高它们的效率和稳定性。在这项工作中，我们研究了任意扩展精度对共轭梯度方法(CG)的多个变体的影响。我们将展示VRP处理器如何提高这些内核的收敛性和效率。我们还说明了我们的工具集(库、软件环境)如何能够以快速和直观的方式迁移遗留应用程序，同时保持高性能。我们观察到内核迭代次数提高了8倍，延迟提高了40%。然而，主要的好处是稳定性和精度。它使得不需要昂贵的补偿技术就可以求解更大的病态系统成为可能。

引用次数: 0

Formal Verification of a Chained Multiply-Add Design: Combining Theorem Proving and Equivalence Checking 链式乘加设计的形式化验证:结合定理证明与等价检验

2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)

Pub Date : 2022-09-01 DOI: 10.1109/ARITH54963.2022.00030

David M. Russinoff, J. Bruguera, C. Chau, M. Manjrekar, Nicholas Pfister, Harsha Valsaraju

We present a hybrid methodology for the formal verification of arithmetic RTL designs that combines sequential logic equivalence checking with interactive theorem proving in a two-step process. First, an intermediate model of the design is extracted by hand and coded in Restricted Algorithmic C, a simple C subset augmented by the C++ register class templates of Algorithmic C, which provide the bit manipulation features of Verilog. The model is designed to mirror the RTL microarchitecture closely enough to allow efficient equivalence checking, but sufficiently abstract to be amenable to formal analysis. The model is then automatically translated to the logic of the ACL2 theorem prover, which is used to establish correctness with respect to an architectural specification. As an illustration, we describe the modeling and proof of correctness of a chained multiply-add module, designed to test techniques for area and power reduction and intended for implementation in future Arm graphics nrocessors.

我们提出了一种混合方法，用于算术RTL设计的形式化验证，该方法将顺序逻辑等价检验与交互定理证明结合在一起，分两步进行。首先，手工提取设计的中间模型并在Restricted Algorithmic C中编码，这是一个简单的C子集，由Algorithmic C的c++寄存器类模板增强，提供Verilog的位操作功能。该模型被设计成足够紧密地反映RTL微体系结构，以允许有效的等价性检查，但又足够抽象，以适应形式化分析。然后将模型自动转换为ACL2定理证明器的逻辑，该逻辑用于建立关于体系结构规范的正确性。作为一个例子，我们描述了一个链式乘加模块的建模和正确性证明，该模块旨在测试减少面积和功耗的技术，并打算在未来的Arm图形处理器中实现。

引用次数: 0

Low-Latency and High-Bandwidth Pipelined Radix-64 Division and Square Root Unit 低延迟和高带宽管道64根除法和平方根单位

2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)

Pub Date : 2022-09-01 DOI: 10.1109/ARITH54963.2022.00012

J. Bruguera

Digit-recurrence algorithms are widely used in actual microprocessors to compute floating-point division and square root. These iterative algorithms present a good trade-off in terms of performance, area and power. Commercial processors have non-pipelined division and square root units where part of the logic is used over several cycles. The main drawbacks of these non-pipelined units are the long latency of the traditional division and square root implementations, the low bandwidth (or throughput) due to the reuse of part of the logic over several cycles, and its hardware complexity with separated logic for division and square root. We present a radix-64 floating-point division and square root algorithm with a common iteration for division and square root and where each radix-64 iteration is made of two simpler radix-8 iterations. The radix-64 algorithm allows to get low-latency operations, and the common division and square root radix-64 iteration results in some area reduction. The algorithm is mapped into a low-latency and high-bandwidth pipelined unit.

数字递归算法在实际微处理器中广泛应用于浮点除法和平方根运算。这些迭代算法在性能、面积和功率方面表现出良好的权衡。商业处理器具有非流水线的除法和平方根单元，其中部分逻辑在几个周期中使用。这些非流水线单元的主要缺点是传统的除法和平方根实现的长延迟，由于在几个周期内重复使用部分逻辑而导致的低带宽(或吞吐量)，以及除法和平方根的分离逻辑的硬件复杂性。我们提出了一个基数64浮点除法和平方根算法，该算法具有除法和平方根的公共迭代，其中每个基数64迭代由两个更简单的基数8迭代组成。基数-64算法允许获得低延迟的操作，并且公除法和根号基数-64迭代可以减少一些面积。该算法被映射到一个低延迟和高带宽的流水线单元。

引用次数: 1

A BF16 FMA is All You Need for DNN Training 一个BF16 FMA是所有你需要的DNN训练

2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)

Pub Date : 2022-09-01 DOI: 10.1109/arith54963.2022.00011

John Osorio Ríos, Adrià Armejach, E. Petit, G. Henry, Marc Casas

引用次数: 0

Enhanced Floating-Point Adder with Full Denormal Support 增强浮点加法器与完全正常的支持

2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)

Pub Date : 2022-09-01 DOI: 10.1109/ARITH54963.2022.00015

Jongwook Sohn, David K. Dean, Eric E. Quintana, Wing Shek Wong

This paper presents an enhanced floating-point adder (FADD) design for the Intel E-Core processor. Floating-point addition and subtraction are two of the most widely used operations in many applications. The proposed FADD is executed in 2 cycles, fully pipelined, handles SSE/AVX operations for scalar/packed IEEE single and double precision, and supports all four rounding modes. Also, the proposed FADD fully supports both denormal inputs and underflow outputs without microcode assistance. To achieve the 2-cycle FADD with full denormal support, several optimization techniques are applied: split path algorithm, early alignment and sticky logic, parallel addition, rounding and all-ones detection, and modified leading zero anticipation (LZA) for masking the underflow. As a result, the proposed FADD achieved not only full denormal support but also about 12.5% reduced latency compared to the traditional FADD designs.

本文提出了一种适用于Intel E-Core处理器的增强型浮点加法器(FADD)设计。浮点加法和减法是许多应用程序中使用最广泛的两个操作。所提出的FADD在2个周期内执行，完全流水线化，处理标量/封装IEEE单精度和双精度的SSE/AVX操作，并支持所有四种舍入模式。此外，拟议的FADD完全支持非正常输入和下流输出，而无需微码辅助。为了实现完全非正常支持的2周期FADD，采用了几种优化技术:分离路径算法、早期对齐和粘滞逻辑、并行加法、舍入和全一检测，以及改进的前导零预测(LZA)来掩盖下流。因此，与传统的FADD设计相比，所提出的FADD不仅实现了完全的非正常支持，而且延迟降低了12.5%。

引用次数: 0

PMNS for efficient arithmetic and small memory cost PMNS具有高效的运算和较小的存储开销

2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)

Pub Date : 2022-09-01 DOI: 10.1109/ARITH54963.2022.00023

Fangan-Yssouf Dosso, J. Robert, P. Véron

引用次数: 0

Point-Targeted Sparseness and Ling Transforms on Parallel Prefix Adder Trees 并行前缀加法树的点目标稀疏性和Ling变换

2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)

Pub Date : 2022-09-01 DOI: 10.1109/ARITH54963.2022.00021

Teodor-Dumitru Ene, J. Stine

Rephrasing binary addition as a parallel prefix tree problem allows for the generation of high-performance architectures with logarithmic delay. Modern literature and implementation seeks to explore this prefix tree design space in order to identify optimal circuits for each target application. This paper broadens the scope of the design space by treating both preprocessing and post-processing nodes as malleable parts of the tree structure. Structures obtained through this novel approach are shown to have superior performance. Implementation results are presented using the SkyWater Open Source 130nm PDK and the open-source tools developed by this paper are made available.

将二进制加法重新表述为并行前缀树问题，可以生成具有对数延迟的高性能架构。现代文献和实现试图探索这个前缀树设计空间，以便为每个目标应用确定最佳电路。本文将预处理节点和后处理节点都视为树形结构的可延展部分，拓宽了设计空间的范围。通过这种新方法获得的结构具有优异的性能。使用SkyWater开源130nm PDK给出了实现结果，并提供了本文开发的开源工具。

引用次数: 2

Bounding the Round-Off Error of the Upwind Scheme for Advection 平流逆风方案舍入误差的边界

2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)

Pub Date : 2022-09-01 DOI: 10.1109/arith54963.2022.00031

Louise Ben Salem-Knapp, S. Boldo, William Weens

引用次数: 0

Efficient Reduction Algorithms for Special Gaussian Integer Moduli 特殊高斯整数模的高效约简算法

2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)

Pub Date : 2022-09-01 DOI: 10.1109/ARITH54963.2022.00029

Malek Safieh, F. D. Santis

Gaussian integers are a subset of the complex numbers with integers as real and imaginary parts. When Gaussian integers are equipped with modulo operations, they form Gaussian integer rings or fields, depending on the specific choice of the modulus. Arithmetic on Gaussian integers can offer advantages in terms of operand size and improved parallelism, due to independent calculation of the real and imaginary parts. However, although Gaussian integer modulo reduction is the fundamental operation to enable computations in finite Gaussian integer rings and fields, efficient algorithms for Gaussian integer modulo reduction have not been widely investigated so far. In this work, we fill this gap and present efficient reduction algorithms for Gaussian integer moduli of special forms. Indeed, we demonstrate that there exist different classes of Gaussian integer moduli allowing for very fast reductions. Finally, we show that the computational complexity of the proposed algorithm is significantly reduced compared with generic Gaussian integer reduction methods known to date, e.g., Montgomery-based reduction for Gaussian integers.

高斯整数是以整数为实部和虚部的复数的子集。当对高斯整数进行模运算时，根据模数的具体选择，它们形成高斯整数环或高斯整数场。高斯整数的算术可以在操作数大小和改进的并行性方面提供优势，由于实部和虚部的独立计算。然而，尽管高斯整数模约是实现有限高斯整数环和场计算的基本运算，但有效的高斯整数模约算法迄今尚未得到广泛的研究。在这项工作中，我们填补了这一空白，并提出了特殊形式的高斯整数模的有效约简算法。事实上，我们证明了存在不同类别的高斯整数模，允许非常快速的约简。最后，我们证明了与目前已知的通用高斯整数约简方法(例如基于montgomery的高斯整数约简)相比，所提出算法的计算复杂度显著降低。

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE 29th Symposium on Computer Arithmetic (ARITH)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀