首页 > 最新文献

ACM Transactions on Mathematical Software最新文献

英文 中文
Newly Released Capabilities in the Distributed-Memory SuperLU Sparse Direct Solver 分布式内存SuperLU稀疏直接求解器新发布的功能
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3577197
Xiaoye S. Li, Paul Lin, Yang Liu, Piyush Sao

We present the new features available in the recent release of SuperLU_DIST, Version 8.1.1. SuperLU_DIST is a distributed-memory parallel sparse direct solver. The new features include (1) a 3D communication-avoiding algorithm framework that trades off inter-process communication for selective memory duplication, (2) multi-GPU support for both NVIDIA GPUs and AMD GPUs, and (3) mixed-precision routines that perform single-precision LU factorization and double-precision iterative refinement. Apart from the algorithm improvements, we also modernized the software build system to use CMake and Spack package installation tools to simplify the installation procedure. Throughout the article, we describe in detail the pertinent performance-sensitive parameters associated with each new algorithmic feature, show how they are exposed to the users, and give general guidance of how to set these parameters. We illustrate that the solver’s performance both in time and memory can be greatly improved after systematic tuning of the parameters, depending on the input sparse matrix and underlying hardware.

我们将介绍SuperLU_DIST 8.1.1版本中提供的新特性。SuperLU_DIST是一个分布式内存并行稀疏直接求解器。新功能包括(1)一个避免3D通信的算法框架,该框架可以在进程间通信中进行选择性内存复制,(2)支持NVIDIA gpu和AMD gpu的多gpu,以及(3)执行单精度LU分解和双精度迭代细化的混合精度例程。除了算法改进之外,我们还对软件构建系统进行了现代化改造,使用CMake和Spack包安装工具来简化安装过程。在本文中,我们详细描述了与每个新算法特性相关联的相关性能敏感参数,展示了如何向用户展示这些参数,并提供了如何设置这些参数的一般指导。我们证明,根据输入稀疏矩阵和底层硬件,系统调整参数后,求解器在时间和内存方面的性能都可以大大提高。
{"title":"Newly Released Capabilities in the Distributed-Memory SuperLU Sparse Direct Solver","authors":"Xiaoye S. Li, Paul Lin, Yang Liu, Piyush Sao","doi":"https://dl.acm.org/doi/10.1145/3577197","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3577197","url":null,"abstract":"<p>We present the new features available in the recent release of <monospace>SuperLU_DIST</monospace>, Version 8.1.1. <monospace>SuperLU_DIST</monospace> is a distributed-memory parallel sparse direct solver. The new features include (1) a 3D communication-avoiding algorithm framework that trades off inter-process communication for selective memory duplication, (2) multi-GPU support for both NVIDIA GPUs and AMD GPUs, and (3) mixed-precision routines that perform single-precision LU factorization and double-precision iterative refinement. Apart from the algorithm improvements, we also modernized the software build system to use CMake and Spack package installation tools to simplify the installation procedure. Throughout the article, we describe in detail the pertinent performance-sensitive parameters associated with each new algorithmic feature, show how they are exposed to the users, and give general guidance of how to set these parameters. We illustrate that the solver’s performance both in time and memory can be greatly improved after systematic tuning of the parameters, depending on the input sparse matrix and underlying hardware.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Geometric Multigrid Method for Space-Time Finite Element Discretizations of the Navier–Stokes Equations and its Application to 3D Flow Simulation Navier-Stokes方程时空有限元离散的几何多重网格方法及其在三维流动模拟中的应用
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3582492
Mathias Anselmann, Markus Bause

We present a parallelized geometric multigrid (GMG) method, based on the cell-based Vanka smoother, for higher order space-time finite element methods (STFEM) to the incompressible Navier–Stokes equations. The STFEM is implemented as a time marching scheme. The GMG solver is applied as a preconditioner for generalized minimal residual iterations. Its performance properties are demonstrated for 2D and 3D benchmarks of flow around a cylinder. The key ingredients of the GMG approach are the construction of the local Vanka smoother over all degrees of freedom in time of the respective subinterval and its efficient application. For this, data structures that store pre-computed cell inverses of the Jacobian for all hierarchical levels and require only a reasonable amount of memory overhead are generated. The GMG method is built for the deal.II finite element library. The concepts are flexible and can be transferred to similar software platforms.

针对不可压缩Navier-Stokes方程的高阶时空有限元方法,提出了一种基于Vanka平滑的并行几何多网格(GMG)方法。STFEM是一种时间推进方案。将GMG求解器作为广义最小残差迭代的预条件。它的性能性能证明了二维和三维基准的流动围绕一个圆柱体。GMG方法的关键是在各个子区间的所有自由度上构建局部Vanka平滑及其有效应用。为此,生成的数据结构存储所有层次级别的预先计算的雅可比矩阵的单元逆,并且只需要合理数量的内存开销。GMG方法是为该交易构建的。II有限元库。这些概念是灵活的,可以转移到类似的软件平台。
{"title":"A Geometric Multigrid Method for Space-Time Finite Element Discretizations of the Navier–Stokes Equations and its Application to 3D Flow Simulation","authors":"Mathias Anselmann, Markus Bause","doi":"https://dl.acm.org/doi/10.1145/3582492","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3582492","url":null,"abstract":"<p>We present a parallelized geometric multigrid (GMG) method, based on the cell-based Vanka smoother, for higher order space-time finite element methods (STFEM) to the incompressible Navier–Stokes equations. The STFEM is implemented as a time marching scheme. The GMG solver is applied as a preconditioner for generalized minimal residual iterations. Its performance properties are demonstrated for 2D and 3D benchmarks of flow around a cylinder. The key ingredients of the GMG approach are the construction of the local Vanka smoother over all degrees of freedom in time of the respective subinterval and its efficient application. For this, data structures that store pre-computed cell inverses of the Jacobian for all hierarchical levels and require only a reasonable amount of memory overhead are generated. The GMG method is built for the <i>deal.II</i> finite element library. The concepts are flexible and can be transferred to similar software platforms.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithm 1033: Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Distributed-memory Architectures 算法1033:分布式存储结构下随机线性码最小距离计算的并行实现
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3573383
Gregorio Quintana-Ortí, Fernando Hernando, Francisco D. Igual

The minimum distance of a linear code is a key concept in information theory. Therefore, the time required by its computation is very important to many problems in this area. In this article, we introduce a family of implementations of the Brouwer–Zimmermann algorithm for distributed-memory architectures for computing the minimum distance of a random linear code over 𝔽2. Both current commercial and public-domain software only work on either unicore architectures or shared-memory architectures, which are limited in the number of cores/processors employed in the computation. Our implementations focus on distributed-memory architectures, thus being able to employ hundreds or even thousands of cores in the computation of the minimum distance. Our experimental results show that our implementations are much faster, even up to several orders of magnitude, than current implementations widely used nowadays.

线性码的最小距离是信息论中的一个重要概念。因此,其计算所需的时间对于该领域的许多问题都是非常重要的。在本文中,我们将介绍用于计算𝔽2上随机线性代码的最小距离的分布式内存架构的browser - zimmermann算法的一系列实现。当前的商业和公共领域软件都只能在单核架构或共享内存架构上工作,这在计算中使用的核心/处理器数量上是有限的。我们的实现侧重于分布式内存架构,因此能够在最小距离的计算中使用数百甚至数千个内核。我们的实验结果表明,我们的实现比目前广泛使用的实现要快得多,甚至可以达到几个数量级。
{"title":"Algorithm 1033: Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Distributed-memory Architectures","authors":"Gregorio Quintana-Ortí, Fernando Hernando, Francisco D. Igual","doi":"https://dl.acm.org/doi/10.1145/3573383","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3573383","url":null,"abstract":"<p>The minimum distance of a linear code is a key concept in information theory. Therefore, the time required by its computation is very important to many problems in this area. In this article, we introduce a family of implementations of the Brouwer–Zimmermann algorithm for distributed-memory architectures for computing the minimum distance of a random linear code over 𝔽<sub>2</sub>. Both current commercial and public-domain software only work on either unicore architectures or shared-memory architectures, which are limited in the number of cores/processors employed in the computation. Our implementations focus on distributed-memory architectures, thus being able to employ hundreds or even thousands of cores in the computation of the minimum distance. Our experimental results show that our implementations are much faster, even up to several orders of magnitude, than current implementations widely used nowadays.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithm 1032: Bi-cubic Splines for Polyhedral Control Nets 算法1032:多面体控制网的双三次样条
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3570158
Jörg Peters, Kyle Lo, Kȩstutis Karčiauskas

For control nets outlining a large class of topological polyhedra, not just tensor-product grids, bi-cubic polyhedral splines form a piecewise polynomial, first-order differentiable space that associates one function with each vertex. Akin to tensor-product splines, the resulting smooth surface approximates the polyhedron. Admissible polyhedral control nets consist of quadrilateral faces in a grid-like layout, star-configuration where n ≠ 4 quadrilateral faces join around an interior vertex, n-gon configurations, where 2n quadrilaterals surround an n-gon, polar configurations where a cone of n triangles meeting at a vertex is surrounded by a ribbon of n quadrilaterals, and three types of T-junctions where two quad-strips merge into one.

The bi-cubic pieces of a polyhedral spline have matching derivatives along their break lines, possibly after a known change of variables. The pieces are represented in Bernstein-Bézier form with coefficients depending linearly on the polyhedral control net, so that evaluation, differentiation, integration, moments, and so on, are no more costly than for standard tensor-product splines. Bi-cubic polyhedral splines can be used both to model geometry and for computing functions on the geometry. Although polyhedral splines do not offer nested refinement by refinement of the control net, polyhedral splines support engineering analysis of curved smooth objects. Coarse nets typically suffice since the splines efficiently model curved features. Algorithm 1032 is a C++ library with input-output example pairs and an IGES output choice.

对于控制网概述了一大类拓扑多面体,而不仅仅是张量积网格,双三次多面体样条形成了一个分段多项式,一阶可微空间,将一个函数与每个顶点相关联。类似于张量积样条,得到的光滑表面近似于多面体。可接受的多面体控制网由网格状布局的四边形面组成,星形结构(n≠4个四边形面围绕一个内部顶点连接),n形结构(2n个四边形围绕一个n形),极形结构(n个三角形的圆锥在一个顶点会合,被n个四边形的带包围),以及三种类型的t形结(两个四边形合并为一个)。多面体样条的双立方块沿其断行具有匹配的导数,可能在已知变量变化之后。这些块以bernstein - bsamzier形式表示,其系数线性依赖于多面体控制网,因此评估、微分、积分、矩等并不比标准张量积样条花费更多。双三次多面体样条既可用于几何建模,也可用于几何上的函数计算。虽然多面体样条不能通过控制网的细化提供嵌套细化,但多面体样条支持曲面光滑对象的工程分析。粗网通常就足够了,因为样条可以有效地模拟曲线特征。算法1032是一个c++库,具有输入输出示例对和IGES输出选择。
{"title":"Algorithm 1032: Bi-cubic Splines for Polyhedral Control Nets","authors":"Jörg Peters, Kyle Lo, Kȩstutis Karčiauskas","doi":"https://dl.acm.org/doi/10.1145/3570158","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3570158","url":null,"abstract":"<p>For control nets outlining a large class of topological polyhedra, not just tensor-product grids, bi-cubic polyhedral splines form a piecewise polynomial, first-order differentiable space that associates one function with each vertex. Akin to tensor-product splines, the resulting smooth surface approximates the polyhedron. Admissible polyhedral control nets consist of quadrilateral faces in a grid-like layout, star-configuration where <i>n</i> ≠ 4 quadrilateral faces join around an interior vertex, <i>n</i>-gon configurations, where <i>2n</i> quadrilaterals surround an <i>n</i>-gon, polar configurations where a cone of <i>n</i> triangles meeting at a vertex is surrounded by a ribbon of <i>n</i> quadrilaterals, and three types of T-junctions where two quad-strips merge into one. </p><p>The bi-cubic pieces of a polyhedral spline have matching derivatives along their break lines, possibly after a known change of variables. The pieces are represented in Bernstein-Bézier form with coefficients depending linearly on the polyhedral control net, so that evaluation, differentiation, integration, moments, and so on, are no more costly than for standard tensor-product splines. Bi-cubic polyhedral splines can be used both to model geometry and for computing functions on the geometry. Although polyhedral splines do not offer nested refinement by refinement of the control net, polyhedral splines support engineering analysis of curved smooth objects. Coarse nets typically suffice since the splines efficiently model curved features. Algorithm 1032 is a C++ library with input-output example pairs and an IGES output choice.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling Research through the SCIP Optimization Suite 8.0 通过SCIP优化套件8.0实现研究
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-03-10 DOI: 10.1145/3585516
Ksenia Bestuzheva, Mathieu Besançon, Weikun Chen, Antonia Chmiela, Tim Donkiewicz, Jasper van Doornmalen, L. Eifler, Oliver Gaul, Gerald Gamrath, A. Gleixner, Leona Gottwald, Christoph Graczyk, Katrin Halbig, A. Hoen, Christopher Hojny, R. V. D. Hulst, T. Koch, M. Lübbecke, Stephen J. Maher, Frederic Matter, Erik Mühmer, Benjamin Müller, M. Pfetsch, D. Rehfeldt, Steffan Schlein, Franziska SchlÃŰsser, Felipe Serrano, Y. Shinano, Boro Sofranac, Mark Turner, Stefan Vigerske, Fabian Wegscheider, Philip A. Wellner, Dieter Weninger, Jakob Witzig
The SCIP Optimization Suite provides a collection of software packages for mathematical optimization centered around the constraint integer programming framework SCIP. The focus of this article is on the role of the SCIP Optimization Suite in supporting research. SCIP’s main design principles are discussed, followed by a presentation of the latest performance improvements and developments in version 8.0, which serve both as examples of SCIP’s application as a research tool and as a platform for further developments. Furthermore, this article gives an overview of interfaces to other programming and modeling languages, new features that expand the possibilities for user interaction with the framework, and the latest developments in several extensions built upon SCIP.
SCIP优化套件提供了以约束整数规划框架SCIP为中心的数学优化软件包集合。本文的重点是SCIP优化套件在支持研究中的作用。讨论了SCIP的主要设计原则,然后介绍了最新的性能改进和8.0版本的发展,这些都是SCIP作为研究工具和进一步开发平台的应用示例。此外,本文还概述了与其他编程和建模语言的接口、扩展用户与框架交互可能性的新特性,以及基于SCIP构建的几个扩展的最新发展。
{"title":"Enabling Research through the SCIP Optimization Suite 8.0","authors":"Ksenia Bestuzheva, Mathieu Besançon, Weikun Chen, Antonia Chmiela, Tim Donkiewicz, Jasper van Doornmalen, L. Eifler, Oliver Gaul, Gerald Gamrath, A. Gleixner, Leona Gottwald, Christoph Graczyk, Katrin Halbig, A. Hoen, Christopher Hojny, R. V. D. Hulst, T. Koch, M. Lübbecke, Stephen J. Maher, Frederic Matter, Erik Mühmer, Benjamin Müller, M. Pfetsch, D. Rehfeldt, Steffan Schlein, Franziska SchlÃŰsser, Felipe Serrano, Y. Shinano, Boro Sofranac, Mark Turner, Stefan Vigerske, Fabian Wegscheider, Philip A. Wellner, Dieter Weninger, Jakob Witzig","doi":"10.1145/3585516","DOIUrl":"https://doi.org/10.1145/3585516","url":null,"abstract":"The SCIP Optimization Suite provides a collection of software packages for mathematical optimization centered around the constraint integer programming framework SCIP. The focus of this article is on the role of the SCIP Optimization Suite in supporting research. SCIP’s main design principles are discussed, followed by a presentation of the latest performance improvements and developments in version 8.0, which serve both as examples of SCIP’s application as a research tool and as a platform for further developments. Furthermore, this article gives an overview of interfaces to other programming and modeling languages, new features that expand the possibilities for user interaction with the framework, and the latest developments in several extensions built upon SCIP.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42394524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Algorithm 1036: ATC, An Advanced Tucker Compression Library for Multidimensional Data 算法1036:ATC,一种多维数据的高级Tucker压缩库
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-03-01 DOI: 10.1145/3585514
Wouter Baert, N. Vannieuwenhoven
We present ATC, a C++ library for advanced Tucker-based lossy compression of dense multidimensional numerical data in a shared-memory parallel setting, based on the sequentially truncated higher-order singular value decomposition (ST-HOSVD) and bit plane truncation. Several techniques are proposed to improve speed, memory usage, error control and compression rate. First, a hybrid truncation scheme is described which combines Tucker rank truncation and TTHRESH quantization. We derive a novel expression to approximate the error of truncated Tucker decompositions in the case of core and factor perturbations. We parallelize the quantization and encoding scheme and adjust this phase to improve error control. Implementation aspects are described, such as an ST-HOSVD procedure using only a single transposition. We also discuss several usability features of ATC, including the presence of multiple interfaces, extensive data type support, and integrated downsampling of the decompressed data. Numerical results show that ATC maintains state-of-the-art Tucker compression rates while providing average speed-up factors of 2.2 to 3.5 and halving memory usage. Our compressor provides precise error control, deviating only 1.4% from the requested error on average. Finally, ATC often achieves higher compression than non-Tucker-based compressors in the high-error domain.
基于顺序截断的高阶奇异值分解(ST-HOSVD)和位平面截断,我们提出了一个c++库ATC,用于在共享内存并行设置中对密集多维数值数据进行基于tucker的高级有损压缩。提出了几种提高速度、内存使用、错误控制和压缩率的技术。首先,提出了一种结合Tucker秩截断和TTHRESH量化的混合截断方案。我们推导了一个新的表达式来近似在核心和因子扰动情况下截断Tucker分解的误差。我们将量化和编码方案并行化,并调整相位以改善误差控制。描述了实现方面,例如仅使用单个换位的ST-HOSVD过程。我们还讨论了ATC的几个可用性特性,包括多个接口的存在、广泛的数据类型支持以及对解压缩数据的集成下采样。数值结果表明,ATC在提供2.2到3.5的平均加速因子和减半内存使用的同时,保持了最先进的Tucker压缩率。我们的压缩机提供精确的误差控制,平均误差仅为要求误差的1.4%。最后,在高误差域,ATC通常比非基于塔克的压缩器实现更高的压缩。
{"title":"Algorithm 1036: ATC, An Advanced Tucker Compression Library for Multidimensional Data","authors":"Wouter Baert, N. Vannieuwenhoven","doi":"10.1145/3585514","DOIUrl":"https://doi.org/10.1145/3585514","url":null,"abstract":"We present ATC, a C++ library for advanced Tucker-based lossy compression of dense multidimensional numerical data in a shared-memory parallel setting, based on the sequentially truncated higher-order singular value decomposition (ST-HOSVD) and bit plane truncation. Several techniques are proposed to improve speed, memory usage, error control and compression rate. First, a hybrid truncation scheme is described which combines Tucker rank truncation and TTHRESH quantization. We derive a novel expression to approximate the error of truncated Tucker decompositions in the case of core and factor perturbations. We parallelize the quantization and encoding scheme and adjust this phase to improve error control. Implementation aspects are described, such as an ST-HOSVD procedure using only a single transposition. We also discuss several usability features of ATC, including the presence of multiple interfaces, extensive data type support, and integrated downsampling of the decompressed data. Numerical results show that ATC maintains state-of-the-art Tucker compression rates while providing average speed-up factors of 2.2 to 3.5 and halving memory usage. Our compressor provides precise error control, deviating only 1.4% from the requested error on average. Finally, ATC often achieves higher compression than non-Tucker-based compressors in the high-error domain.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42418864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
CPFloat: A C Library for Simulating Low-precision Arithmetic CPFloat:一个模拟低精度算术的C库
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-02-25 DOI: 10.1145/3585515
M. Fasi, M. Mikaitis
One can simulate low-precision floating-point arithmetic via software by executing each arithmetic operation in hardware and then rounding the result to the desired number of significant bits. For IEEE-compliant formats, rounding requires only standard mathematical library functions, but handling subnormals, underflow, and overflow demands special attention, and numerical errors can cause mathematically correct formulae to behave incorrectly in finite arithmetic. Moreover, the ensuing implementations are not necessarily efficient, as the library functions these techniques build upon are typically designed to handle a broad range of cases and may not be optimized for the specific needs of rounding algorithms. CPFloat is a C library for simulating low-precision arithmetics. It offers efficient routines for rounding, performing mathematical computations, and querying properties of the simulated low-precision format. The software exploits the bit-level floating-point representation of the format in which the numbers are stored and replaces costly library calls with low-level bit manipulations and integer arithmetic. In numerical experiments, the new techniques bring a considerable speedup (typically one order of magnitude or more) over existing alternatives in C, C++, and MATLAB. To our knowledge, CPFloat is currently the most efficient and complete library for experimenting with custom low-precision floating-point arithmetic.
可以通过软件模拟低精度浮点运算,方法是在硬件中执行每个算术运算,然后将结果四舍五入到所需的有效位数。对于符合ieee的格式,舍入只需要标准的数学库函数,但是处理次法线、下溢和溢出需要特别注意,并且数值错误可能导致数学上正确的公式在有限算术中表现不正确。此外,随后的实现不一定是高效的,因为构建这些技术的库函数通常是为处理广泛的情况而设计的,可能没有针对舍入算法的特定需求进行优化。CPFloat是一个用于模拟低精度算术的C库。它为舍入、执行数学计算和查询模拟低精度格式的属性提供了有效的例程。该软件利用存储数字的格式的位级浮点表示,并用低级位操作和整数运算取代昂贵的库调用。在数值实验中,与现有的C、c++和MATLAB替代方案相比,新技术带来了相当大的加速(通常是一个数量级或更多)。据我们所知,CPFloat是目前用于实验自定义低精度浮点算法的最有效和最完整的库。
{"title":"CPFloat: A C Library for Simulating Low-precision Arithmetic","authors":"M. Fasi, M. Mikaitis","doi":"10.1145/3585515","DOIUrl":"https://doi.org/10.1145/3585515","url":null,"abstract":"One can simulate low-precision floating-point arithmetic via software by executing each arithmetic operation in hardware and then rounding the result to the desired number of significant bits. For IEEE-compliant formats, rounding requires only standard mathematical library functions, but handling subnormals, underflow, and overflow demands special attention, and numerical errors can cause mathematically correct formulae to behave incorrectly in finite arithmetic. Moreover, the ensuing implementations are not necessarily efficient, as the library functions these techniques build upon are typically designed to handle a broad range of cases and may not be optimized for the specific needs of rounding algorithms. CPFloat is a C library for simulating low-precision arithmetics. It offers efficient routines for rounding, performing mathematical computations, and querying properties of the simulated low-precision format. The software exploits the bit-level floating-point representation of the format in which the numbers are stored and replaces costly library calls with low-level bit manipulations and integer arithmetic. In numerical experiments, the new techniques bring a considerable speedup (typically one order of magnitude or more) over existing alternatives in C, C++, and MATLAB. To our knowledge, CPFloat is currently the most efficient and complete library for experimenting with custom low-precision floating-point arithmetic.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46068558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Task-based Parallel Programming for Scalable Matrix Product Algorithms 基于任务的可扩展矩阵积算法并行编程
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-02-24 DOI: 10.1145/3583560
E. Agullo, A. Buttari, A. Guermouche, J. Herrmann, Antoine Jego
Task-based programming models have succeeded in gaining the interest of the high-performance mathematical software community because they relieve part of the burden of developing and implementing distributed-memory parallel algorithms in an efficient and portable way.In increasingly larger, more heterogeneous clusters of computers, these models appear as a way to maintain and enhance more complex algorithms. However, task-based programming models lack the flexibility and the features that are necessary to express in an elegant and compact way scalable algorithms that rely on advanced communication patterns. We show that the Sequential Task Flow paradigm can be extended to write compact yet efficient and scalable routines for linear algebra computations. Although, this work focuses on dense General Matrix Multiplication, the proposed features enable the implementation of more complex algorithms. We describe the implementation of these features and of the resulting GEMM operation. Finally, we present an experimental analysis on two homogeneous supercomputers showing that our approach is competitive up to 32,768 CPU cores with state-of-the-art libraries and may outperform them for some problem dimensions. Although our code can use GPUs straightforwardly, we do not deal with this case because it implies other issues which are out of the scope of this work.
基于任务的编程模型已经成功地引起了高性能数学软件社区的兴趣,因为它们以一种高效和可移植的方式减轻了开发和实现分布式内存并行算法的部分负担。在越来越大、越来越异构的计算机集群中,这些模型似乎是维护和增强更复杂算法的一种方式。然而,基于任务的编程模型缺乏灵活性和必要的功能,无法以优雅紧凑的方式表达依赖于高级通信模式的可扩展算法。我们证明了序列任务流范式可以扩展到为线性代数计算编写紧凑、高效和可扩展的例程。尽管这项工作的重点是密集的通用矩阵乘法,但所提出的特征能够实现更复杂的算法。我们描述了这些特性的实现以及由此产生的GEMM操作。最后,我们对两台同类超级计算机进行了实验分析,表明我们的方法具有最先进库的32768个CPU核心的竞争力,并且在某些问题维度上可能优于它们。尽管我们的代码可以直接使用GPU,但我们不处理这种情况,因为它暗示了超出本工作范围的其他问题。
{"title":"Task-based Parallel Programming for Scalable Matrix Product Algorithms","authors":"E. Agullo, A. Buttari, A. Guermouche, J. Herrmann, Antoine Jego","doi":"10.1145/3583560","DOIUrl":"https://doi.org/10.1145/3583560","url":null,"abstract":"Task-based programming models have succeeded in gaining the interest of the high-performance mathematical software community because they relieve part of the burden of developing and implementing distributed-memory parallel algorithms in an efficient and portable way.In increasingly larger, more heterogeneous clusters of computers, these models appear as a way to maintain and enhance more complex algorithms. However, task-based programming models lack the flexibility and the features that are necessary to express in an elegant and compact way scalable algorithms that rely on advanced communication patterns. We show that the Sequential Task Flow paradigm can be extended to write compact yet efficient and scalable routines for linear algebra computations. Although, this work focuses on dense General Matrix Multiplication, the proposed features enable the implementation of more complex algorithms. We describe the implementation of these features and of the resulting GEMM operation. Finally, we present an experimental analysis on two homogeneous supercomputers showing that our approach is competitive up to 32,768 CPU cores with state-of-the-art libraries and may outperform them for some problem dimensions. Although our code can use GPUs straightforwardly, we do not deal with this case because it implies other issues which are out of the scope of this work.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48323698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithm xxx: Encapsulated error, a direct approach to evaluate floating-point accuracy 算法xxx:封装错误,一种直接计算浮点精度的方法
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-02-17 DOI: 10.1145/3549205
Nestor Demeure, C. Chevalier, C. Denis, P. Dossantos-Uzarralde
Floating-point numbers represent only a subset of real numbers. As such, floating-point arithmetic introduces approximations that can compound and have a significant impact on numerical simulations. We introduce Encapsulated error, a new way to estimate the numerical error of an application and provide a reference implementation, the Shaman library. Our method uses dedicated arithmetic over a type that encapsulates both the result the user would have had with the original computation and an approximation of its numerical error. We thus can measure the number of significant digits of any result or intermediate result in a simulation. We show that this approach, while simple, gives results competitive with state of the art methods. It has a smaller overhead, and it is compatible with parallelism, making it suitable for the study of large scale applications.
浮点数只表示实数的一个子集。因此,浮点运算引入了可以复合的近似,并对数值模拟产生重大影响。我们介绍了封装误差,这是一种估计应用程序数值误差的新方法,并提供了一个参考实现,即萨满库。我们的方法在一个类型上使用专用算术,该类型封装了用户在原始计算中的结果及其数值误差的近似值。因此,我们可以测量模拟中任何结果或中间结果的有效位数。我们表明,这种方法虽然简单,但其结果与最先进的方法相比具有竞争力。它具有较小的开销,并且兼容并行性,适合于大规模应用的研究。
{"title":"Algorithm xxx: Encapsulated error, a direct approach to evaluate floating-point accuracy","authors":"Nestor Demeure, C. Chevalier, C. Denis, P. Dossantos-Uzarralde","doi":"10.1145/3549205","DOIUrl":"https://doi.org/10.1145/3549205","url":null,"abstract":"Floating-point numbers represent only a subset of real numbers. As such, floating-point arithmetic introduces approximations that can compound and have a significant impact on numerical simulations. We introduce Encapsulated error, a new way to estimate the numerical error of an application and provide a reference implementation, the Shaman library. Our method uses dedicated arithmetic over a type that encapsulates both the result the user would have had with the original computation and an approximation of its numerical error. We thus can measure the number of significant digits of any result or intermediate result in a simulation. We show that this approach, while simple, gives results competitive with state of the art methods. It has a smaller overhead, and it is compatible with parallelism, making it suitable for the study of large scale applications.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42088619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithm xxx: Encapsulated error, a direct approach to evaluate floating-point accuracy 算法xxx:封装错误,一种直接计算浮点精度的方法
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-02-17 DOI: https://dl.acm.org/doi/10.1145/3549205
Nestor Demeure, Cédric Chevalier, Christophe Denis, Pierre Dossantos-Uzarralde

Floating-point numbers represent only a subset of real numbers. As such, floating-point arithmetic introduces approximations that can compound and have a significant impact on numerical simulations. We introduce Encapsulated error, a new way to estimate the numerical error of an application and provide a reference implementation, the Shaman library. Our method uses dedicated arithmetic over a type that encapsulates both the result the user would have had with the original computation and an approximation of its numerical error. We thus can measure the number of significant digits of any result or intermediate result in a simulation. We show that this approach, while simple, gives results competitive with state of the art methods. It has a smaller overhead, and it is compatible with parallelism, making it suitable for the study of large scale applications.

浮点数只是实数的一个子集。因此,浮点运算引入了可以复合并对数值模拟产生重大影响的近似值。我们介绍了封装误差,这是一种估计应用程序数值误差的新方法,并提供了一个参考实现——Shaman库。我们的方法在一个类型上使用专用算法,该类型封装了用户使用原始计算得到的结果及其数值误差的近似值。因此,我们可以在模拟中测量任何结果或中间结果的有效位数。我们表明,这种方法虽然简单,但结果与最先进的方法相竞争。它具有较小的开销,并且与并行性兼容,使其适合于大规模应用程序的研究。
{"title":"Algorithm xxx: Encapsulated error, a direct approach to evaluate floating-point accuracy","authors":"Nestor Demeure, Cédric Chevalier, Christophe Denis, Pierre Dossantos-Uzarralde","doi":"https://dl.acm.org/doi/10.1145/3549205","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3549205","url":null,"abstract":"<p>Floating-point numbers represent only a subset of real numbers. As such, floating-point arithmetic introduces approximations that can compound and have a significant impact on numerical simulations. We introduce Encapsulated error, a new way to estimate the numerical error of an application and provide a reference implementation, the Shaman library. Our method uses dedicated arithmetic over a type that encapsulates both the result the user would have had with the original computation and an approximation of its numerical error. We thus can measure the number of significant digits of any result or intermediate result in a simulation. We show that this approach, while simple, gives results competitive with state of the art methods. It has a smaller overhead, and it is compatible with parallelism, making it suitable for the study of large scale applications.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Mathematical Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1