ACM Transactions on Mathematical Software最新文献

英文中文

Improvements to SLEPc in Releases 3.14–3.18 版本3.14-3.18对SLEPc的改进

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2023-06-07 DOI: 10.1145/3603373

J. Román, F. Alvarruiz, C. Campos, Lisandro Dalcin, P. Jolivet, A. L. Daviña

This short article describes the main new features added to SLEPc, the Scalable Library for Eigenvalue Problem Computations, in the past two and a half years, corresponding to five release versions. The main novelty is the extension of the SVD module with new problem types, such as the generalized SVD or the hyperbolic SVD. Additionally, many improvements have been incorporated in different parts of the library, including contour integral eigensolvers, preconditioning, and GPU support.

这篇短文描述了在过去两年半的时间里，SLEPc(特征值问题计算的可扩展库)增加的主要新特性，对应于五个发布版本。主要的新颖之处在于SVD模块扩展了新的问题类型，如广义SVD或双曲SVD。此外，许多改进已纳入库的不同部分，包括轮廓积分特征求解器，预处理和GPU支持。

引用次数: 0

Algorithms for Parallel Generic hp-adaptive Finite Element Software 并行通用hp自适应有限元软件算法

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2023-06-05 DOI: https://dl.acm.org/doi/10.1145/3603372

Marc Fehling, Wolfgang Bangerth

The hp-adaptive finite element method (FEM) – where one independently chooses the mesh size (h) and polynomial degree (p) to be used on each cell – has long been known to have better theoretical convergence properties than either h- or p-adaptive methods alone. However, it is not widely used, owing at least in parts to the difficulty of the underlying algorithms and the lack of widely usable implementations. This is particularly true when used with continuous finite elements.

Herein, we discuss algorithms that are necessary for a comprehensive and generic implementation of hp-adaptive finite element methods on distributed-memory, parallel machines. In particular, we will present a multi-stage algorithm for the unique enumeration of degrees of freedom (DoFs) suitable for continuous finite element spaces, describe considerations for weighted load balancing, and discuss the transfer of variable size data between processes. We illustrate the performance of our algorithms with numerical examples, and demonstrate that they scale reasonably up to at least 16 384 Message Passing Interface (MPI) processes.

We provide a reference implementation of our algorithms as part of the open-source library deal.II.

hp-adaptive finite element method (FEM) -其中一个独立选择网格尺寸(h)和多项式度(p)用于每个单元-早已被认为具有更好的理论收敛性比h-或p-adaptive方法单独。然而，它并没有被广泛使用，至少部分原因是底层算法的困难和缺乏广泛可用的实现。当使用连续有限单元时尤其如此。在此，我们讨论了在分布式内存并行机器上全面和通用地实现hp自适应有限元方法所必需的算法。特别是，我们将提出一种适用于连续有限元空间的唯一自由度枚举(dof)的多阶段算法，描述加权负载平衡的考虑因素，并讨论进程之间可变大小数据的传输。我们用数值示例说明了我们的算法的性能，并证明它们可以合理地扩展到至少16 384个消息传递接口(MPI)进程。作为开源库协议的一部分，我们提供了我们算法的参考实现。

{"title":"Algorithms for Parallel Generic hp-adaptive Finite Element Software","authors":"Marc Fehling, Wolfgang Bangerth","doi":"https://dl.acm.org/doi/10.1145/3603372","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3603372","url":null,"abstract":"The hp-adaptive finite element method (FEM) – where one independently chooses the mesh size (h) and polynomial degree (p) to be used on each cell – has long been known to have better theoretical convergence properties than either h- or p-adaptive methods alone. However, it is not widely used, owing at least in parts to the difficulty of the underlying algorithms and the lack of widely usable implementations. This is particularly true when used with continuous finite elements. Herein, we discuss algorithms that are necessary for a comprehensive and generic implementation of hp-adaptive finite element methods on distributed-memory, parallel machines. In particular, we will present a multi-stage algorithm for the unique enumeration of degrees of freedom (DoFs) suitable for continuous finite element spaces, describe considerations for weighted load balancing, and discuss the transfer of variable size data between processes. We illustrate the performance of our algorithms with numerical examples, and demonstrate that they scale reasonably up to at least 16 384 Message Passing Interface (MPI) processes. We provide a reference implementation of our algorithms as part of the open-source library <monospace>deal.II</monospace>.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"37 3","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accurate Calculation of Euclidean Norms Using Double-word Arithmetic 用双字算法精确计算欧几里得范数

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3568672

Vincent Lefèvre, Nicolas Louvet, Jean-Michel Muller, Joris Picot, Laurence Rideau

We consider the computation of the Euclidean (or L2) norm of an n-dimensional vector in floating-point arithmetic. We review the classical solutions used to avoid spurious overflow or underflow and/or to obtain very accurate results. We modify a recently published algorithm (that uses double-word arithmetic) to allow for a very accurate solution, free of spurious overflows and underflows. To that purpose, we use a double-word square-root algorithm of which we provide a tight error analysis. The returned L2 norm will be within very slightly more than 0.5 ulp from the exact result, which means that we will almost always provide correct rounding.

我们考虑浮点运算中n维向量欧几里得(或L2)范数的计算。我们回顾了用于避免虚假溢出或下溢和/或获得非常准确结果的经典解决方案。我们修改了一个最近发布的算法(使用双字算法)，以允许一个非常精确的解决方案，没有虚假的溢出和下溢。为此，我们使用双词平方根算法，并对其进行严格的误差分析。返回的L2范数将与确切结果相差0.5 ulp，这意味着我们几乎总是提供正确的舍入。

引用次数: 0

Robust Topological Construction of All-hexahedral Boundary Layer Meshes 全六面体边界层网格的鲁棒拓扑构造

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3577196

Maxence Reberol, Kilian Verhetsel, François Henrotte, David Bommes, Jean-François Remacle

We present a robust technique to build a topologically optimal all-hexahedral layer on the boundary of a model with arbitrarily complex ridges and corners. The generated boundary layer mesh strictly respects the geometry of the input surface mesh, and it is optimal in the sense that the hexahedral valences of the boundary edges are as close as possible to their ideal values (local dihedral angle divided by 90°). Starting from a valid watertight surface mesh (all-quad in practice), we build a global optimization integer programming problem to minimize the mismatch between the hexahedral valences of the boundary edges and their ideal values. The formulation of the integer programming problem relies on the duality between boundary hexahedral configurations and triangulations of the disk, which we reframe in terms of integer constraints. The global problem is solved efficiently by performing combinatorial branch-and-bound searches on a series of sub-problems defined in the vicinity of complicated ridges/corners, where the local mesh topology is necessarily irregular because of the inherent constraints in hexahedral meshes. From the integer solution, we build the topology of the all-hexahedral layer, and the mesh geometry is computed by untangling/smoothing. Our approach is fully automated, topologically robust, and fast.

我们提出了一种鲁棒技术，在具有任意复杂脊和角的模型边界上构建拓扑最优的全六面体层。生成的边界层网格严格尊重输入曲面网格的几何形状，边界边缘的六面体价尽可能接近其理想值(局部二面角除以90°)是最优的。从一个有效的水密曲面网格(实际为全四边形)出发，构建了一个全局优化整数规划问题，以最小化边界边的六面体价与其理想值之间的不匹配。整数规划问题的公式依赖于磁盘的边界六面体构型和三角形之间的对偶性，我们根据整数约束对其进行重构。由于六面体网格的固有约束，局部网格拓扑结构必然是不规则的，通过在复杂脊/角附近定义的一系列子问题上进行组合分支定界搜索，有效地求解了全局问题。从整数解出发，构建了全六面体层的拓扑结构，并通过解缠/平滑计算网格几何形状。我们的方法是完全自动化的、拓扑健壮的、快速的。

{"title":"Robust Topological Construction of All-hexahedral Boundary Layer Meshes","authors":"Maxence Reberol, Kilian Verhetsel, François Henrotte, David Bommes, Jean-François Remacle","doi":"https://dl.acm.org/doi/10.1145/3577196","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3577196","url":null,"abstract":"We present a robust technique to build a topologically optimal all-hexahedral layer on the boundary of a model with arbitrarily complex ridges and corners. The generated boundary layer mesh strictly respects the geometry of the input surface mesh, and it is optimal in the sense that the hexahedral valences of the boundary edges are as close as possible to their ideal values (local dihedral angle divided by 90°). Starting from a valid watertight surface mesh (all-quad in practice), we build a global optimization integer programming problem to minimize the mismatch between the hexahedral valences of the boundary edges and their ideal values. The formulation of the integer programming problem relies on the duality between boundary hexahedral configurations and triangulations of the disk, which we reframe in terms of integer constraints. The global problem is solved efficiently by performing combinatorial branch-and-bound searches on a series of sub-problems defined in the vicinity of complicated ridges/corners, where the local mesh topology is necessarily irregular because of the inherent constraints in hexahedral meshes. From the integer solution, we build the topology of the all-hexahedral layer, and the mesh geometry is computed by untangling/smoothing. Our approach is fully automated, topologically robust, and fast.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"20 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Algorithm 1031: MQSI—Monotone Quintic Spline Interpolation 算法1031:mqsi -单调五次样条插值

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3570157

Thomas Lux, Layne T. Watson, Tyler Chang, William Thacker

MQSI is a Fortran 2003 subroutine for constructing monotone quintic spline interpolants to univariate monotone data. Using sharp theoretical monotonicity constraints, first and second derivative estimates at data provided by a quadratic facet model are refined to produce a univariate C² monotone interpolant. Algorithm and implementation details, complexity and sensitivity analyses, usage information, a brief performance study, and comparisons with other spline approaches are included.

MQSI是Fortran 2003的一个子程序，用于构造单变量单调数据的单调五次样条插值。使用尖锐的理论单调性约束，在二次面模型提供的数据上的一阶和二阶导数估计被精炼以产生单变量C2单调插值。包括算法和实现细节、复杂性和灵敏度分析、使用信息、简要的性能研究以及与其他样条方法的比较。

引用次数: 0

Certifying Zeros of Polynomial Systems Using Interval Arithmetic 用区间算法证明多项式系统的零点

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3580277

Paul Breiding, Kemal Rose, Sascha Timme

We establish interval arithmetic as a practical tool for certification in numerical algebraic geometry. Our software HomotopyContinuation.jl now has a built-in function certify, which proves the correctness of an isolated nonsingular solution to a square system of polynomial equations. The implementation rests on Krawczyk’s method. We demonstrate that it dramatically outperforms earlier approaches to certification. We see this contribution as a powerful new tool in numerical algebraic geometry, which can make certification the default and not just an option.

在数值代数几何中，我们建立了区间算法作为一种实用的证明工具。我们的软件同伦延拓。Jl现在有一个内置的函数证明，它证明了多项式方程的平方系统的孤立非奇异解的正确性。实现依赖于Krawczyk的方法。我们证明，它大大优于早期的认证方法。我们将此贡献视为数值代数几何中的一个强大的新工具，它可以使认证成为默认值，而不仅仅是一个选项。

引用次数: 0

Algorithm 1034: An Accelerated Algorithm to Compute the Qn Robust Statistic, with Corrections to Constants 算法1034:一种计算Qn鲁棒统计量的加速算法，并对常数进行校正

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3576920

Thierry Fahmy

The robust scale estimator Q_n developed by Croux and Rousseeuw [3], for the computation of which they provided a deterministic algorithm, has proven to be very useful in several domains including in quality management and time series analysis. It has interesting mathematical (50% breakdown, 82% Asymptotic Relative Efficiency) and computing (O(nlogn) time, O(n) space) properties. While working on a faster algorithm to compute Q_n, we have discovered an error in the computation of the d constant, and as a consequence in the d_n constants that are used to scale the statistic for consistency with the variance of a normal sample. These errors have been reproduced in several articles including in the International Standard Organisation 13,528 [12] document. In this article, we fix the errors and present a new approach, which includes a new algorithm, allowing computations to run 1.3 to 4.5 times faster when n grows from 10 to 100,000.

Croux和Rousseeuw[3]开发的鲁棒规模估计器Qn，为其计算提供了一种确定性算法，已被证明在质量管理和时间序列分析等多个领域非常有用。它具有有趣的数学(50%击穿，82%渐近相对效率)和计算(O(nlogn)时间，O(n)空间)特性。在研究更快的算法来计算Qn时，我们发现了d常数计算中的一个错误，结果是用于缩放统计量以与正常样本的方差一致的dn常数。这些错误已经在包括国际标准组织13528[12]文件在内的几篇文章中重复出现。在本文中，我们修复了这些错误，并提出了一种新的方法，其中包括一个新的算法，当n从10增加到100,000时，计算速度可以提高1.3到4.5倍。

引用次数: 0

Event-Based Automatic Differentiation of OpenMP with OpDiLib 基于OpDiLib的OpenMP事件自动鉴别

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3570159

Johannes Blühdorn, Max Sagebaum, Nicolas Gauger

We present the new software OpDiLib, a universal add-on for classical operator overloading AD tools that enables the automatic differentiation (AD) of OpenMP parallelized code. With it, we establish support for OpenMP features in a reverse mode operator overloading AD tool to an extent that was previously only reported on in source transformation tools. We achieve this with an event-based implementation ansatz that is unprecedented in AD. Combined with modern OpenMP features around OMPT, we demonstrate how it can be used to achieve differentiation without any additional modifications of the source code; neither do we impose a priori restrictions on the data access patterns, which makes OpDiLib highly applicable. For further performance optimizations, restrictions like atomic updates on adjoint variables can be lifted in a fine-grained manner. OpDiLib can also be applied in a semi-automatic fashion via a macro interface, which supports compilers that do not implement OMPT. We demonstrate the applicability of OpDiLib for a pure operator overloading approach in a hybrid parallel environment. We quantify the cost of atomic updates on adjoint variables and showcase the speedup and scaling that can be achieved with the different configurations of OpDiLib in both the forward and the reverse pass.

我们提出了一种新的软件OpDiLib，它是经典运算符重载AD工具的通用附加组件，可以实现OpenMP并行代码的自动区分(AD)。有了它，我们在反向模式操作符重载AD工具中建立了对OpenMP特性的支持，其程度以前只在源代码转换工具中报道过。我们通过基于事件的实现分析来实现这一点，这在AD中是前所未有的。结合围绕OMPT的现代OpenMP特性，我们演示了如何使用它来实现差异化，而无需对源代码进行任何额外的修改;我们也没有对数据访问模式施加先验限制，这使得OpDiLib非常适用。对于进一步的性能优化，可以以细粒度的方式解除伴随变量的原子更新等限制。OpDiLib还可以通过宏接口以半自动的方式应用，宏接口支持不实现OMPT的编译器。我们演示了OpDiLib在混合并行环境中用于纯运算符重载方法的适用性。我们量化了伴随变量的原子更新的成本，并展示了在正向和反向传递中使用OpDiLib的不同配置可以实现的加速和扩展。

{"title":"Event-Based Automatic Differentiation of OpenMP with OpDiLib","authors":"Johannes Blühdorn, Max Sagebaum, Nicolas Gauger","doi":"https://dl.acm.org/doi/10.1145/3570159","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3570159","url":null,"abstract":"We present the new software OpDiLib, a universal add-on for classical operator overloading AD tools that enables the automatic differentiation (AD) of OpenMP parallelized code. With it, we establish support for OpenMP features in a reverse mode operator overloading AD tool to an extent that was previously only reported on in source transformation tools. We achieve this with an event-based implementation ansatz that is unprecedented in AD. Combined with modern OpenMP features around OMPT, we demonstrate how it can be used to achieve differentiation without any additional modifications of the source code; neither do we impose a priori restrictions on the data access patterns, which makes OpDiLib highly applicable. For further performance optimizations, restrictions like atomic updates on adjoint variables can be lifted in a fine-grained manner. OpDiLib can also be applied in a semi-automatic fashion via a macro interface, which supports compilers that do not implement OMPT. We demonstrate the applicability of OpDiLib for a pure operator overloading approach in a hybrid parallel environment. We quantify the cost of atomic updates on adjoint variables and showcase the speedup and scaling that can be achieved with the different configurations of OpDiLib in both the forward and the reverse pass.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combining Sparse Approximate Factorizations with Mixed-precision Iterative Refinement 稀疏近似分解与混合精度迭代细化的结合

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3582493

Patrick Amestoy, Alfredo Buttari, Nicholas J. Higham, Jean-Yves L’Excellent, Theo Mary, Bastien Vieublé

The standard LU factorization-based solution process for linear systems can be enhanced in speed or accuracy by employing mixed-precision iterative refinement. Most recent work has focused on dense systems. We investigate the potential of mixed-precision iterative refinement to enhance methods for sparse systems based on approximate sparse factorizations. In doing so, we first develop a new error analysis for LU- and GMRES-based iterative refinement under a general model of LU factorization that accounts for the approximation methods typically used by modern sparse solvers, such as low-rank approximations or relaxed pivoting strategies. We then provide a detailed performance analysis of both the execution time and memory consumption of different algorithms, based on a selected set of iterative refinement variants and approximate sparse factorizations. Our performance study uses the multifrontal solver MUMPS, which can exploit block low-rank factorization and static pivoting. We evaluate the performance of the algorithms on large, sparse problems coming from a variety of real-life and industrial applications showing that mixed-precision iterative refinement combined with approximate sparse factorization can lead to considerable reductions of both the time and memory consumption.

采用混合精度迭代细化方法，可以提高线性系统基于单元分解的标准求解过程的速度和精度。最近的研究主要集中在密集系统上。我们研究了混合精度迭代细化的潜力，以增强基于近似稀疏分解的稀疏系统方法。在此过程中，我们首先开发了一种新的基于LU和gmres的迭代精化的误差分析，该分析基于LU分解的一般模型，该模型解释了现代稀疏解算器通常使用的近似方法，如低秩近似或放松pivot策略。然后，我们基于一组选定的迭代细化变量和近似稀疏分解，对不同算法的执行时间和内存消耗进行了详细的性能分析。我们的性能研究使用多正面求解器MUMPS，它可以利用块低秩分解和静态旋转。我们评估了算法在来自各种现实生活和工业应用的大型稀疏问题上的性能，表明混合精度迭代细化与近似稀疏分解相结合可以大大减少时间和内存消耗。

{"title":"Combining Sparse Approximate Factorizations with Mixed-precision Iterative Refinement","authors":"Patrick Amestoy, Alfredo Buttari, Nicholas J. Higham, Jean-Yves L’Excellent, Theo Mary, Bastien Vieublé","doi":"https://dl.acm.org/doi/10.1145/3582493","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3582493","url":null,"abstract":"The standard LU factorization-based solution process for linear systems can be enhanced in speed or accuracy by employing mixed-precision iterative refinement. Most recent work has focused on dense systems. We investigate the potential of mixed-precision iterative refinement to enhance methods for sparse systems based on approximate sparse factorizations. In doing so, we first develop a new error analysis for LU- and GMRES-based iterative refinement under a general model of LU factorization that accounts for the approximation methods typically used by modern sparse solvers, such as low-rank approximations or relaxed pivoting strategies. We then provide a detailed performance analysis of both the execution time and memory consumption of different algorithms, based on a selected set of iterative refinement variants and approximate sparse factorizations. Our performance study uses the multifrontal solver MUMPS, which can exploit block low-rank factorization and static pivoting. We evaluate the performance of the algorithms on large, sparse problems coming from a variety of real-life and industrial applications showing that mixed-precision iterative refinement combined with approximate sparse factorization can lead to considerable reductions of both the time and memory consumption.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"74 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Newly Released Capabilities in the Distributed-Memory SuperLU Sparse Direct Solver 分布式内存SuperLU稀疏直接求解器新发布的功能

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3577197

Xiaoye S. Li, Paul Lin, Yang Liu, Piyush Sao

We present the new features available in the recent release of SuperLU_DIST, Version 8.1.1. SuperLU_DIST is a distributed-memory parallel sparse direct solver. The new features include (1) a 3D communication-avoiding algorithm framework that trades off inter-process communication for selective memory duplication, (2) multi-GPU support for both NVIDIA GPUs and AMD GPUs, and (3) mixed-precision routines that perform single-precision LU factorization and double-precision iterative refinement. Apart from the algorithm improvements, we also modernized the software build system to use CMake and Spack package installation tools to simplify the installation procedure. Throughout the article, we describe in detail the pertinent performance-sensitive parameters associated with each new algorithmic feature, show how they are exposed to the users, and give general guidance of how to set these parameters. We illustrate that the solver’s performance both in time and memory can be greatly improved after systematic tuning of the parameters, depending on the input sparse matrix and underlying hardware.

我们将介绍SuperLU_DIST 8.1.1版本中提供的新特性。SuperLU_DIST是一个分布式内存并行稀疏直接求解器。新功能包括(1)一个避免3D通信的算法框架，该框架可以在进程间通信中进行选择性内存复制，(2)支持NVIDIA gpu和AMD gpu的多gpu，以及(3)执行单精度LU分解和双精度迭代细化的混合精度例程。除了算法改进之外，我们还对软件构建系统进行了现代化改造，使用CMake和Spack包安装工具来简化安装过程。在本文中，我们详细描述了与每个新算法特性相关联的相关性能敏感参数，展示了如何向用户展示这些参数，并提供了如何设置这些参数的一般指导。我们证明，根据输入稀疏矩阵和底层硬件，系统调整参数后，求解器在时间和内存方面的性能都可以大大提高。

{"title":"Newly Released Capabilities in the Distributed-Memory SuperLU Sparse Direct Solver","authors":"Xiaoye S. Li, Paul Lin, Yang Liu, Piyush Sao","doi":"https://dl.acm.org/doi/10.1145/3577197","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3577197","url":null,"abstract":"We present the new features available in the recent release of <monospace>SuperLU_DIST</monospace>, Version 8.1.1. <monospace>SuperLU_DIST</monospace> is a distributed-memory parallel sparse direct solver. The new features include (1) a 3D communication-avoiding algorithm framework that trades off inter-process communication for selective memory duplication, (2) multi-GPU support for both NVIDIA GPUs and AMD GPUs, and (3) mixed-precision routines that perform single-precision LU factorization and double-precision iterative refinement. Apart from the algorithm improvements, we also modernized the software build system to use CMake and Spack package installation tools to simplify the installation procedure. Throughout the article, we describe in detail the pertinent performance-sensitive parameters associated with each new algorithmic feature, show how they are exposed to the users, and give general guidance of how to set these parameters. We illustrate that the solver’s performance both in time and memory can be greatly improved after systematic tuning of the parameters, depending on the input sparse matrix and underlying hardware.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"25 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ACM Transactions on Mathematical Software

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀