首页 > 最新文献

ACM Transactions on Mathematical Software最新文献

英文 中文
Improvements to SLEPc in Releases 3.14–3.18 版本3.14-3.18对SLEPc的改进
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-06-07 DOI: https://dl.acm.org/doi/10.1145/3603373
Jose E. Roman, Fernando Alvarruiz, Carmen Campos, Lisandro Dalcin, Pierre Jolivet, Alejandro Lamas Daviña

This short paper describes the main new features added to SLEPc, the Scalable Library for Eigenvalue Problem Computations, in the last two and a half years, corresponding to five release versions. The main novelty is the extension of the SVD module with new problem types such as the generalized SVD or the hyperbolic SVD. Additionally, many improvements have been incorporated in different parts of the library, including contour integral eigensolvers, preconditioning and GPU support.

这篇短文描述了在过去两年半的时间里,SLEPc(特征值问题计算的可扩展库)增加的主要新功能,对应于五个发布版本。主要新颖之处在于将奇异值分解模块扩展为新的问题类型,如广义奇异值分解或双曲奇异值分解。此外,许多改进已经纳入到库的不同部分,包括轮廓积分特征求解器,预处理和GPU支持。
{"title":"Improvements to SLEPc in Releases 3.14–3.18","authors":"Jose E. Roman, Fernando Alvarruiz, Carmen Campos, Lisandro Dalcin, Pierre Jolivet, Alejandro Lamas Daviña","doi":"https://dl.acm.org/doi/10.1145/3603373","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3603373","url":null,"abstract":"<p>This short paper describes the main new features added to SLEPc, the Scalable Library for Eigenvalue Problem Computations, in the last two and a half years, corresponding to five release versions. The main novelty is the extension of the <monospace>SVD</monospace> module with new problem types such as the generalized SVD or the hyperbolic SVD. Additionally, many improvements have been incorporated in different parts of the library, including contour integral eigensolvers, preconditioning and GPU support.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improvements to SLEPc in Releases 3.14–3.18 版本3.14-3.18对SLEPc的改进
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-06-07 DOI: 10.1145/3603373
J. Román, F. Alvarruiz, C. Campos, Lisandro Dalcin, P. Jolivet, A. L. Daviña
This short article describes the main new features added to SLEPc, the Scalable Library for Eigenvalue Problem Computations, in the past two and a half years, corresponding to five release versions. The main novelty is the extension of the SVD module with new problem types, such as the generalized SVD or the hyperbolic SVD. Additionally, many improvements have been incorporated in different parts of the library, including contour integral eigensolvers, preconditioning, and GPU support.
这篇短文描述了在过去两年半的时间里,SLEPc(特征值问题计算的可扩展库)增加的主要新特性,对应于五个发布版本。主要的新颖之处在于SVD模块扩展了新的问题类型,如广义SVD或双曲SVD。此外,许多改进已纳入库的不同部分,包括轮廓积分特征求解器,预处理和GPU支持。
{"title":"Improvements to SLEPc in Releases 3.14–3.18","authors":"J. Román, F. Alvarruiz, C. Campos, Lisandro Dalcin, P. Jolivet, A. L. Daviña","doi":"10.1145/3603373","DOIUrl":"https://doi.org/10.1145/3603373","url":null,"abstract":"This short article describes the main new features added to SLEPc, the Scalable Library for Eigenvalue Problem Computations, in the past two and a half years, corresponding to five release versions. The main novelty is the extension of the SVD module with new problem types, such as the generalized SVD or the hyperbolic SVD. Additionally, many improvements have been incorporated in different parts of the library, including contour integral eigensolvers, preconditioning, and GPU support.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64077639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithms for Parallel Generic hp-adaptive Finite Element Software 并行通用hp自适应有限元软件算法
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-06-05 DOI: https://dl.acm.org/doi/10.1145/3603372
Marc Fehling, Wolfgang Bangerth

The hp-adaptive finite element method (FEM) – where one independently chooses the mesh size (h) and polynomial degree (p) to be used on each cell – has long been known to have better theoretical convergence properties than either h- or p-adaptive methods alone. However, it is not widely used, owing at least in parts to the difficulty of the underlying algorithms and the lack of widely usable implementations. This is particularly true when used with continuous finite elements.

Herein, we discuss algorithms that are necessary for a comprehensive and generic implementation of hp-adaptive finite element methods on distributed-memory, parallel machines. In particular, we will present a multi-stage algorithm for the unique enumeration of degrees of freedom (DoFs) suitable for continuous finite element spaces, describe considerations for weighted load balancing, and discuss the transfer of variable size data between processes. We illustrate the performance of our algorithms with numerical examples, and demonstrate that they scale reasonably up to at least 16 384 Message Passing Interface (MPI) processes.

We provide a reference implementation of our algorithms as part of the open-source library deal.II.

hp-adaptive finite element method (FEM) -其中一个独立选择网格尺寸(h)和多项式度(p)用于每个单元-早已被认为具有更好的理论收敛性比h-或p-adaptive方法单独。然而,它并没有被广泛使用,至少部分原因是底层算法的困难和缺乏广泛可用的实现。当使用连续有限单元时尤其如此。在此,我们讨论了在分布式内存并行机器上全面和通用地实现hp自适应有限元方法所必需的算法。特别是,我们将提出一种适用于连续有限元空间的唯一自由度枚举(dof)的多阶段算法,描述加权负载平衡的考虑因素,并讨论进程之间可变大小数据的传输。我们用数值示例说明了我们的算法的性能,并证明它们可以合理地扩展到至少16 384个消息传递接口(MPI)进程。作为开源库协议的一部分,我们提供了我们算法的参考实现。
{"title":"Algorithms for Parallel Generic hp-adaptive Finite Element Software","authors":"Marc Fehling, Wolfgang Bangerth","doi":"https://dl.acm.org/doi/10.1145/3603372","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3603372","url":null,"abstract":"<p>The <i>hp</i>-adaptive finite element method (FEM) – where one independently chooses the mesh size (<i>h</i>) and polynomial degree (<i>p</i>) to be used on each cell – has long been known to have better theoretical convergence properties than either <i>h</i>- or <i>p</i>-adaptive methods alone. However, it is not widely used, owing at least in parts to the difficulty of the underlying algorithms and the lack of widely usable implementations. This is particularly true when used with continuous finite elements. </p><p>Herein, we discuss algorithms that are necessary for a comprehensive and generic implementation of <i>hp</i>-adaptive finite element methods on distributed-memory, parallel machines. In particular, we will present a multi-stage algorithm for the unique enumeration of degrees of freedom (DoFs) suitable for continuous finite element spaces, describe considerations for weighted load balancing, and discuss the transfer of variable size data between processes. We illustrate the performance of our algorithms with numerical examples, and demonstrate that they scale reasonably up to at least 16 384 Message Passing Interface (MPI) processes. </p><p>We provide a reference implementation of our algorithms as part of the open-source library <monospace>deal.II</monospace>.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accurate Calculation of Euclidean Norms Using Double-word Arithmetic 用双字算法精确计算欧几里得范数
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3568672
Vincent Lefèvre, Nicolas Louvet, Jean-Michel Muller, Joris Picot, Laurence Rideau

We consider the computation of the Euclidean (or L2) norm of an n-dimensional vector in floating-point arithmetic. We review the classical solutions used to avoid spurious overflow or underflow and/or to obtain very accurate results. We modify a recently published algorithm (that uses double-word arithmetic) to allow for a very accurate solution, free of spurious overflows and underflows. To that purpose, we use a double-word square-root algorithm of which we provide a tight error analysis. The returned L2 norm will be within very slightly more than 0.5 ulp from the exact result, which means that we will almost always provide correct rounding.

我们考虑浮点运算中n维向量欧几里得(或L2)范数的计算。我们回顾了用于避免虚假溢出或下溢和/或获得非常准确结果的经典解决方案。我们修改了一个最近发布的算法(使用双字算法),以允许一个非常精确的解决方案,没有虚假的溢出和下溢。为此,我们使用双词平方根算法,并对其进行严格的误差分析。返回的L2范数将与确切结果相差0.5 ulp,这意味着我们几乎总是提供正确的舍入。
{"title":"Accurate Calculation of Euclidean Norms Using Double-word Arithmetic","authors":"Vincent Lefèvre, Nicolas Louvet, Jean-Michel Muller, Joris Picot, Laurence Rideau","doi":"https://dl.acm.org/doi/10.1145/3568672","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3568672","url":null,"abstract":"<p>We consider the computation of the Euclidean (or L2) norm of an <i>n</i>-dimensional vector in floating-point arithmetic. We review the classical solutions used to avoid spurious overflow or underflow and/or to obtain very accurate results. We modify a recently published algorithm (that uses double-word arithmetic) to allow for a very accurate solution, free of spurious overflows and underflows. To that purpose, we use a double-word square-root algorithm of which we provide a tight error analysis. The returned L2 norm will be within very slightly more than 0.5 ulp from the exact result, which means that we will almost always provide correct rounding.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138543481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Topological Construction of All-hexahedral Boundary Layer Meshes 全六面体边界层网格的鲁棒拓扑构造
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3577196
Maxence Reberol, Kilian Verhetsel, François Henrotte, David Bommes, Jean-François Remacle

We present a robust technique to build a topologically optimal all-hexahedral layer on the boundary of a model with arbitrarily complex ridges and corners. The generated boundary layer mesh strictly respects the geometry of the input surface mesh, and it is optimal in the sense that the hexahedral valences of the boundary edges are as close as possible to their ideal values (local dihedral angle divided by 90°). Starting from a valid watertight surface mesh (all-quad in practice), we build a global optimization integer programming problem to minimize the mismatch between the hexahedral valences of the boundary edges and their ideal values. The formulation of the integer programming problem relies on the duality between boundary hexahedral configurations and triangulations of the disk, which we reframe in terms of integer constraints. The global problem is solved efficiently by performing combinatorial branch-and-bound searches on a series of sub-problems defined in the vicinity of complicated ridges/corners, where the local mesh topology is necessarily irregular because of the inherent constraints in hexahedral meshes. From the integer solution, we build the topology of the all-hexahedral layer, and the mesh geometry is computed by untangling/smoothing. Our approach is fully automated, topologically robust, and fast.

我们提出了一种鲁棒技术,在具有任意复杂脊和角的模型边界上构建拓扑最优的全六面体层。生成的边界层网格严格尊重输入曲面网格的几何形状,边界边缘的六面体价尽可能接近其理想值(局部二面角除以90°)是最优的。从一个有效的水密曲面网格(实际为全四边形)出发,构建了一个全局优化整数规划问题,以最小化边界边的六面体价与其理想值之间的不匹配。整数规划问题的公式依赖于磁盘的边界六面体构型和三角形之间的对偶性,我们根据整数约束对其进行重构。由于六面体网格的固有约束,局部网格拓扑结构必然是不规则的,通过在复杂脊/角附近定义的一系列子问题上进行组合分支定界搜索,有效地求解了全局问题。从整数解出发,构建了全六面体层的拓扑结构,并通过解缠/平滑计算网格几何形状。我们的方法是完全自动化的、拓扑健壮的、快速的。
{"title":"Robust Topological Construction of All-hexahedral Boundary Layer Meshes","authors":"Maxence Reberol, Kilian Verhetsel, François Henrotte, David Bommes, Jean-François Remacle","doi":"https://dl.acm.org/doi/10.1145/3577196","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3577196","url":null,"abstract":"<p>We present a robust technique to build a topologically optimal all-hexahedral layer on the boundary of a model with arbitrarily complex ridges and corners. The generated boundary layer mesh strictly respects the geometry of the input surface mesh, and it is optimal in the sense that the hexahedral valences of the boundary edges are as close as possible to their ideal values (local dihedral angle divided by 90°). Starting from a valid watertight surface mesh (all-quad in practice), we build a global optimization integer programming problem to minimize the mismatch between the hexahedral valences of the boundary edges and their ideal values. The formulation of the integer programming problem relies on the duality between boundary hexahedral configurations and triangulations of the disk, which we reframe in terms of integer constraints. The global problem is solved efficiently by performing combinatorial branch-and-bound searches on a series of sub-problems defined in the vicinity of complicated ridges/corners, where the local mesh topology is necessarily irregular because of the inherent constraints in hexahedral meshes. From the integer solution, we build the topology of the all-hexahedral layer, and the mesh geometry is computed by untangling/smoothing. Our approach is fully automated, topologically robust, and fast.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithm 1031: MQSI—Monotone Quintic Spline Interpolation 算法1031:mqsi -单调五次样条插值
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3570157
Thomas Lux, Layne T. Watson, Tyler Chang, William Thacker

MQSI is a Fortran 2003 subroutine for constructing monotone quintic spline interpolants to univariate monotone data. Using sharp theoretical monotonicity constraints, first and second derivative estimates at data provided by a quadratic facet model are refined to produce a univariate C2 monotone interpolant. Algorithm and implementation details, complexity and sensitivity analyses, usage information, a brief performance study, and comparisons with other spline approaches are included.

MQSI是Fortran 2003的一个子程序,用于构造单变量单调数据的单调五次样条插值。使用尖锐的理论单调性约束,在二次面模型提供的数据上的一阶和二阶导数估计被精炼以产生单变量C2单调插值。包括算法和实现细节、复杂性和灵敏度分析、使用信息、简要的性能研究以及与其他样条方法的比较。
{"title":"Algorithm 1031: MQSI—Monotone Quintic Spline Interpolation","authors":"Thomas Lux, Layne T. Watson, Tyler Chang, William Thacker","doi":"https://dl.acm.org/doi/10.1145/3570157","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3570157","url":null,"abstract":"<p>MQSI is a Fortran 2003 subroutine for constructing monotone quintic spline interpolants to univariate monotone data. Using sharp theoretical monotonicity constraints, first and second derivative estimates at data provided by a quadratic facet model are refined to produce a univariate C<sup>2</sup> monotone interpolant. Algorithm and implementation details, complexity and sensitivity analyses, usage information, a brief performance study, and comparisons with other spline approaches are included.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Certifying Zeros of Polynomial Systems Using Interval Arithmetic 用区间算法证明多项式系统的零点
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3580277
Paul Breiding, Kemal Rose, Sascha Timme

We establish interval arithmetic as a practical tool for certification in numerical algebraic geometry. Our software HomotopyContinuation.jl now has a built-in function certify, which proves the correctness of an isolated nonsingular solution to a square system of polynomial equations. The implementation rests on Krawczyk’s method. We demonstrate that it dramatically outperforms earlier approaches to certification. We see this contribution as a powerful new tool in numerical algebraic geometry, which can make certification the default and not just an option.

在数值代数几何中,我们建立了区间算法作为一种实用的证明工具。我们的软件同伦延拓。Jl现在有一个内置的函数证明,它证明了多项式方程的平方系统的孤立非奇异解的正确性。实现依赖于Krawczyk的方法。我们证明,它大大优于早期的认证方法。我们将此贡献视为数值代数几何中的一个强大的新工具,它可以使认证成为默认值,而不仅仅是一个选项。
{"title":"Certifying Zeros of Polynomial Systems Using Interval Arithmetic","authors":"Paul Breiding, Kemal Rose, Sascha Timme","doi":"https://dl.acm.org/doi/10.1145/3580277","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3580277","url":null,"abstract":"<p>We establish interval arithmetic as a practical tool for certification in numerical algebraic geometry. Our software <monospace>HomotopyContinuation.jl</monospace> now has a built-in function <monospace>certify</monospace>, which proves the correctness of an isolated nonsingular solution to a square system of polynomial equations. The implementation rests on Krawczyk’s method. We demonstrate that it dramatically outperforms earlier approaches to certification. We see this contribution as a powerful new tool in numerical algebraic geometry, which can make certification the default and not just an option.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithm 1034: An Accelerated Algorithm to Compute the Qn Robust Statistic, with Corrections to Constants 算法1034:一种计算Qn鲁棒统计量的加速算法,并对常数进行校正
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3576920
Thierry Fahmy

The robust scale estimator Qn developed by Croux and Rousseeuw [3], for the computation of which they provided a deterministic algorithm, has proven to be very useful in several domains including in quality management and time series analysis. It has interesting mathematical (50% breakdown, 82% Asymptotic Relative Efficiency) and computing (O(nlogn) time, O(n) space) properties. While working on a faster algorithm to compute Qn, we have discovered an error in the computation of the d constant, and as a consequence in the dn constants that are used to scale the statistic for consistency with the variance of a normal sample. These errors have been reproduced in several articles including in the International Standard Organisation 13,528 [12] document. In this article, we fix the errors and present a new approach, which includes a new algorithm, allowing computations to run 1.3 to 4.5 times faster when n grows from 10 to 100,000.

Croux和Rousseeuw[3]开发的鲁棒规模估计器Qn,为其计算提供了一种确定性算法,已被证明在质量管理和时间序列分析等多个领域非常有用。它具有有趣的数学(50%击穿,82%渐近相对效率)和计算(O(nlogn)时间,O(n)空间)特性。在研究更快的算法来计算Qn时,我们发现了d常数计算中的一个错误,结果是用于缩放统计量以与正常样本的方差一致的dn常数。这些错误已经在包括国际标准组织13528[12]文件在内的几篇文章中重复出现。在本文中,我们修复了这些错误,并提出了一种新的方法,其中包括一个新的算法,当n从10增加到100,000时,计算速度可以提高1.3到4.5倍。
{"title":"Algorithm 1034: An Accelerated Algorithm to Compute the Qn Robust Statistic, with Corrections to Constants","authors":"Thierry Fahmy","doi":"https://dl.acm.org/doi/10.1145/3576920","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3576920","url":null,"abstract":"<p>The robust scale estimator <i>Q<sub>n</sub></i> developed by Croux and Rousseeuw [3], for the computation of which they provided a deterministic algorithm, has proven to be very useful in several domains including in quality management and time series analysis. It has interesting mathematical (50% breakdown, 82% Asymptotic Relative Efficiency) and computing (<i>O(nlogn)</i> time, <i>O</i>(<i>n</i>) space) properties. While working on a faster algorithm to compute <i>Q<sub>n</sub></i>, we have discovered an error in the computation of the <i>d</i> constant, and as a consequence in the <i>d<sub>n</sub></i> constants that are used to scale the statistic for consistency with the variance of a normal sample. These errors have been reproduced in several articles including in the International Standard Organisation 13,528 [12] document. In this article, we fix the errors and present a new approach, which includes a new algorithm, allowing computations to run 1.3 to 4.5 times faster when <i>n</i> grows from 10 to 100,000.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Event-Based Automatic Differentiation of OpenMP with OpDiLib 基于OpDiLib的OpenMP事件自动鉴别
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3570159
Johannes Blühdorn, Max Sagebaum, Nicolas Gauger

We present the new software OpDiLib, a universal add-on for classical operator overloading AD tools that enables the automatic differentiation (AD) of OpenMP parallelized code. With it, we establish support for OpenMP features in a reverse mode operator overloading AD tool to an extent that was previously only reported on in source transformation tools. We achieve this with an event-based implementation ansatz that is unprecedented in AD. Combined with modern OpenMP features around OMPT, we demonstrate how it can be used to achieve differentiation without any additional modifications of the source code; neither do we impose a priori restrictions on the data access patterns, which makes OpDiLib highly applicable. For further performance optimizations, restrictions like atomic updates on adjoint variables can be lifted in a fine-grained manner. OpDiLib can also be applied in a semi-automatic fashion via a macro interface, which supports compilers that do not implement OMPT. We demonstrate the applicability of OpDiLib for a pure operator overloading approach in a hybrid parallel environment. We quantify the cost of atomic updates on adjoint variables and showcase the speedup and scaling that can be achieved with the different configurations of OpDiLib in both the forward and the reverse pass.

我们提出了一种新的软件OpDiLib,它是经典运算符重载AD工具的通用附加组件,可以实现OpenMP并行代码的自动区分(AD)。有了它,我们在反向模式操作符重载AD工具中建立了对OpenMP特性的支持,其程度以前只在源代码转换工具中报道过。我们通过基于事件的实现分析来实现这一点,这在AD中是前所未有的。结合围绕OMPT的现代OpenMP特性,我们演示了如何使用它来实现差异化,而无需对源代码进行任何额外的修改;我们也没有对数据访问模式施加先验限制,这使得OpDiLib非常适用。对于进一步的性能优化,可以以细粒度的方式解除伴随变量的原子更新等限制。OpDiLib还可以通过宏接口以半自动的方式应用,宏接口支持不实现OMPT的编译器。我们演示了OpDiLib在混合并行环境中用于纯运算符重载方法的适用性。我们量化了伴随变量的原子更新的成本,并展示了在正向和反向传递中使用OpDiLib的不同配置可以实现的加速和扩展。
{"title":"Event-Based Automatic Differentiation of OpenMP with OpDiLib","authors":"Johannes Blühdorn, Max Sagebaum, Nicolas Gauger","doi":"https://dl.acm.org/doi/10.1145/3570159","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3570159","url":null,"abstract":"<p>We present the new software OpDiLib, a universal add-on for classical operator overloading AD tools that enables the automatic differentiation (AD) of OpenMP parallelized code. With it, we establish support for OpenMP features in a reverse mode operator overloading AD tool to an extent that was previously only reported on in source transformation tools. We achieve this with an event-based implementation ansatz that is unprecedented in AD. Combined with modern OpenMP features around OMPT, we demonstrate how it can be used to achieve differentiation without any additional modifications of the source code; neither do we impose a priori restrictions on the data access patterns, which makes OpDiLib highly applicable. For further performance optimizations, restrictions like atomic updates on adjoint variables can be lifted in a fine-grained manner. OpDiLib can also be applied in a semi-automatic fashion via a macro interface, which supports compilers that do not implement OMPT. We demonstrate the applicability of OpDiLib for a pure operator overloading approach in a hybrid parallel environment. We quantify the cost of atomic updates on adjoint variables and showcase the speedup and scaling that can be achieved with the different configurations of OpDiLib in both the forward and the reverse pass.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining Sparse Approximate Factorizations with Mixed-precision Iterative Refinement 稀疏近似分解与混合精度迭代细化的结合
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-03-21 DOI: https://dl.acm.org/doi/10.1145/3582493
Patrick Amestoy, Alfredo Buttari, Nicholas J. Higham, Jean-Yves L’Excellent, Theo Mary, Bastien Vieublé

The standard LU factorization-based solution process for linear systems can be enhanced in speed or accuracy by employing mixed-precision iterative refinement. Most recent work has focused on dense systems. We investigate the potential of mixed-precision iterative refinement to enhance methods for sparse systems based on approximate sparse factorizations. In doing so, we first develop a new error analysis for LU- and GMRES-based iterative refinement under a general model of LU factorization that accounts for the approximation methods typically used by modern sparse solvers, such as low-rank approximations or relaxed pivoting strategies. We then provide a detailed performance analysis of both the execution time and memory consumption of different algorithms, based on a selected set of iterative refinement variants and approximate sparse factorizations. Our performance study uses the multifrontal solver MUMPS, which can exploit block low-rank factorization and static pivoting. We evaluate the performance of the algorithms on large, sparse problems coming from a variety of real-life and industrial applications showing that mixed-precision iterative refinement combined with approximate sparse factorization can lead to considerable reductions of both the time and memory consumption.

采用混合精度迭代细化方法,可以提高线性系统基于单元分解的标准求解过程的速度和精度。最近的研究主要集中在密集系统上。我们研究了混合精度迭代细化的潜力,以增强基于近似稀疏分解的稀疏系统方法。在此过程中,我们首先开发了一种新的基于LU和gmres的迭代精化的误差分析,该分析基于LU分解的一般模型,该模型解释了现代稀疏解算器通常使用的近似方法,如低秩近似或放松pivot策略。然后,我们基于一组选定的迭代细化变量和近似稀疏分解,对不同算法的执行时间和内存消耗进行了详细的性能分析。我们的性能研究使用多正面求解器MUMPS,它可以利用块低秩分解和静态旋转。我们评估了算法在来自各种现实生活和工业应用的大型稀疏问题上的性能,表明混合精度迭代细化与近似稀疏分解相结合可以大大减少时间和内存消耗。
{"title":"Combining Sparse Approximate Factorizations with Mixed-precision Iterative Refinement","authors":"Patrick Amestoy, Alfredo Buttari, Nicholas J. Higham, Jean-Yves L’Excellent, Theo Mary, Bastien Vieublé","doi":"https://dl.acm.org/doi/10.1145/3582493","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3582493","url":null,"abstract":"<p>The standard LU factorization-based solution process for linear systems can be enhanced in speed or accuracy by employing mixed-precision iterative refinement. Most recent work has focused on dense systems. We investigate the potential of mixed-precision iterative refinement to enhance methods for sparse systems based on approximate sparse factorizations. In doing so, we first develop a new error analysis for LU- and GMRES-based iterative refinement under a general model of LU factorization that accounts for the approximation methods typically used by modern sparse solvers, such as low-rank approximations or relaxed pivoting strategies. We then provide a detailed performance analysis of both the execution time and memory consumption of different algorithms, based on a selected set of iterative refinement variants and approximate sparse factorizations. Our performance study uses the multifrontal solver MUMPS, which can exploit block low-rank factorization and static pivoting. We evaluate the performance of the algorithms on large, sparse problems coming from a variety of real-life and industrial applications showing that mixed-precision iterative refinement combined with approximate sparse factorization can lead to considerable reductions of both the time and memory consumption.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Mathematical Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1