首页 > 最新文献

ACM Transactions on Mathematical Software最新文献

英文 中文
Distributed ℋ2-Matrices for Boundary Element Methods 边界元法的分布h - 2矩阵
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-02-01 DOI: 10.1145/3582494
S. Börm
Standard discretization techniques for boundary integral equations, e.g., the Galerkin boundary element method, lead to large densely populated matrices that require fast and efficient compression techniques like the fast multipole method or hierarchical matrices. If the underlying mesh is very large, running the corresponding algorithms on a distributed computer is attractive, e.g., since distributed computers frequently are cost-effective and offer a high accumulated memory bandwidth. Compared to the closely related particle methods, for which distributed algorithms are well-established, the Galerkin discretization poses a challenge, since the supports of the basis functions influence the block structure of the matrix and therefore the flow of data in the corresponding algorithms. This article introduces distributed ℋ2-matrices, a class of hierarchical matrices that is closely related to fast multipole methods and particularly well-suited for distributed computing. While earlier efforts required the global tree structure of the ℋ2-matrix to be stored in every node of the distributed system, the new approach needs only local multilevel information that can be obtained via a simple distributed algorithm, allowing us to scale to significantly larger systems. Experiments show that this approach can handle very large meshes with more than 130 million triangles efficiently.
边界积分方程的标准离散化技术,如伽辽金边界元方法,导致大量密集的矩阵,需要快速有效的压缩技术,如快速多极方法或分层矩阵。如果底层网格非常大,则在分布式计算机上运行相应的算法是有吸引力的,例如,因为分布式计算机通常具有成本效益,并且提供较高的累积内存带宽。与密切相关的粒子方法相比,Galerkin离散化提出了一个挑战,因为基函数的支持会影响矩阵的块结构,从而影响相应算法中的数据流。本文介绍了分布式h 2矩阵,它是与快速多极方法密切相关的一类层次矩阵,特别适合于分布式计算。早期的研究需要在分布式系统的每个节点中存储全局的h 2矩阵树结构,而新的方法只需要局部的多层信息,这些信息可以通过一个简单的分布式算法获得,从而使我们能够扩展到更大的系统。实验表明,该方法可以有效地处理超过1.3亿个三角形的超大网格。
{"title":"Distributed ℋ2-Matrices for Boundary Element Methods","authors":"S. Börm","doi":"10.1145/3582494","DOIUrl":"https://doi.org/10.1145/3582494","url":null,"abstract":"Standard discretization techniques for boundary integral equations, e.g., the Galerkin boundary element method, lead to large densely populated matrices that require fast and efficient compression techniques like the fast multipole method or hierarchical matrices. If the underlying mesh is very large, running the corresponding algorithms on a distributed computer is attractive, e.g., since distributed computers frequently are cost-effective and offer a high accumulated memory bandwidth. Compared to the closely related particle methods, for which distributed algorithms are well-established, the Galerkin discretization poses a challenge, since the supports of the basis functions influence the block structure of the matrix and therefore the flow of data in the corresponding algorithms. This article introduces distributed ℋ2-matrices, a class of hierarchical matrices that is closely related to fast multipole methods and particularly well-suited for distributed computing. While earlier efforts required the global tree structure of the ℋ2-matrix to be stored in every node of the distributed system, the new approach needs only local multilevel information that can be obtained via a simple distributed algorithm, allowing us to scale to significantly larger systems. Experiments show that this approach can handle very large meshes with more than 130 million triangles efficiently.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":" ","pages":"1 - 21"},"PeriodicalIF":2.7,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43867034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithm 10xx: SuiteSparse:GraphBLAS: parallel graph algorithms in the language of sparse linear algebra 算法10xx: SuiteSparse:GraphBLAS:稀疏线性代数语言中的并行图算法
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-01-25 DOI: https://dl.acm.org/doi/10.1145/3577195
Timothy A. Davis

SuiteSparse:GraphBLAS is a full parallel implementation of the GraphBLAS standard, which defines a set of sparse matrix operations on an extended algebra of semirings using an almost unlimited variety of operators and types. When applied to sparse adjacency matrices, these algebraic operations are equivalent to computations on graphs. A description of the parallel implementation of SuiteSparse:GraphBLAS is given, including its novel parallel algorithms for sparse matrix multiply, addition, element-wise multiply, submatrix extraction and assignment, and the GraphBLAS mask/accumulator operation. Its performance is illustrated by solving the graph problems in the GAP Benchmark and by comparing it with other sparse matrix libraries.

GraphBLAS是GraphBLAS标准的完全并行实现,它使用几乎无限种类的操作符和类型在扩展的半环代数上定义了一组稀疏矩阵操作。当应用于稀疏邻接矩阵时,这些代数运算相当于图上的计算。给出了SuiteSparse:GraphBLAS并行实现的描述,包括其用于稀疏矩阵乘法、加法、元素智能乘法、子矩阵提取和赋值以及GraphBLAS掩码/累加器操作的新型并行算法。通过解决GAP基准中的图问题并与其他稀疏矩阵库进行比较,说明了其性能。
{"title":"Algorithm 10xx: SuiteSparse:GraphBLAS: parallel graph algorithms in the language of sparse linear algebra","authors":"Timothy A. Davis","doi":"https://dl.acm.org/doi/10.1145/3577195","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3577195","url":null,"abstract":"<p>SuiteSparse:GraphBLAS is a full parallel implementation of the GraphBLAS standard, which defines a set of sparse matrix operations on an extended algebra of semirings using an almost unlimited variety of operators and types. When applied to sparse adjacency matrices, these algebraic operations are equivalent to computations on graphs. A description of the parallel implementation of SuiteSparse:GraphBLAS is given, including its novel parallel algorithms for sparse matrix multiply, addition, element-wise multiply, submatrix extraction and assignment, and the GraphBLAS mask/accumulator operation. Its performance is illustrated by solving the graph problems in the GAP Benchmark and by comparing it with other sparse matrix libraries.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138543465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithm 1037: SuiteSparse:GraphBLAS: Parallel Graph Algorithms in the Language of Sparse Linear Algebra 算法1037:SuiteSparse:GraphBLAS:稀疏线性代数语言中的并行图算法
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2023-01-25 DOI: 10.1145/3577195
T. Davis
SuiteSparse:GraphBLAS is a full parallel implementation of the GraphBLAS standard, which defines a set of sparse matrix operations on an extended algebra of semirings using an almost unlimited variety of operators and types. When applied to sparse adjacency matrices, these algebraic operations are equivalent to computations on graphs. A description of the parallel implementation of SuiteSparse:GraphBLAS is given, including its novel parallel algorithms for sparse matrix multiply, addition, element-wise multiply, submatrix extraction and assignment, and the GraphBLAS mask/accumulator operation. Its performance is illustrated by solving the graph problems in the GAP Benchmark and by comparing it with other sparse matrix libraries.
GraphBLAS是GraphBLAS标准的完全并行实现,它使用几乎无限种类的操作符和类型在扩展的半环代数上定义了一组稀疏矩阵操作。当应用于稀疏邻接矩阵时,这些代数运算相当于图上的计算。给出了SuiteSparse:GraphBLAS并行实现的描述,包括其用于稀疏矩阵乘法、加法、元素智能乘法、子矩阵提取和赋值以及GraphBLAS掩码/累加器操作的新型并行算法。通过解决GAP基准中的图问题并与其他稀疏矩阵库进行比较,说明了其性能。
{"title":"Algorithm 1037: SuiteSparse:GraphBLAS: Parallel Graph Algorithms in the Language of Sparse Linear Algebra","authors":"T. Davis","doi":"10.1145/3577195","DOIUrl":"https://doi.org/10.1145/3577195","url":null,"abstract":"SuiteSparse:GraphBLAS is a full parallel implementation of the GraphBLAS standard, which defines a set of sparse matrix operations on an extended algebra of semirings using an almost unlimited variety of operators and types. When applied to sparse adjacency matrices, these algebraic operations are equivalent to computations on graphs. A description of the parallel implementation of SuiteSparse:GraphBLAS is given, including its novel parallel algorithms for sparse matrix multiply, addition, element-wise multiply, submatrix extraction and assignment, and the GraphBLAS mask/accumulator operation. Its performance is illustrated by solving the graph problems in the GAP Benchmark and by comparing it with other sparse matrix libraries.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 30"},"PeriodicalIF":2.7,"publicationDate":"2023-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44957647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Robust Topological Construction of All-hexahedral Boundary Layer Meshes 所有六面体边界层网格的鲁棒拓扑构造
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-12-20 DOI: 10.1145/3577196
M. Reberol, Kilian Verhetsel, F. Henrotte, D. Bommes, J. Remacle
We present a robust technique to build a topologically optimal all-hexahedral layer on the boundary of a model with arbitrarily complex ridges and corners. The generated boundary layer mesh strictly respects the geometry of the input surface mesh, and it is optimal in the sense that the hexahedral valences of the boundary edges are as close as possible to their ideal values (local dihedral angle divided by 90°). Starting from a valid watertight surface mesh (all-quad in practice), we build a global optimization integer programming problem to minimize the mismatch between the hexahedral valences of the boundary edges and their ideal values. The formulation of the integer programming problem relies on the duality between boundary hexahedral configurations and triangulations of the disk, which we reframe in terms of integer constraints. The global problem is solved efficiently by performing combinatorial branch-and-bound searches on a series of sub-problems defined in the vicinity of complicated ridges/corners, where the local mesh topology is necessarily irregular because of the inherent constraints in hexahedral meshes. From the integer solution, we build the topology of the all-hexahedral layer, and the mesh geometry is computed by untangling/smoothing. Our approach is fully automated, topologically robust, and fast.
我们提出了一种鲁棒技术,在具有任意复杂脊和角的模型边界上构建拓扑最优的全六面体层。生成的边界层网格严格尊重输入曲面网格的几何形状,边界边缘的六面体价尽可能接近其理想值(局部二面角除以90°)是最优的。从一个有效的水密曲面网格(实际为全四边形)出发,构建了一个全局优化整数规划问题,以最小化边界边的六面体价与其理想值之间的不匹配。整数规划问题的公式依赖于磁盘的边界六面体构型和三角形之间的对偶性,我们根据整数约束对其进行重构。由于六面体网格的固有约束,局部网格拓扑结构必然是不规则的,通过在复杂脊/角附近定义的一系列子问题上进行组合分支定界搜索,有效地求解了全局问题。从整数解出发,构建了全六面体层的拓扑结构,并通过解缠/平滑计算网格几何形状。我们的方法是完全自动化的、拓扑健壮的、快速的。
{"title":"Robust Topological Construction of All-hexahedral Boundary Layer Meshes","authors":"M. Reberol, Kilian Verhetsel, F. Henrotte, D. Bommes, J. Remacle","doi":"10.1145/3577196","DOIUrl":"https://doi.org/10.1145/3577196","url":null,"abstract":"We present a robust technique to build a topologically optimal all-hexahedral layer on the boundary of a model with arbitrarily complex ridges and corners. The generated boundary layer mesh strictly respects the geometry of the input surface mesh, and it is optimal in the sense that the hexahedral valences of the boundary edges are as close as possible to their ideal values (local dihedral angle divided by 90°). Starting from a valid watertight surface mesh (all-quad in practice), we build a global optimization integer programming problem to minimize the mismatch between the hexahedral valences of the boundary edges and their ideal values. The formulation of the integer programming problem relies on the duality between boundary hexahedral configurations and triangulations of the disk, which we reframe in terms of integer constraints. The global problem is solved efficiently by performing combinatorial branch-and-bound searches on a series of sub-problems defined in the vicinity of complicated ridges/corners, where the local mesh topology is necessarily irregular because of the inherent constraints in hexahedral meshes. From the integer solution, we build the topology of the all-hexahedral layer, and the mesh geometry is computed by untangling/smoothing. Our approach is fully automated, topologically robust, and fast.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 32"},"PeriodicalIF":2.7,"publicationDate":"2022-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48482584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
FastSpline: Automatic Generation of Interpolants for Lattice Samplings FastSpline:网格采样插值的自动生成
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-12-20 DOI: 10.1145/3577194
J. Horacsek, U. Alim
Interpolation is a foundational concept in scientific computing and is at the heart of many scientific visualization techniques. There is usually a tradeoff between the approximation capabilities of an interpolation scheme and its evaluation efficiency. For many applications, it is important for a user to navigate their data in real time. In practice, evaluation efficiency outweighs any incremental improvements in reconstruction fidelity. We first analyze, from a general standpoint, the use of compact piece-wise polynomial basis functions to efficiently interpolate data that is sampled on a lattice. We then detail our automatic code-generation framework on both CPU and GPU architectures. Specifically, we propose a general framework that can produce a fast evaluation scheme by analyzing the algebro-geometric structure of the convolution sum for a given lattice and basis function combination. We demonstrate the utility and generality of our framework by providing fast implementations of various box splines on the Body Centered and Face Centered Cubic lattices, as well as some non-separable box splines on the Cartesian lattice. We also provide fast implementations for certain Voronoi-splines that have not yet appeared in the literature. Finally, we demonstrate that this framework may also be used for non-Cartesian lattices in 4D.
插值是科学计算中的一个基本概念,也是许多科学可视化技术的核心。通常在插值方案的逼近能力和评估效率之间存在权衡。对于许多应用程序,用户实时浏览他们的数据非常重要。实际上,评估效率比重建保真度的任何增量改进都重要。我们首先分析,从一般的角度来看,使用紧凑的分段多项式基函数来有效地插值在晶格上采样的数据。然后我们详细介绍了我们在CPU和GPU架构上的自动代码生成框架。具体来说,我们通过分析给定格和基函数组合的卷积和的代数-几何结构,提出了一个可以产生快速评估方案的一般框架。我们通过在以体为中心和以面为中心的立方格上提供各种盒样条的快速实现,以及在笛卡尔格上提供一些不可分离的盒样条,来展示我们框架的实用性和通用性。我们还提供了一些尚未在文献中出现的voronoi样条的快速实现。最后,我们证明了该框架也可以用于四维的非笛卡尔格。
{"title":"FastSpline: Automatic Generation of Interpolants for Lattice Samplings","authors":"J. Horacsek, U. Alim","doi":"10.1145/3577194","DOIUrl":"https://doi.org/10.1145/3577194","url":null,"abstract":"Interpolation is a foundational concept in scientific computing and is at the heart of many scientific visualization techniques. There is usually a tradeoff between the approximation capabilities of an interpolation scheme and its evaluation efficiency. For many applications, it is important for a user to navigate their data in real time. In practice, evaluation efficiency outweighs any incremental improvements in reconstruction fidelity. We first analyze, from a general standpoint, the use of compact piece-wise polynomial basis functions to efficiently interpolate data that is sampled on a lattice. We then detail our automatic code-generation framework on both CPU and GPU architectures. Specifically, we propose a general framework that can produce a fast evaluation scheme by analyzing the algebro-geometric structure of the convolution sum for a given lattice and basis function combination. We demonstrate the utility and generality of our framework by providing fast implementations of various box splines on the Body Centered and Face Centered Cubic lattices, as well as some non-separable box splines on the Cartesian lattice. We also provide fast implementations for certain Voronoi-splines that have not yet appeared in the literature. Finally, we demonstrate that this framework may also be used for non-Cartesian lattices in 4D.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 35"},"PeriodicalIF":2.7,"publicationDate":"2022-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47226907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
CS-TSSOS: Correlative and Term Sparsity for Large-Scale Polynomial Optimization CS-TSSOS:大规模多项式优化的相关项稀疏性
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-12-19 DOI: https://dl.acm.org/doi/10.1145/3569709
Jie Wang, Victor Magron, J. B. Lasserre, Ngoc Hoang Anh Mai

This work proposes a new moment-SOS hierarchy, called CS-TSSOS, for solving large-scale sparse polynomial optimization problems. Its novelty is to exploit simultaneously correlative sparsity and term sparsity by combining advantages of two existing frameworks for sparse polynomial optimization. The former is due to Waki et al. [40] while the latter was initially proposed by Wang et al. [42] and later exploited in the TSSOS hierarchy [46, 47]. In doing so we obtain CS-TSSOS—a two-level hierarchy of semidefinite programming relaxations with (i) the crucial property to involve blocks of SDP matrices and (ii) the guarantee of convergence to the global optimum under certain conditions. We demonstrate its efficiency and scalability on several large-scale instances of the celebrated Max-Cut problem and the important industrial optimal power flow problem, involving up to six thousand variables and tens of thousands of constraints.

这项工作提出了一个新的矩- sos层次结构,称为CS-TSSOS,用于解决大规模稀疏多项式优化问题。其新颖之处在于结合现有两种稀疏多项式优化框架的优点,同时利用了相关稀疏性和项稀疏性。前者源于Waki等人[40],而后者最初由Wang等人[42]提出,后来在TSSOS层次中得到利用[46,47]。在此过程中,我们得到了cs - tssos -一类两层半定规划松弛,它具有(i)涉及SDP矩阵块的关键性质和(ii)在一定条件下收敛到全局最优的保证。我们在著名的最大切割问题和重要的工业最优潮流问题的几个大规模实例上展示了它的效率和可扩展性,涉及多达6,000个变量和数万个约束。
{"title":"CS-TSSOS: Correlative and Term Sparsity for Large-Scale Polynomial Optimization","authors":"Jie Wang, Victor Magron, J. B. Lasserre, Ngoc Hoang Anh Mai","doi":"https://dl.acm.org/doi/10.1145/3569709","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3569709","url":null,"abstract":"<p>This work proposes a new moment-SOS hierarchy, called <i>CS-TSSOS</i>, for solving large-scale sparse polynomial optimization problems. Its novelty is to exploit simultaneously <i>correlative sparsity</i> and <i>term sparsity</i> by combining advantages of two existing frameworks for sparse polynomial optimization. The former is due to Waki et al. [40] while the latter was initially proposed by Wang et al. [42] and later exploited in the TSSOS hierarchy [46, 47]. In doing so we obtain CS-TSSOS—a two-level hierarchy of semidefinite programming relaxations with (i) the crucial property to involve blocks of SDP matrices and (ii) the guarantee of convergence to the global optimum under certain conditions. We demonstrate its efficiency and scalability on several large-scale instances of the celebrated Max-Cut problem and the important industrial optimal power flow problem, involving up to six thousand variables and tens of thousands of constraints.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"289 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Newly Released Capabilities in the Distributed-Memory SuperLU Sparse Direct Solver 分布式内存SuperLU稀疏直接求解器新发布的功能
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-12-19 DOI: 10.1145/3577197
X. Li, Paul B. S. Lin, Yang Liu, Piyush Sao
We present the new features available in the recent release of SuperLU_DIST, Version 8.1.1. SuperLU_DIST is a distributed-memory parallel sparse direct solver. The new features include (1) a 3D communication-avoiding algorithm framework that trades off inter-process communication for selective memory duplication, (2) multi-GPU support for both NVIDIA GPUs and AMD GPUs, and (3) mixed-precision routines that perform single-precision LU factorization and double-precision iterative refinement. Apart from the algorithm improvements, we also modernized the software build system to use CMake and Spack package installation tools to simplify the installation procedure. Throughout the article, we describe in detail the pertinent performance-sensitive parameters associated with each new algorithmic feature, show how they are exposed to the users, and give general guidance of how to set these parameters. We illustrate that the solver’s performance both in time and memory can be greatly improved after systematic tuning of the parameters, depending on the input sparse matrix and underlying hardware.
我们将介绍SuperLU_DIST 8.1.1版本中提供的新特性。SuperLU_DIST是一个分布式内存并行稀疏直接求解器。新功能包括(1)一个避免3D通信的算法框架,该框架可以在进程间通信中进行选择性内存复制,(2)支持NVIDIA gpu和AMD gpu的多gpu,以及(3)执行单精度LU分解和双精度迭代细化的混合精度例程。除了算法改进之外,我们还对软件构建系统进行了现代化改造,使用CMake和Spack包安装工具来简化安装过程。在本文中,我们详细描述了与每个新算法特性相关联的相关性能敏感参数,展示了如何向用户展示这些参数,并提供了如何设置这些参数的一般指导。我们证明,根据输入稀疏矩阵和底层硬件,系统调整参数后,求解器在时间和内存方面的性能都可以大大提高。
{"title":"Newly Released Capabilities in the Distributed-Memory SuperLU Sparse Direct Solver","authors":"X. Li, Paul B. S. Lin, Yang Liu, Piyush Sao","doi":"10.1145/3577197","DOIUrl":"https://doi.org/10.1145/3577197","url":null,"abstract":"We present the new features available in the recent release of SuperLU_DIST, Version 8.1.1. SuperLU_DIST is a distributed-memory parallel sparse direct solver. The new features include (1) a 3D communication-avoiding algorithm framework that trades off inter-process communication for selective memory duplication, (2) multi-GPU support for both NVIDIA GPUs and AMD GPUs, and (3) mixed-precision routines that perform single-precision LU factorization and double-precision iterative refinement. Apart from the algorithm improvements, we also modernized the software build system to use CMake and Spack package installation tools to simplify the installation procedure. Throughout the article, we describe in detail the pertinent performance-sensitive parameters associated with each new algorithmic feature, show how they are exposed to the users, and give general guidance of how to set these parameters. We illustrate that the solver’s performance both in time and memory can be greatly improved after systematic tuning of the parameters, depending on the input sparse matrix and underlying hardware.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 20"},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45727625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
pylspack: Parallel Algorithms and Data Structures for Sketching, Column Subset Selection, Regression, and Leverage Scores 并行算法和数据结构素描,列子集选择,回归,和杠杆得分
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-12-19 DOI: https://dl.acm.org/doi/10.1145/3555370
Aleksandros Sobczyk, Efstratios Gallopoulos

We present parallel algorithms and data structures for three fundamental operations in Numerical Linear Algebra: (i) Gaussian and CountSketch random projections and their combination, (ii) computation of the Gram matrix, and (iii) computation of the squared row norms of the product of two matrices, with a special focus on “tall-and-skinny” matrices, which arise in many applications. We provide a detailed analysis of the ubiquitous CountSketch transform and its combination with Gaussian random projections, accounting for memory requirements, computational complexity and workload balancing. We also demonstrate how these results can be applied to column subset selection, least squares regression and leverage scores computation. These tools have been implemented in pylspack, a publicly available Python package1 whose core is written in C++ and parallelized with OpenMP and that is compatible with standard matrix data structures of SciPy and NumPy. Extensive numerical experiments indicate that the proposed algorithms scale well and significantly outperform existing libraries for tall-and-skinny matrices.

我们提出了数值线性代数中三个基本运算的并行算法和数据结构:(i)高斯和countssketch随机投影及其组合,(ii) Gram矩阵的计算,以及(iii)两个矩阵乘积的平方行范数的计算,特别关注在许多应用中出现的“高和瘦”矩阵。我们详细分析了无处不在的countssketch变换及其与高斯随机投影的结合,考虑了内存需求,计算复杂性和工作负载平衡。我们还演示了如何将这些结果应用于列子集选择、最小二乘回归和杠杆分数计算。这些工具已经在pylspack中实现,pylspack是一个公开可用的Python包,其核心是用c++编写的,并与OpenMP并行,并且与SciPy和NumPy的标准矩阵数据结构兼容。大量的数值实验表明,所提出的算法具有良好的可扩展性,并且明显优于现有的高瘦矩阵库。
{"title":"pylspack: Parallel Algorithms and Data Structures for Sketching, Column Subset Selection, Regression, and Leverage Scores","authors":"Aleksandros Sobczyk, Efstratios Gallopoulos","doi":"https://dl.acm.org/doi/10.1145/3555370","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3555370","url":null,"abstract":"<p>We present parallel algorithms and data structures for three fundamental operations in Numerical Linear Algebra: (i) Gaussian and CountSketch random projections and their combination, (ii) computation of the Gram matrix, and (iii) computation of the squared row norms of the product of two matrices, with a special focus on “tall-and-skinny” matrices, which arise in many applications. We provide a detailed analysis of the ubiquitous CountSketch transform and its combination with Gaussian random projections, accounting for memory requirements, computational complexity and workload balancing. We also demonstrate how these results can be applied to column subset selection, least squares regression and leverage scores computation. These tools have been implemented in <monospace>pylspack</monospace>, a publicly available Python package<sup>1</sup> whose core is written in C++ and parallelized with OpenMP and that is compatible with standard matrix data structures of SciPy and NumPy. Extensive numerical experiments indicate that the proposed algorithms scale well and significantly outperform existing libraries for tall-and-skinny matrices.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"9 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting Constant Trace Property in Large-scale Polynomial Optimization 利用大规模多项式优化中的常迹特性
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-12-19 DOI: https://dl.acm.org/doi/10.1145/3555309
Ngoc Hoang Anh Mai, J. B. Lasserre, Victor Magron, Jie Wang

We prove that every semidefinite moment relaxation of a polynomial optimization problem (POP) with a ball constraint can be reformulated as a semidefinite program involving a matrix with constant trace property (CTP). As a result, such moment relaxations can be solved efficiently by first-order methods that exploit CTP, e.g., the conditional gradient-based augmented Lagrangian method. We also extend this CTP-exploiting framework to large-scale POPs with different sparsity structures. The efficiency and scalability of our framework are illustrated on some moment relaxations for various randomly generated POPs, especially second-order moment relaxations for quadratically constrained quadratic programs.

证明了具有球约束的多项式优化问题(POP)的每一个半定矩松弛都可以重新表述为一个包含常迹性质矩阵(CTP)的半定规划。因此,这种矩松弛可以通过利用CTP的一阶方法有效地求解,例如,基于条件梯度的增广拉格朗日方法。我们还将这种ctp开发框架扩展到具有不同稀疏结构的大规模持久性有机污染物。对各种随机生成的pop的矩松弛问题,特别是二次约束二次规划的二阶矩松弛问题,说明了该框架的有效性和可扩展性。
{"title":"Exploiting Constant Trace Property in Large-scale Polynomial Optimization","authors":"Ngoc Hoang Anh Mai, J. B. Lasserre, Victor Magron, Jie Wang","doi":"https://dl.acm.org/doi/10.1145/3555309","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3555309","url":null,"abstract":"<p>We prove that every semidefinite moment relaxation of a polynomial optimization problem (POP) with a ball constraint can be reformulated as a semidefinite program involving a matrix with constant trace property (CTP). As a result, such moment relaxations can be solved efficiently by first-order methods that exploit CTP, e.g., the conditional gradient-based augmented Lagrangian method. We also extend this CTP-exploiting framework to large-scale POPs with different sparsity structures. The efficiency and scalability of our framework are illustrated on some moment relaxations for various randomly generated POPs, especially second-order moment relaxations for quadratically constrained quadratic programs.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"48 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Normal Form Algorithm for Tensor Rank Decomposition 张量秩分解的一种范式算法
IF 2.7 1区 数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2022-12-19 DOI: https://dl.acm.org/doi/10.1145/3555369
Simon Telen, Nick Vannieuwenhoven

We propose a new numerical algorithm for computing the tensor rank decomposition or canonical polyadic decomposition of higher-order tensors subject to a rank and genericity constraint. Reformulating this computational problem as a system of polynomial equations allows us to leverage recent numerical linear algebra tools from computational algebraic geometry. We characterize the complexity of our algorithm in terms of an algebraic property of this polynomial system—the multigraded regularity. We prove effective bounds for many tensor formats and ranks, which are of independent interest for overconstrained polynomial system solving. Moreover, we conjecture a general formula for the multigraded regularity, yielding a (parameterized) polynomial time complexity for the tensor rank decomposition problem in the considered setting. Our numerical experiments show that our algorithm can outperform state-of-the-art numerical algorithms by an order of magnitude in terms of accuracy, computation time, and memory consumption.

我们提出了一种新的数值算法来计算受秩和泛型约束的高阶张量的张量秩分解或正则多进分解。将这个计算问题重新表述为多项式方程系统,使我们能够利用计算代数几何中最新的数值线性代数工具。我们用这个多项式系统的一个代数性质来描述我们算法的复杂性——多重梯度正则性。我们证明了许多张量格式和秩的有效界,它们对于求解过约束多项式系统具有独立的意义。此外,我们推测了一个多阶正则性的一般公式,为所考虑的设置中的张量秩分解问题提供了(参数化的)多项式时间复杂度。我们的数值实验表明,我们的算法在精度、计算时间和内存消耗方面比最先进的数值算法要好一个数量级。
{"title":"A Normal Form Algorithm for Tensor Rank Decomposition","authors":"Simon Telen, Nick Vannieuwenhoven","doi":"https://dl.acm.org/doi/10.1145/3555369","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3555369","url":null,"abstract":"<p>We propose a new numerical algorithm for computing the tensor rank decomposition or canonical polyadic decomposition of higher-order tensors subject to a rank and genericity constraint. Reformulating this computational problem as a system of polynomial equations allows us to leverage recent numerical linear algebra tools from computational algebraic geometry. We characterize the complexity of our algorithm in terms of an algebraic property of this polynomial system—the multigraded regularity. We prove effective bounds for many tensor formats and ranks, which are of independent interest for overconstrained polynomial system solving. Moreover, we conjecture a general formula for the multigraded regularity, yielding a (parameterized) polynomial time complexity for the tensor rank decomposition problem in the considered setting. Our numerical experiments show that our algorithm can outperform state-of-the-art numerical algorithms by an order of magnitude in terms of accuracy, computation time, and memory consumption.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"37 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Mathematical Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1