首页 > 最新文献

ACM Transactions on Mathematical Software (TOMS)最新文献

英文 中文
Algorithm 1017 算法1017
Pub Date : 2021-06-25 DOI: 10.1145/3451389
Pavel Škrabánek, N. Martínková
Fuzzy regression provides an alternative to statistical regression when the model is indefinite, the relationships between model parameters are vague, the sample size is low, or the data are hierarchically structured. Such cases allow to consider the choice of a regression model based on the fuzzy set theory. In fuzzyreg, we implement fuzzy linear regression methods that differ in the expectations of observational data types, outlier handling, and parameter estimation method. We provide a wrapper function that prepares data for fitting fuzzy linear models with the respective methods from a syntax established in R for fitting regression models. The function fuzzylm thus provides a novel functionality for R through standardized operations with fuzzy numbers. Additional functions allow for conversion of real-value variables to be fuzzy numbers, printing, summarizing, model plotting, and calculation of model predictions from new data using supporting functions that perform arithmetic operations with triangular fuzzy numbers. Goodness of fit and total error of the fit measures allow model comparisons. The package contains a dataset named bats with measurements of temperatures of hibernating bats and the mean annual surface temperature reflecting the climate at the sampling sites. The predictions from fuzzy linear models fitted to this dataset correspond well to the observed biological phenomenon. Fuzzy linear regression has great potential in predictive modeling where the data structure prevents statistical analysis and the modeled process exhibits inherent fuzziness.
当模型是不确定的,模型参数之间的关系模糊,样本量小,或者数据是分层结构时,模糊回归提供了一种替代统计回归的方法。这种情况允许考虑基于模糊集理论的回归模型的选择。在fuzzyreg中,我们实现了不同于观测数据类型、异常值处理和参数估计方法的模糊线性回归方法。我们提供了一个包装器函数,该函数用R中建立的用于拟合回归模型的语法中的相应方法准备拟合模糊线性模型的数据。因此,函数fuzzylm通过模糊数的标准化操作为R提供了一种新的功能。附加函数允许将实值变量转换为模糊数、打印、汇总、模型绘制以及使用支持函数从新数据计算模型预测,这些支持函数对三角模糊数执行算术运算。拟合优度和拟合措施的总误差允许模型比较。该包包含一个名为蝙蝠的数据集,其中测量了冬眠蝙蝠的温度和反映采样地点气候的年平均地表温度。从模糊线性模型拟合到这个数据集的预测与观察到的生物现象很好地对应。模糊线性回归在预测建模中具有很大的潜力,因为数据结构阻碍了统计分析,并且建模过程具有固有的模糊性。
{"title":"Algorithm 1017","authors":"Pavel Škrabánek, N. Martínková","doi":"10.1145/3451389","DOIUrl":"https://doi.org/10.1145/3451389","url":null,"abstract":"Fuzzy regression provides an alternative to statistical regression when the model is indefinite, the relationships between model parameters are vague, the sample size is low, or the data are hierarchically structured. Such cases allow to consider the choice of a regression model based on the fuzzy set theory. In fuzzyreg, we implement fuzzy linear regression methods that differ in the expectations of observational data types, outlier handling, and parameter estimation method. We provide a wrapper function that prepares data for fitting fuzzy linear models with the respective methods from a syntax established in R for fitting regression models. The function fuzzylm thus provides a novel functionality for R through standardized operations with fuzzy numbers. Additional functions allow for conversion of real-value variables to be fuzzy numbers, printing, summarizing, model plotting, and calculation of model predictions from new data using supporting functions that perform arithmetic operations with triangular fuzzy numbers. Goodness of fit and total error of the fit measures allow model comparisons. The package contains a dataset named bats with measurements of temperatures of hibernating bats and the mean annual surface temperature reflecting the climate at the sampling sites. The predictions from fuzzy linear models fitted to this dataset correspond well to the observed biological phenomenon. Fuzzy linear regression has great potential in predictive modeling where the data structure prevents statistical analysis and the modeled process exhibits inherent fuzziness.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"37 1","pages":"1 - 18"},"PeriodicalIF":0.0,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77061136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines 一组批处理基本线性代数子程序和LAPACK例程
Pub Date : 2021-06-25 DOI: 10.1145/3431921
A. Abdelfattah, Timothy B. Costa, J. Dongarra, M. Gates, A. Haidar, S. Hammarling, N. Higham, J. Kurzak, P. Luszczek, S. Tomov, M. Zounon
This article describes a standard API for a set of Batched Basic Linear Algebra Subprograms (Batched BLAS or BBLAS). The focus is on many independent BLAS operations on small matrices that are grouped together and processed by a single routine, called a Batched BLAS routine. The matrices are grouped together in uniformly sized groups, with just one group if all the matrices are of equal size. The aim is to provide more efficient, but portable, implementations of algorithms on high-performance many-core platforms. These include multicore and many-core CPU processors, GPUs and coprocessors, and other hardware accelerators with floating-point compute facility. As well as the standard types of single and double precision, we also include half and quadruple precision in the standard. In particular, half precision is used in many very large scale applications, such as those associated with machine learning.
本文描述了一组Batched Basic Linear Algebra子程序(Batched BLAS或BBLAS)的标准API。重点是对小矩阵进行许多独立的BLAS操作,这些操作被分组在一起并由单个例程(称为批处理BLAS例程)处理。矩阵被分组成大小一致的组,如果所有矩阵的大小相等,则只有一个组。其目的是在高性能多核平台上提供更高效但可移植的算法实现。这些包括多核和多核CPU处理器、gpu和协处理器,以及其他具有浮点计算功能的硬件加速器。除了单精度和双精度的标准类型外,我们还包括半精度和四倍精度的标准类型。特别是,半精度在许多非常大规模的应用中使用,例如与机器学习相关的应用。
{"title":"A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines","authors":"A. Abdelfattah, Timothy B. Costa, J. Dongarra, M. Gates, A. Haidar, S. Hammarling, N. Higham, J. Kurzak, P. Luszczek, S. Tomov, M. Zounon","doi":"10.1145/3431921","DOIUrl":"https://doi.org/10.1145/3431921","url":null,"abstract":"This article describes a standard API for a set of Batched Basic Linear Algebra Subprograms (Batched BLAS or BBLAS). The focus is on many independent BLAS operations on small matrices that are grouped together and processed by a single routine, called a Batched BLAS routine. The matrices are grouped together in uniformly sized groups, with just one group if all the matrices are of equal size. The aim is to provide more efficient, but portable, implementations of algorithms on high-performance many-core platforms. These include multicore and many-core CPU processors, GPUs and coprocessors, and other hardware accelerators with floating-point compute facility. As well as the standard types of single and double precision, we also include half and quadruple precision in the standard. In particular, half precision is used in many very large scale applications, such as those associated with machine learning.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"582 1","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78931136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
HIFIR: Hybrid Incomplete Factorization with Iterative Refinement for Preconditioning Ill-Conditioned and Singular Systems 预处理病态奇异系统的混合不完全分解与迭代改进
Pub Date : 2021-06-18 DOI: 10.1145/3536165
Qiao Chen, X. Jiao
We introduce a software package called Hybrid Incomplete Factorization with Iterative Refinement (HIFIR) for preconditioning sparse, unsymmetric, ill-conditioned, and potentially singular systems. HIFIR computes a hybrid incomplete factorization (HIF), which combines multilevel incomplete LU factorization with a truncated, rank-revealing QR (RRQR) factorization on the final Schur complement. This novel hybridization is based on the new theory of ϵ-accurate approximate generalized inverse (AGI). It enables near-optimal preconditioners for consistent systems and enables flexible GMRES to solve inconsistent systems when coupled with iterative refinement. In this article, we focus on some practical algorithmic and software issues of HIFIR. In particular, we introduce a new inverse-based rook pivoting (IBRP) into ILU, which improves the robustness and the overall efficiency for some ill-conditioned systems by significantly reducing the size of the final Schur complement for some systems. We also describe the software design of HIFIR in terms of its efficient data structures for supporting rook pivoting in a multilevel setting, its template-based generic programming interfaces for mixed-precision real and complex values in C++, and its user-friendly high-level interfaces in MATLAB and Python. We demonstrate the effectiveness of HIFIR for ill-conditioned or singular systems arising from several applications, including the Helmholtz equation, linear elasticity, stationary incompressible Navier–Stokes (INS) equations, and time-dependent advection-diffusion equation.
我们介绍了一个软件包称为混合不完全分解迭代细化(HIFIR)预处理稀疏,不对称,病态,和潜在的奇异系统。HIFIR计算混合不完全分解(HIF),它将多级不完全LU分解与最终Schur补上的截断、显示秩的QR (RRQR)分解相结合。这种新的杂化是基于ϵ-accurate近似广义逆(AGI)的新理论。它为一致系统提供了近乎最佳的预调节器,并使灵活的GMRES能够在与迭代改进相结合时解决不一致的系统。在本文中,我们重点讨论了HIFIR的一些实际算法和软件问题。特别地,我们在ILU中引入了一种新的逆基车转(IBRP),通过显著减少一些系统的最终Schur补的大小,提高了一些病态系统的鲁棒性和整体效率。我们还描述了HIFIR的软件设计,包括其高效的数据结构,以支持多级设置中的车旋转,其基于模板的通用编程接口,用于混合精度的实数和复数值,以及其在MATLAB和Python中的用户友好的高级界面。我们通过几种应用,包括Helmholtz方程、线性弹性、平稳不可压缩Navier-Stokes (INS)方程和随时间变化的平流扩散方程,证明了HIFIR对病态或奇异系统的有效性。
{"title":"HIFIR: Hybrid Incomplete Factorization with Iterative Refinement for Preconditioning Ill-Conditioned and Singular Systems","authors":"Qiao Chen, X. Jiao","doi":"10.1145/3536165","DOIUrl":"https://doi.org/10.1145/3536165","url":null,"abstract":"We introduce a software package called Hybrid Incomplete Factorization with Iterative Refinement (HIFIR) for preconditioning sparse, unsymmetric, ill-conditioned, and potentially singular systems. HIFIR computes a hybrid incomplete factorization (HIF), which combines multilevel incomplete LU factorization with a truncated, rank-revealing QR (RRQR) factorization on the final Schur complement. This novel hybridization is based on the new theory of ϵ-accurate approximate generalized inverse (AGI). It enables near-optimal preconditioners for consistent systems and enables flexible GMRES to solve inconsistent systems when coupled with iterative refinement. In this article, we focus on some practical algorithmic and software issues of HIFIR. In particular, we introduce a new inverse-based rook pivoting (IBRP) into ILU, which improves the robustness and the overall efficiency for some ill-conditioned systems by significantly reducing the size of the final Schur complement for some systems. We also describe the software design of HIFIR in terms of its efficient data structures for supporting rook pivoting in a multilevel setting, its template-based generic programming interfaces for mixed-precision real and complex values in C++, and its user-friendly high-level interfaces in MATLAB and Python. We demonstrate the effectiveness of HIFIR for ill-conditioned or singular systems arising from several applications, including the Helmholtz equation, linear elasticity, stationary incompressible Navier–Stokes (INS) equations, and time-dependent advection-diffusion equation.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"110 1","pages":"1 - 33"},"PeriodicalIF":0.0,"publicationDate":"2021-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86234848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bringing Trimmed Serendipity Methods to Computational Practice in Firedrake 将修剪的意外发现方法引入Firedrake的计算实践
Pub Date : 2021-04-27 DOI: 10.1145/3490485
J. Crum, Cyrus Cheng, D. Ham, L. Mitchell, R. Kirby, J. Levine, A. Gillette
We present an implementation of the trimmed serendipity finite element family, using the open-source finite element package Firedrake. The new elements can be used seamlessly within the software suite for problems requiring H1, H(curl), or H(div)-conforming elements on meshes of squares or cubes. To test how well trimmed serendipity elements perform in comparison to traditional tensor product elements, we perform a sequence of numerical experiments including the primal Poisson, mixed Poisson, and Maxwell cavity eigenvalue problems. Overall, we find that the trimmed serendipity elements converge, as expected, at the same rate as the respective tensor product elements, while being able to offer significant savings in the time or memory required to solve certain problems.
我们使用开源有限元包Firedrake实现了修剪过的serendipity有限元族。新元素可以在软件套件中无缝使用,用于需要在正方形或立方体网格上符合H1、H(旋度)或H(div)的元素的问题。为了测试裁剪好的意外元素与传统张量积元素相比表现如何,我们进行了一系列数值实验,包括原始泊松、混合泊松和麦克斯韦腔特征值问题。总的来说,我们发现修剪的serendipity元素收敛,正如预期的那样,与各自的张量积元素以相同的速度收敛,同时能够在解决某些问题所需的时间或内存方面提供显着节省。
{"title":"Bringing Trimmed Serendipity Methods to Computational Practice in Firedrake","authors":"J. Crum, Cyrus Cheng, D. Ham, L. Mitchell, R. Kirby, J. Levine, A. Gillette","doi":"10.1145/3490485","DOIUrl":"https://doi.org/10.1145/3490485","url":null,"abstract":"We present an implementation of the trimmed serendipity finite element family, using the open-source finite element package Firedrake. The new elements can be used seamlessly within the software suite for problems requiring H1, H(curl), or H(div)-conforming elements on meshes of squares or cubes. To test how well trimmed serendipity elements perform in comparison to traditional tensor product elements, we perform a sequence of numerical experiments including the primal Poisson, mixed Poisson, and Maxwell cavity eigenvalue problems. Overall, we find that the trimmed serendipity elements converge, as expected, at the same rate as the respective tensor product elements, while being able to offer significant savings in the time or memory required to solve certain problems.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"48 1","pages":"1 - 19"},"PeriodicalIF":0.0,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86942098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software 银杏线性代数软件中高性能预处理的自适应精确块-雅可比
Pub Date : 2021-04-26 DOI: 10.1145/3441850
Goran Flegar, H. Anzt, T. Cojean, E. S. Quintana‐Ortí
The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.
在数值算法中使用混合精度是加速科学应用的一种有前途的策略。特别是,在高端gpu(图形处理单元)中采用专门的硬件和数据格式来进行低精度算法,这激发了许多旨在仔细降低工作精度以加快计算速度的努力。对于性能受内存带宽限制的算法,在内存访问之前(和之后)压缩其数据的想法受到了相当大的关注。一种想法是将近似运算符(类似于预处理符)存储在低于工作精度的位置,希望不会影响算法输出。我们实现了自适应精度块- jacobi预调节器的第一个高性能实现,它选择用于存储预调节器数据的精度格式,同时考虑到单个预调节器块的数值属性。我们在Ginkgo线性代数库中实现了自适应块jacobi预调节器作为生产就绪功能,不仅考虑了IEEE标准的精确格式,还考虑了优化指数长度和显著预调节器块特性的自定义格式。在最先进的GPU加速器上运行的实验表明,我们的实现提供了有吸引力的运行时节省。
{"title":"Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software","authors":"Goran Flegar, H. Anzt, T. Cojean, E. S. Quintana‐Ortí","doi":"10.1145/3441850","DOIUrl":"https://doi.org/10.1145/3441850","url":null,"abstract":"The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"39 1","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2021-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84707846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Algorithm 1016 算法1016
Pub Date : 2021-04-20 DOI: 10.1145/3446979
Jens Hahne, S. Friedhoff, M. Bolten
In this article, we introduce the Python framework PyMGRIT, which implements the multigrid-reduction-in-time (MGRIT) algorithm for solving (non-)linear systems arising from the discretization of time-dependent problems. The MGRIT algorithm is a reduction-based iterative method that allows parallel-in-time simulations, i.e., calculating multiple time steps simultaneously in a simulation, using a time-grid hierarchy. The PyMGRIT framework includes many different variants of the MGRIT algorithm, ranging from different multigrid cycle types and relaxation schemes, various coarsening strategies, including time-only and space-time coarsening, and the ability to utilize different time integrators on different levels in the multigrid hierachy. The comprehensive documentation with tutorials and many examples and the fully documented code allow an easy start into the work with the package. The functionality of the code is ensured by automated serial and parallel tests using continuous integration. PyMGRIT supports serial runs suitable for prototyping and testing of new approaches, as well as parallel runs using the Message Passing Interface (MPI). In this manuscript, we describe the implementation of the MGRIT algorithm in PyMGRIT and present the usage from both a user and a developer point of view. Three examples illustrate different aspects of the package itself, especially running tests with pure time parallelism, as well as space-time parallelism through the coupling of PyMGRIT with PETSc or Firedrake.
在本文中,我们介绍了Python框架PyMGRIT,它实现了多网格时间约简(MGRIT)算法,用于解决由时间相关问题离散化引起的(非线性)系统。MGRIT算法是一种基于约简的迭代方法,允许并行实时模拟,即在模拟中同时计算多个时间步长,使用时间网格层次结构。PyMGRIT框架包括MGRIT算法的许多不同变体,包括不同的多网格循环类型和松弛方案,各种粗化策略,包括时间粗化和时空粗化,以及在多网格层次的不同层次上使用不同时间积分器的能力。包含教程和许多示例的全面文档以及完整文档化的代码允许轻松地开始使用该包的工作。代码的功能是通过使用持续集成的自动化串行和并行测试来保证的。PyMGRIT支持适合于新方法的原型设计和测试的串行运行,以及使用消息传递接口(Message Passing Interface, MPI)的并行运行。在本文中,我们描述了在PyMGRIT中MGRIT算法的实现,并从用户和开发人员的角度介绍了使用方法。三个示例说明了包本身的不同方面,特别是使用纯时间并行性运行测试,以及通过PyMGRIT与PETSc或Firedrake的耦合运行时空并行性。
{"title":"Algorithm 1016","authors":"Jens Hahne, S. Friedhoff, M. Bolten","doi":"10.1145/3446979","DOIUrl":"https://doi.org/10.1145/3446979","url":null,"abstract":"In this article, we introduce the Python framework PyMGRIT, which implements the multigrid-reduction-in-time (MGRIT) algorithm for solving (non-)linear systems arising from the discretization of time-dependent problems. The MGRIT algorithm is a reduction-based iterative method that allows parallel-in-time simulations, i.e., calculating multiple time steps simultaneously in a simulation, using a time-grid hierarchy. The PyMGRIT framework includes many different variants of the MGRIT algorithm, ranging from different multigrid cycle types and relaxation schemes, various coarsening strategies, including time-only and space-time coarsening, and the ability to utilize different time integrators on different levels in the multigrid hierachy. The comprehensive documentation with tutorials and many examples and the fully documented code allow an easy start into the work with the package. The functionality of the code is ensured by automated serial and parallel tests using continuous integration. PyMGRIT supports serial runs suitable for prototyping and testing of new approaches, as well as parallel runs using the Message Passing Interface (MPI). In this manuscript, we describe the implementation of the MGRIT algorithm in PyMGRIT and present the usage from both a user and a developer point of view. Three examples illustrate different aspects of the package itself, especially running tests with pure time parallelism, as well as space-time parallelism through the coupling of PyMGRIT with PETSc or Firedrake.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"26 1","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73177657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Algorithm 1015 算法1015
Pub Date : 2021-04-20 DOI: 10.1145/3442348
S. Guthe, D. Thuerck
We present a new algorithm for solving the dense linear (sum) assignment problem and an efficient, parallel implementation that is based on the successive shortest path algorithm. More specifically, we introduce the well-known epsilon scaling approach used in the Auction algorithm to approximate the dual variables of the successive shortest path algorithm prior to solving the assignment problem to limit the complexity of the path search. This improves the runtime by several orders of magnitude for hard-to-solve real-world problems, making the runtime virtually independent of how hard the assignment is to find. In addition, our approach allows for using accelerators and/or external compute resources to calculate individual rows of the cost matrix. This enables us to solve problems that are larger than what has been reported in the past, including the ability to efficiently solve problems whose cost matrix exceeds the available systems memory. To our knowledge, this is the first implementation that is able to solve problems with more than one trillion arcs in less than 100 hours on a single machine.
我们提出了一种求解密集线性(和)分配问题的新算法,以及一种基于连续最短路径算法的高效并行实现。更具体地说,我们引入了拍卖算法中使用的著名的epsilon缩放方法,在解决分配问题之前近似连续最短路径算法的对偶变量,以限制路径搜索的复杂性。对于难以解决的现实问题,这将运行时提高了几个数量级,使运行时实际上独立于找到分配的难度。此外,我们的方法允许使用加速器和/或外部计算资源来计算成本矩阵的各个行。这使我们能够解决比过去报告的更大的问题,包括有效解决成本矩阵超过可用系统内存的问题的能力。据我们所知,这是第一个能够在不到100小时的时间内在一台机器上解决超过一万亿次电弧问题的实现。
{"title":"Algorithm 1015","authors":"S. Guthe, D. Thuerck","doi":"10.1145/3442348","DOIUrl":"https://doi.org/10.1145/3442348","url":null,"abstract":"We present a new algorithm for solving the dense linear (sum) assignment problem and an efficient, parallel implementation that is based on the successive shortest path algorithm. More specifically, we introduce the well-known epsilon scaling approach used in the Auction algorithm to approximate the dual variables of the successive shortest path algorithm prior to solving the assignment problem to limit the complexity of the path search. This improves the runtime by several orders of magnitude for hard-to-solve real-world problems, making the runtime virtually independent of how hard the assignment is to find. In addition, our approach allows for using accelerators and/or external compute resources to calculate individual rows of the cost matrix. This enables us to solve problems that are larger than what has been reported in the past, including the ability to efficiently solve problems whose cost matrix exceeds the available systems memory. To our knowledge, this is the first implementation that is able to solve problems with more than one trillion arcs in less than 100 hours on a single machine.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"50 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74209025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Supporting Mixed-domain Mixed-precision Matrix Multiplication within the BLIS Framework 在BLIS框架内支持混合域混合精度矩阵乘法
Pub Date : 2021-04-20 DOI: 10.1145/3402225
F. V. Zee, D. Parikh, R. V. D. Geijn
We approach the problem of implementing mixed-datatype support within the general matrix multiplication (gemm) operation of the BLAS-like Library Instantiation Software framework, whereby each matrix operand A, B, and C may be stored as single- or double-precision real or complex values. Another factor of complexity, whereby the matrix product and accumulation are allowed to take place in a precision different from the storage precisions of either A or B, is also discussed. We first break the problem into orthogonal dimensions, considering the mixing of domains separately from mixing precisions. Support for all combinations of matrix operands stored in either the real or complex domain is mapped out by enumerating the cases and describing an implementation approach for each. Supporting all combinations of storage and computation precisions is handled by typecasting the matrices at key stages of the computation—during packing and/or accumulation, as needed. Several optional optimizations are also documented. Performance results gathered on a 56-core Marvell ThunderX2 and a 52-core Intel Xeon Platinum demonstrate that high performance is mostly preserved, with modest slowdowns incurred from unavoidable typecast instructions. The mixed-datatype implementation confirms that combinatorial intractability is avoided, with the framework relying on only two assembly microkernels to implement 128 datatype combinations.
我们解决了在类似blas的库实例化软件框架的一般矩阵乘法(gem)操作中实现混合数据类型支持的问题,其中每个矩阵操作数A、B和C可以存储为单精度或双精度实数或复数值。还讨论了另一个复杂性因素,即允许矩阵乘积和累加以不同于a或B的存储精度的精度进行。我们首先将问题分解为正交维度,将混合精度与混合域的混合分开考虑。对存储在实数或复数域中的矩阵操作数的所有组合的支持是通过列举每种情况并描述每种情况的实现方法来绘制的。支持存储精度和计算精度的所有组合是通过在计算的关键阶段(根据需要在打包和/或累积期间)对矩阵进行类型转换来处理的。还记录了几个可选的优化。在56核Marvell ThunderX2和52核Intel Xeon Platinum上收集的性能结果表明,高性能在很大程度上得到了保留,由于不可避免的类型转换指令导致了适度的减速。混合数据类型实现确保避免了组合的难处性,框架仅依赖于两个汇编微内核来实现128个数据类型组合。
{"title":"Supporting Mixed-domain Mixed-precision Matrix Multiplication within the BLIS Framework","authors":"F. V. Zee, D. Parikh, R. V. D. Geijn","doi":"10.1145/3402225","DOIUrl":"https://doi.org/10.1145/3402225","url":null,"abstract":"We approach the problem of implementing mixed-datatype support within the general matrix multiplication (gemm) operation of the BLAS-like Library Instantiation Software framework, whereby each matrix operand A, B, and C may be stored as single- or double-precision real or complex values. Another factor of complexity, whereby the matrix product and accumulation are allowed to take place in a precision different from the storage precisions of either A or B, is also discussed. We first break the problem into orthogonal dimensions, considering the mixing of domains separately from mixing precisions. Support for all combinations of matrix operands stored in either the real or complex domain is mapped out by enumerating the cases and describing an implementation approach for each. Supporting all combinations of storage and computation precisions is handled by typecasting the matrices at key stages of the computation—during packing and/or accumulation, as needed. Several optional optimizations are also documented. Performance results gathered on a 56-core Marvell ThunderX2 and a 52-core Intel Xeon Platinum demonstrate that high performance is mostly preserved, with modest slowdowns incurred from unavoidable typecast instructions. The mixed-datatype implementation confirms that combinatorial intractability is avoided, with the framework relying on only two assembly microkernels to implement 128 datatype combinations.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"90 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79932608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improved Arithmetic of Complex Fans 复杂风扇的改进算法
Pub Date : 2021-04-20 DOI: 10.1145/3434400
G. Soylu
Complex fans are sets of complex numbers whose magnitudes and angles range in closed intervals. The fact that the sum of two fans is a disordered shape gives rise to the need for computational methods to find the minimal enclosing fan. Cases where the sum of two fans contains the origin of the complex plane as a boundary point are of special interest. The result of the addition is then enclosed by circles in current methods, but under certain circumstances this turns out to be an overestimate. The focus of this article is the diagnosis and treatment of such cases.
复扇是复数的集合,其大小和角度在封闭的间隔内变化。由于两个风扇的和是一个无序的形状,因此需要计算方法来找到最小的封闭风扇。两个扇形的和包含复平面的原点作为边界点的情况是特别有趣的。在当前的方法中,加法的结果用圆圈括起来,但在某些情况下,这被证明是高估了。本文的重点是这类病例的诊断和治疗。
{"title":"Improved Arithmetic of Complex Fans","authors":"G. Soylu","doi":"10.1145/3434400","DOIUrl":"https://doi.org/10.1145/3434400","url":null,"abstract":"Complex fans are sets of complex numbers whose magnitudes and angles range in closed intervals. The fact that the sum of two fans is a disordered shape gives rise to the need for computational methods to find the minimal enclosing fan. Cases where the sum of two fans contains the origin of the complex plane as a boundary point are of special interest. The result of the addition is then enclosed by circles in current methods, but under certain circumstances this turns out to be an overestimate. The focus of this article is the diagnosis and treatment of such cases.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"326 1","pages":"1 - 10"},"PeriodicalIF":0.0,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79718137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QPPAL: A Two-phase Proximal Augmented Lagrangian Method for High-dimensional Convex Quadratic Programming Problems 高维凸二次规划问题的两相近端增广拉格朗日方法
Pub Date : 2021-03-24 DOI: 10.1145/3476571
Ling Liang, Xudong Li, Defeng Sun, K. Toh
In this article, we aim to solve high-dimensional convex quadratic programming (QP) problems with a large number of quadratic terms, linear equality, and inequality constraints. To solve the targeted QP problem to a desired accuracy efficiently, we consider the restricted-Wolfe dual problem and develop a two-phase Proximal Augmented Lagrangian method (QPPAL), with Phase I to generate a reasonably good initial point to warm start Phase II to obtain an accurate solution efficiently. More specifically, in Phase I, based on the recently developed symmetric Gauss-Seidel (sGS) decomposition technique, we design a novel sGS-based semi-proximal augmented Lagrangian method for the purpose of finding a solution of low to medium accuracy. Then, in Phase II, a proximal augmented Lagrangian algorithm is proposed to obtain a more accurate solution efficiently. Extensive numerical results evaluating the performance of QPPAL against existing state-of-the-art solvers Gurobi, OSQP, and QPALM are presented to demonstrate the high efficiency and robustness of our proposed algorithm for solving various classes of large-scale convex QP problems. The MATLAB implementation of the software package QPPAL is available at https://blog.nus.edu.sg/mattohkc/softwares/qppal/.
在本文中,我们的目标是解决具有大量二次项、线性等式和不等式约束的高维凸二次规划(QP)问题。为了有效地求解目标QP问题,考虑限制- wolfe对偶问题,提出了一种两相近端增广拉格朗日方法(QPPAL),其中第一阶段生成一个相当好的初始点来热启动第二阶段,从而有效地获得精确解。更具体地说,在第一阶段,基于最近发展的对称高斯-塞德尔(sGS)分解技术,我们设计了一种新的基于sGS的半近端增广拉格朗日方法,目的是寻找中低精度的解。然后,在第二阶段,提出了一种近端增广拉格朗日算法,以有效地获得更精确的解。通过对现有最先进的求解器gurrobi、OSQP和QPALM的性能进行评估,给出了大量的数值结果,以证明我们提出的算法在解决各种类型的大规模凸QP问题时的高效率和鲁棒性。QPPAL软件包的MATLAB实现可在https://blog.nus.edu.sg/mattohkc/softwares/qppal/上获得。
{"title":"QPPAL: A Two-phase Proximal Augmented Lagrangian Method for High-dimensional Convex Quadratic Programming Problems","authors":"Ling Liang, Xudong Li, Defeng Sun, K. Toh","doi":"10.1145/3476571","DOIUrl":"https://doi.org/10.1145/3476571","url":null,"abstract":"In this article, we aim to solve high-dimensional convex quadratic programming (QP) problems with a large number of quadratic terms, linear equality, and inequality constraints. To solve the targeted QP problem to a desired accuracy efficiently, we consider the restricted-Wolfe dual problem and develop a two-phase Proximal Augmented Lagrangian method (QPPAL), with Phase I to generate a reasonably good initial point to warm start Phase II to obtain an accurate solution efficiently. More specifically, in Phase I, based on the recently developed symmetric Gauss-Seidel (sGS) decomposition technique, we design a novel sGS-based semi-proximal augmented Lagrangian method for the purpose of finding a solution of low to medium accuracy. Then, in Phase II, a proximal augmented Lagrangian algorithm is proposed to obtain a more accurate solution efficiently. Extensive numerical results evaluating the performance of QPPAL against existing state-of-the-art solvers Gurobi, OSQP, and QPALM are presented to demonstrate the high efficiency and robustness of our proposed algorithm for solving various classes of large-scale convex QP problems. The MATLAB implementation of the software package QPPAL is available at https://blog.nus.edu.sg/mattohkc/softwares/qppal/.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"41 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2021-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79898695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
ACM Transactions on Mathematical Software (TOMS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1