首页 > 最新文献

ACM Transactions on Mathematical Software (TOMS)最新文献

英文 中文
Algorithm 1009 算法1009
Pub Date : 2020-05-18 DOI: 10.1145/3381537
S. Hawkins
MieSolver provides an efficient solver for the problem of wave propagation through a heterogeneous configuration of nonidentical circular cylinders. MieSolver allows great flexibility in the physical properties of each cylinder, and the cylinders may have opaque or penetrable cores, as well as an arbitrary number of penetrable layers. The wave propagation is governed by the two-dimensional Helmholtz equation and models electromagnetic, acoustic, and elastic waves. The solver is based on the Mie series solution for scattering by a single circular cylinder and hence is numerically stable and highly accurate. We demonstrate the accuracy of our software with extensive numerical experiments over a wide range of frequencies (about five orders of magnitude) and up to 60 cylinders.
MieSolver提供了一种有效的求解器,用于求解波通过非相同圆柱的非均匀结构的传播问题。MieSolver允许每个圆柱体的物理性质具有很大的灵活性,并且圆柱体可以具有不透明或可穿透的核心,以及任意数量的可穿透层。波的传播是由二维亥姆霍兹方程和模型电磁,声学和弹性波控制。该求解器基于单圆柱散射的Mie级数解,因此在数值上稳定且精度高。我们通过广泛的数值实验证明了我们的软件在广泛的频率范围内(约五个数量级)和多达60个圆柱体的准确性。
{"title":"Algorithm 1009","authors":"S. Hawkins","doi":"10.1145/3381537","DOIUrl":"https://doi.org/10.1145/3381537","url":null,"abstract":"MieSolver provides an efficient solver for the problem of wave propagation through a heterogeneous configuration of nonidentical circular cylinders. MieSolver allows great flexibility in the physical properties of each cylinder, and the cylinders may have opaque or penetrable cores, as well as an arbitrary number of penetrable layers. The wave propagation is governed by the two-dimensional Helmholtz equation and models electromagnetic, acoustic, and elastic waves. The solver is based on the Mie series solution for scattering by a single circular cylinder and hence is numerically stable and highly accurate. We demonstrate the accuracy of our software with extensive numerical experiments over a wide range of frequencies (about five orders of magnitude) and up to 60 cylinders.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"9 1","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2020-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75165936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
PETSc DMNetwork PETSc DMNetwork
Pub Date : 2020-04-26 DOI: 10.1145/3344587
S. Abhyankar, G. Betrie, D. Maldonado, L. McInnes, Barry F. Smith, Hong Zhang
We present DMNetwork, a high-level package included in the PETSc library for the simulation of multiphysics phenomena over large-scale networked systems. The library aims at applications that have networked structures such as those in electrical, gas, and water distribution systems. DMNetwork provides data and topology management, parallelization for multiphysics systems over a network, and hierarchical and composable solvers to exploit the problem structure. DMNetwork eases the simulation development cycle by providing the necessary infrastructure through simple abstractions to define and query the network components. This article presents the design of DMNetwork, illustrates its user interface, and demonstrates its ability to solve multiphysics systems, such as an electric circuit, a network of power grid and water subnetworks, and transient hydraulic systems over large networks with more than 2 billion variables on extreme-scale computers using up to 30,000 processors.
我们提出DMNetwork,一个包含在PETSc库中的高级包,用于模拟大规模网络系统上的多物理场现象。该图书馆的目标是具有网络结构的应用程序,如电气,燃气和水分配系统。DMNetwork提供数据和拓扑管理、网络上多物理场系统的并行化,以及利用问题结构的分层和可组合求解器。DMNetwork通过简单的抽象提供必要的基础设施来定义和查询网络组件,从而简化了仿真开发周期。本文介绍了DMNetwork的设计,说明了其用户界面,并展示了其解决多物理场系统的能力,例如电路,电网和水子网网络,以及在使用多达30,000个处理器的超大规模计算机上具有超过20亿个变量的大型网络上的瞬态液压系统。
{"title":"PETSc DMNetwork","authors":"S. Abhyankar, G. Betrie, D. Maldonado, L. McInnes, Barry F. Smith, Hong Zhang","doi":"10.1145/3344587","DOIUrl":"https://doi.org/10.1145/3344587","url":null,"abstract":"We present DMNetwork, a high-level package included in the PETSc library for the simulation of multiphysics phenomena over large-scale networked systems. The library aims at applications that have networked structures such as those in electrical, gas, and water distribution systems. DMNetwork provides data and topology management, parallelization for multiphysics systems over a network, and hierarchical and composable solvers to exploit the problem structure. DMNetwork eases the simulation development cycle by providing the necessary infrastructure through simple abstractions to define and query the network components. This article presents the design of DMNetwork, illustrates its user interface, and demonstrates its ability to solve multiphysics systems, such as an electric circuit, a network of power grid and water subnetworks, and transient hydraulic systems over large networks with more than 2 billion variables on extreme-scale computers using up to 30,000 processors.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"36 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2020-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81224257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Exactly Computing the Tail of the Poisson-Binomial Distribution 精确计算泊松二项分布的尾部
Pub Date : 2020-04-16 DOI: 10.1145/3460774
N. Peres, Andrew Lee, U. Keich
We present ShiftConvolvePoibin, a fast exact method to compute the tail of a Poisson-binomial distribution (PBD). Our method employs an exponential shift to retain its accuracy when computing a tail probability, and in practice we find that it is immune to the significant relative errors that other methods, exact or approximate, can suffer from when computing very small tail probabilities of the PBD. The accompanying R package is also competitive with the fastest implementations for computing the entire PBD.
我们提出了ShiftConvolvePoibin,一种快速精确计算泊松二项分布(PBD)尾部的方法。我们的方法在计算尾部概率时采用指数移位来保持其准确性,并且在实践中我们发现,当计算非常小的PBD尾部概率时,其他方法(精确或近似)可能会出现显著的相对误差,而我们的方法可以避免这种相对误差。附带的R包也与计算整个PBD的最快实现相竞争。
{"title":"Exactly Computing the Tail of the Poisson-Binomial Distribution","authors":"N. Peres, Andrew Lee, U. Keich","doi":"10.1145/3460774","DOIUrl":"https://doi.org/10.1145/3460774","url":null,"abstract":"We present ShiftConvolvePoibin, a fast exact method to compute the tail of a Poisson-binomial distribution (PBD). Our method employs an exponential shift to retain its accuracy when computing a tail probability, and in practice we find that it is immune to the significant relative errors that other methods, exact or approximate, can suffer from when computing very small tail probabilities of the PBD. The accompanying R package is also competitive with the fastest implementations for computing the entire PBD.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"179 1","pages":"1 - 19"},"PeriodicalIF":0.0,"publicationDate":"2020-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80105643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Algorithm 1003 算法1003
Pub Date : 2020-03-20 DOI: 10.1145/3337792
T. Davis, W. Hager, Scott P. Kolodziej, S. Yeralan
Partitioning graphs is a common and useful operation in many areas, from parallel computing to VLSI design to sparse matrix algorithms. In this article, we introduce Mongoose, a multilevel hybrid graph partitioning algorithm and library. Building on previous work in multilevel partitioning frameworks and combinatoric approaches, we introduce novel stall-reducing and stall-free coarsening strategies, as well as an efficient hybrid algorithm leveraging (1) traditional combinatoric methods and (2) continuous quadratic programming formulations. We demonstrate how this new hybrid algorithm outperforms either strategy in isolation, and we also compare Mongoose to METIS and demonstrate its effectiveness on large and social networking (power law) graphs.
从并行计算到超大规模集成电路设计,再到稀疏矩阵算法,图分割在许多领域都是一种常见而有用的操作。本文介绍了多层混合图划分算法Mongoose及其库。在多层划分框架和组合方法的基础上,我们引入了新的减少失速和无失速粗化策略,以及利用(1)传统组合方法和(2)连续二次规划公式的高效混合算法。我们展示了这种新的混合算法如何在孤立的情况下优于任何一种策略,我们还比较了Mongoose和METIS,并展示了它在大型和社交网络(幂律)图上的有效性。
{"title":"Algorithm 1003","authors":"T. Davis, W. Hager, Scott P. Kolodziej, S. Yeralan","doi":"10.1145/3337792","DOIUrl":"https://doi.org/10.1145/3337792","url":null,"abstract":"Partitioning graphs is a common and useful operation in many areas, from parallel computing to VLSI design to sparse matrix algorithms. In this article, we introduce Mongoose, a multilevel hybrid graph partitioning algorithm and library. Building on previous work in multilevel partitioning frameworks and combinatoric approaches, we introduce novel stall-reducing and stall-free coarsening strategies, as well as an efficient hybrid algorithm leveraging (1) traditional combinatoric methods and (2) continuous quadratic programming formulations. We demonstrate how this new hybrid algorithm outperforms either strategy in isolation, and we also compare Mongoose to METIS and demonstrate its effectiveness on large and social networking (power law) graphs.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"45 1","pages":"1 - 18"},"PeriodicalIF":0.0,"publicationDate":"2020-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77674132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
High-order Numerical Quadratures in a Tetrahedron with an Implicitly Defined Curved Interface 具有隐定义曲面界面的四面体的高阶数值正交
Pub Date : 2020-03-20 DOI: 10.1145/3372144
Tao Cui, W. Leng, Huaqing Liu, Linbo Zhang, Weiying Zheng
Given a shape regular tetrahedron and a curved surface that is defined implicitly by a nonlinear level set function and divides the tetrahedron into two sub-domains, a general-purpose, robust, and high-order numerical algorithm is proposed in this article for computing both volume integrals in the sub-domains and surface integrals on their common boundary. The algorithm uses a direct approach that decomposes 3D volume integrals or 2D surface integrals into multiple 1D integrals and computes the 1D integrals with Gaussian quadratures. It only requires finding roots of univariate nonlinear functions in given intervals and evaluating the integrand, the level set function, and the gradient of the level set function at given points. It can achieve arbitrarily high accuracy by increasing the orders of Gaussian quadratures, and it does not need extra a priori knowledge about the integrand and the level set function. The code for the algorithm is freely available in the open-source finite element toolbox Parallel Hierarchical Grid (PHG) and can serve as a basic building block for implementing 3D high-order numerical algorithms involving implicit interfaces or boundaries.
给出一个形状正的四面体和一个由非线性水平集函数隐式定义并将四面体划分为两个子域的曲面,本文提出了一种通用的、鲁棒的高阶数值算法,用于计算子域中的体积积分和它们共同边界上的曲面积分。该算法采用直接的方法,将三维体积积分或二维表面积分分解为多个一维积分,并用高斯正交法计算一维积分。它只需要在给定区间内求单变量非线性函数的根,并在给定点处求被积函数、水平集函数和水平集函数的梯度。它可以通过增加高斯正交的阶数来达到任意高的精度,并且不需要额外的关于被积函数和水平集函数的先验知识。该算法的代码可以在开源有限元工具箱并行分层网格(PHG)中免费获得,并且可以作为实现涉及隐式接口或边界的3D高阶数值算法的基本构建块。
{"title":"High-order Numerical Quadratures in a Tetrahedron with an Implicitly Defined Curved Interface","authors":"Tao Cui, W. Leng, Huaqing Liu, Linbo Zhang, Weiying Zheng","doi":"10.1145/3372144","DOIUrl":"https://doi.org/10.1145/3372144","url":null,"abstract":"Given a shape regular tetrahedron and a curved surface that is defined implicitly by a nonlinear level set function and divides the tetrahedron into two sub-domains, a general-purpose, robust, and high-order numerical algorithm is proposed in this article for computing both volume integrals in the sub-domains and surface integrals on their common boundary. The algorithm uses a direct approach that decomposes 3D volume integrals or 2D surface integrals into multiple 1D integrals and computes the 1D integrals with Gaussian quadratures. It only requires finding roots of univariate nonlinear functions in given intervals and evaluating the integrand, the level set function, and the gradient of the level set function at given points. It can achieve arbitrarily high accuracy by increasing the orders of Gaussian quadratures, and it does not need extra a priori knowledge about the integrand and the level set function. The code for the algorithm is freely available in the open-source finite element toolbox Parallel Hierarchical Grid (PHG) and can serve as a basic building block for implementing 3D high-order numerical algorithms involving implicit interfaces or boundaries.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"32 1","pages":"1 - 18"},"PeriodicalIF":0.0,"publicationDate":"2020-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86541787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Algorithm 1005 算法1005
Pub Date : 2020-03-20 DOI: 10.1145/3382191
K. Jónasson, S. Sigurdsson, H. F. Yngvason, Pétur Orri Ragnarsson, Páll Melsted
A set of Fortran subroutines for reverse mode algorithmic (or automatic) differentiation of the basic linear algebra subprograms (BLAS) is presented. This is preceded by a description of the mathematical tools used to obtain the formulae of these derivatives, with emphasis on special matrices supported by the BLAS: triangular, symmetric, and band. All single and double precision BLAS derivatives have been implemented, together with the Cholesky factorization from Linear Algebra Package (LAPACK). The subroutines are written in Fortran 2003 with a Fortran 77 interface to allow use from C and C++, as well as dynamic languages such as R, Python, Matlab, and Octave. The subroutines are all implemented by calling BLAS, thereby attaining fast runtime. Timing results show derivative runtimes that are about twice those of the corresponding BLAS, in line with theory. The emphasis is on reverse mode because it is more important for the main application that we have in mind, numerical optimization. Two examples are presented, one dealing with the least squares modeling of groundwater, and the other dealing with the maximum likelihood estimation of the parameters of a vector autoregressive time series. The article contains comprehensive tables of formulae for the BLAS derivatives as well as for several non-BLAS matrix operations commonly used in optimization.
给出了一组用于基本线性代数子程序(BLAS)逆模算法(或自动)微分的Fortran子程序。在此之前,描述了用于获得这些导数公式的数学工具,重点是BLAS支持的特殊矩阵:三角形,对称和带。所有单精度和双精度BLAS导数都已实现,以及线性代数包(LAPACK)的Cholesky分解。子程序是用Fortran 2003编写的,带有Fortran 77接口,允许在C和c++以及R、Python、Matlab和Octave等动态语言中使用。子例程全部通过调用BLAS实现,从而实现快速运行。计时结果表明,导数运行时间大约是相应BLAS的两倍,与理论相符。重点是反向模式,因为它对于我们想到的主要应用,数值优化,更为重要。给出了两个例子,一个处理地下水的最小二乘建模,另一个处理向量自回归时间序列参数的最大似然估计。本文包含了BLAS导数的综合公式表以及优化中常用的几种非BLAS矩阵运算。
{"title":"Algorithm 1005","authors":"K. Jónasson, S. Sigurdsson, H. F. Yngvason, Pétur Orri Ragnarsson, Páll Melsted","doi":"10.1145/3382191","DOIUrl":"https://doi.org/10.1145/3382191","url":null,"abstract":"A set of Fortran subroutines for reverse mode algorithmic (or automatic) differentiation of the basic linear algebra subprograms (BLAS) is presented. This is preceded by a description of the mathematical tools used to obtain the formulae of these derivatives, with emphasis on special matrices supported by the BLAS: triangular, symmetric, and band. All single and double precision BLAS derivatives have been implemented, together with the Cholesky factorization from Linear Algebra Package (LAPACK). The subroutines are written in Fortran 2003 with a Fortran 77 interface to allow use from C and C++, as well as dynamic languages such as R, Python, Matlab, and Octave. The subroutines are all implemented by calling BLAS, thereby attaining fast runtime. Timing results show derivative runtimes that are about twice those of the corresponding BLAS, in line with theory. The emphasis is on reverse mode because it is more important for the main application that we have in mind, numerical optimization. Two examples are presented, one dealing with the least squares modeling of groundwater, and the other dealing with the maximum likelihood estimation of the parameters of a vector autoregressive time series. The article contains comprehensive tables of formulae for the BLAS derivatives as well as for several non-BLAS matrix operations commonly used in optimization.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"33 1","pages":"1 - 20"},"PeriodicalIF":0.0,"publicationDate":"2020-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78416019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Software Platform for Adaptive High Order Multistep Methods 自适应高阶多步方法的软件平台
Pub Date : 2020-03-20 DOI: 10.1145/3372159
Carmen Arévalo, Erik Jonsson-Glans, Josefine Olander, M. S. Soto, Gustaf Söderlind
We present a software package, Modes, offering h-adaptive and p-adaptive linear multistep methods for first order initial value problems in ordinary differential equations. The implementation is based on a new parametric, grid-independent representation of multistep methods [Arévalo and Söderlind 2017]. Parameters are supplied for over 60 methods. For nonstiff problems, all maximal order methods (p=k for explicit and p=k+1 for implicit methods) are supported. For stiff computation, implicit methods of order p=k are included. A collection of step-size controllers based on digital filters is provided, generating smooth step-size sequences offering improved computational stability. Controllers may be selected to match method and problem classes. A new system for automatic order control is also provided for designated families of multistep methods, offering simultaneous h- and p-adaptivity. Implemented as a Matlab toolbox, the software covers high order computations with linear multistep methods within a unified, generic framework. Computational experiments show that the new software is competitive and offers qualitative improvements. Modes is available for downloading and is primarily intended as a platform for developing a new generation of state-of-the-art multistep solvers, as well as for true ceteris paribus evaluation of algorithmic components. This also enables method comparisons within a single implementation environment.
我们提出了一个软件包,Modes,提供了常微分方程一阶初值问题的h-自适应和p-自适应线性多步方法。该实现基于多步骤方法的一种新的参数化、网格无关的表示[arsamuvalo and Söderlind 2017]。为60多个方法提供了参数。对于非刚性问题,支持所有最大阶方法(显式方法为p=k,隐式方法为p=k+1)。对于刚性计算,包括p=k阶的隐式方法。提供了一组基于数字滤波器的步长控制器,生成平滑的步长序列,提供了更好的计算稳定性。可以选择控制器来匹配方法和问题类别。自动订单控制的新系统也提供了指定的多步骤方法家族,同时提供h-和p-适应性。作为Matlab工具箱实现,该软件在统一的通用框架内涵盖了线性多步骤方法的高阶计算。计算实验表明,新软件具有竞争力,并提供了定性改进。Modes可供下载,主要用作开发新一代最先进的多步骤求解器的平台,以及对算法组件进行真正的同等条件评估。这还支持在单个实现环境中进行方法比较。
{"title":"A Software Platform for Adaptive High Order Multistep Methods","authors":"Carmen Arévalo, Erik Jonsson-Glans, Josefine Olander, M. S. Soto, Gustaf Söderlind","doi":"10.1145/3372159","DOIUrl":"https://doi.org/10.1145/3372159","url":null,"abstract":"We present a software package, Modes, offering h-adaptive and p-adaptive linear multistep methods for first order initial value problems in ordinary differential equations. The implementation is based on a new parametric, grid-independent representation of multistep methods [Arévalo and Söderlind 2017]. Parameters are supplied for over 60 methods. For nonstiff problems, all maximal order methods (p=k for explicit and p=k+1 for implicit methods) are supported. For stiff computation, implicit methods of order p=k are included. A collection of step-size controllers based on digital filters is provided, generating smooth step-size sequences offering improved computational stability. Controllers may be selected to match method and problem classes. A new system for automatic order control is also provided for designated families of multistep methods, offering simultaneous h- and p-adaptivity. Implemented as a Matlab toolbox, the software covers high order computations with linear multistep methods within a unified, generic framework. Computational experiments show that the new software is competitive and offers qualitative improvements. Modes is available for downloading and is primarily intended as a platform for developing a new generation of state-of-the-art multistep solvers, as well as for true ceteris paribus evaluation of algorithmic components. This also enables method comparisons within a single implementation environment.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"81 1","pages":"1 - 17"},"PeriodicalIF":0.0,"publicationDate":"2020-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76709535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Algorithm 1004 算法1004
Pub Date : 2020-03-20 DOI: 10.1145/3371237
Jeremy Reizenstein, Benjamin Graham
Iterated-integral signatures and log signatures are sequences calculated from a path that characterizes its shape. They originate from the work of K. T. Chen and have become important through Terry Lyons’s theory of differential equations driven by rough paths, which is an important developing area of stochastic analysis. They have applications in statistics and machine learning, where there can be a need to calculate finite parts of them quickly for many paths. We introduce the signature and the most basic information (displacement and signed areas) that it contains. We present algorithms for efficiently calculating these signatures. For log signatures this requires consideration of the structure of free Lie algebras. We benchmark the performance of the algorithms. The methods are implemented in C++ and released as a Python extension package, which also supports differentiation. In combination with a machine learning library (Tensorflow, PyTorch, or Theano), this allows end-to-end learning of neural networks involving signatures.
迭代积分签名和日志签名是从表征其形状的路径计算的序列。它们起源于K. T. Chen的工作,并通过Terry Lyons的粗糙路径驱动微分方程理论而变得重要,这是随机分析的一个重要发展领域。它们在统计学和机器学习中有应用,在这些领域,可能需要对许多路径快速计算它们的有限部分。我们介绍了签名及其包含的最基本信息(位移和签名区域)。我们提出了有效计算这些签名的算法。对于日志签名,这需要考虑自由李代数的结构。我们对算法的性能进行基准测试。这些方法在c++中实现,并作为Python扩展包发布,该扩展包也支持差异化。结合机器学习库(Tensorflow, PyTorch或Theano),这允许涉及签名的神经网络的端到端学习。
{"title":"Algorithm 1004","authors":"Jeremy Reizenstein, Benjamin Graham","doi":"10.1145/3371237","DOIUrl":"https://doi.org/10.1145/3371237","url":null,"abstract":"Iterated-integral signatures and log signatures are sequences calculated from a path that characterizes its shape. They originate from the work of K. T. Chen and have become important through Terry Lyons’s theory of differential equations driven by rough paths, which is an important developing area of stochastic analysis. They have applications in statistics and machine learning, where there can be a need to calculate finite parts of them quickly for many paths. We introduce the signature and the most basic information (displacement and signed areas) that it contains. We present algorithms for efficiently calculating these signatures. For log signatures this requires consideration of the structure of free Lie algebras. We benchmark the performance of the algorithms. The methods are implemented in C++ and released as a Python extension package, which also supports differentiation. In combination with a machine learning library (Tensorflow, PyTorch, or Theano), this allows end-to-end learning of neural networks involving signatures.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"26 1","pages":"1 - 21"},"PeriodicalIF":0.0,"publicationDate":"2020-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74320556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Strassen’s Algorithm Reloaded on GPUs Strassen算法在gpu上重新加载
Pub Date : 2020-03-20 DOI: 10.1145/3372419
Jianyu Huang, Chenhan D. Yu, R. Geijn
Conventional Graphics Processing Unit (GPU) implementations of Strassen’s algorithm (Strassen) rely on the existing high-performance matrix multiplication (gemm), trading space for time. As a result, such approaches can only achieve practical speedup for relatively large, “squarish” matrices due to the extra memory overhead, and their usages are limited due to the considerable workspace. We present novel Strassen primitives for GPUs that can be composed to generate a family of Strassen algorithms. Our algorithms utilize both the memory and thread hierarchies on GPUs, reusing shared memory and register files inherited from gemm, fusing additional operations, and avoiding extra workspace. We further exploit intra- and inter-kernel parallelism by batching, streaming, and employing atomic operations. We develop a performance model for NVIDIA Volta GPUs to select the appropriate blocking parameters and predict the performance for gemm and Strassen. Overall, our 1-level Strassen can achieve up to 1.11× speedup with a crossover point as small as 1,536 compared to cublasSgemm on a NVIDIA Tesla V100 GPU. With additional workspace, our 2-level Strassen can achieve 1.19× speedup with a crossover point at 7,680.
传统图形处理单元(GPU)实现的Strassen算法(Strassen)依赖于现有的高性能矩阵乘法(gem),以空间换取时间。因此,由于额外的内存开销,这种方法只能实现相对较大的“平方”矩阵的实际加速,并且由于相当大的工作空间,它们的使用受到限制。我们提出了新的Strassen原语的gpu,可以组成生成一个家族的Strassen算法。我们的算法利用gpu上的内存和线程层次结构,重用从gem继承的共享内存和注册文件,融合额外的操作,并避免额外的工作空间。我们通过批处理、流处理和原子操作进一步利用内核内部和内核间的并行性。我们建立了NVIDIA Volta gpu的性能模型,以选择合适的阻塞参数并预测gem和Strassen的性能。总的来说,与NVIDIA Tesla V100 GPU上的cublassgem相比,我们的1级Strassen可以实现高达1.11倍的加速,交叉点小至1536。有了额外的工作空间,我们的2级Strassen可以实现1.19倍的加速,交叉点为7680。
{"title":"Strassen’s Algorithm Reloaded on GPUs","authors":"Jianyu Huang, Chenhan D. Yu, R. Geijn","doi":"10.1145/3372419","DOIUrl":"https://doi.org/10.1145/3372419","url":null,"abstract":"Conventional Graphics Processing Unit (GPU) implementations of Strassen’s algorithm (Strassen) rely on the existing high-performance matrix multiplication (gemm), trading space for time. As a result, such approaches can only achieve practical speedup for relatively large, “squarish” matrices due to the extra memory overhead, and their usages are limited due to the considerable workspace. We present novel Strassen primitives for GPUs that can be composed to generate a family of Strassen algorithms. Our algorithms utilize both the memory and thread hierarchies on GPUs, reusing shared memory and register files inherited from gemm, fusing additional operations, and avoiding extra workspace. We further exploit intra- and inter-kernel parallelism by batching, streaming, and employing atomic operations. We develop a performance model for NVIDIA Volta GPUs to select the appropriate blocking parameters and predict the performance for gemm and Strassen. Overall, our 1-level Strassen can achieve up to 1.11× speedup with a crossover point as small as 1,536 compared to cublasSgemm on a NVIDIA Tesla V100 GPU. With additional workspace, our 2-level Strassen can achieve 1.19× speedup with a crossover point at 7,680.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"54 1","pages":"1 - 22"},"PeriodicalIF":0.0,"publicationDate":"2020-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74143251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Algorithm 1006 算法1006
Pub Date : 2020-03-11 DOI: 10.1145/3365983
Rémy Abergel, L. Moisan
We present a computational procedure to evaluate the integral ∫yx sp-1 e-μs ds for 0 ≤ x < y ≤ +∞,μ = ±1, p> 0, which generalizes the lower (x=0) and upper (y=+∞) incomplete gamma functions. To allow for large values of x, y, and p while avoiding under/overflow issues in the standard double precision floating point arithmetic, we use an explicit normalization that is much more efficient than the classical ratio with the complete gamma function. The generalized incomplete gamma function is estimated with continued fractions, with integrations by parts, or, when x ≈ y, with the Romberg numerical integration algorithm. We show that the accuracy reached by our algorithm improves a recent state-of-the-art method by two orders of magnitude, and it is essentially optimal considering the limitations imposed by floating point arithmetic. Moreover, the admissible parameter range of our algorithm (0 ≤ p,x,y ≤ 1015) is much larger than competing algorithms, and its robustness is assessed through massive usage in an image processing application.
给出了积分∫yx sp-1 e-μs ds在0≤x < y≤+∞,μ =±1,p> 0时的计算方法,推广了下(x=0)和上(y=+∞)不完全函数。为了允许x, y和p的大值,同时避免标准双精度浮点运算中的不足/溢出问题,我们使用显式规范化,它比具有完整gamma函数的经典比率更有效。广义不完全函数用连分式估计,用分部积分估计,或者当x≈y时,用Romberg数值积分算法估计。我们表明,我们的算法所达到的精度将最近最先进的方法提高了两个数量级,并且考虑到浮点算法所施加的限制,它本质上是最优的。此外,我们的算法的允许参数范围(0≤p,x,y≤1015)比竞争算法大得多,并且通过在图像处理应用中的大量使用来评估其鲁棒性。
{"title":"Algorithm 1006","authors":"Rémy Abergel, L. Moisan","doi":"10.1145/3365983","DOIUrl":"https://doi.org/10.1145/3365983","url":null,"abstract":"We present a computational procedure to evaluate the integral ∫yx sp-1 e-μs ds for 0 ≤ x < y ≤ +∞,μ = ±1, p> 0, which generalizes the lower (x=0) and upper (y=+∞) incomplete gamma functions. To allow for large values of x, y, and p while avoiding under/overflow issues in the standard double precision floating point arithmetic, we use an explicit normalization that is much more efficient than the classical ratio with the complete gamma function. The generalized incomplete gamma function is estimated with continued fractions, with integrations by parts, or, when x ≈ y, with the Romberg numerical integration algorithm. We show that the accuracy reached by our algorithm improves a recent state-of-the-art method by two orders of magnitude, and it is essentially optimal considering the limitations imposed by floating point arithmetic. Moreover, the admissible parameter range of our algorithm (0 ≤ p,x,y ≤ 1015) is much larger than competing algorithms, and its robustness is assessed through massive usage in an image processing application.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"18 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2020-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90145877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
ACM Transactions on Mathematical Software (TOMS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1