首页 > 最新文献

ACM Transactions on Mathematical Software最新文献

英文 中文
Task-based Parallel Programming for Scalable Matrix Product Algorithms 基于任务的可扩展矩阵积算法并行编程
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-06-15 DOI: https://dl.acm.org/doi/10.1145/3583560
Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Julien Herrmann, Antoine Jego

Task-based programming models have succeeded in gaining the interest of the high-performance mathematical software community because they relieve part of the burden of developing and implementing distributed-memory parallel algorithms in an efficient and portable way.In increasingly larger, more heterogeneous clusters of computers, these models appear as a way to maintain and enhance more complex algorithms. However, task-based programming models lack the flexibility and the features that are necessary to express in an elegant and compact way scalable algorithms that rely on advanced communication patterns. We show that the Sequential Task Flow paradigm can be extended to write compact yet efficient and scalable routines for linear algebra computations. Although, this work focuses on dense General Matrix Multiplication, the proposed features enable the implementation of more complex algorithms. We describe the implementation of these features and of the resulting GEMM operation. Finally, we present an experimental analysis on two homogeneous supercomputers showing that our approach is competitive up to 32,768 CPU cores with state-of-the-art libraries and may outperform them for some problem dimensions. Although our code can use GPUs straightforwardly, we do not deal with this case because it implies other issues which are out of the scope of this work.

基于任务的编程模型已经成功地获得了高性能数学软件社区的兴趣,因为它们以一种高效和可移植的方式减轻了开发和实现分布式内存并行算法的部分负担。在越来越大、越来越异构的计算机集群中,这些模型似乎是维护和增强更复杂算法的一种方式。然而,基于任务的编程模型缺乏灵活性和功能,而这些灵活性和功能是以一种优雅而紧凑的方式表达依赖于高级通信模式的可扩展算法所必需的。我们展示了顺序任务流范式可以扩展到编写紧凑但高效和可扩展的线性代数计算例程。虽然这项工作的重点是密集的一般矩阵乘法,但所提出的特征可以实现更复杂的算法。我们将描述这些特性的实现以及由此产生的GEMM操作。最后,我们在两台同构超级计算机上进行了实验分析,结果表明,我们的方法在拥有最先进库的32,768个CPU内核的情况下具有竞争力,并且在某些问题维度上可能优于它们。虽然我们的代码可以直接使用gpu,但我们不处理这种情况,因为它暗示了超出本工作范围的其他问题。
{"title":"Task-based Parallel Programming for Scalable Matrix Product Algorithms","authors":"Emmanuel Agullo, Alfredo Buttari, Abdou Guermouche, Julien Herrmann, Antoine Jego","doi":"https://dl.acm.org/doi/10.1145/3583560","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3583560","url":null,"abstract":"<p>Task-based programming models have succeeded in gaining the interest of the high-performance mathematical software community because they relieve part of the burden of developing and implementing distributed-memory parallel algorithms in an efficient and portable way.In increasingly larger, more heterogeneous clusters of computers, these models appear as a way to maintain and enhance more complex algorithms. However, task-based programming models lack the flexibility and the features that are necessary to express in an elegant and compact way scalable algorithms that rely on advanced communication patterns. We show that the Sequential Task Flow paradigm can be extended to write compact yet efficient and scalable routines for linear algebra computations. Although, this work focuses on dense General Matrix Multiplication, the proposed features enable the implementation of more complex algorithms. We describe the implementation of these features and of the resulting GEMM operation. Finally, we present an experimental analysis on two homogeneous supercomputers showing that our approach is competitive up to 32,768 CPU cores with state-of-the-art libraries and may outperform them for some problem dimensions. Although our code can use GPUs straightforwardly, we do not deal with this case because it implies other issues which are out of the scope of this work.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithm 1035: A Gradient-based Implementation of the Polyhedral Active Set Algorithm 算法1035:基于梯度的多面体活动集算法实现
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-06-15 DOI: https://dl.acm.org/doi/10.1145/3583559
William W. Hager, Hongchao Zhang

The Polyhedral Active Set Algorithm (PASA) is designed to optimize a general nonlinear function over a polyhedron. Phase one of the algorithm is a nonmonotone gradient projection algorithm, while phase two is an active set algorithm that explores faces of the constraint polyhedron. A gradient-based implementation is presented, where a projected version of the conjugate gradient algorithm is employed in phase two. Asymptotically, only phase two is performed. Comparisons are given with IPOPT using polyhedral-constrained problems from CUTEst and the Maros/Meszaros quadratic programming test set.

多面体活动集算法(Polyhedral Active Set Algorithm, PASA)设计用于优化多面体上的一般非线性函数。该算法的第一阶段是一种非单调梯度投影算法,第二阶段是一种探索约束多面体面的活动集算法。提出了一种基于梯度的实现,其中共轭梯度算法的投影版本在第二阶段被采用。渐近地,只执行第二阶段。利用CUTEst中的多面体约束问题和Maros/Meszaros二次规划测试集与IPOPT进行了比较。
{"title":"Algorithm 1035: A Gradient-based Implementation of the Polyhedral Active Set Algorithm","authors":"William W. Hager, Hongchao Zhang","doi":"https://dl.acm.org/doi/10.1145/3583559","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3583559","url":null,"abstract":"<p>The Polyhedral Active Set Algorithm (PASA) is designed to optimize a general nonlinear function over a polyhedron. Phase one of the algorithm is a nonmonotone gradient projection algorithm, while phase two is an active set algorithm that explores faces of the constraint polyhedron. A gradient-based implementation is presented, where a projected version of the conjugate gradient algorithm is employed in phase two. Asymptotically, only phase two is performed. Comparisons are given with IPOPT using polyhedral-constrained problems from CUTEst and the Maros/Meszaros quadratic programming test set.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
hIPPYlib-MUQ: A Bayesian Inference Software Framework for Integration of Data with Complex Predictive Models under Uncertainty hipylib - muq:一种用于不确定性下复杂预测模型数据集成的贝叶斯推理软件框架
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-06-15 DOI: https://dl.acm.org/doi/10.1145/3580278
Ki-Tae Kim, Umberto Villa, Matthew Parno, Youssef Marzouk, Omar Ghattas, Noemi Petra

Bayesian inference provides a systematic framework for integration of data with mathematical models to quantify the uncertainty in the solution of the inverse problem. However, the solution of Bayesian inverse problems governed by complex forward models described by partial differential equations (PDEs) remains prohibitive with black-box Markov chain Monte Carlo (MCMC) methods. We present hIPPYlib-MUQ, an extensible and scalable software framework that contains implementations of state-of-the art algorithms aimed to overcome the challenges of high-dimensional, PDE-constrained Bayesian inverse problems. These algorithms accelerate MCMC sampling by exploiting the geometry and intrinsic low-dimensionality of parameter space via derivative information and low rank approximation. The software integrates two complementary open-source software packages, hIPPYlib and MUQ. hIPPYlib solves PDE-constrained inverse problems using automatically-generated adjoint-based derivatives, but it lacks full Bayesian capabilities. MUQ provides a spectrum of powerful Bayesian inversion models and algorithms, but expects forward models to come equipped with gradients and Hessians to permit large-scale solution. By combining these two complementary libraries, we created a robust, scalable, and efficient software framework that realizes the benefits of each and allows us to tackle complex large-scale Bayesian inverse problems across a broad spectrum of scientific and engineering disciplines. To illustrate the capabilities of hIPPYlib-MUQ, we present a comparison of a number of MCMC methods available in the integrated software on several high-dimensional Bayesian inverse problems. These include problems characterized by both linear and nonlinear PDEs, various noise models, and different parameter dimensions. The results demonstrate that large (∼ 50×) speedups over conventional black box and gradient-based MCMC algorithms can be obtained by exploiting Hessian information (from the log-posterior), underscoring the power of the integrated hIPPYlib-MUQ framework.

贝叶斯推理为数据与数学模型的集成提供了一个系统框架,以量化反问题解中的不确定性。然而,由偏微分方程(PDEs)描述的复杂正演模型控制的贝叶斯反问题的解仍然是黑盒马尔可夫链蒙特卡罗(MCMC)方法所禁止的。我们提出了hIPPYlib-MUQ,这是一个可扩展和可扩展的软件框架,包含了旨在克服高维、pde约束的贝叶斯逆问题挑战的最先进算法的实现。这些算法通过导数信息和低秩近似,利用参数空间的几何特性和固有的低维性,加快了MCMC采样速度。该软件集成了两个互补的开源软件包,hIPPYlib和MUQ。hipylib使用自动生成的基于伴随导数的导数来解决pde约束的逆问题,但它缺乏完整的贝叶斯功能。MUQ提供了一系列强大的贝叶斯反演模型和算法,但希望正演模型配备梯度和Hessians,以允许大规模解决。通过结合这两个互补的库,我们创建了一个健壮的、可伸缩的、高效的软件框架,它实现了每个库的优点,并允许我们在广泛的科学和工程学科范围内处理复杂的大规模贝叶斯反问题。为了说明hipylib - muq的功能,我们对几个高维贝叶斯反问题的集成软件中可用的一些MCMC方法进行了比较。这些问题包括线性和非线性偏微分方程、各种噪声模型和不同的参数尺寸。结果表明,通过利用Hessian信息(来自对数后验),可以获得比传统黑盒和基于梯度的MCMC算法大(~ 50倍)的加速,强调了集成hipylib - muq框架的强大功能。
{"title":"hIPPYlib-MUQ: A Bayesian Inference Software Framework for Integration of Data with Complex Predictive Models under Uncertainty","authors":"Ki-Tae Kim, Umberto Villa, Matthew Parno, Youssef Marzouk, Omar Ghattas, Noemi Petra","doi":"https://dl.acm.org/doi/10.1145/3580278","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3580278","url":null,"abstract":"<p>Bayesian inference provides a systematic framework for integration of data with mathematical models to quantify the uncertainty in the solution of the inverse problem. However, the solution of Bayesian inverse problems governed by complex forward models described by <b>partial differential equations (PDEs)</b> remains prohibitive with black-box <b>Markov chain Monte Carlo (MCMC)</b> methods. We present hIPPYlib-MUQ, an extensible and scalable software framework that contains implementations of state-of-the art algorithms aimed to overcome the challenges of high-dimensional, PDE-constrained Bayesian inverse problems. These algorithms accelerate MCMC sampling by exploiting the geometry and intrinsic low-dimensionality of parameter space via derivative information and low rank approximation. The software integrates two complementary open-source software packages, hIPPYlib and MUQ. hIPPYlib solves PDE-constrained inverse problems using automatically-generated adjoint-based derivatives, but it lacks full Bayesian capabilities. MUQ provides a spectrum of powerful Bayesian inversion models and algorithms, but expects forward models to come equipped with gradients and Hessians to permit large-scale solution. By combining these two complementary libraries, we created a robust, scalable, and efficient software framework that realizes the benefits of each and allows us to tackle complex large-scale Bayesian inverse problems across a broad spectrum of scientific and engineering disciplines. To illustrate the capabilities of hIPPYlib-MUQ, we present a comparison of a number of MCMC methods available in the integrated software on several high-dimensional Bayesian inverse problems. These include problems characterized by both linear and nonlinear PDEs, various noise models, and different parameter dimensions. The results demonstrate that large (∼ 50×) speedups over conventional black box and gradient-based MCMC algorithms can be obtained by exploiting Hessian information (from the log-posterior), underscoring the power of the integrated hIPPYlib-MUQ framework.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computing with B-series b系列计算
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-06-15 DOI: https://dl.acm.org/doi/10.1145/3573384
David I. Ketcheson, Hendrik Ranocha

We present BSeries.jl, a Julia package for the computation and manipulation of B-series, which are a versatile theoretical tool for understanding and designing discretizations of differential equations. We give a short introduction to the theory of B-series and associated concepts and provide examples of their use, including method composition and backward error analysis. The associated software is highly performant and makes it possible to work with B-series of high order.

我们提出b系列。jl,一个用于计算和操作b系列的Julia包,它是一个用于理解和设计微分方程离散化的通用理论工具。本文简要介绍了b级数的理论和相关概念,并举例说明了它们的应用,包括方法组成和逆向误差分析。配套软件性能优异,可与b系列高阶产品配套使用。
{"title":"Computing with B-series","authors":"David I. Ketcheson, Hendrik Ranocha","doi":"https://dl.acm.org/doi/10.1145/3573384","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3573384","url":null,"abstract":"<p>We present BSeries.jl, a Julia package for the computation and manipulation of B-series, which are a versatile theoretical tool for understanding and designing discretizations of differential equations. We give a short introduction to the theory of B-series and associated concepts and provide examples of their use, including method composition and backward error analysis. The associated software is highly performant and makes it possible to work with B-series of high order.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling Research through the SCIP Optimization Suite 8.0 通过SCIP优化套件8.0实现研究
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-06-15 DOI: https://dl.acm.org/doi/10.1145/3585516
Ksenia Bestuzheva, Mathieu Besançon, Wei-Kun Chen, Antonia Chmiela, Tim Donkiewicz, Jasper van Doornmalen, Leon Eifler, Oliver Gaul, Gerald Gamrath, Ambros Gleixner, Leona Gottwald, Christoph Graczyk, Katrin Halbig, Alexander Hoen, Christopher Hojny, Rolf van der Hulst, Thorsten Koch, Marco Lübbecke, Stephen J. Maher, Frederic Matter, Erik Mühmer, Benjamin Müller, Marc E. Pfetsch, Daniel Rehfeldt, Steffan Schlein, Franziska Schlösser, Felipe Serrano, Yuji Shinano, Boro Sofranac, Mark Turner, Stefan Vigerske, Fabian Wegscheider, Philipp Wellner, Dieter Weninger, Jakob Witzig

The SCIP Optimization Suite provides a collection of software packages for mathematical optimization centered around the constraint integer programming framework SCIP. The focus of this article is on the role of the SCIP Optimization Suite in supporting research. SCIP’s main design principles are discussed, followed by a presentation of the latest performance improvements and developments in version 8.0, which serve both as examples of SCIP’s application as a research tool and as a platform for further developments. Furthermore, this article gives an overview of interfaces to other programming and modeling languages, new features that expand the possibilities for user interaction with the framework, and the latest developments in several extensions built upon SCIP.

SCIP优化套件提供了以约束整数规划框架SCIP为中心的数学优化软件包集合。本文的重点是SCIP优化套件在支持研究中的作用。讨论了SCIP的主要设计原则,然后介绍了最新的性能改进和8.0版本的发展,这些都是SCIP作为研究工具和进一步开发平台的应用示例。此外,本文还概述了与其他编程和建模语言的接口、扩展用户与框架交互可能性的新特性,以及基于SCIP构建的几个扩展的最新发展。
{"title":"Enabling Research through the SCIP Optimization Suite 8.0","authors":"Ksenia Bestuzheva, Mathieu Besançon, Wei-Kun Chen, Antonia Chmiela, Tim Donkiewicz, Jasper van Doornmalen, Leon Eifler, Oliver Gaul, Gerald Gamrath, Ambros Gleixner, Leona Gottwald, Christoph Graczyk, Katrin Halbig, Alexander Hoen, Christopher Hojny, Rolf van der Hulst, Thorsten Koch, Marco Lübbecke, Stephen J. Maher, Frederic Matter, Erik Mühmer, Benjamin Müller, Marc E. Pfetsch, Daniel Rehfeldt, Steffan Schlein, Franziska Schlösser, Felipe Serrano, Yuji Shinano, Boro Sofranac, Mark Turner, Stefan Vigerske, Fabian Wegscheider, Philipp Wellner, Dieter Weninger, Jakob Witzig","doi":"https://dl.acm.org/doi/10.1145/3585516","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3585516","url":null,"abstract":"<p>The <span>SCIP</span> Optimization Suite provides a collection of software packages for mathematical optimization centered around the constraint integer programming framework <span>SCIP</span>. The focus of this article is on the role of the <span>SCIP</span> Optimization Suite in supporting research. <span>SCIP</span>’s main design principles are discussed, followed by a presentation of the latest performance improvements and developments in version 8.0, which serve both as examples of <span>SCIP</span>’s application as a research tool and as a platform for further developments. Furthermore, this article gives an overview of interfaces to other programming and modeling languages, new features that expand the possibilities for user interaction with the framework, and the latest developments in several extensions built upon <span>SCIP</span>.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Truncated Log-concave Sampling for Convex Bodies with Reflective Hamiltonian Monte Carlo 基于反射哈密顿蒙特卡罗的凸体截断对数凹采样
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-06-15 DOI: https://dl.acm.org/doi/10.1145/3589505
Apostolos Chalkis, Vissarion Fisikopoulos, Marios Papachristou, Elias Tsigaridas

We introduce Reflective Hamiltonian Monte Carlo (ReHMC), an HMC-based algorithm to sample from a log-concave distribution restricted to a convex body. The random walk is based on incorporating reflections to the Hamiltonian dynamics such that the support of the target density is the convex body. We develop an efficient open source implementation of ReHMC and perform an experimental study on various high-dimensional datasets. The experiments suggest that ReHMC outperforms Hit-and-Run and Coordinate-Hit-and-Run regarding the time it needs to produce an independent sample, introducing practical truncated sampling in thousands of dimensions.

本文介绍了一种基于反射哈密顿蒙特卡罗(ReHMC)的算法,该算法从一个限制于凸体的对数凹分布中进行采样。随机漫步是基于结合哈密顿动力学的反射,使得目标密度的支撑是凸体。我们开发了一个高效的开源ReHMC实现,并在各种高维数据集上进行了实验研究。实验表明,ReHMC在产生独立样本所需的时间上优于Hit-and-Run和Coordinate-Hit-and-Run,在数千个维度上引入了实用的截断采样。
{"title":"Truncated Log-concave Sampling for Convex Bodies with Reflective Hamiltonian Monte Carlo","authors":"Apostolos Chalkis, Vissarion Fisikopoulos, Marios Papachristou, Elias Tsigaridas","doi":"https://dl.acm.org/doi/10.1145/3589505","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3589505","url":null,"abstract":"<p>We introduce Reflective Hamiltonian Monte Carlo (ReHMC), an HMC-based algorithm to sample from a log-concave distribution restricted to a convex body. The random walk is based on incorporating reflections to the Hamiltonian dynamics such that the support of the target density is the convex body. We develop an efficient open source implementation of ReHMC and perform an experimental study on various high-dimensional datasets. The experiments suggest that ReHMC outperforms Hit-and-Run and Coordinate-Hit-and-Run regarding the time it needs to produce an independent sample, introducing practical truncated sampling in thousands of dimensions.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithm 1036: ATC, An Advanced Tucker Compression Library for Multidimensional Data 算法1036:ATC,一种多维数据的高级Tucker压缩库
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-06-15 DOI: https://dl.acm.org/doi/10.1145/3585514
Wouter Baert, Nick Vannieuwenhoven

We present ATC, a C++ library for advanced Tucker-based lossy compression of dense multidimensional numerical data in a shared-memory parallel setting, based on the sequentially truncated higher-order singular value decomposition (ST-HOSVD) and bit plane truncation. Several techniques are proposed to improve speed, memory usage, error control and compression rate. First, a hybrid truncation scheme is described which combines Tucker rank truncation and TTHRESH quantization. We derive a novel expression to approximate the error of truncated Tucker decompositions in the case of core and factor perturbations. We parallelize the quantization and encoding scheme and adjust this phase to improve error control. Implementation aspects are described, such as an ST-HOSVD procedure using only a single transposition. We also discuss several usability features of ATC, including the presence of multiple interfaces, extensive data type support, and integrated downsampling of the decompressed data. Numerical results show that ATC maintains state-of-the-art Tucker compression rates while providing average speed-up factors of 2.2 to 3.5 and halving memory usage. Our compressor provides precise error control, deviating only 1.4% from the requested error on average. Finally, ATC often achieves higher compression than non-Tucker-based compressors in the high-error domain.

基于顺序截断的高阶奇异值分解(ST-HOSVD)和位平面截断,我们提出了一个c++库ATC,用于在共享内存并行设置中对密集多维数值数据进行基于tucker的高级有损压缩。提出了几种提高速度、内存使用、错误控制和压缩率的技术。首先,提出了一种结合Tucker秩截断和TTHRESH量化的混合截断方案。我们推导了一个新的表达式来近似在核心和因子扰动情况下截断Tucker分解的误差。我们将量化和编码方案并行化,并调整相位以改善误差控制。描述了实现方面,例如仅使用单个换位的ST-HOSVD过程。我们还讨论了ATC的几个可用性特性,包括多个接口的存在、广泛的数据类型支持以及对解压缩数据的集成下采样。数值结果表明,ATC在提供2.2到3.5的平均加速因子和减半内存使用的同时,保持了最先进的Tucker压缩率。我们的压缩机提供精确的误差控制,平均误差仅为要求误差的1.4%。最后,在高误差域,ATC通常比非基于塔克的压缩器实现更高的压缩。
{"title":"Algorithm 1036: ATC, An Advanced Tucker Compression Library for Multidimensional Data","authors":"Wouter Baert, Nick Vannieuwenhoven","doi":"https://dl.acm.org/doi/10.1145/3585514","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3585514","url":null,"abstract":"<p>We present ATC, a C++ library for advanced Tucker-based lossy compression of dense multidimensional numerical data in a shared-memory parallel setting, based on the sequentially truncated higher-order singular value decomposition (ST-HOSVD) and bit plane truncation. Several techniques are proposed to improve speed, memory usage, error control and compression rate. First, a hybrid truncation scheme is described which combines Tucker rank truncation and TTHRESH quantization. We derive a novel expression to approximate the error of truncated Tucker decompositions in the case of core and factor perturbations. We parallelize the quantization and encoding scheme and adjust this phase to improve error control. Implementation aspects are described, such as an ST-HOSVD procedure using only a single transposition. We also discuss several usability features of ATC, including the presence of multiple interfaces, extensive data type support, and integrated downsampling of the decompressed data. Numerical results show that ATC maintains state-of-the-art Tucker compression rates while providing average speed-up factors of 2.2 to 3.5 and halving memory usage. Our compressor provides precise error control, deviating only 1.4% from the requested error on average. Finally, ATC often achieves higher compression than non-Tucker-based compressors in the high-error domain.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FastSpline: Automatic Generation of Interpolants for Lattice Samplings FastSpline:自动生成插值格采样
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-06-15 DOI: https://dl.acm.org/doi/10.1145/3577194
Joshua Horacsek, Usman Alim

Interpolation is a foundational concept in scientific computing and is at the heart of many scientific visualization techniques. There is usually a tradeoff between the approximation capabilities of an interpolation scheme and its evaluation efficiency. For many applications, it is important for a user to navigate their data in real time. In practice, evaluation efficiency outweighs any incremental improvements in reconstruction fidelity. We first analyze, from a general standpoint, the use of compact piece-wise polynomial basis functions to efficiently interpolate data that is sampled on a lattice. We then detail our automatic code-generation framework on both CPU and GPU architectures. Specifically, we propose a general framework that can produce a fast evaluation scheme by analyzing the algebro-geometric structure of the convolution sum for a given lattice and basis function combination. We demonstrate the utility and generality of our framework by providing fast implementations of various box splines on the Body Centered and Face Centered Cubic lattices, as well as some non-separable box splines on the Cartesian lattice. We also provide fast implementations for certain Voronoi-splines that have not yet appeared in the literature. Finally, we demonstrate that this framework may also be used for non-Cartesian lattices in 4D.

插值是科学计算中的一个基本概念,也是许多科学可视化技术的核心。通常在插值方案的逼近能力和评估效率之间存在权衡。对于许多应用程序,用户实时浏览他们的数据非常重要。实际上,评估效率比重建保真度的任何增量改进都重要。我们首先分析,从一般的角度来看,使用紧凑的分段多项式基函数来有效地插值在晶格上采样的数据。然后我们详细介绍了我们在CPU和GPU架构上的自动代码生成框架。具体来说,我们通过分析给定格和基函数组合的卷积和的代数-几何结构,提出了一个可以产生快速评估方案的一般框架。我们通过在以体为中心和以面为中心的立方格上提供各种盒样条的快速实现,以及在笛卡尔格上提供一些不可分离的盒样条,来展示我们框架的实用性和通用性。我们还提供了一些尚未在文献中出现的voronoi样条的快速实现。最后,我们证明了该框架也可以用于四维的非笛卡尔格。
{"title":"FastSpline: Automatic Generation of Interpolants for Lattice Samplings","authors":"Joshua Horacsek, Usman Alim","doi":"https://dl.acm.org/doi/10.1145/3577194","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3577194","url":null,"abstract":"<p>Interpolation is a foundational concept in scientific computing and is at the heart of many scientific visualization techniques. There is usually a tradeoff between the approximation capabilities of an interpolation scheme and its evaluation efficiency. For many applications, it is important for a user to navigate their data in real time. In practice, evaluation efficiency outweighs any incremental improvements in reconstruction fidelity. We first analyze, from a general standpoint, the use of compact piece-wise polynomial basis functions to efficiently interpolate data that is sampled on a lattice. We then detail our automatic code-generation framework on both CPU and GPU architectures. Specifically, we propose a general framework that can produce a fast evaluation scheme by analyzing the algebro-geometric structure of the convolution sum for a given lattice and basis function combination. We demonstrate the utility and generality of our framework by providing fast implementations of various box splines on the Body Centered and Face Centered Cubic lattices, as well as some non-separable box splines on the Cartesian lattice. We also provide fast implementations for certain Voronoi-splines that have not yet appeared in the literature. Finally, we demonstrate that this framework may also be used for non-Cartesian lattices in 4D.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ARKODE: A Flexible IVP Solver Infrastructure for One-step Methods ARKODE:用于一步法的灵活IVP求解器基础结构
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-06-15 DOI: https://dl.acm.org/doi/10.1145/3594632
Daniel R. Reynolds, David J. Gardner, Carol S. Woodward, Rujeko Chinomona

We describe the ARKODE library of one-step time integration methods for ordinary differential equation (ODE) initial-value problems (IVPs). In addition to providing standard explicit and diagonally implicit Runge–Kutta methods, ARKODE supports one-step methods designed to treat additive splittings of the IVP, including implicit-explicit (ImEx) additive Runge–Kutta methods and multirate infinitesimal (MRI) methods. We present the role of ARKODE within the SUNDIALS suite of time integration and nonlinear solver libraries, the core ARKODE infrastructure for utilities common to large classes of one-step methods, as well as its use of “time stepper” modules enabling easy incorporation of novel algorithms into the library. Numerical results show example problems of increasing complexity, highlighting the algorithmic flexibility afforded through this infrastructure, and include a larger multiphysics application leveraging multiple algorithmic features from ARKODE and SUNDIALS.

本文描述了求解常微分方程(ODE)初值问题的一步时间积分方法的ARKODE库。除了提供标准的显式和对角隐式龙格-库塔方法外,ARKODE还支持一步法,用于处理IVP的加性分裂,包括隐式-显式(ImEx)加性龙格-库塔方法和多率无穷小(MRI)方法。我们介绍了ARKODE在SUNDIALS时间集成和非线性求解器库套件中的作用,这是大型单步方法类常用的核心ARKODE基础设施,以及它使用的“时间步进器”模块,可以轻松地将新算法合并到库中。数值结果显示了日益复杂的示例问题,突出了通过该基础设施提供的算法灵活性,并包括利用ARKODE和SUNDIALS的多种算法特性的更大的多物理场应用程序。
{"title":"ARKODE: A Flexible IVP Solver Infrastructure for One-step Methods","authors":"Daniel R. Reynolds, David J. Gardner, Carol S. Woodward, Rujeko Chinomona","doi":"https://dl.acm.org/doi/10.1145/3594632","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3594632","url":null,"abstract":"<p>We describe the ARKODE library of one-step time integration methods for ordinary differential equation (ODE) initial-value problems (IVPs). In addition to providing standard explicit and diagonally implicit Runge–Kutta methods, ARKODE supports one-step methods designed to treat additive splittings of the IVP, including implicit-explicit (ImEx) additive Runge–Kutta methods and multirate infinitesimal (MRI) methods. We present the role of ARKODE within the SUNDIALS suite of time integration and nonlinear solver libraries, the core ARKODE infrastructure for utilities common to large classes of one-step methods, as well as its use of “time stepper” modules enabling easy incorporation of novel algorithms into the library. Numerical results show example problems of increasing complexity, highlighting the algorithmic flexibility afforded through this infrastructure, and include a larger multiphysics application leveraging multiple algorithmic features from ARKODE and SUNDIALS.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed ℋ2-Matrices for Boundary Element Methods 边界元法的分布h - 2矩阵
IF 2.7 1区 数学 Q1 Mathematics Pub Date : 2023-06-15 DOI: https://dl.acm.org/doi/10.1145/3582494
Steffen Börm

Standard discretization techniques for boundary integral equations, e.g., the Galerkin boundary element method, lead to large densely populated matrices that require fast and efficient compression techniques like the fast multipole method or hierarchical matrices. If the underlying mesh is very large, running the corresponding algorithms on a distributed computer is attractive, e.g., since distributed computers frequently are cost-effective and offer a high accumulated memory bandwidth.

Compared to the closely related particle methods, for which distributed algorithms are well-established, the Galerkin discretization poses a challenge, since the supports of the basis functions influence the block structure of the matrix and therefore the flow of data in the corresponding algorithms. This article introduces distributed ℋ2-matrices, a class of hierarchical matrices that is closely related to fast multipole methods and particularly well-suited for distributed computing. While earlier efforts required the global tree structure of the ℋ2-matrix to be stored in every node of the distributed system, the new approach needs only local multilevel information that can be obtained via a simple distributed algorithm, allowing us to scale to significantly larger systems. Experiments show that this approach can handle very large meshes with more than 130 million triangles efficiently.

边界积分方程的标准离散化技术,如伽辽金边界元方法,导致大量密集的矩阵,需要快速有效的压缩技术,如快速多极方法或分层矩阵。如果底层网格非常大,则在分布式计算机上运行相应的算法是有吸引力的,例如,因为分布式计算机通常具有成本效益,并且提供较高的累积内存带宽。与密切相关的粒子方法相比,Galerkin离散化提出了一个挑战,因为基函数的支持会影响矩阵的块结构,从而影响相应算法中的数据流。本文介绍了分布式h 2矩阵,它是与快速多极方法密切相关的一类层次矩阵,特别适合于分布式计算。早期的研究需要在分布式系统的每个节点中存储全局的h 2矩阵树结构,而新的方法只需要局部的多层信息,这些信息可以通过一个简单的分布式算法获得,从而使我们能够扩展到更大的系统。实验表明,该方法可以有效地处理超过1.3亿个三角形的超大网格。
{"title":"Distributed ℋ2-Matrices for Boundary Element Methods","authors":"Steffen Börm","doi":"https://dl.acm.org/doi/10.1145/3582494","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3582494","url":null,"abstract":"<p>Standard discretization techniques for boundary integral equations, e.g., the Galerkin boundary element method, lead to large densely populated matrices that require fast and efficient compression techniques like the fast multipole method or hierarchical matrices. If the underlying mesh is very large, running the corresponding algorithms on a distributed computer is attractive, e.g., since distributed computers frequently are cost-effective and offer a high accumulated memory bandwidth.</p><p>Compared to the closely related particle methods, for which distributed algorithms are well-established, the Galerkin discretization poses a challenge, since the supports of the basis functions influence the block structure of the matrix and therefore the flow of data in the corresponding algorithms. This article introduces distributed ℋ<sub>2</sub>-matrices, a class of hierarchical matrices that is closely related to fast multipole methods and particularly well-suited for distributed computing. While earlier efforts required the global tree structure of the ℋ<sub>2</sub>-matrix to be stored in every node of the distributed system, the new approach needs only local multilevel information that can be obtained via a simple distributed algorithm, allowing us to scale to significantly larger systems. Experiments show that this approach can handle very large meshes with more than 130 million triangles efficiently.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Mathematical Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1