首页 > 最新文献

ACM Transactions on Mathematical Software (TOMS)最新文献

英文 中文
Faithfully Rounded Floating-point Computations 忠实地舍入浮点计算
Pub Date : 2020-07-13 DOI: 10.1145/3290955
M. Lange, S. Rump
We present a pair arithmetic for the four basic operations and square root. It can be regarded as a simplified, more-efficient double-double arithmetic. The central assumption on the underlying arithmetic is the first standard model for error analysis for operations on a discrete set of real numbers. Neither do we require a floating-point grid nor a rounding to nearest property. Based on that, we define a relative rounding error unit u and prove rigorous error bounds for the computed result of an arbitrary arithmetic expression depending on u, the size of the expression, and possibly a condition measure. In the second part of this note, we extend the error analysis by examining requirements to ensure faithfully rounded outputs and apply our results to IEEE 754 standard conform floating-point systems. For a class of mathematical expressions, using an IEEE 754 standard conform arithmetic with base β, the result is proved to be faithfully rounded for up to 1 / √βu - 2 operations. Our findings cover a number of previously published algorithms to compute faithfully rounded results, among them Horner’s scheme, products, sums, dot products, or Euclidean norm. Beyond that, several other problems can be analyzed, such as polynomial interpolation, orientation problems, Householder transformations, or the smallest singular value of Hilbert matrices of large size.
我们给出了四种基本运算和平方根的一对算法。它可以看作是一种简化的、更有效的双双算法。基础算法的中心假设是对一组离散实数的操作进行误差分析的第一个标准模型。我们既不需要浮点网格,也不需要舍入到最近的属性。在此基础上,我们定义了一个相对舍入误差单位u,并证明了任意算术表达式计算结果的严格误差界限,这取决于u、表达式的大小以及可能的条件度量。在本文的第二部分中,我们通过检查需求来扩展误差分析,以确保忠实地舍入输出,并将我们的结果应用于符合IEEE 754标准的浮点系统。对于一类数学表达式,采用IEEE 754标准的以β为底的符合算法,证明了结果可以忠实地四舍五入,最多可进行1 /√βu - 2运算。我们的研究结果涵盖了许多以前发表的算法来计算忠实的四舍五入结果,其中包括霍纳方案、乘积、和、点积或欧几里得范数。除此之外,还可以分析其他几个问题,例如多项式插值、方向问题、Householder变换或大尺寸Hilbert矩阵的最小奇异值。
{"title":"Faithfully Rounded Floating-point Computations","authors":"M. Lange, S. Rump","doi":"10.1145/3290955","DOIUrl":"https://doi.org/10.1145/3290955","url":null,"abstract":"We present a pair arithmetic for the four basic operations and square root. It can be regarded as a simplified, more-efficient double-double arithmetic. The central assumption on the underlying arithmetic is the first standard model for error analysis for operations on a discrete set of real numbers. Neither do we require a floating-point grid nor a rounding to nearest property. Based on that, we define a relative rounding error unit u and prove rigorous error bounds for the computed result of an arbitrary arithmetic expression depending on u, the size of the expression, and possibly a condition measure. In the second part of this note, we extend the error analysis by examining requirements to ensure faithfully rounded outputs and apply our results to IEEE 754 standard conform floating-point systems. For a class of mathematical expressions, using an IEEE 754 standard conform arithmetic with base β, the result is proved to be faithfully rounded for up to 1 / √βu - 2 operations. Our findings cover a number of previously published algorithms to compute faithfully rounded results, among them Horner’s scheme, products, sums, dot products, or Euclidean norm. Beyond that, several other problems can be analyzed, such as polynomial interpolation, orientation problems, Householder transformations, or the smallest singular value of Hilbert matrices of large size.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"37 1","pages":"1 - 20"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85543795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
fenicsR13
Pub Date : 2020-07-12 DOI: 10.1145/3442378
Lambert Theisen, M. Torrilhon
We present a mixed finite element solver for the linearized regularized 13-moment equations of non-equilibrium gas dynamics. The Python implementation builds upon the software tools provided by the FEniCS computing platform. We describe a new tensorial approach utilizing the extension capabilities of FEniCS’ Unified Form Language to define required differential operators for tensors above second degree. The presented solver serves as an example for implementing tensorial variational formulations in FEniCS, for which the documentation and literature seem to be very sparse. Using the software abstraction levels provided by the Unified Form Language allows an almost one-to-one correspondence between the underlying mathematics and the resulting source code. Test cases support the correctness of the proposed method using validation with exact solutions. To justify the usage of extended gas flow models, we discuss typical application cases involving rarefaction effects. We provide the documented and validated solver publicly.
本文提出了非平衡气体动力学线性化正则化13矩方程的混合有限元求解方法。Python实现建立在FEniCS计算平台提供的软件工具之上。我们描述了一种新的张量方法,利用FEniCS的统一形式语言的扩展能力来定义二阶以上张量的所需微分算子。所提出的求解器是在fenic中实现张量变分公式的一个例子,关于这方面的文档和文献似乎非常稀少。使用统一形式语言提供的软件抽象层次允许底层数学和结果源代码之间几乎一对一的对应关系。测试用例使用精确解验证来支持所建议方法的正确性。为了证明扩展气体流动模型的使用是合理的,我们讨论了涉及稀薄效应的典型应用案例。我们公开提供文档化和验证的求解器。
{"title":"fenicsR13","authors":"Lambert Theisen, M. Torrilhon","doi":"10.1145/3442378","DOIUrl":"https://doi.org/10.1145/3442378","url":null,"abstract":"We present a mixed finite element solver for the linearized regularized 13-moment equations of non-equilibrium gas dynamics. The Python implementation builds upon the software tools provided by the FEniCS computing platform. We describe a new tensorial approach utilizing the extension capabilities of FEniCS’ Unified Form Language to define required differential operators for tensors above second degree. The presented solver serves as an example for implementing tensorial variational formulations in FEniCS, for which the documentation and literature seem to be very sparse. Using the software abstraction levels provided by the Unified Form Language allows an almost one-to-one correspondence between the underlying mathematics and the resulting source code. Test cases support the correctness of the proposed method using validation with exact solutions. To justify the usage of extended gas flow models, we discuss typical application cases involving rarefaction effects. We provide the documented and validated solver publicly.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"18 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2020-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83593945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Algorithm 1019: A Task-based Multi-shift QR/QZ Algorithm with Aggressive Early Deflation 算法1019:一种基于任务的多移位QR/QZ主动早期通缩算法
Pub Date : 2020-07-07 DOI: 10.1145/3495005
Mirko Myllykoski
The QR algorithm is one of the three phases in the process of computing the eigenvalues and the eigenvectors of a dense nonsymmetric matrix. This paper describes a task-based QR algorithm for reducing an upper Hessenberg matrix to real Schur form. The task-based algorithm also supports generalized eigenvalue problems (QZ algorithm) but this paper concentrates on the standard case. The task-based algorithm adopts previous algorithmic improvements, such as tightly-coupled multi-shifts and Aggressive Early Deflation (AED), and also incorporates several new ideas that significantly improve the performance. This includes, but is not limited to, the elimination of several synchronization points, the dynamic merging of previously separate computational steps, the shortening and the prioritization of the critical path, and experimental GPU support. The task-based implementation is demonstrated to be multiple times faster than multi-threaded LAPACK and ScaLAPACK in both single-node and multi-node configurations on two different machines based on Intel and AMD CPUs. The implementation is built on top of the StarPU runtime system and is part of the open-source StarNEig library.
QR算法是计算密集非对称矩阵的特征值和特征向量的三个阶段之一。本文描述了一种基于任务的QR算法,用于将上海森伯格矩阵约简为实舒尔形式。基于任务的算法也支持广义特征值问题(QZ算法),但本文主要讨论标准情况。基于任务的算法采用了先前的算法改进,如紧密耦合的多班次和积极的早期通货紧缩(AED),并结合了一些新思想,显著提高了性能。这包括但不限于,消除几个同步点,以前单独的计算步骤的动态合并,关键路径的缩短和优先级,以及实验性GPU支持。在基于Intel和AMD cpu的两台不同机器上,基于任务的实现在单节点和多节点配置下都比多线程LAPACK和ScaLAPACK快好几倍。该实现建立在StarPU运行时系统之上,是开源StarNEig库的一部分。
{"title":"Algorithm 1019: A Task-based Multi-shift QR/QZ Algorithm with Aggressive Early Deflation","authors":"Mirko Myllykoski","doi":"10.1145/3495005","DOIUrl":"https://doi.org/10.1145/3495005","url":null,"abstract":"The QR algorithm is one of the three phases in the process of computing the eigenvalues and the eigenvectors of a dense nonsymmetric matrix. This paper describes a task-based QR algorithm for reducing an upper Hessenberg matrix to real Schur form. The task-based algorithm also supports generalized eigenvalue problems (QZ algorithm) but this paper concentrates on the standard case. The task-based algorithm adopts previous algorithmic improvements, such as tightly-coupled multi-shifts and Aggressive Early Deflation (AED), and also incorporates several new ideas that significantly improve the performance. This includes, but is not limited to, the elimination of several synchronization points, the dynamic merging of previously separate computational steps, the shortening and the prioritization of the critical path, and experimental GPU support. The task-based implementation is demonstrated to be multiple times faster than multi-threaded LAPACK and ScaLAPACK in both single-node and multi-node configurations on two different machines based on Intel and AMD CPUs. The implementation is built on top of the StarPU runtime system and is part of the open-source StarNEig library.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"12 1","pages":"1 - 36"},"PeriodicalIF":0.0,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80621438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing 银杏:用于高性能计算的现代线性算子代数框架
Pub Date : 2020-06-30 DOI: 10.1145/3480935
H. Anzt, T. Cojean, Goran Flegar, Fritz Göbel, Thomas Grützmacher, Pratik Nayak, T. Ribizel, Yu-Hsiang Tsai, E. S. Quintana‐Ortí
In this article, we present Ginkgo, a modern C++ math library for scientific high performance computing. While classical linear algebra libraries act on matrix and vector objects, Ginkgo’s design principle abstracts all functionality as “linear operators,” motivating the notation of a “linear operator algebra library.” Ginkgo’s current focus is oriented toward providing sparse linear algebra functionality for high performance graphics processing unit (GPU) architectures, but given the library design, this focus can be easily extended to accommodate other algorithms and hardware architectures. We introduce this sophisticated software architecture that separates core algorithms from architecture-specific backends and provide details on extensibility and sustainability measures. We also demonstrate Ginkgo’s usability by providing examples on how to use its functionality inside the MFEM and deal.ii finite element ecosystems. Finally, we offer a practical demonstration of Ginkgo’s high performance on state-of-the-art GPU architectures.
在本文中,我们介绍了Ginkgo,一个用于科学高性能计算的现代c++数学库。当经典的线性代数库作用于矩阵和向量对象时,Ginkgo的设计原则将所有功能抽象为“线性算子”,从而激发了“线性算子代数库”的符号。Ginkgo目前的重点是为高性能图形处理单元(GPU)架构提供稀疏线性代数功能,但是考虑到库的设计,这个重点可以很容易地扩展到适应其他算法和硬件架构。我们将介绍这种复杂的软件体系结构,它将核心算法与特定于体系结构的后端分离开来,并提供有关可扩展性和可持续性措施的详细信息。我们还通过提供如何在MFEM和deal中使用其功能的示例来演示Ginkgo的可用性。Ii有限元生态系统。最后,我们提供了Ginkgo在最先进的GPU架构上的高性能的实际演示。
{"title":"Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing","authors":"H. Anzt, T. Cojean, Goran Flegar, Fritz Göbel, Thomas Grützmacher, Pratik Nayak, T. Ribizel, Yu-Hsiang Tsai, E. S. Quintana‐Ortí","doi":"10.1145/3480935","DOIUrl":"https://doi.org/10.1145/3480935","url":null,"abstract":"In this article, we present Ginkgo, a modern C++ math library for scientific high performance computing. While classical linear algebra libraries act on matrix and vector objects, Ginkgo’s design principle abstracts all functionality as “linear operators,” motivating the notation of a “linear operator algebra library.” Ginkgo’s current focus is oriented toward providing sparse linear algebra functionality for high performance graphics processing unit (GPU) architectures, but given the library design, this focus can be easily extended to accommodate other algorithms and hardware architectures. We introduce this sophisticated software architecture that separates core algorithms from architecture-specific backends and provide details on extensibility and sustainability measures. We also demonstrate Ginkgo’s usability by providing examples on how to use its functionality inside the MFEM and deal.ii finite element ecosystems. Finally, we offer a practical demonstration of Ginkgo’s high performance on state-of-the-art GPU architectures.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"33 1","pages":"1 - 33"},"PeriodicalIF":0.0,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88423799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Irksome: Automating Runge–Kutta Time-stepping for Finite Element Methods 有限元方法的龙格-库塔时间步进自动化
Pub Date : 2020-06-29 DOI: 10.1145/3466168
P. Farrell, R. Kirby, J. Marchena-Menendez
While implicit Runge–Kutta (RK) methods possess high order accuracy and important stability properties, implementation difficulties and the high expense of solving the coupled algebraic system at each time step are frequently cited as impediments. We present Irksome, a high-level library for manipulating UFL (Unified Form Language) expressions of semidiscrete variational forms to obtain UFL expressions for the coupled Runge–Kutta stage equations at each time step. Irksome works with the Firedrake package to enable the efficient solution of the resulting coupled algebraic systems. Numerical examples confirm the efficacy of the software and our solver techniques for various problems.
虽然隐式龙格-库塔(RK)方法具有高阶精度和重要的稳定性,但实现困难和在每个时间步求解耦合代数系统的高费用是经常被引用的障碍。我们提出了一个高级库Irksome,用于处理半离散变分形式的UFL(统一形式语言)表达式,以获得耦合龙格-库塔阶段方程在每个时间步的UFL表达式。Irksome与Firedrake包一起工作,以使结果耦合代数系统的有效解决方案成为可能。数值算例验证了该软件和我们的求解器技术对各种问题的有效性。
{"title":"Irksome: Automating Runge–Kutta Time-stepping for Finite Element Methods","authors":"P. Farrell, R. Kirby, J. Marchena-Menendez","doi":"10.1145/3466168","DOIUrl":"https://doi.org/10.1145/3466168","url":null,"abstract":"While implicit Runge–Kutta (RK) methods possess high order accuracy and important stability properties, implementation difficulties and the high expense of solving the coupled algebraic system at each time step are frequently cited as impediments. We present Irksome, a high-level library for manipulating UFL (Unified Form Language) expressions of semidiscrete variational forms to obtain UFL expressions for the coupled Runge–Kutta stage equations at each time step. Irksome works with the Firedrake package to enable the efficient solution of the resulting coupled algebraic systems. Numerical examples confirm the efficacy of the software and our solver techniques for various problems.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"1 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2020-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78263429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
H-Revolve H-Revolve
Pub Date : 2020-06-01 DOI: 10.1145/3378672
J. Herrmann, G. Pallez
We study the problem of checkpointing strategies for adjoint computation on synchronous hierarchical platforms, specifically computational platforms with several levels of storage with different writing and reading costs. When reversing a large adjoint chain, choosing which data to checkpoint and where is a critical decision for the overall performance of the computation. We introduce H-Revolve, an optimal algorithm for this problem. We make it available in a public Python library along with the implementation of several state-of-the-art algorithms for the variant of the problem with two levels of storage. We provide a detailed description of how one can use this library in an adjoint computation software in the field of automatic differentiation or backpropagation. Finally, we evaluate the performance of H-Revolve and other checkpointing heuristics though an extensive campaign of simulation.
研究了同步分层平台上伴随计算的检查点策略问题,特别是具有不同读写成本的多层存储的计算平台。在反转一个大的伴随链时,选择哪些数据要检查点以及在哪里检查点是对计算的整体性能至关重要的决策。我们介绍了H-Revolve算法,这是解决这一问题的最优算法。我们在一个公共Python库中提供它,并为具有两级存储的问题变体实现了几个最先进的算法。我们详细描述了如何在自动微分或反向传播领域的伴随计算软件中使用该库。最后,我们通过广泛的模拟活动来评估H-Revolve和其他检查点启发式的性能。
{"title":"H-Revolve","authors":"J. Herrmann, G. Pallez","doi":"10.1145/3378672","DOIUrl":"https://doi.org/10.1145/3378672","url":null,"abstract":"We study the problem of checkpointing strategies for adjoint computation on synchronous hierarchical platforms, specifically computational platforms with several levels of storage with different writing and reading costs. When reversing a large adjoint chain, choosing which data to checkpoint and where is a critical decision for the overall performance of the computation. We introduce H-Revolve, an optimal algorithm for this problem. We make it available in a public Python library along with the implementation of several state-of-the-art algorithms for the variant of the problem with two levels of storage. We provide a detailed description of how one can use this library in an adjoint computation software in the field of automatic differentiation or backpropagation. Finally, we evaluate the performance of H-Revolve and other checkpointing heuristics though an extensive campaign of simulation.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"2 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87378216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Algorithm 1010 算法1010
Pub Date : 2020-05-18 DOI: 10.1145/3386241
A. G. Orellana, C. Michele
Aiming to provide a very accurate, efficient, and robust quartic equation solver for physical applications, we have proposed an algorithm that builds on the previous works of P. Strobach and S. L. Shmakov. It is based on the decomposition of the quartic polynomial into two quadratics, whose coefficients are first accurately estimated by handling carefully numerical errors and afterward refined through the use of the Newton-Raphson method. Our algorithm is very accurate in comparison with other state-of-the-art solvers that can be found in the literature, but (most importantly) it turns out to be very efficient according to our timing tests. A crucial issue for us is the robustness of the algorithm, i.e., its ability to cope with the detrimental effect of round-off errors, no matter what set of quartic coefficients is provided in a practical application. In this respect, we extensively tested our algorithm in comparison to other quartic equation solvers both by considering specific extreme cases and by carrying out a statistical analysis over a very large set of quartics. Our algorithm has also been heavily tested in a physical application, i.e., simulations of hard cylinders, where it proved its absolute reliability as well as its efficiency.
为了为物理应用提供一个非常准确、高效和鲁棒的四次方程求解器,我们提出了一种基于P. Strobach和S. L. Shmakov先前工作的算法。它的基础是将四次多项式分解为两个二次多项式,其系数首先通过仔细处理数值误差来准确估计,然后通过使用牛顿-拉夫森方法加以改进。与文献中可以找到的其他最先进的求解器相比,我们的算法非常准确,但(最重要的是)根据我们的定时测试,它被证明是非常有效的。对我们来说,一个关键的问题是算法的鲁棒性,即无论在实际应用中提供什么样的四次系数集,它都能够处理舍入误差的有害影响。在这方面,我们通过考虑特定的极端情况和在非常大的四分之一集上进行统计分析,与其他四分方程求解器相比,广泛地测试了我们的算法。我们的算法也在物理应用中进行了大量测试,即硬气缸的模拟,在那里它证明了它的绝对可靠性和效率。
{"title":"Algorithm 1010","authors":"A. G. Orellana, C. Michele","doi":"10.1145/3386241","DOIUrl":"https://doi.org/10.1145/3386241","url":null,"abstract":"Aiming to provide a very accurate, efficient, and robust quartic equation solver for physical applications, we have proposed an algorithm that builds on the previous works of P. Strobach and S. L. Shmakov. It is based on the decomposition of the quartic polynomial into two quadratics, whose coefficients are first accurately estimated by handling carefully numerical errors and afterward refined through the use of the Newton-Raphson method. Our algorithm is very accurate in comparison with other state-of-the-art solvers that can be found in the literature, but (most importantly) it turns out to be very efficient according to our timing tests. A crucial issue for us is the robustness of the algorithm, i.e., its ability to cope with the detrimental effect of round-off errors, no matter what set of quartic coefficients is provided in a practical application. In this respect, we extensively tested our algorithm in comparison to other quartic equation solvers both by considering specific extreme cases and by carrying out a statistical analysis over a very large set of quartics. Our algorithm has also been heavily tested in a physical application, i.e., simulations of hard cylinders, where it proved its absolute reliability as well as its efficiency.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"9 1","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2020-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88026388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem 基于相关三对角特征问题的双对角奇异值分解计算
Pub Date : 2020-05-18 DOI: 10.1145/3361746
O. Marques, J. Demmel, P. Vasconcelos
The Singular Value Decomposition (SVD) is widely used in numerical analysis and scientific computing applications, including dimensionality reduction, data compression and clustering, and computation of pseudo-inverses. In many cases, a crucial part of the SVD of a general matrix is to find the SVD of an associated bidiagonal matrix. This article discusses an algorithm to compute the SVD of a bidiagonal matrix through the eigenpairs of an associated symmetric tridiagonal matrix. The algorithm enables the computation of only a subset of singular values and corresponding vectors, with potential performance gains. The article focuses on a sequential version of the algorithm, and discusses special cases and implementation details. The implementation, called BDSVDX, has been included in the LAPACK library. We use a large set of bidiagonal matrices to assess the accuracy of the implementation, both in single and double precision, as well as to identify potential shortcomings. The results show that BDSVDX can be up to three orders of magnitude faster than existing algorithms, which are limited to the computation of a full SVD. We also show comparisons of an implementation that uses BDSVDX as a building block for the computation of the SVD of general matrices.
奇异值分解(SVD)广泛应用于数值分析和科学计算中,包括降维、数据压缩和聚类、伪逆计算等。在许多情况下,一般矩阵的SVD的关键部分是找到相关双对角矩阵的SVD。本文讨论了一种利用相关对称三对角矩阵的特征对计算双对角矩阵奇异值分布的算法。该算法只允许计算奇异值和相应向量的子集,具有潜在的性能增益。本文将重点介绍该算法的顺序版本,并讨论特殊情况和实现细节。该实现称为BDSVDX,已包含在LAPACK库中。我们使用大量双对角矩阵来评估实现的准确性,包括单精度和双精度,以及识别潜在的缺点。结果表明,BDSVDX可以比现有算法快3个数量级,而现有算法仅限于全SVD的计算。我们还展示了使用BDSVDX作为计算一般矩阵的SVD的构建块的实现的比较。
{"title":"Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem","authors":"O. Marques, J. Demmel, P. Vasconcelos","doi":"10.1145/3361746","DOIUrl":"https://doi.org/10.1145/3361746","url":null,"abstract":"The Singular Value Decomposition (SVD) is widely used in numerical analysis and scientific computing applications, including dimensionality reduction, data compression and clustering, and computation of pseudo-inverses. In many cases, a crucial part of the SVD of a general matrix is to find the SVD of an associated bidiagonal matrix. This article discusses an algorithm to compute the SVD of a bidiagonal matrix through the eigenpairs of an associated symmetric tridiagonal matrix. The algorithm enables the computation of only a subset of singular values and corresponding vectors, with potential performance gains. The article focuses on a sequential version of the algorithm, and discusses special cases and implementation details. The implementation, called BDSVDX, has been included in the LAPACK library. We use a large set of bidiagonal matrices to assess the accuracy of the implementation, both in single and double precision, as well as to identify potential shortcomings. The results show that BDSVDX can be up to three orders of magnitude faster than existing algorithms, which are limited to the computation of a full SVD. We also show comparisons of an implementation that uses BDSVDX as a building block for the computation of the SVD of general matrices.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"32 1 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2020-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75786779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Error Analysis of Some Operations Involved in the Cooley-Tukey Fast Fourier Transform Cooley-Tukey快速傅里叶变换中一些运算的误差分析
Pub Date : 2020-05-18 DOI: 10.1145/3368619
N. Brisebarre, Mioara Joldes, J. Muller, Ana-Maria Naneş, Joris Picot
We are interested in obtaining error bounds for the classical Cooley-Tukey fast Fourier transform algorithm in floating-point arithmetic, for the 2-norm as well as for the infinity norm. For that purpose, we also give some results on the relative error of the complex multiplication by a root of unity, and on the largest value that can take the real or imaginary part of one term of the fast Fourier transform of a vector x, assuming that all terms of x have real and imaginary parts less than some value b.
我们感兴趣的是获得经典的Cooley-Tukey快速傅立叶变换算法在浮点运算中的误差界,对于2范数以及对于无穷范数。为了这个目的,我们也给出了一些结果关于复数乘以一个单位根的相对误差,以及关于向量x的快速傅里叶变换的一项的实部或虚部的最大值,假设x的所有项的实部和虚部都小于某个值b。
{"title":"Error Analysis of Some Operations Involved in the Cooley-Tukey Fast Fourier Transform","authors":"N. Brisebarre, Mioara Joldes, J. Muller, Ana-Maria Naneş, Joris Picot","doi":"10.1145/3368619","DOIUrl":"https://doi.org/10.1145/3368619","url":null,"abstract":"We are interested in obtaining error bounds for the classical Cooley-Tukey fast Fourier transform algorithm in floating-point arithmetic, for the 2-norm as well as for the infinity norm. For that purpose, we also give some results on the relative error of the complex multiplication by a root of unity, and on the largest value that can take the real or imaginary part of one term of the fast Fourier transform of a vector x, assuming that all terms of x have real and imaginary parts less than some value b.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"65 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2020-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91084192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Algorithm 1008 算法1008
Pub Date : 2020-05-18 DOI: 10.1145/3378542
Jose Maria Varas Casado, R. Hewson
A Matlab class for multicomplex numbers was developed with particular attention paid to the robust and accurate handling of small imaginary components. This is primarily to allow the class to be used to obtain n-order derivative information using the multicomplex step method for, among other applications, gradient-based optimization and optimum control problems. The algebra of multicomplex numbers is described, as is its accurate computational implementation, considering small term approximations and the identification of principal values. The implementation of the method in Matlab is studied, and a class definition is constructed. This new class definition enables Matlab to handle n-order multicomplex numbers and perform arithmetic functions. It was found that with this method, the step size could be arbitrarily decreased toward machine precision. Use of the method to obtain up to the seventh derivative of functions is presented, as is timing data to demonstrate the efficiency of the class implementation.
开发了一个多复数的Matlab类,特别注意对小虚分量的鲁棒性和准确性处理。这主要是为了允许该类使用多复步法获得n阶导数信息,在其他应用中,基于梯度的优化和最优控制问题。描述了多复数的代数,以及它的精确计算实现,考虑了小项近似和主值的识别。研究了该方法在Matlab中的实现,并构造了类定义。这个新的类定义使Matlab能够处理n阶多复数并执行算术函数。结果表明,该方法可使步长向机器精度任意减小。使用该方法来获得函数的七阶导数,以及时序数据,以证明类实现的效率。
{"title":"Algorithm 1008","authors":"Jose Maria Varas Casado, R. Hewson","doi":"10.1145/3378542","DOIUrl":"https://doi.org/10.1145/3378542","url":null,"abstract":"A Matlab class for multicomplex numbers was developed with particular attention paid to the robust and accurate handling of small imaginary components. This is primarily to allow the class to be used to obtain n-order derivative information using the multicomplex step method for, among other applications, gradient-based optimization and optimum control problems. The algebra of multicomplex numbers is described, as is its accurate computational implementation, considering small term approximations and the identification of principal values. The implementation of the method in Matlab is studied, and a class definition is constructed. This new class definition enables Matlab to handle n-order multicomplex numbers and perform arithmetic functions. It was found that with this method, the step size could be arbitrarily decreased toward machine precision. Use of the method to obtain up to the seventh derivative of functions is presented, as is timing data to demonstrate the efficiency of the class implementation.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"56 1","pages":"1 - 26"},"PeriodicalIF":0.0,"publicationDate":"2020-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78986444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
ACM Transactions on Mathematical Software (TOMS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1