ACM Transactions on Mathematical Software最新文献

英文中文

Accurate Calculation of Euclidean Norms Using Double-word Arithmetic 用双字算术精确计算欧几里得范数

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2022-10-25 DOI: 10.1145/3568672

V. Lefèvre, N. Louvet, J. Muller, Joris Picot, L. Rideau

We consider the computation of the Euclidean (or L2) norm of an n-dimensional vector in floating-point arithmetic. We review the classical solutions used to avoid spurious overflow or underflow and/or to obtain very accurate results. We modify a recently published algorithm (that uses double-word arithmetic) to allow for a very accurate solution, free of spurious overflows and underflows. To that purpose, we use a double-word square-root algorithm of which we provide a tight error analysis. The returned L2 norm will be within very slightly more than 0.5 ulp from the exact result, which means that we will almost always provide correct rounding.

我们考虑浮点运算中n维向量的欧几里得（或L2）范数的计算。我们回顾了用于避免虚假上溢或下溢和/或获得非常准确结果的经典解决方案。我们修改了最近发布的算法（使用双字算术），以获得非常准确的解决方案，没有虚假的溢出和下溢。为此，我们使用了一种双字平方根算法，对其进行了严格的误差分析。返回的L2范数将与确切结果相差0.5 ulp，这意味着我们几乎总是提供正确的舍入。

引用次数: 2

IFISS3D: A Computational Laboratory for Investigating Finite Element Approximation in Three Dimensions IFISS3D:研究三维有限元逼近的计算实验室

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2022-09-27 DOI: 10.1145/3604934

Georgios Papanikos, C. Powell, D. Silvester

IFISS is an established MATLAB finite element software package for studying strategies for solving partial differential equations (PDEs). IFISS3D is a new add-on toolbox that extends IFISS capabilities for elliptic PDEs from two to three space dimensions. The open-source MATLAB framework provides a computational laboratory for experimentation and exploration of finite element approximation and error estimation, as well as iterative solvers. The package is designed to be useful as a teaching tool for instructors and students who want to learn about state-of-the-art finite element methodology. It will also be useful for researchers as a source of reproducible test matrices of arbitrarily large dimension.

IFISS是一个成熟的MATLAB有限元软件包，用于研究偏微分方程(PDEs)求解策略。IFISS3D是一个新的附加工具箱，它将IFISS功能扩展到椭圆偏微分方程的二维到三维空间。开源的MATLAB框架为有限元逼近和误差估计以及迭代求解提供了实验和探索的计算实验室。该软件包的设计是有用的教学工具，为教师和学生谁想要了解国家的最先进的有限元方法。对于研究人员来说，它也可以作为任意大尺寸可重复测试矩阵的来源。

引用次数: 1

Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado 基于Sacado的新兴多核体系结构c++代码自动识别

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2022-09-27 DOI: 10.1145/3560262

E. Phipps, R. Pawlowski, C. Trott

Automatic differentiation (AD) is a well-known technique for evaluating analytic derivatives of calculations implemented on a computer, with numerous software tools available for incorporating AD technology into complex applications. However, a growing challenge for AD is the efficient differentiation of parallel computations implemented on emerging manycore computing architectures such as multicore CPUs, GPUs, and accelerators as these devices become more pervasive. In this work, we explore forward mode, operator overloading-based differentiation of C++ codes on these architectures using the widely available Sacado AD software package. In particular, we leverage Kokkos, a C++ tool providing APIs for implementing parallel computations that is portable to a wide variety of emerging architectures. We describe the challenges that arise when differentiating code for these architectures using Kokkos, and two approaches for overcoming them that ensure optimal memory access patterns as well as expose additional dimensions of fine-grained parallelism in the derivative calculation. We describe the results of several computational experiments that demonstrate the performance of the approach on a few contemporary CPU and GPU architectures. We then conclude with applications of these techniques to the simulation of discretized systems of partial differential equations.

自动微分（AD）是一种众所周知的技术，用于评估在计算机上执行的计算的分析导数，有许多软件工具可用于将AD技术结合到复杂的应用程序中。然而，AD面临的一个日益严峻的挑战是，随着这些设备的普及，在新兴的多核计算架构（如多核CPU、GPU和加速器）上实现的并行计算的高效差异化。在这项工作中，我们使用广泛可用的Sacado AD软件包，探索了在这些架构上基于前向模式、运算符重载的C++代码的差异化。特别是，我们利用了Kokkos，这是一种C++工具，提供用于实现并行计算的API，可移植到各种新兴体系结构。我们描述了在使用Kokkos区分这些架构的代码时出现的挑战，以及克服这些挑战的两种方法，它们确保了最佳的内存访问模式，并在导数计算中暴露了细粒度并行性的额外维度。我们描述了几个计算实验的结果，这些实验证明了该方法在一些当代CPU和GPU架构上的性能。然后，我们总结了这些技术在离散偏微分方程组模拟中的应用。

{"title":"Automatic Differentiation of C++ Codes on Emerging Manycore Architectures with Sacado","authors":"E. Phipps, R. Pawlowski, C. Trott","doi":"10.1145/3560262","DOIUrl":"https://doi.org/10.1145/3560262","url":null,"abstract":"Automatic differentiation (AD) is a well-known technique for evaluating analytic derivatives of calculations implemented on a computer, with numerous software tools available for incorporating AD technology into complex applications. However, a growing challenge for AD is the efficient differentiation of parallel computations implemented on emerging manycore computing architectures such as multicore CPUs, GPUs, and accelerators as these devices become more pervasive. In this work, we explore forward mode, operator overloading-based differentiation of C++ codes on these architectures using the widely available Sacado AD software package. In particular, we leverage Kokkos, a C++ tool providing APIs for implementing parallel computations that is portable to a wide variety of emerging architectures. We describe the challenges that arise when differentiating code for these architectures using Kokkos, and two approaches for overcoming them that ensure optimal memory access patterns as well as expose additional dimensions of fine-grained parallelism in the derivative calculation. We describe the results of several computational experiments that demonstrate the performance of the approach on a few contemporary CPU and GPU architectures. We then conclude with applications of these techniques to the simulation of discretized systems of partial differential equations.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"48 1","pages":"1 - 29"},"PeriodicalIF":2.7,"publicationDate":"2022-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42830984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Remark on Algorithm 1010: Boosting Efficiency in Solving Quartic Equations with No Compromise in Accuracy 算法1010:在不影响精度的情况下提高求解四次方程的效率

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2022-09-19 DOI: 10.1145/3564270

C. De Michele

We present a correction and an improvement to Algorithm 1010 [A. Orellana and C. De Michele 2020].

本文对算法1010 [a]进行了修正和改进。Orellana and C. De Michele 2020]。

引用次数: 1

emgr – EMpirical GRamian Framework Version 5.99 emgr -经验语法框架版本5.99

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2022-09-08 DOI: 10.1145/3609860

Christian Himpe

Version 5.99 of the empirical Gramian framework – emgr – completes a development cycle which focused on parametric model order reduction of gas network models while preserving compatibility to the previous development for the application of combined state and parameter reduction for neuroscience network models. Second, new features concerning empirical Gramian types, perturbation design, and trajectory post-processing, as well as a Python version in addition to the default MATLAB / Octave implementation, have been added. This work summarizes these changes, particularly since emgr version 5.4, see Himpe, 2018 [Algorithms 11(7): 91], and gives recent as well as future applications, such as parameter identification in systems biology, based on the current feature set.

经验Gramian框架的5.99版本- emgr -完成了一个开发周期，重点是气体网络模型的参数模型降阶，同时保留了对神经科学网络模型组合状态和参数降阶应用的先前开发的兼容性。其次，添加了关于经验Gramian类型、摄动设计和轨迹后处理的新功能，以及除了默认的MATLAB / Octave实现之外的Python版本。这项工作总结了这些变化，特别是自emgr 5.4版本以来，参见Himpe, 2018[算法11(7):91]，并给出了最近和未来的应用，例如基于当前特征集的系统生物学中的参数识别。

引用次数: 19

Cache-oblivious Hilbert Curve-based Blocking Scheme for Matrix Transposition 基于缓存遗忘Hilbert曲线的矩阵换位分块方案

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2022-08-09 DOI: 10.1145/3555353

J. N. F. Alves, L. Russo, Alexandre P. Francisco

This article presents a fast SIMD Hilbert space-filling curve generator, which supports a new cache-oblivious blocking-scheme technique applied to the out-of-place transposition of general matrices. Matrix operations found in high performance computing libraries are usually parameterized based on host microprocessor specifications to minimize data movement within the different levels of memory hierarchy. The performance of cache-oblivious algorithms does not rely on such parameterizations. This type of algorithm provides an elegant and portable solution to address the lack of standardization in modern-day processors. Our solution consists in an iterative blocking scheme that takes advantage of the locality-preserving properties of Hilbert space-filling curves to minimize data movement in any memory hierarchy. This scheme traverses the input matrix, in O(nm) time and space, improving the behavior of matrix algorithms that inherently present poor memory locality. The application of this technique to the problem of out-of-place matrix transposition achieved competitive results when compared to state-of-the-art approaches. The performance of our solution surpassed Intel MKL version after employing standard software prefetching techniques.

本文提出了一种快速SIMD-Hilbert空间填充曲线生成器，该生成器支持一种新的缓存遗忘分块技术，该技术应用于一般矩阵的错位换位。高性能计算库中的矩阵运算通常基于主机微处理器规范进行参数化，以最大限度地减少不同级别内存层次结构中的数据移动。缓存遗忘算法的性能不依赖于这样的参数化。这种类型的算法提供了一种优雅而便携的解决方案，以解决现代处理器缺乏标准化的问题。我们的解决方案包含一个迭代阻塞方案，该方案利用Hilbert空间填充曲线的局部保持特性，最大限度地减少任何内存层次中的数据移动。该方案在O（nm）时间和空间中遍历输入矩阵，改善了固有地存在较差内存局部性的矩阵算法的行为。与最先进的方法相比，将该技术应用于错位矩阵换位问题获得了具有竞争力的结果。采用标准软件预取技术后，我们的解决方案的性能超过了“英特尔MKL”版本。

{"title":"Cache-oblivious Hilbert Curve-based Blocking Scheme for Matrix Transposition","authors":"J. N. F. Alves, L. Russo, Alexandre P. Francisco","doi":"10.1145/3555353","DOIUrl":"https://doi.org/10.1145/3555353","url":null,"abstract":"This article presents a fast SIMD Hilbert space-filling curve generator, which supports a new cache-oblivious blocking-scheme technique applied to the out-of-place transposition of general matrices. Matrix operations found in high performance computing libraries are usually parameterized based on host microprocessor specifications to minimize data movement within the different levels of memory hierarchy. The performance of cache-oblivious algorithms does not rely on such parameterizations. This type of algorithm provides an elegant and portable solution to address the lack of standardization in modern-day processors. Our solution consists in an iterative blocking scheme that takes advantage of the locality-preserving properties of Hilbert space-filling curves to minimize data movement in any memory hierarchy. This scheme traverses the input matrix, in O(nm) time and space, improving the behavior of matrix algorithms that inherently present poor memory locality. The application of this technique to the problem of out-of-place matrix transposition achieved competitive results when compared to state-of-the-art approaches. The performance of our solution surpassed Intel MKL version after employing standard software prefetching techniques.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"48 1","pages":"1 - 28"},"PeriodicalIF":2.7,"publicationDate":"2022-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43226486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Algorithm xxx: SC-SR1: MATLAB Software for Limited-Memory SR1 Trust-Region Methods 算法xxx:SC-SR1:MATLAB有限内存软件SR1信赖域方法

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2022-07-22 DOI: 10.1145/3550269

J. Brust, O. Burdakov, Jennifer B. Erway, Roummel F. Marcia

We present a MATLAB implementation of the symmetric rank-one (SC-SR1) method that solves trust-region subproblems when a limited-memory symmetric rank-one (L-SR1) matrix is used in place of the true Hessian matrix, which can be used for large-scale optimization. The method takes advantage of two shape-changing norms [7, 9] to decompose the trust-region subproblem into two separate problems. Using one of the proposed norms, the resulting subproblems have closed-form solutions. Meanwhile, using the other proposed norm, one of the resulting subproblems has a closed-form solution while the other is easily solvable using techniques that exploit the structure of L-SR1 matrices. Numerical results suggest that the SC-SR1 method is able to solve trust-region subproblems to high accuracy even in the so-called “hard case”. When integrated into a trust-region algorithm, extensive numerical experiments suggest that the proposed algorithms perform well, when compared with widely used solvers, such as truncated CG.

我们提出了对称秩一（SC-SR1）方法的MATLAB实现，当使用有限记忆对称秩一矩阵（L-SR1）代替可用于大规模优化的真Hessian矩阵时，该方法解决了信任域子问题。该方法利用两个形状变化范数[7，9]将信任域子问题分解为两个独立的问题。使用所提出的规范之一，得到的子问题具有闭合形式的解。同时，使用另一个提出的范数，其中一个子问题具有闭合形式的解，而另一个子问题使用利用L-SR1矩阵结构的技术很容易求解。数值结果表明，即使在所谓的“困难情况”下，SC-SR1方法也能够高精度地求解信赖域子问题。当集成到信任域算法中时，大量的数值实验表明，与广泛使用的求解器（如截断CG）相比，所提出的算法表现良好。

引用次数: 4

Algorithms for Parallel Generic hp-Adaptive Finite Element Software 并行通用hp-自适应有限元软件算法

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2022-06-13 DOI: 10.1145/3603372

M. Fehling, W. Bangerth

The hp-adaptive finite element method—where one independently chooses the mesh size (h) and polynomial degree (p) to be used on each cell—has long been known to have better theoretical convergence properties than either h- or p-adaptive methods alone. However, it is not widely used, owing at least in part to the difficulty of the underlying algorithms and the lack of widely usable implementations. This is particularly true when used with continuous finite elements. Herein, we discuss algorithms that are necessary for a comprehensive and generic implementation of hp-adaptive finite element methods on distributed-memory, parallel machines. In particular, we will present a multistage algorithm for the unique enumeration of degrees of freedom suitable for continuous finite element spaces, describe considerations for weighted load balancing, and discuss the transfer of variable size data between processes. We illustrate the performance of our algorithms with numerical examples and demonstrate that they scale reasonably up to at least 16,384 message passage interface processes. We provide a reference implementation of our algorithms as part of the open source library deal.II.

hp自适应有限元法——独立选择每个单元的网格大小(h)和多项式度(p)——长期以来被认为比单独使用h或p自适应方法具有更好的理论收敛性。然而，它并没有被广泛使用，至少部分原因是底层算法的困难和缺乏广泛可用的实现。当使用连续有限单元时尤其如此。在此，我们讨论了在分布式内存并行机器上全面和通用地实现hp自适应有限元方法所必需的算法。特别是，我们将提出一种适用于连续有限元空间的唯一自由度枚举的多阶段算法，描述加权负载平衡的考虑因素，并讨论进程之间可变大小数据的传输。我们用数值示例说明了算法的性能，并证明它们可以合理地扩展到至少16,384个消息传递接口进程。作为开源库协议的一部分，我们提供了我们算法的参考实现。

{"title":"Algorithms for Parallel Generic hp-Adaptive Finite Element Software","authors":"M. Fehling, W. Bangerth","doi":"10.1145/3603372","DOIUrl":"https://doi.org/10.1145/3603372","url":null,"abstract":"The hp-adaptive finite element method—where one independently chooses the mesh size (h) and polynomial degree (p) to be used on each cell—has long been known to have better theoretical convergence properties than either h- or p-adaptive methods alone. However, it is not widely used, owing at least in part to the difficulty of the underlying algorithms and the lack of widely usable implementations. This is particularly true when used with continuous finite elements. Herein, we discuss algorithms that are necessary for a comprehensive and generic implementation of hp-adaptive finite element methods on distributed-memory, parallel machines. In particular, we will present a multistage algorithm for the unique enumeration of degrees of freedom suitable for continuous finite element spaces, describe considerations for weighted load balancing, and discuss the transfer of variable size data between processes. We illustrate the performance of our algorithms with numerical examples and demonstrate that they scale reasonably up to at least 16,384 message passage interface processes. We provide a reference implementation of our algorithms as part of the open source library deal.II.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 26"},"PeriodicalIF":2.7,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44759580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

ARKODE: A Flexible IVP Solver Infrastructure for One-step Methods ARKODE:用于一步法的灵活IVP求解器基础结构

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2022-05-27 DOI: 10.1145/3594632

D. Reynolds, D. J. Gardner, C. Woodward, Rujeko Chinomona

We describe the ARKODE library of one-step time integration methods for ordinary differential equation (ODE) initial-value problems (IVPs). In addition to providing standard explicit and diagonally implicit Runge–Kutta methods, ARKODE supports one-step methods designed to treat additive splittings of the IVP, including implicit-explicit (ImEx) additive Runge–Kutta methods and multirate infinitesimal (MRI) methods. We present the role of ARKODE within the SUNDIALS suite of time integration and nonlinear solver libraries, the core ARKODE infrastructure for utilities common to large classes of one-step methods, as well as its use of “time stepper” modules enabling easy incorporation of novel algorithms into the library. Numerical results show example problems of increasing complexity, highlighting the algorithmic flexibility afforded through this infrastructure, and include a larger multiphysics application leveraging multiple algorithmic features from ARKODE and SUNDIALS.

本文描述了求解常微分方程(ODE)初值问题的一步时间积分方法的ARKODE库。除了提供标准的显式和对角隐式龙格-库塔方法外，ARKODE还支持一步法，用于处理IVP的加性分裂，包括隐式-显式(ImEx)加性龙格-库塔方法和多率无穷小(MRI)方法。我们介绍了ARKODE在SUNDIALS时间集成和非线性求解器库套件中的作用，这是大型单步方法类常用的核心ARKODE基础设施，以及它使用的“时间步进器”模块，可以轻松地将新算法合并到库中。数值结果显示了日益复杂的示例问题，突出了通过该基础设施提供的算法灵活性，并包括利用ARKODE和SUNDIALS的多种算法特性的更大的多物理场应用程序。

引用次数: 4

Algorithm XXX: Concurrent Alternating Least Squares for multiple simultaneous Canonical Polyadic Decompositions 算法XXX:并行交替最小二乘的多重同时正则多进分解

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2022-04-29 DOI: 10.1145/3519383

C. Psarras, L. Karlsson, R. Bro, P. Bientinesi

Tensor decompositions, such as CANDECOMP/PARAFAC (CP), are widely used in a variety of applications, such as chemometrics, signal processing, and machine learning. A broadly used method for computing such decompositions relies on the Alternating Least Squares (ALS) algorithm. When the number of components is small, regardless of its implementation, ALS exhibits low arithmetic intensity, which severely hinders its performance and makes GPU offloading ineffective. We observe that, in practice, experts often have to compute multiple decompositions of the same tensor, each with a small number of components (typically fewer than 20), to ultimately find the best ones to use for the application at hand. In this paper, we illustrate how multiple decompositions of the same tensor can be fused together at the algorithmic level to increase the arithmetic intensity. Therefore, it becomes possible to make efficient use of GPUs for further speedups; at the same time the technique is compatible with many enhancements typically used in ALS, such as line search, extrapolation, and non-negativity constraints. We introduce the Concurrent ALS algorithm and library, which offers an interface to MATLAB, and a mechanism to effectively deal with the issue that decompositions complete at different times. Experimental results on artificial and real datasets demonstrate a shorter time to completion due to increased arithmetic intensity.

张量分解，如CANDECOMP/PARAFAC (CP)，被广泛应用于各种应用，如化学计量学，信号处理和机器学习。一种广泛使用的计算这种分解的方法依赖于交替最小二乘(ALS)算法。在组件数量较少的情况下，无论采用何种实现方式，ALS的运算强度都很低，严重影响了ALS的性能，导致GPU卸载效率低下。我们观察到，在实践中，专家经常需要计算相同张量的多次分解，每次分解都有少量的组件(通常少于20个)，以最终找到适合手头应用程序的最佳组件。在本文中，我们说明了如何在算法层面上将同一张量的多个分解融合在一起以增加算法强度。因此，可以有效地利用gpu来进一步提高速度;同时，该技术与ALS中通常使用的许多增强功能兼容，例如线搜索、外推和非负性约束。本文介绍了并行ALS算法和库，它提供了一个与MATLAB的接口，以及一种有效处理分解在不同时间完成问题的机制。在人工和真实数据集上的实验结果表明，由于提高了算法强度，算法完成时间缩短。

{"title":"Algorithm XXX: Concurrent Alternating Least Squares for multiple simultaneous Canonical Polyadic Decompositions","authors":"C. Psarras, L. Karlsson, R. Bro, P. Bientinesi","doi":"10.1145/3519383","DOIUrl":"https://doi.org/10.1145/3519383","url":null,"abstract":"Tensor decompositions, such as CANDECOMP/PARAFAC (CP), are widely used in a variety of applications, such as chemometrics, signal processing, and machine learning. A broadly used method for computing such decompositions relies on the Alternating Least Squares (ALS) algorithm. When the number of components is small, regardless of its implementation, ALS exhibits low arithmetic intensity, which severely hinders its performance and makes GPU offloading ineffective. We observe that, in practice, experts often have to compute multiple decompositions of the same tensor, each with a small number of components (typically fewer than 20), to ultimately find the best ones to use for the application at hand. In this paper, we illustrate how multiple decompositions of the same tensor can be fused together at the algorithmic level to increase the arithmetic intensity. Therefore, it becomes possible to make efficient use of GPUs for further speedups; at the same time the technique is compatible with many enhancements typically used in ALS, such as line search, extrapolation, and non-negativity constraints. We introduce the Concurrent ALS algorithm and library, which offers an interface to MATLAB, and a mechanism to effectively deal with the issue that decompositions complete at different times. Experimental results on artificial and real datasets demonstrate a shorter time to completion due to increased arithmetic intensity.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41347702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ACM Transactions on Mathematical Software

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀