ACM Transactions on Mathematical Software (TOMS)最新文献

英文中文

A Computational Study of Using Black-box QR Solvers for Large-scale Sparse-dense Linear Least Squares Problems 大规模稀疏密集线性最小二乘问题黑箱QR解的计算研究

ACM Transactions on Mathematical Software (TOMS)

Pub Date : 2022-02-16 DOI: 10.1145/3494527

J. Scott, M. Tuma

Large-scale overdetermined linear least squares problems arise in many practical applications. One popular solution method is based on the backward stable QR factorization of the system matrix A. This article focuses on sparse-dense least squares problems in which A is sparse except from a small number of rows that are considered dense. For large-scale problems, the direct application of a QR solver either fails because of insufficient memory or is unacceptably slow. We study several solution approaches based on using a sparse QR solver without modification, focussing on the case that the sparse part of A is rank deficient. We discuss partial matrix stretching and regularization and propose extending the augmented system formulation with iterative refinement for sparse problems to sparse-dense problems, optionally incorporating multi-precision arithmetic. In summary, our computational study shows that, before applying a black-box QR factorization, a check should be made for rows that are classified as dense and, if such rows are identified, then A should be split into sparse and dense blocks; a number of ways to use a black-box QR factorization to exploit this splitting are possible, with no single method found to be the best in all cases.

大规模过定线性最小二乘问题在许多实际应用中都会出现。一种流行的解决方法是基于系统矩阵A的后向稳定QR分解。本文主要关注稀疏-密集最小二乘问题，其中A是稀疏的，只有少数行被认为是密集的。对于大规模问题，直接应用QR求解器要么由于内存不足而失败，要么速度慢得令人无法接受。研究了基于不加修饰的稀疏QR求解器的几种求解方法，重点研究了a的稀疏部分秩亏的情况。我们讨论了部分矩阵的伸缩和正则化，并提出将稀疏问题的迭代细化增广系统公式扩展到稀疏密集问题，可选地结合多精度算法。综上所述，我们的计算研究表明，在应用黑盒QR分解之前，应该对被分类为密集的行进行检查，如果这些行被识别出来，则应该将a分成稀疏和密集的块;有许多方法可以使用黑盒QR分解来利用这种分裂，但没有一种方法在所有情况下都是最好的。

{"title":"A Computational Study of Using Black-box QR Solvers for Large-scale Sparse-dense Linear Least Squares Problems","authors":"J. Scott, M. Tuma","doi":"10.1145/3494527","DOIUrl":"https://doi.org/10.1145/3494527","url":null,"abstract":"Large-scale overdetermined linear least squares problems arise in many practical applications. One popular solution method is based on the backward stable QR factorization of the system matrix A. This article focuses on sparse-dense least squares problems in which A is sparse except from a small number of rows that are considered dense. For large-scale problems, the direct application of a QR solver either fails because of insufficient memory or is unacceptably slow. We study several solution approaches based on using a sparse QR solver without modification, focussing on the case that the sparse part of A is rank deficient. We discuss partial matrix stretching and regularization and propose extending the augmented system formulation with iterative refinement for sparse problems to sparse-dense problems, optionally incorporating multi-precision arithmetic. In summary, our computational study shows that, before applying a black-box QR factorization, a check should be made for rows that are classified as dense and, if such rows are identified, then A should be split into sparse and dense blocks; a number of ways to use a black-box QR factorization to exploit this splitting are possible, with no single method found to be the best in all cases.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"30 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2022-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91194345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Source-to-Source Automatic Differentiation of OpenMP Parallel Loops OpenMP并行环路的源到源自动分化

ACM Transactions on Mathematical Software (TOMS)

Pub Date : 2022-02-16 DOI: 10.1145/3472796

J. Hückelheim, L. Hascoët

This article presents our work toward correct and efficient automatic differentiation of OpenMP parallel worksharing loops in forward and reverse mode. Automatic differentiation is a method to obtain gradients of numerical programs, which are crucial in optimization, uncertainty quantification, and machine learning. The computational cost to compute gradients is a common bottleneck in practice. For applications that are parallelized for multicore CPUs or GPUs using OpenMP, one also wishes to compute the gradients in parallel. We propose a framework to reason about the correctness of the generated derivative code, from which we justify our OpenMP extension to the differentiation model. We implement this model in the automatic differentiation tool Tapenade and present test cases that are differentiated following our extended differentiation procedure. Performance of the generated derivative programs in forward and reverse mode is better than sequential, although our reverse mode often scales worse than the input programs.

本文介绍了我们在正向和反向模式下对OpenMP并行工作共享循环进行正确和有效的自动区分的工作。自动微分是一种求数值程序梯度的方法，在优化、不确定性量化和机器学习等领域具有重要意义。计算梯度的计算成本是实际应用中常见的瓶颈。对于使用OpenMP为多核cpu或gpu并行化的应用程序，也希望并行计算梯度。我们提出了一个框架来推断生成的衍生代码的正确性，并以此证明我们的OpenMP扩展到分化模型是正确的。我们在自动区分工具Tapenade中实现这个模型，并根据我们扩展的区分过程给出区分的测试用例。在正向和反向模式下生成的导数程序的性能优于顺序，尽管我们的反向模式通常比输入程序更差。

引用次数: 2

Kummer versus Montgomery Face-off over Prime Order Fields Kummer和Montgomery在Prime Order Fields的对决

ACM Transactions on Mathematical Software (TOMS)

Pub Date : 2022-02-11 DOI: 10.1145/3503536

K. Nath, P. Sarkar

This paper makes a comprehensive comparison of the efficiencies of vectorized implementations of Kummer lines and Montgomery curves at various security levels. For the comparison, nine Kummer lines are considered, out of which eight are new, and new assembly implementations of all nine Kummer lines have been made. Seven previously proposed Montgomery curves are considered and new vectorized assembly implementations have been made for three of them. Our comparisons show that for all security levels, Kummer lines are consistently faster than Montgomery curves, though the speed-up gap is not much.

本文全面比较了Kummer线和Montgomery曲线矢量化实现在不同安全级别下的效率。为了进行比较，考虑了9条Kummer线，其中8条是新的，并且所有9条Kummer线的新装配实现都已经完成。考虑了先前提出的七个Montgomery曲线，并对其中三个曲线进行了新的矢量化装配实现。我们的比较表明，对于所有安全级别，Kummer线始终比Montgomery曲线快，尽管加速差距并不大。

引用次数: 1

Remark on Algorithm 982: Explicit Solutions of Triangular Systems of First-order Linear Initial-value Ordinary Differential Equations with Constant Coefficients 算法982:一阶常系数线性初值常微分方程三角方程组的显式解

ACM Transactions on Mathematical Software (TOMS)

Pub Date : 2021-12-15 DOI: 10.1145/3479429

W. Van Snyder

Algorithm 982: Explicit solutions of triangular systems of first-order linear initial-value ordinary differential equations with constant coefficients provides an explicit solution for an homogeneous system, and a brief description of how to compute a solution for the inhomogeneous case. The method described is not directly useful if the coefficient matrix is singular. This remark explains more completely how to compute the solution for the inhomogeneous case and for the singular coefficient matrix case.

算法982:一阶常系数线性初值常微分方程三角形系统的显式解提供了齐次系统的显式解，并简要描述了如何计算非齐次情况的解。如果系数矩阵是奇异的，所描述的方法就不是直接有用的。这个注释更完整地解释了如何计算非齐次情况和奇异系数矩阵情况的解。

引用次数: 0

Algorithm 1018: FaVeST—Fast Vector Spherical Harmonic Transforms 算法1018:最快的矢量球谐变换

ACM Transactions on Mathematical Software (TOMS)

Pub Date : 2021-09-28 DOI: 10.1145/3458470

Q. L. Le Gia, Ming Li, Yu Guang Wang

Vector spherical harmonics on the unit sphere of ℝ3 have broad applications in geophysics, quantum mechanics, and astrophysics. In the representation of a tangent vector field, one needs to evaluate the expansion and the Fourier coefficients of vector spherical harmonics. In this article, we develop fast algorithms (FaVeST) for vector spherical harmonic transforms on these evaluations. The forward FaVeST evaluates the Fourier coefficients and has a computational cost proportional to N log √N for N number of evaluation points. The adjoint FaVeST, which evaluates a linear combination of vector spherical harmonics with a degree up to ⊡M for M evaluation points, has cost proportional to M log √M. Numerical examples of simulated tangent fields illustrate the accuracy, efficiency, and stability of FaVeST.

单位球上的矢量球谐波在地球物理学、量子力学和天体物理学中有着广泛的应用。在切矢量场的表示中，需要计算矢量球谐波的展开式和傅里叶系数。在本文中，我们开发了基于这些评价的矢量球谐变换的快速算法(FaVeST)。前向faest评估傅里叶系数，对于N个评估点，其计算成本与N log√N成正比。伴随的faest对M个评价点的矢量球谐波的线性组合进行评价，其评价度高达95.m，其代价与M log√M成正比。模拟切线场的数值例子说明了该方法的精度、效率和稳定性。

引用次数: 1

Corrigendum: Remark on Algorithm 723: Fresnel Integrals 勘误:关于算法723:菲涅耳积分的注释

ACM Transactions on Mathematical Software (TOMS)

Pub Date : 2021-09-28 DOI: 10.1145/3452336

W. Van Snyder

There are mistakes and typographical errors in Remark on Algorithm 723: Fresnel Integrals, which appeared in ACM Transactions on Mathematical Software 22, 4 (December 1996). This remark corrects those errors. The software provided to Collected Algorithms of the ACM was correct.

在ACM数学软件汇刊22,4(1996年12月)中出现的关于算法723:菲涅耳积分的注释中存在错误和排版错误。这句话纠正了那些错误。ACM提供的“收集算法”软件正确。

引用次数: 0

Fast Matching Pursuit with Multi-Gabor Dictionaries 基于多gabor词典的快速匹配追踪

ACM Transactions on Mathematical Software (TOMS)

Pub Date : 2021-06-25 DOI: 10.1145/3447958

Zdeněk Průša, N. Holighaus, Péter Balázs

Finding the best K-sparse approximation of a signal in a redundant dictionary is an NP-hard problem. Suboptimal greedy matching pursuit algorithms are generally used for this task. In this work, we present an acceleration technique and an implementation of the matching pursuit algorithm acting on a multi-Gabor dictionary, i.e., a concatenation of several Gabor-type time-frequency dictionaries, each of which consists of translations and modulations of a possibly different window and time and frequency shift parameters. The technique is based on pre-computing and thresholding inner products between atoms and on updating the residual directly in the coefficient domain, i.e., without the round-trip to the signal domain. Since the proposed acceleration technique involves an approximate update step, we provide theoretical and experimental results illustrating the convergence of the resulting algorithm. The implementation is written in C (compatible with C99 and C++11), and we also provide Matlab and GNU Octave interfaces. For some settings, the implementation is up to 70 times faster than the standard Matching Pursuit Toolkit.

在冗余字典中寻找信号的最佳k -稀疏逼近是一个np困难问题。次优贪婪匹配追踪算法通常用于此任务。在这项工作中，我们提出了一种加速技术和匹配追踪算法的实现，该算法作用于多gabor字典，即几个gabor类型时频字典的串联，每个字典由可能不同的窗口和时频移参数的平移和调制组成。该技术基于原子间内积的预计算和阈值化，并直接在系数域更新残差，即不需要往返于信号域。由于提出的加速技术涉及一个近似的更新步骤，我们提供了理论和实验结果来说明所得到的算法的收敛性。该实现是用C语言编写的(兼容C99和c++ 11)，我们还提供了Matlab和GNU Octave接口。对于某些设置，实现比标准的Matching Pursuit Toolkit快70倍。

引用次数: 3

NEP

ACM Transactions on Mathematical Software (TOMS)

Pub Date : 2021-06-25 DOI: 10.1145/3447544

C. Campos, J. Román

SLEPc is a parallel library for the solution of various types of large-scale eigenvalue problems. Over the past few years, we have been developing a module within SLEPc, called NEP, that is intended for solving nonlinear eigenvalue problems. These problems can be defined by means of a matrix-valued function that depends nonlinearly on a single scalar parameter. We do not consider the particular case of polynomial eigenvalue problems (which are implemented in a different module in SLEPc) and focus here on rational eigenvalue problems and other general nonlinear eigenproblems involving square roots or any other nonlinear function. The article discusses how the NEP module has been designed to fit the needs of applications and provides a description of the available solvers, including some implementation details such as parallelization. Several test problems coming from real applications are used to evaluate the performance and reliability of the solvers.

SLEPc是一个用于求解各种类型的大规模特征值问题的并行库。在过去的几年里，我们一直在SLEPc中开发一个名为NEP的模块，用于解决非线性特征值问题。这些问题可以用非线性依赖于单个标量参数的矩阵值函数来定义。我们不考虑多项式特征值问题的特殊情况(在SLEPc的不同模块中实现)，并将重点放在有理特征值问题和其他涉及平方根或任何其他非线性函数的一般非线性特征问题上。本文讨论了如何设计NEP模块以满足应用程序的需求，并提供了可用求解器的描述，包括一些实现细节，如并行化。通过实际应用中的几个测试问题来评估求解器的性能和可靠性。

引用次数: 9

HyperNOMAD HyperNOMAD

ACM Transactions on Mathematical Software (TOMS)

Pub Date : 2021-06-25 DOI: 10.1145/3450975

Dounia Lakhmiri, Sébastien Le Digabel, C. Tribes

The performance of deep neural networks is highly sensitive to the choice of the hyperparameters that define the structure of the network and the learning process. When facing a new application, tuning a deep neural network is a tedious and time-consuming process that is often described as a “dark art.” This explains the necessity of automating the calibration of these hyperparameters. Derivative-free optimization is a field that develops methods designed to optimize time-consuming functions without relying on derivatives. This work introduces the HyperNOMAD package, an extension of the NOMAD software that applies the MADS algorithm [7] to simultaneously tune the hyperparameters responsible for both the architecture and the learning process of a deep neural network (DNN). This generic approach allows for an important flexibility in the exploration of the search space by taking advantage of categorical variables. HyperNOMAD is tested on the MNIST, Fashion-MNIST, and CIFAR-10 datasets and achieves results comparable to the current state of the art.

深度神经网络的性能对定义网络结构和学习过程的超参数的选择高度敏感。当面对一个新的应用程序时，调整深度神经网络是一个乏味而耗时的过程，通常被描述为“黑暗艺术”。这就解释了自动化校准这些超参数的必要性。无导数优化是一个开发方法的领域，旨在优化耗时的函数，而不依赖于导数。这项工作介绍了HyperNOMAD包，这是NOMAD软件的扩展，它应用MADS算法[7]同时调优负责深层神经网络(DNN)架构和学习过程的超参数。这种通用方法通过利用分类变量，在探索搜索空间方面提供了重要的灵活性。HyperNOMAD在MNIST、Fashion-MNIST和CIFAR-10数据集上进行了测试，并取得了与当前技术水平相当的结果。

引用次数: 23

PLANC

ACM Transactions on Mathematical Software (TOMS)

Pub Date : 2021-06-25 DOI: 10.1145/3432185

Srinivas Eswar, Koby Hayashi, Grey Ballard, R. Kannan, Michael A. Matheson, Haesun Park

We consider the problem of low-rank approximation of massive dense nonnegative tensor data, for example, to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes, and performing efficient and scalable parallel algorithms to compute the low-rank approximation. We present a software package called Parallel Low-rank Approximation with Nonnegativity Constraints, which implements our solution and allows for extension in terms of data (dense or sparse, matrices or tensors of any order), algorithm (e.g., from multiplicative updating techniques to alternating direction method of multipliers), and architecture (we exploit GPUs to accelerate the computation in this work). We describe our parallel distributions and algorithms, which are careful to avoid unnecessary communication and computation, show how to extend the software to include new algorithms and/or constraints, and report efficiency and scalability results for both synthetic and real-world data sets.

我们考虑了大量密集非负张量数据的低秩逼近问题，例如，在视频和成像应用中发现潜在模式。随着数据集规模的增长，单个工作站在计算时间和可用内存方面都遇到了瓶颈。我们提出了一种分布式内存并行计算解决方案来处理大量数据集，跨多个节点的内存加载输入数据，并执行高效和可扩展的并行算法来计算低秩近似。我们提出了一个软件包，称为具有非负性约束的并行低秩近似，它实现了我们的解决方案，并允许在数据(密集或稀疏，矩阵或任意顺序的张量)，算法(例如，从乘法更新技术到乘法器的交替方向方法)和架构(我们利用gpu来加速这项工作中的计算)方面进行扩展。我们描述了我们的并行分布和算法，它们小心地避免了不必要的通信和计算，展示了如何扩展软件以包含新的算法和/或约束，并报告了合成和真实数据集的效率和可扩展性结果。

{"title":"PLANC","authors":"Srinivas Eswar, Koby Hayashi, Grey Ballard, R. Kannan, Michael A. Matheson, Haesun Park","doi":"10.1145/3432185","DOIUrl":"https://doi.org/10.1145/3432185","url":null,"abstract":"We consider the problem of low-rank approximation of massive dense nonnegative tensor data, for example, to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes, and performing efficient and scalable parallel algorithms to compute the low-rank approximation. We present a software package called Parallel Low-rank Approximation with Nonnegativity Constraints, which implements our solution and allows for extension in terms of data (dense or sparse, matrices or tensors of any order), algorithm (e.g., from multiplicative updating techniques to alternating direction method of multipliers), and architecture (we exploit GPUs to accelerate the computation in this work). We describe our parallel distributions and algorithms, which are careful to avoid unnecessary communication and computation, show how to extend the software to include new algorithms and/or constraints, and report efficiency and scalability results for both synthetic and real-world data sets.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"83 1","pages":"1 - 37"},"PeriodicalIF":0.0,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80972369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ACM Transactions on Mathematical Software (TOMS)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀