ACM Transactions on Mathematical Software最新文献

Algorithm xxx: A Covariate-Dependent Approach to Gaussian Graphical Modeling in R 算法 xxx：用 R 进行高斯图形建模的协变量依赖方法

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2024-04-30 DOI: 10.1145/3659206

Jacob Helwig, Sutanoy Dasgupta, Peng Zhao, Bani K. Mallick, Debdeep Pati

Graphical models are used to capture complex multivariate relationships and have applications in diverse disciplines such as in biology, physics, and economics. Within this field, Gaussian graphical models aim to identify the pairs of variables whose dependence is maintained even after conditioning on the remaining variables in the data, known as the conditional dependence structure of the data. There are many existing software packages for Gaussian graphical modeling, however, they often make restrictive assumptions that reduce their flexibility for modeling data that are not identically distributed. Conversely, covdepGE is a R implementation of a variational weighted pseudo-likelihood algorithm for modeling the conditional dependence structure as a continuous function of an extraneous covariate. To build on the efficiency of this algorithm, covdepGE leverages parallelism and C++ integration with R. Additionally, covdepGE provides fully-automated and data-driven hyperparameter specification while maintaining flexibility for the user to decide key components of the estimation procedure. Through an extensive simulation study spanning diverse settings, covdepGE is demonstrated to be top of its class in recovering the ground-truth conditional dependence structure while efficiently managing computational overhead.

图形模型用于捕捉复杂的多元关系，在生物学、物理学和经济学等不同学科中都有应用。在这一领域中，高斯图形模型旨在识别即使在对数据中的其余变量进行条件化处理后，其依赖关系仍然保持不变的变量对，即数据的条件依赖结构。现有许多高斯图形建模软件包，但它们通常会做出限制性假设，从而降低了对非同分布数据建模的灵活性。相反，covdepGE 是一种变分加权伪似然法算法的 R 实现，用于将条件依赖结构建模为无关协变量的连续函数。为了提高该算法的效率，covdepGE 利用并行性和 C++ 与 R 的集成。此外，covdepGE 还提供全自动和数据驱动的超参数规格，同时保持用户决定估计过程关键部分的灵活性。通过对各种环境的广泛模拟研究，covdepGE 在恢复地面真实条件依赖结构方面被证明是同类产品中的佼佼者，同时还能有效管理计算开销。

{"title":"Algorithm xxx: A Covariate-Dependent Approach to Gaussian Graphical Modeling in R","authors":"Jacob Helwig, Sutanoy Dasgupta, Peng Zhao, Bani K. Mallick, Debdeep Pati","doi":"10.1145/3659206","DOIUrl":"https://doi.org/10.1145/3659206","url":null,"abstract":"Graphical models are used to capture complex multivariate relationships and have applications in diverse disciplines such as in biology, physics, and economics. Within this field, Gaussian graphical models aim to identify the pairs of variables whose dependence is maintained even after conditioning on the remaining variables in the data, known as the conditional dependence structure of the data. There are many existing software packages for Gaussian graphical modeling, however, they often make restrictive assumptions that reduce their flexibility for modeling data that are not identically distributed. Conversely, <monospace>covdepGE</monospace> is a R implementation of a variational weighted pseudo-likelihood algorithm for modeling the conditional dependence structure as a continuous function of an extraneous covariate. To build on the efficiency of this algorithm, <monospace>covdepGE</monospace> leverages parallelism and C++ integration with R. Additionally, <monospace>covdepGE</monospace> provides fully-automated and data-driven hyperparameter specification while maintaining flexibility for the user to decide key components of the estimation procedure. Through an extensive simulation study spanning diverse settings, <monospace>covdepGE</monospace> is demonstrated to be top of its class in recovering the ground-truth conditional dependence structure while efficiently managing computational overhead.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"98 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140835555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Remark on Algorithm 1012: Computing projections with large data sets 关于算法 1012 的备注：利用大型数据集计算投影

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2024-04-22 DOI: 10.1145/3656581

Tyler H. Chang, Layne T. Watson, Sven Leyffer, Thomas C. H. Lux, Hussain M. J. Almohri

In ACM TOMS Algorithm 1012, the DELAUNAYSPARSE software is given for performing Delaunay interpolation in medium to high dimensions. When extrapolating outside the convex hull of the training set, DELAUNAYSPARSE calls the nonnegative least squares solver DWNNLS to compute projections onto the convex hull. However, DWNNLS and many other available sum of squares optimization solvers were not intended for usage with many variable problems, which result from the large training sets that are typical in machine learning applications. Thus, a new PROJECT subroutine is given, based on the highly customizable quadratic program solver BQPD. This solution is shown to be as robust as DELAUNAYSPARSE for projection onto both synthetic and real-world data sets, where other available solvers frequently fail. Although it is intended as an update for DELAUNAYSPARSE, due to the difficulty and prevalence of the problem, this solution is likely to be of external interest as well.

在 ACM TOMS Algorithm 1012 中，DELAUNAYSPARSE 软件用于执行中高维度的德劳内插值。当外推法超出训练集凸壳时，DELAUNAYSPARSE 会调用非负最小二乘法求解器 DWNNLS 计算凸壳上的投影。然而，DWNNLS 和许多其他可用的平方和优化求解器并不适合用于处理多变量问题，而多变量问题是机器学习应用中典型的大型训练集的结果。因此，基于高度可定制的二次方程式程序求解器 BQPD，给出了一个新的 PROJECT 子程序。在投影到合成数据集和真实世界数据集时，该解决方案与 DELAUNAYSPARSE 一样稳健，而其他可用的求解器却经常失败。尽管该方案旨在作为 DELAUNAYSPARSE 的升级版，但由于该问题的难度和普遍性，该方案可能也会引起外部兴趣。

{"title":"Remark on Algorithm 1012: Computing projections with large data sets","authors":"Tyler H. Chang, Layne T. Watson, Sven Leyffer, Thomas C. H. Lux, Hussain M. J. Almohri","doi":"10.1145/3656581","DOIUrl":"https://doi.org/10.1145/3656581","url":null,"abstract":"In ACM TOMS Algorithm 1012, the <monospace>DELAUNAYSPARSE</monospace> software is given for performing Delaunay interpolation in medium to high dimensions. When extrapolating outside the convex hull of the training set, <monospace>DELAUNAYSPARSE</monospace> calls the nonnegative least squares solver <monospace>DWNNLS</monospace> to compute projections onto the convex hull. However, <monospace>DWNNLS</monospace> and many other available sum of squares optimization solvers were not intended for usage with many variable problems, which result from the large training sets that are typical in machine learning applications. Thus, a new <monospace>PROJECT</monospace> subroutine is given, based on the highly customizable quadratic program solver <monospace>BQPD</monospace>. This solution is shown to be as robust as <monospace>DELAUNAYSPARSE</monospace> for projection onto both synthetic and real-world data sets, where other available solvers frequently fail. Although it is intended as an update for <monospace>DELAUNAYSPARSE</monospace>, due to the difficulty and prevalence of the problem, this solution is likely to be of external interest as well.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"24 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140634395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PyOED: An Extensible Suite for Data Assimilation and Model-Constrained Optimal Design of Experiments PyOED：用于数据同化和模型约束优化实验设计的可扩展套件

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2024-03-20 DOI: 10.1145/3653071

Abhijit Chowdhary, Shady E. Ahmed, Ahmed Attia

This paper describes PyOED, a highly extensible scientific package that enables developing and testing model-constrained optimal experimental design (OED) for inverse problems. Specifically, PyOED aims to be a comprehensive Python toolkit for model-constrained OED. The package targets scientists and researchers interested in understanding the details of OED formulations and approaches. It is also meant to enable researchers to experiment with standard and innovative OED technologies with a wide range of test problems (e.g., simulation models). OED, inverse problems (e.g., Bayesian inversion), and data assimilation (DA) are closely related research fields, and their formulations overlap significantly. Thus, PyOED is continuously being expanded with a plethora of Bayesian inversion, DA, and OED methods as well as new scientific simulation models, observation error models, and observation operators. These pieces are added such that they can be permuted to enable testing OED methods in various settings of varying complexities. The PyOED core is completely written in Python and utilizes the inherent object-oriented capabilities; however, the current version of PyOED is meant to be extensible rather than scalable. Specifically, PyOED is developed to “enable rapid development and benchmarking of OED methods with minimal coding effort and to maximize code reutilization.” This paper provides a brief description of the PyOED layout and philosophy and provides a set of exemplary test cases and tutorials to demonstrate the potential of the package.

本文介绍了 PyOED，这是一个高度可扩展的科学软件包，可用于开发和测试逆问题的模型约束优化实验设计（OED）。具体来说，PyOED 的目标是成为模型约束最优实验设计（OED）的综合性 Python 工具包。该工具包面向有兴趣了解 OED 公式和方法细节的科学家和研究人员。它还旨在使研究人员能够利用各种测试问题（如仿真模型）尝试标准和创新的 OED 技术。OED、反演问题（如贝叶斯反演）和数据同化（DA）是密切相关的研究领域，它们的公式有很大的重叠。因此，PyOED 正在通过大量的贝叶斯反演、DA 和 OED 方法以及新的科学模拟模型、观测误差模型和观测算子不断扩展。添加的这些组件可以进行排列组合，以便在各种复杂环境中测试 OED 方法。PyOED 的核心完全用 Python 编写，并利用了 Python 本身面向对象的能力；不过，PyOED 当前版本的目的是可扩展，而不是可升级。具体来说，PyOED 的开发目的是 "以最小的编码工作量实现 OED 方法的快速开发和基准测试，并最大限度地提高代码的再利用率"。本文简要介绍了 PyOED 的布局和理念，并提供了一组示例测试用例和教程，以展示该软件包的潜力。

{"title":"PyOED: An Extensible Suite for Data Assimilation and Model-Constrained Optimal Design of Experiments","authors":"Abhijit Chowdhary, Shady E. Ahmed, Ahmed Attia","doi":"10.1145/3653071","DOIUrl":"https://doi.org/10.1145/3653071","url":null,"abstract":"This paper describes PyOED, a highly extensible scientific package that enables developing and testing model-constrained optimal experimental design (OED) for inverse problems. Specifically, PyOED aims to be a comprehensive Python toolkit for model-constrained OED. The package targets scientists and researchers interested in understanding the details of OED formulations and approaches. It is also meant to enable researchers to experiment with standard and innovative OED technologies with a wide range of test problems (e.g., simulation models). OED, inverse problems (e.g., Bayesian inversion), and data assimilation (DA) are closely related research fields, and their formulations overlap significantly. Thus, PyOED is continuously being expanded with a plethora of Bayesian inversion, DA, and OED methods as well as new scientific simulation models, observation error models, and observation operators. These pieces are added such that they can be permuted to enable testing OED methods in various settings of varying complexities. The PyOED core is completely written in Python and utilizes the inherent object-oriented capabilities; however, the current version of PyOED is meant to be extensible rather than scalable. Specifically, PyOED is developed to “enable rapid development and benchmarking of OED methods with minimal coding effort and to maximize code reutilization.” This paper provides a brief description of the PyOED layout and philosophy and provides a set of exemplary test cases and tutorials to demonstrate the potential of the package.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"144 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140199067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Avoiding breakdown in incomplete factorizations in low precision arithmetic 避免低精度算术中的不完全因式分解

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2024-03-12 DOI: 10.1145/3651155

Jennifer Scott, Miroslav Tůma

The emergence of low precision floating-point arithmetic in computer hardware has led to a resurgence of interest in the use of mixed precision numerical linear algebra. For linear systems of equations, there has been renewed enthusiasm for mixed precision variants of iterative refinement. We consider the iterative solution of large sparse systems using incomplete factorization preconditioners. The focus is on the robust computation of such preconditioners in half precision arithmetic and employing them to solve symmetric positive definite systems to higher precision accuracy; however, the proposed ideas can be applied more generally. Even for well-conditioned problems, incomplete factorizations can break down when small entries occur on the diagonal during the factorization. When using half precision arithmetic, overflows are an additional possible source of breakdown. We examine how breakdowns can be avoided and we implement our strategies within new half precision Fortran sparse incomplete Cholesky factorization software. Results are reported for a range of problems from practical applications. These demonstrate that, even for highly ill-conditioned problems, half precision preconditioners can potentially replace double precision preconditioners, although unsurprisingly this may be at the cost of additional iterations of a Krylov solver.

计算机硬件中低精度浮点运算的出现，使人们对使用混合精度数值线性代数的兴趣再次高涨。对于线性方程组，人们对混合精度迭代精化变体的热情再次高涨。我们考虑使用不完全因式分解预处理器对大型稀疏系统进行迭代求解。重点是在半精度算术中稳健计算这种预处理，并利用它们求解精度更高的对称正定系统；然而，所提出的想法可以更广泛地应用。即使是条件良好的问题，在因式分解过程中，如果对角线上出现小条目，不完全因式分解也会崩溃。在使用半精度运算时，溢出也可能成为分解的另一个来源。我们研究了如何避免崩溃，并在新的半精度 Fortran 稀疏不完全 Cholesky 因式分解软件中实施了我们的策略。我们报告了一系列实际应用问题的结果。这些结果表明，即使对于条件极差的问题，半精度预处理器也有可能取代双精度预处理器，不过不出所料的是，这可能要以 Krylov 求解器的额外迭代为代价。

{"title":"Avoiding breakdown in incomplete factorizations in low precision arithmetic","authors":"Jennifer Scott, Miroslav Tůma","doi":"10.1145/3651155","DOIUrl":"https://doi.org/10.1145/3651155","url":null,"abstract":"The emergence of low precision floating-point arithmetic in computer hardware has led to a resurgence of interest in the use of mixed precision numerical linear algebra. For linear systems of equations, there has been renewed enthusiasm for mixed precision variants of iterative refinement. We consider the iterative solution of large sparse systems using incomplete factorization preconditioners. The focus is on the robust computation of such preconditioners in half precision arithmetic and employing them to solve symmetric positive definite systems to higher precision accuracy; however, the proposed ideas can be applied more generally. Even for well-conditioned problems, incomplete factorizations can break down when small entries occur on the diagonal during the factorization. When using half precision arithmetic, overflows are an additional possible source of breakdown. We examine how breakdowns can be avoided and we implement our strategies within new half precision Fortran sparse incomplete Cholesky factorization software. Results are reported for a range of problems from practical applications. These demonstrate that, even for highly ill-conditioned problems, half precision preconditioners can potentially replace double precision preconditioners, although unsurprisingly this may be at the cost of additional iterations of a Krylov solver.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"133 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140129291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Algorithm xxx: PyGenStability, a multiscale community detection with generalized Markov Stability 算法 xxx：PyGenStability：利用广义马尔可夫稳定性的多尺度群落检测算法

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2024-03-11 DOI: 10.1145/3651225

Alexis Arnaudon, Dominik J. Schindler, Robert L. Peach, Adam Gosztolai, Maxwell Hodges, Michael T. Schaub, Mauricio Barahona

We present PyGenStability, a general-use Python software package that provides a suite of analysis and visualisation tools for unsupervised multiscale community detection in graphs. PyGenStability finds optimized partitions of a graph at different levels of resolution by maximizing the generalized Markov Stability quality function with the Louvain or Leiden algorithms. The package includes automatic detection of robust graph partitions and allows the flexibility to choose quality functions for weighted undirected, directed and signed graphs, and to include other user-defined quality functions.

我们介绍的 PyGenStability 是一款通用 Python 软件包，它为图中无监督多尺度群落检测提供了一套分析和可视化工具。PyGenStability 采用卢万算法或莱顿算法，通过最大化广义马尔可夫稳定性质量函数，在不同分辨率水平上对图进行优化分区。该软件包包括自动检测稳健图分区，并允许灵活选择加权无向图、有向图和有符号图的质量函数，以及其他用户定义的质量函数。

引用次数: 0

Algorithm XXX: Sparse Precision Matrix Estimation With SQUIC 算法 XXX：利用 SQUIC 进行稀疏精度矩阵估计

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2024-03-05 DOI: 10.1145/3650108

Aryan Eftekhari, Lisa Gaedke-Merzhäuser, Dimosthenis Pasadakis, Matthias Bollhöfer, Simon Scheidegger, Olaf Schenk

We present SQUIC, a fast and scalable package for sparse precision matrix estimation. The algorithm employs a second-order method to solve the (ell_{1})-regularized maximum likelihood problem, utilizing highly optimized linear algebra subroutines. In comparative tests using synthetic datasets, we demonstrate that SQUIC not only scales to datasets of up to a million random variables but also consistently delivers run times that are significantly faster than other well-established sparse precision matrix estimation packages. Furthermore, we showcase the application of the introduced package in classifying microarray gene expressions. We demonstrate that by utilizing a matrix form of the tuning parameter (also known as the regularization parameter), SQUIC can effectively incorporate prior information into the estimation procedure, resulting in improved application results with minimal computational overhead.

我们介绍的 SQUIC 是一个用于稀疏精度矩阵估计的快速、可扩展的软件包。该算法采用二阶方法来解决 (ell_{1})-regularized 最大似然问题，并利用高度优化的线性代数子程序。在使用合成数据集进行的对比测试中，我们证明 SQUIC 不仅可以扩展到多达一百万个随机变量的数据集，而且其运行时间始终比其他成熟的稀疏精度矩阵估计软件包快得多。此外，我们还展示了引入的软件包在微阵列基因表达分类中的应用。我们证明，通过利用矩阵形式的调整参数（也称为正则化参数），SQUIC 可以有效地将先验信息纳入估计过程，从而以最小的计算开销获得更好的应用结果。

引用次数: 0

Optimal Re-Materialization Strategies for Heterogeneous Chains: How to Train Deep Neural Networks with Limited Memory 异构链的最佳再物化策略：如何利用有限的内存训练深度神经网络

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2024-03-05 DOI: 10.1145/3648633

Olivier Beaumont, Lionel Eyraud-Dubois, Julien Herrmann, Alexis Joly, Alena Shilova

Training in Feed Forward Deep Neural Networks is a memory-intensive operation which is usually performed on GPUs with limited memory capacities. This may force data scientists to limit the depth of the models or the resolution of the input data if data does not fit in the GPU memory. The re-materialization technique, whose idea comes from the checkpointing strategies developed in the Automatic Differentiation literature, allows data scientists to limit the memory requirements related to the storage of intermediate data (activations), at the cost of an increase in the computational cost.

This paper introduces a new strategy of re-materialization of activations that significantly reduces memory usage. It consists in selecting which activations are saved and which activations are deleted during the forward phase, and then recomputing the deleted activations when they are needed during the backward phase.

We propose an original computation model that combines two types of activation savings: either only storing the layer inputs, or recording the complete history of operations that produced the outputs. This paper focuses on the fully heterogeneous case, where the computation time and the memory requirement of each layer is different. We prove that finding the optimal solution is NP-hard and that classical techniques from Automatic Differentiation literature do not apply. Moreover, the classical assumption of memory persistence of materialized activations, used to simplify the search of optimal solutions, does not hold anymore. Thus, we propose a weak memory persistence property and provide a Dynamic Program to compute the optimal sequence of computations.

This algorithm is made available through the Rotor software, a PyTorch plug-in dealing with any network consisting of a sequence of layers, each of them having an arbitrarily complex structure. Through extensive experiments, we show that our implementation consistently outperforms existing re-materialization approaches for a large class of networks, image sizes and batch sizes.

前馈深度神经网络的训练是一项内存密集型操作，通常在内存容量有限的 GPU 上进行。如果数据不适合 GPU 内存，数据科学家可能不得不限制模型的深度或输入数据的分辨率。再物化技术的理念来自自动微分文献中开发的检查点策略，它允许数据科学家以增加计算成本为代价，限制与中间数据（激活）存储相关的内存需求。它包括在前向阶段选择保存哪些激活，删除哪些激活，然后在后向阶段需要时重新计算被删除的激活。我们提出了一种新颖的计算模型，它结合了两种激活保存方式：或只保存层输入，或记录产生输出的完整操作历史。本文的重点是完全异构的情况，即每个层的计算时间和内存需求都不同。我们证明，找到最优解是 NP 难的，自动微分文献中的经典技术并不适用。此外，用于简化最优解搜索的物化激活记忆持久性经典假设也不再成立。因此，我们提出了一个弱记忆持久性属性，并提供了一个动态程序来计算最优计算序列。该算法可通过 Rotor 软件实现，它是一个 PyTorch 插件，可处理任何由层级序列组成的网络，每个层级都具有任意复杂的结构。通过广泛的实验，我们发现，在大量网络、图像大小和批量大小的情况下，我们的实现始终优于现有的再物化方法。

{"title":"Optimal Re-Materialization Strategies for Heterogeneous Chains: How to Train Deep Neural Networks with Limited Memory","authors":"Olivier Beaumont, Lionel Eyraud-Dubois, Julien Herrmann, Alexis Joly, Alena Shilova","doi":"10.1145/3648633","DOIUrl":"https://doi.org/10.1145/3648633","url":null,"abstract":"Training in Feed Forward Deep Neural Networks is a memory-intensive operation which is usually performed on GPUs with limited memory capacities. This may force data scientists to limit the depth of the models or the resolution of the input data if data does not fit in the GPU memory. The re-materialization technique, whose idea comes from the checkpointing strategies developed in the Automatic Differentiation literature, allows data scientists to limit the memory requirements related to the storage of intermediate data (activations), at the cost of an increase in the computational cost.This paper introduces a new strategy of re-materialization of activations that significantly reduces memory usage. It consists in selecting which activations are saved and which activations are deleted during the forward phase, and then recomputing the deleted activations when they are needed during the backward phase.We propose an original computation model that combines two types of activation savings: either only storing the layer inputs, or recording the complete history of operations that produced the outputs. This paper focuses on the fully heterogeneous case, where the computation time and the memory requirement of each layer is different. We prove that finding the optimal solution is NP-hard and that classical techniques from Automatic Differentiation literature do not apply. Moreover, the classical assumption of memory persistence of materialized activations, used to simplify the search of optimal solutions, does not hold anymore. Thus, we propose a weak memory persistence property and provide a Dynamic Program to compute the optimal sequence of computations.This algorithm is made available through the Rotor software, a PyTorch plug-in dealing with any network consisting of a sequence of layers, each of them having an arbitrarily complex structure. Through extensive experiments, we show that our implementation consistently outperforms existing re-materialization approaches for a large class of networks, image sizes and batch sizes.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"44 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140044861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hermitian Dynamic Mode Decomposition - numerical analysis and software solution 赫米特动态模式分解 - 数值分析和软件解决方案

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2024-01-26 DOI: 10.1145/3641884

Zlatko Drmač

The Dynamic Mode Decomposition (DMD) is a versatile and increasingly popular method for data driven analysis of dynamical systems that arise in a variety of applications in e.g. computational fluid dynamics, robotics or machine learning. In the framework of numerical linear algebra, it is a data driven Rayleigh-Ritz procedure applied to a DMD matrix that is derived from the supplied data. In some applications, the physics of the underlying problem implies hermiticity of the DMD matrix, so the general DMD procedure is not computationally optimal. Furthermore, it does not guarantee important structural properties of the Hermitian eigenvalue problem and may return non-physical solutions. This paper proposes a software solution to the Hermitian (including the real symmetric) DMD matrices, accompanied with a numerical analysis that contains several fine and instructive numerical details. The eigenpairs are computed together with their residuals, and perturbation theory provides error bounds for the eigenvalues and eigenvectors. The software is developed and tested using the LAPACK package.

动态模态分解（DMD）是一种通用且日益流行的方法，用于对计算流体动力学、机器人学或机器学习等各种应用中出现的动态系统进行数据驱动分析。在数值线性代数的框架内，它是一种数据驱动的 Rayleigh-Ritz 程序，适用于根据所提供的数据导出的 DMD 矩阵。在某些应用中，基本问题的物理特性意味着 DMD 矩阵的隐蔽性，因此一般的 DMD 程序在计算上并不是最优的。此外，它不能保证赫米特特征值问题的重要结构特性，并可能返回非物理解。本文提出了赫米蒂（包括实对称）DMD 矩阵的软件解决方案，并附带了数值分析，其中包含一些精细而有启发性的数值细节。该软件计算了特征对及其残差，并通过扰动理论给出了特征值和特征向量的误差范围。该软件使用 LAPACK 软件包进行开发和测试。

引用次数: 0

A LAPACK implementation of the Dynamic Mode Decomposition 动态模式分解的 LAPACK 实现

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2024-01-19 DOI: 10.1145/3640012

Zlatko Drmač

The Dynamic Mode Decomposition (DMD) is a method for computational analysis of nonlinear dynamical systems in data driven scenarios. Based on high fidelity numerical simulations or experimental data, the DMD can be used to reveal the latent structures in the dynamics or as a forecasting or a model order reduction tool. The theoretical underpinning of the DMD is the Koopman operator on a Hilbert space of observables of the dynamics under study. This paper describes a numerically robust and versatile variant of the DMD and its implementation using the state of the art dense numerical linear algebra software package LAPACK. The features of the proposed software solution include residual bounds for the computed eigenpairs of the DMD matrix, eigenvectors refinements and computation of the eigenvectors of the Exact DMD, compressed DMD for efficient analysis of high dimensional problems that can be easily adapted for fast updates in a streaming DMD. Numerical analysis is the bedrock of numerical robustness and reliability of the software, that is tested following the highest standards and practices of LAPACK. Important numerical topics are discussed in detail and illustrated using numerous numerical examples.

动态模式分解（DMD）是一种在数据驱动情况下对非线性动力系统进行计算分析的方法。基于高保真数值模拟或实验数据，DMD 可用于揭示动力学中的潜在结构，或作为预测或模型阶次缩减工具。DMD 的理论基础是所研究动态观测值的希尔伯特空间上的库普曼算子。本文介绍了 DMD 在数值上的稳健性和通用性，以及使用最先进的密集数值线性代数软件包 LAPACK 对其进行的实现。该软件解决方案的特点包括 DMD 矩阵特征对计算的残差边界、特征向量细化和精确 DMD 的特征向量计算、用于高效分析高维问题的压缩 DMD（可轻松适应流式 DMD 的快速更新）。数值分析是该软件数值稳健性和可靠性的基石，其测试遵循 LAPACK 的最高标准和惯例。软件详细讨论了重要的数值主题，并使用大量数值示例进行了说明。

{"title":"A LAPACK implementation of the Dynamic Mode Decomposition","authors":"Zlatko Drmač","doi":"10.1145/3640012","DOIUrl":"https://doi.org/10.1145/3640012","url":null,"abstract":"The Dynamic Mode Decomposition (DMD) is a method for computational analysis of nonlinear dynamical systems in data driven scenarios. Based on high fidelity numerical simulations or experimental data, the DMD can be used to reveal the latent structures in the dynamics or as a forecasting or a model order reduction tool. The theoretical underpinning of the DMD is the Koopman operator on a Hilbert space of observables of the dynamics under study. This paper describes a numerically robust and versatile variant of the DMD and its implementation using the state of the art dense numerical linear algebra software package <sans-serif>LAPACK</sans-serif>. The features of the proposed software solution include residual bounds for the computed eigenpairs of the DMD matrix, eigenvectors refinements and computation of the eigenvectors of the Exact DMD, compressed DMD for efficient analysis of high dimensional problems that can be easily adapted for fast updates in a streaming DMD. Numerical analysis is the bedrock of numerical robustness and reliability of the software, that is tested following the highest standards and practices of <sans-serif>LAPACK</sans-serif>. Important numerical topics are discussed in detail and illustrated using numerous numerical examples.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"12 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139499431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Algorithm XXX: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM 算法 XXX：利用 Apache TVM 自动生成一系列矩阵乘法例程

IF 2.7 1区数学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Mathematical Software

Pub Date : 2023-12-26 DOI: 10.1145/3638532

Guillermo Alaejos, Adrián Castelló, Pedro Alonso-Jordá, Francisco D. Igual, Héctor Martínez, Enrique S. Quintana-Ortí

We explore the utilization of the Apache TVM open source framework to automatically generate a family of algorithms that follow the approach taken by popular linear algebra libraries, such as GotoBLAS2, BLIS and OpenBLAS, in order to obtain high-performance blocked formulations of the general matrix multiplication (gemm). In addition, we fully automatize the generation process, by also leveraging the Apache TVM framework to derive a complete variety of the processor-specific micro-kernels for gemm. This is in contrast with the convention in high performance libraries, which hand-encode a single micro-kernel per architecture using Assembly code. In global, the combination of our TVM-generated blocked algorithms and micro-kernels for gemm1) improves portability, maintainability and, globally, streamlines the software life cycle; 2) provides high flexibility to easily tailor and optimize the solution to different data types, processor architectures, and matrix operand shapes, yielding performance on a par (or even superior for specific matrix shapes) with that of hand-tuned libraries; and 3) features a small memory footprint.

我们探索如何利用 Apache TVM 开源框架自动生成一系列算法，这些算法遵循 GotoBLAS2、BLIS 和 OpenBLAS 等流行线性代数库所采用的方法，以获得通用矩阵乘法（gem）的高性能阻塞公式。此外，我们还利用阿帕奇 TVM 框架，为 gemm 衍生出一整套特定于处理器的微内核，使生成过程完全自动化。这与高性能库中使用汇编代码手工编码每个架构的单个微内核的惯例形成鲜明对比。在全球范围内，我们的 TVM 生成的阻塞算法与 gemm 微内核的结合1）提高了可移植性和可维护性，并从整体上简化了软件生命周期；2）提供了高度的灵活性，可以轻松地针对不同的数据类型、处理器架构和矩阵操作数形状定制和优化解决方案，其性能与手工调整的库相当（对于特定的矩阵形状甚至更优）；3）具有内存占用小的特点。

{"title":"Algorithm XXX: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM","authors":"Guillermo Alaejos, Adrián Castelló, Pedro Alonso-Jordá, Francisco D. Igual, Héctor Martínez, Enrique S. Quintana-Ortí","doi":"10.1145/3638532","DOIUrl":"https://doi.org/10.1145/3638532","url":null,"abstract":"We explore the utilization of the Apache TVM open source framework to automatically generate a family of algorithms that follow the approach taken by popular linear algebra libraries, such as GotoBLAS2, BLIS and OpenBLAS, in order to obtain high-performance blocked formulations of the general matrix multiplication (gemm). In addition, we fully automatize the generation process, by also leveraging the Apache TVM framework to derive a complete variety of the processor-specific micro-kernels for gemm. This is in contrast with the convention in high performance libraries, which hand-encode a single micro-kernel per architecture using Assembly code. In global, the combination of our TVM-generated blocked algorithms and micro-kernels for gemm\u00001) improves portability, maintainability and, globally, streamlines the software life cycle; 2) provides high flexibility to easily tailor and optimize the solution to different data types, processor architectures, and matrix operand shapes, yielding performance on a par (or even superior for specific matrix shapes) with that of hand-tuned libraries; and 3) features a small memory footprint.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"66 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139056349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0