Enrico Facca, Gabriele Todeschi, Andrea Natale, Michele Benzi
SIAM Journal on Scientific Computing, Volume 46, Issue 3, Page A1397-A1422, June 2024. Abstract. In this paper, we address the numerical solution of the quadratic optimal transport problem in its dynamical form, the so-called Benamou-Brenier formulation. When solved using interior point methods, the main computational bottleneck is the solution of large saddle point linear systems arising from the associated Newton-Raphson scheme. The main purpose of this paper is to design efficient preconditioners to solve these linear systems via iterative methods. Among the proposed preconditioners, we introduce one based on the partial commutation of the operators that compose the dual Schur complement of these saddle point linear systems, which we refer to as the [math]-preconditioner. A series of numerical tests show that the [math]-preconditioner is the most efficient among those presented, despite a performance deterioration in the last steps of the interior point method. It is in fact the only one having a CPU time that scales only slightly worse than linearly with respect to the number of unknowns used to discretize the problem.
{"title":"Efficient Preconditioners for Solving Dynamical Optimal Transport via Interior Point Methods","authors":"Enrico Facca, Gabriele Todeschi, Andrea Natale, Michele Benzi","doi":"10.1137/23m1570430","DOIUrl":"https://doi.org/10.1137/23m1570430","url":null,"abstract":"SIAM Journal on Scientific Computing, Volume 46, Issue 3, Page A1397-A1422, June 2024. <br/> Abstract. In this paper, we address the numerical solution of the quadratic optimal transport problem in its dynamical form, the so-called Benamou-Brenier formulation. When solved using interior point methods, the main computational bottleneck is the solution of large saddle point linear systems arising from the associated Newton-Raphson scheme. The main purpose of this paper is to design efficient preconditioners to solve these linear systems via iterative methods. Among the proposed preconditioners, we introduce one based on the partial commutation of the operators that compose the dual Schur complement of these saddle point linear systems, which we refer to as the [math]-preconditioner. A series of numerical tests show that the [math]-preconditioner is the most efficient among those presented, despite a performance deterioration in the last steps of the interior point method. It is in fact the only one having a CPU time that scales only slightly worse than linearly with respect to the number of unknowns used to discretize the problem.","PeriodicalId":49526,"journal":{"name":"SIAM Journal on Scientific Computing","volume":"47 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140830093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Budiša, Xiaozhe Hu, Miroslav Kuchta, Kent-Andre Mardal, Ludmil Zikatanov
SIAM Journal on Scientific Computing, Volume 46, Issue 3, Page A1461-A1486, June 2024. Abstract. We develop multilevel methods for interface-driven multiphysics problems that can be coupled across dimensions and where complexity and strength of the interface coupling deteriorates the performance of standard methods. We focus on aggregation-based algebraic multigrid methods with custom smoothers that preserve the coupling information on each coarse level. We prove that, with the proper choice of subspace splitting, we obtain uniform convergence in discretization and physical parameters in the two-level setting. Additionally, we show parameter robustness and scalability with regard to the number of the degrees of freedom of the system on several numerical examples related to the biophysical processes in the brain, namely, the electric signaling in excitable tissue modeled by bidomain, the extracellular-membrane-intracellular (EMI) model, and reduced EMI equations. Reproducibility of computational results. This paper has been awarded the “SIAM Reproducibility Badge: Code and data available” as a recognition that the authors have followed reproducibility principles valued by SISC and the scientific computing community. Code and data that allow readers to reproduce the results in this paper are available at https://github.com/anabudisa/metric-amg-examples and in the supplementary materials (metric-amg-examples-master.zip [30KB]).
{"title":"Algebraic Multigrid Methods for Metric-Perturbed Coupled Problems","authors":"Ana Budiša, Xiaozhe Hu, Miroslav Kuchta, Kent-Andre Mardal, Ludmil Zikatanov","doi":"10.1137/23m1572076","DOIUrl":"https://doi.org/10.1137/23m1572076","url":null,"abstract":"SIAM Journal on Scientific Computing, Volume 46, Issue 3, Page A1461-A1486, June 2024. <br/>Abstract. We develop multilevel methods for interface-driven multiphysics problems that can be coupled across dimensions and where complexity and strength of the interface coupling deteriorates the performance of standard methods. We focus on aggregation-based algebraic multigrid methods with custom smoothers that preserve the coupling information on each coarse level. We prove that, with the proper choice of subspace splitting, we obtain uniform convergence in discretization and physical parameters in the two-level setting. Additionally, we show parameter robustness and scalability with regard to the number of the degrees of freedom of the system on several numerical examples related to the biophysical processes in the brain, namely, the electric signaling in excitable tissue modeled by bidomain, the extracellular-membrane-intracellular (EMI) model, and reduced EMI equations. Reproducibility of computational results. This paper has been awarded the “SIAM Reproducibility Badge: Code and data available” as a recognition that the authors have followed reproducibility principles valued by SISC and the scientific computing community. Code and data that allow readers to reproduce the results in this paper are available at https://github.com/anabudisa/metric-amg-examples and in the supplementary materials (metric-amg-examples-master.zip [30KB]).","PeriodicalId":49526,"journal":{"name":"SIAM Journal on Scientific Computing","volume":"20 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Scientific Computing, Volume 46, Issue 3, Page B205-B228, June 2024. Abstract. This work introduces a parallel and rank-adaptive matrix integrator for dynamical low-rank approximation. The method is related to the previously proposed rank-adaptive basis update and Galerkin (BUG) integrator but differs significantly in that all arising differential equations, both for the basis and the Galerkin coefficients, are solved in parallel. Moreover, this approach eliminates the need for a potentially costly coefficient update with augmented basis matrices. The integrator also incorporates a new step rejection strategy that enhances the robustness of both the parallel integrator and the BUG integrator. By construction, the parallel integrator inherits the robust error bound of the BUG and projector-splitting integrators. Comparisons of the parallel and BUG integrators are presented by a series of numerical experiments which demonstrate the efficiency of the proposed method, for problems from radiative transfer and radiation therapy.
{"title":"A Parallel Rank-Adaptive Integrator for Dynamical Low-Rank Approximation","authors":"Gianluca Ceruti, Jonas Kusch, Christian Lubich","doi":"10.1137/23m1565103","DOIUrl":"https://doi.org/10.1137/23m1565103","url":null,"abstract":"SIAM Journal on Scientific Computing, Volume 46, Issue 3, Page B205-B228, June 2024. <br/> Abstract. This work introduces a parallel and rank-adaptive matrix integrator for dynamical low-rank approximation. The method is related to the previously proposed rank-adaptive basis update and Galerkin (BUG) integrator but differs significantly in that all arising differential equations, both for the basis and the Galerkin coefficients, are solved in parallel. Moreover, this approach eliminates the need for a potentially costly coefficient update with augmented basis matrices. The integrator also incorporates a new step rejection strategy that enhances the robustness of both the parallel integrator and the BUG integrator. By construction, the parallel integrator inherits the robust error bound of the BUG and projector-splitting integrators. Comparisons of the parallel and BUG integrators are presented by a series of numerical experiments which demonstrate the efficiency of the proposed method, for problems from radiative transfer and radiation therapy.","PeriodicalId":49526,"journal":{"name":"SIAM Journal on Scientific Computing","volume":"11 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Scientific Computing, Volume 46, Issue 3, Page A1423-A1440, June 2024. Abstract. A modification of Newton’s method for solving systems of [math] nonlinear equations is presented. The new matrix-free method is exact as opposed to a range of inexact Newton methods in the sense that both the Jacobians and the solutions to the linear Newton systems are computed without truncation. It relies on a given decomposition of a structurally dense invertible Jacobian of the residual into a product of [math] structurally sparse invertible elemental Jacobians according to the chain rule of differentiation. Inspired by the adjoint mode of algorithmic differentiation, explicit accumulation of the Jacobian of the residual is avoided. Prospective, generally applicable implementations of the new method can be based on similar ideas. Sparsity is exploited for the direct solution of the linear Newton systems. Optimal exploitation of sparsity yields various well-known computationally intractable combinatorial optimization problems in sparse linear algebra such as Bandwidth or Directed Elimination Ordering. The method is motivated in the context of a decomposition into elemental Jacobians with bandwidth [math] for [math]. In the likely scenario of [math], the computational cost of the standard Newton algorithm is dominated by the cost of accumulating the Jacobian of the residual. It can be estimated as [math], thus exceeding the cost of [math] for the direct solution of the linear Newton system. The new method reduces this cost to [math], yielding a potential improvement by a factor of [math]. Supporting run time measurements are presented for the tridiagonal case showing a reduction of the computational cost by [math]. Generalization yields the combinatorial Matrix-Free Exact Newton Step problem. We prove NP-completeness, and we present algorithmic components for building methods for the approximate solution. Potential applications of the matrix-free exact Newton method in machine learning of surrogates for computationally expensive nonlinear residuals are touched on briefly as part of various conclusions to be drawn.
{"title":"A Matrix-Free Exact Newton Method","authors":"Uwe Naumann","doi":"10.1137/23m157017x","DOIUrl":"https://doi.org/10.1137/23m157017x","url":null,"abstract":"SIAM Journal on Scientific Computing, Volume 46, Issue 3, Page A1423-A1440, June 2024. <br/> Abstract. A modification of Newton’s method for solving systems of [math] nonlinear equations is presented. The new matrix-free method is exact as opposed to a range of inexact Newton methods in the sense that both the Jacobians and the solutions to the linear Newton systems are computed without truncation. It relies on a given decomposition of a structurally dense invertible Jacobian of the residual into a product of [math] structurally sparse invertible elemental Jacobians according to the chain rule of differentiation. Inspired by the adjoint mode of algorithmic differentiation, explicit accumulation of the Jacobian of the residual is avoided. Prospective, generally applicable implementations of the new method can be based on similar ideas. Sparsity is exploited for the direct solution of the linear Newton systems. Optimal exploitation of sparsity yields various well-known computationally intractable combinatorial optimization problems in sparse linear algebra such as Bandwidth or Directed Elimination Ordering. The method is motivated in the context of a decomposition into elemental Jacobians with bandwidth [math] for [math]. In the likely scenario of [math], the computational cost of the standard Newton algorithm is dominated by the cost of accumulating the Jacobian of the residual. It can be estimated as [math], thus exceeding the cost of [math] for the direct solution of the linear Newton system. The new method reduces this cost to [math], yielding a potential improvement by a factor of [math]. Supporting run time measurements are presented for the tridiagonal case showing a reduction of the computational cost by [math]. Generalization yields the combinatorial Matrix-Free Exact Newton Step problem. We prove NP-completeness, and we present algorithmic components for building methods for the approximate solution. Potential applications of the matrix-free exact Newton method in machine learning of surrogates for computationally expensive nonlinear residuals are touched on briefly as part of various conclusions to be drawn.","PeriodicalId":49526,"journal":{"name":"SIAM Journal on Scientific Computing","volume":"52 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahsan Ali, James J. Brannick, Karsten Kahl, Oliver A. Krzysik, Jacob B. Schroder, Ben S. Southworth
SIAM Journal on Scientific Computing, Ahead of Print. Abstract. This paper focuses on developing a reduction-based algebraic multigrid (AMG) method that is suitable for solving general (non)symmetric linear systems and is naturally robust from pure advection to pure diffusion. Initial motivation comes from a new reduction-based AMG approach, [math] (local approximate ideal restriction), that was developed for solving advection-dominated problems. Though this new solver is very effective in the advection-dominated regime, its performance degrades in cases where diffusion becomes dominant. This is consistent with the fact that in general, reduction-based AMG methods tend to suffer from growth in complexity and/or convergence rates as the problem size is increased, especially for diffusion-dominated problems in two or three dimensions. Motivated by the success of [math] in the advective regime, our aim in this paper is to generalize the AIR framework with the goal of improving the performance of the solver in diffusion-dominated regimes. To do so, we propose a novel way to combine mode constraints as used commonly in energy-minimization AMG methods with the local approximation of ideal operators used in [math]. The resulting constrained [math] algorithm is able to achieve fast scalable convergence on advective and diffusive problems. In addition, it is able to achieve standard low complexity hierarchies in the diffusive regime through aggressive coarsening, something that was previously difficult for reduction-based methods.
{"title":"Constrained Local Approximate Ideal Restriction for Advection-Diffusion Problems","authors":"Ahsan Ali, James J. Brannick, Karsten Kahl, Oliver A. Krzysik, Jacob B. Schroder, Ben S. Southworth","doi":"10.1137/23m1583442","DOIUrl":"https://doi.org/10.1137/23m1583442","url":null,"abstract":"SIAM Journal on Scientific Computing, Ahead of Print. <br/> Abstract. This paper focuses on developing a reduction-based algebraic multigrid (AMG) method that is suitable for solving general (non)symmetric linear systems and is naturally robust from pure advection to pure diffusion. Initial motivation comes from a new reduction-based AMG approach, [math] (local approximate ideal restriction), that was developed for solving advection-dominated problems. Though this new solver is very effective in the advection-dominated regime, its performance degrades in cases where diffusion becomes dominant. This is consistent with the fact that in general, reduction-based AMG methods tend to suffer from growth in complexity and/or convergence rates as the problem size is increased, especially for diffusion-dominated problems in two or three dimensions. Motivated by the success of [math] in the advective regime, our aim in this paper is to generalize the AIR framework with the goal of improving the performance of the solver in diffusion-dominated regimes. To do so, we propose a novel way to combine mode constraints as used commonly in energy-minimization AMG methods with the local approximation of ideal operators used in [math]. The resulting constrained [math] algorithm is able to achieve fast scalable convergence on advective and diffusive problems. In addition, it is able to achieve standard low complexity hierarchies in the diffusive regime through aggressive coarsening, something that was previously difficult for reduction-based methods.","PeriodicalId":49526,"journal":{"name":"SIAM Journal on Scientific Computing","volume":"17 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Konstantin Althaus, Iason Papaioannou, Elisabeth Ullmann
SIAM Journal on Scientific Computing, Volume 46, Issue 3, Page A1487-A1513, June 2024. Abstract. In this paper, we introduce a new algorithm for rare event estimation based on adaptive importance sampling. We consider a smoothed version of the optimal importance sampling density, which is approximated by an ensemble of interacting particles. The particle dynamics is governed by a McKean–Vlasov stochastic differential equation, which was introduced and analyzed in [Carrillo et al., Stud. Appl. Math., 148 (2022), pp. 1069–1140] for consensus-based sampling and optimization of posterior distributions arising in the context of Bayesian inverse problems. We develop automatic updates for the internal parameters of our algorithm. This includes a novel time step size controller for the exponential Euler method, which discretizes the particle dynamics. The behavior of all parameter updates depends on easy to interpret accuracy criteria specified by the user. We show in numerical experiments that our method is competitive to state-of-the-art adaptive importance sampling algorithms for rare event estimation, namely a sequential importance sampling method and the ensemble Kalman filter for rare event estimation. Reproducibility of computational results. This paper has been awarded the “SIAM Reproducibility Badge: Code and data available” as a recognition that the authors have followed reproducibility principles valued by SISC and the scientific computing community. Code and data that allow readers to reproduce the results in this paper are available at https://github.com/AlthausKonstantin/rareeventestimation/tree/master/docs/figures_paper and in the supplementary materials (rareeventestimation-0.3.0.zip [9.66MB]).
{"title":"Consensus-Based Rare Event Estimation","authors":"Konstantin Althaus, Iason Papaioannou, Elisabeth Ullmann","doi":"10.1137/23m1565966","DOIUrl":"https://doi.org/10.1137/23m1565966","url":null,"abstract":"SIAM Journal on Scientific Computing, Volume 46, Issue 3, Page A1487-A1513, June 2024. <br/> Abstract. In this paper, we introduce a new algorithm for rare event estimation based on adaptive importance sampling. We consider a smoothed version of the optimal importance sampling density, which is approximated by an ensemble of interacting particles. The particle dynamics is governed by a McKean–Vlasov stochastic differential equation, which was introduced and analyzed in [Carrillo et al., Stud. Appl. Math., 148 (2022), pp. 1069–1140] for consensus-based sampling and optimization of posterior distributions arising in the context of Bayesian inverse problems. We develop automatic updates for the internal parameters of our algorithm. This includes a novel time step size controller for the exponential Euler method, which discretizes the particle dynamics. The behavior of all parameter updates depends on easy to interpret accuracy criteria specified by the user. We show in numerical experiments that our method is competitive to state-of-the-art adaptive importance sampling algorithms for rare event estimation, namely a sequential importance sampling method and the ensemble Kalman filter for rare event estimation. Reproducibility of computational results. This paper has been awarded the “SIAM Reproducibility Badge: Code and data available” as a recognition that the authors have followed reproducibility principles valued by SISC and the scientific computing community. Code and data that allow readers to reproduce the results in this paper are available at https://github.com/AlthausKonstantin/rareeventestimation/tree/master/docs/figures_paper and in the supplementary materials (rareeventestimation-0.3.0.zip [9.66MB]).","PeriodicalId":49526,"journal":{"name":"SIAM Journal on Scientific Computing","volume":"7 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Scientific Computing, Volume 46, Issue 3, Page B159-B178, June 2024. Abstract. Efficient algorithms based on the fast Fourier transform are developed for computing linear convolutions. A hybrid approach is described that combines the conventional practice of explicit dealiasing (explicitly padding the input data with zeros) and implicit dealiasing (mathematically accounting for these zero values). The new approach generalizes implicit dealiasing to arbitrary padding ratios and includes explicit dealiasing as a special case. Unlike existing implementations of implicit dealiasing, hybrid dealiasing tailors its subtransform sizes to the convolution geometry. Multidimensional convolutions are implemented with hybrid dealiasing by decomposing them into lower-dimensional convolutions. Convolutions of complex-valued and Hermitian inputs of equal length are illustrated with pseudocode and implemented in the open-source FFTW++ library. Hybrid dealiasing is shown to outperform explicit dealiasing in one, two, and three dimensions. Reproducibility of computational results. This paper has been awarded the “SIAM Reproducibility Badge: Code and Data Available” as a recognition that the authors have followed reproducibility principles valued by SISC and the scientific computing community. Code and data that allow readers to reproduce the results in this paper are available from https://github.com/dealias/fftwpp and in the supplementary materials.
{"title":"Hybrid Dealiasing of Complex Convolutions","authors":"Noel Murasko, John C. Bowman","doi":"10.1137/23m1552073","DOIUrl":"https://doi.org/10.1137/23m1552073","url":null,"abstract":"SIAM Journal on Scientific Computing, Volume 46, Issue 3, Page B159-B178, June 2024. <br/> Abstract. Efficient algorithms based on the fast Fourier transform are developed for computing linear convolutions. A hybrid approach is described that combines the conventional practice of explicit dealiasing (explicitly padding the input data with zeros) and implicit dealiasing (mathematically accounting for these zero values). The new approach generalizes implicit dealiasing to arbitrary padding ratios and includes explicit dealiasing as a special case. Unlike existing implementations of implicit dealiasing, hybrid dealiasing tailors its subtransform sizes to the convolution geometry. Multidimensional convolutions are implemented with hybrid dealiasing by decomposing them into lower-dimensional convolutions. Convolutions of complex-valued and Hermitian inputs of equal length are illustrated with pseudocode and implemented in the open-source FFTW++ library. Hybrid dealiasing is shown to outperform explicit dealiasing in one, two, and three dimensions. Reproducibility of computational results. This paper has been awarded the “SIAM Reproducibility Badge: Code and Data Available” as a recognition that the authors have followed reproducibility principles valued by SISC and the scientific computing community. Code and data that allow readers to reproduce the results in this paper are available from https://github.com/dealias/fftwpp and in the supplementary materials.","PeriodicalId":49526,"journal":{"name":"SIAM Journal on Scientific Computing","volume":"13 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Scientific Computing, Volume 46, Issue 3, Page B179-B204, June 2024. Abstract. This work describes the development of matrix-free GPU-accelerated solvers for high-order finite element problems in [math]. The solvers are applicable to grad-div and Darcy problems in saddle-point formulation, and have applications in radiation diffusion and porous media flow problems, among others. Using the interpolation–histopolation basis (cf. [W. Pazner, T. Kolev, and C. R. Dohrmann, SIAM J. Sci. Comput., 45 (2023), pp. A675–A702]), efficient matrix-free preconditioners can be constructed for the [math]-block and Schur complement of the block system. With these approximations, block-preconditioned MINRES converges in a number of iterations that is independent of the mesh size and polynomial degree. The approximate Schur complement takes the form of an M-matrix graph Laplacian and therefore can be well-preconditioned by highly scalable algebraic multigrid methods. High-performance GPU-accelerated algorithms for all components of the solution algorithm are developed, discussed, and benchmarked. Numerical results are presented on a number of challenging test cases, including the “crooked pipe” grad-div problem, the SPE10 reservoir modeling benchmark problem, and a nonlinear radiation diffusion test case.
SIAM 科学计算期刊》,第 46 卷第 3 期,第 B179-B204 页,2024 年 6 月。 摘要这项工作描述了针对[math]中的高阶有限元问题开发的无矩阵 GPU 加速求解器。这些求解器适用于鞍点公式中的梯度二维和达西问题,并可应用于辐射扩散和多孔介质流动问题等。利用插值-组配基础(参见 [W. Pazner, T. Kolev.Pazner, T. Kolev, and C. R. Dohrmann, SIAM J. Sci. Comput., 45 (2023), pp.利用这些近似值,块预处理 MINRES 可以在与网格大小和多项式度无关的迭代次数内收敛。近似舒尔补数采用 M 矩阵图拉普拉奇的形式,因此可以通过高度可扩展的代数多网格方法进行良好预处理。针对求解算法的所有组成部分,我们开发了高性能 GPU 加速算法,并对其进行了讨论和基准测试。在一些具有挑战性的测试案例中展示了数值结果,包括 "弯曲管道 "梯度计算问题、SPE10 储层建模基准问题和非线性辐射扩散测试案例。
{"title":"Matrix-Free High-Performance Saddle-Point Solvers for High-Order Problems in [math]","authors":"Will Pazner, Tzanio Kolev, Panayot S. Vassilevski","doi":"10.1137/23m1568806","DOIUrl":"https://doi.org/10.1137/23m1568806","url":null,"abstract":"SIAM Journal on Scientific Computing, Volume 46, Issue 3, Page B179-B204, June 2024. <br/> Abstract. This work describes the development of matrix-free GPU-accelerated solvers for high-order finite element problems in [math]. The solvers are applicable to grad-div and Darcy problems in saddle-point formulation, and have applications in radiation diffusion and porous media flow problems, among others. Using the interpolation–histopolation basis (cf. [W. Pazner, T. Kolev, and C. R. Dohrmann, SIAM J. Sci. Comput., 45 (2023), pp. A675–A702]), efficient matrix-free preconditioners can be constructed for the [math]-block and Schur complement of the block system. With these approximations, block-preconditioned MINRES converges in a number of iterations that is independent of the mesh size and polynomial degree. The approximate Schur complement takes the form of an M-matrix graph Laplacian and therefore can be well-preconditioned by highly scalable algebraic multigrid methods. High-performance GPU-accelerated algorithms for all components of the solution algorithm are developed, discussed, and benchmarked. Numerical results are presented on a number of challenging test cases, including the “crooked pipe” grad-div problem, the SPE10 reservoir modeling benchmark problem, and a nonlinear radiation diffusion test case.","PeriodicalId":49526,"journal":{"name":"SIAM Journal on Scientific Computing","volume":"59 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140829959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIAM Journal on Scientific Computing, Ahead of Print. Abstract. Earlier work on rounding-error analysis of multigrid was restricted to cycles that used one relaxation step before coarsening and none afterwards. The present paper extends this analysis to two-grid methods that use one relaxation step both before and after coarsening. The analysis is based on floating point arithmetic and focuses on a two-grid scheme that is perturbed on the coarse grid to allow for an approximate coarse-grid solve. Leveraging previously published results, this two-grid theory can then be extended to general [math]-cycles, as well as full multigrid. It can also be extended to mixed-precision iterative refinement based on these cycles. An added benefit of the theory here over previous work is that it is obtained in a more organized, transparent, and simpler way.
{"title":"Rounding-Error Analysis of Multigrid [math]-Cycles","authors":"Stephen F. McCormick, Rasmus Tamstorf","doi":"10.1137/23m1582898","DOIUrl":"https://doi.org/10.1137/23m1582898","url":null,"abstract":"SIAM Journal on Scientific Computing, Ahead of Print. <br/> Abstract. Earlier work on rounding-error analysis of multigrid was restricted to cycles that used one relaxation step before coarsening and none afterwards. The present paper extends this analysis to two-grid methods that use one relaxation step both before and after coarsening. The analysis is based on floating point arithmetic and focuses on a two-grid scheme that is perturbed on the coarse grid to allow for an approximate coarse-grid solve. Leveraging previously published results, this two-grid theory can then be extended to general [math]-cycles, as well as full multigrid. It can also be extended to mixed-precision iterative refinement based on these cycles. An added benefit of the theory here over previous work is that it is obtained in a more organized, transparent, and simpler way.","PeriodicalId":49526,"journal":{"name":"SIAM Journal on Scientific Computing","volume":"122 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140625892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}