M. Fasi, N. Higham, Florent Lopez, Théo Mary, M. Mikaitis
{"title":"Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores","authors":"M. Fasi, N. Higham, Florent Lopez, Théo Mary, M. Mikaitis","doi":"10.1137/21m1465032","DOIUrl":"https://doi.org/10.1137/21m1465032","url":null,"abstract":"","PeriodicalId":21812,"journal":{"name":"SIAM J. Sci. Comput.","volume":"28 1","pages":"1-"},"PeriodicalIF":0.0,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86656379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Preconditioning Sparse Matrices with Alternating and Multiplicative Operator Splittings","authors":"Christopher J. L. Klein, R. Strzodka","doi":"10.1137/21m1430492","DOIUrl":"https://doi.org/10.1137/21m1430492","url":null,"abstract":"","PeriodicalId":21812,"journal":{"name":"SIAM J. Sci. Comput.","volume":"13 1","pages":"25-"},"PeriodicalIF":0.0,"publicationDate":"2023-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81056004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nonlinear advection-diffusion equations often arise in the modeling of transport processes. We propose for these equations a non-overlapping domain decomposition algorithm of Schwarz waveform-relaxation type. It relies on nonlinear zeroth-order (or Robin) transmission conditions between the sub-domains that ensure the continuity of the converged solution and of its normal flux across the interface. We prove existence of unique iterative solutions and the convergence of the algorithm. We then present a numerical discretization for solving the SWR problems using a forward Euler discretization in time and a finite volume method in space, including a local Newton iteration for solving the nonlinear transmission conditions. Our discrete algorithm is asymptotic preserving, i.e. robust in the vanishing viscosity limit. Finally, we present numerical results that confirm the theoretical findings, in particular the convergence of the algorithm. Moreover, we show that the SWR algorithm can be successfully applied to two-phase flow problems in porous media as paradigms for evolution equations with strongly nonlinear advective and diffusive fluxes.
{"title":"Non-Overlapping Schwarz Waveform-Relaxation for Nonlinear Advection-Diffusion Equations","authors":"M. Gander, S. Lunowa, C. Rohde","doi":"10.1137/21m1415005","DOIUrl":"https://doi.org/10.1137/21m1415005","url":null,"abstract":"Nonlinear advection-diffusion equations often arise in the modeling of transport processes. We propose for these equations a non-overlapping domain decomposition algorithm of Schwarz waveform-relaxation type. It relies on nonlinear zeroth-order (or Robin) transmission conditions between the sub-domains that ensure the continuity of the converged solution and of its normal flux across the interface. We prove existence of unique iterative solutions and the convergence of the algorithm. We then present a numerical discretization for solving the SWR problems using a forward Euler discretization in time and a finite volume method in space, including a local Newton iteration for solving the nonlinear transmission conditions. Our discrete algorithm is asymptotic preserving, i.e. robust in the vanishing viscosity limit. Finally, we present numerical results that confirm the theoretical findings, in particular the convergence of the algorithm. Moreover, we show that the SWR algorithm can be successfully applied to two-phase flow problems in porous media as paradigms for evolution equations with strongly nonlinear advective and diffusive fluxes.","PeriodicalId":21812,"journal":{"name":"SIAM J. Sci. Comput.","volume":"221 1","pages":"49-"},"PeriodicalIF":0.0,"publicationDate":"2023-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77107249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Some Observations on the Interaction Between Linear and Nonlinear Stabilization for Continuous Finite Element Methods Applied to Hyperbolic Conservation Laws","authors":"E. Burman","doi":"10.1137/21m1464154","DOIUrl":"https://doi.org/10.1137/21m1464154","url":null,"abstract":"","PeriodicalId":21812,"journal":{"name":"SIAM J. Sci. Comput.","volume":"80 1","pages":"96-"},"PeriodicalIF":0.0,"publicationDate":"2023-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84200280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John E. Augustine, W. Moses, Amanda Redlich, E. Upfal
{"title":"Balanced Allocation: Patience Is Not a Virtue","authors":"John E. Augustine, W. Moses, Amanda Redlich, E. Upfal","doi":"10.1137/17m1155375","DOIUrl":"https://doi.org/10.1137/17m1155375","url":null,"abstract":"","PeriodicalId":21812,"journal":{"name":"SIAM J. Sci. Comput.","volume":"173 1","pages":"1743-1768"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73197660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-07DOI: 10.48550/arXiv.2212.03735
Zhaonan Dong, Lorenzo Mascotto
We prove $hp$-optimal error estimates for interior penalty discontinuous Galerkin methods (IPDG) for the biharmonic problem with homogeneous essential boundary conditions. We consider tensor product-type meshes in two and three dimensions, and triangular meshes in two dimensions. An essential ingredient in the analysis is the construction of a global $H^2$ piecewise polynomial approximants with $hp$-optimal approximation properties over the given meshes. The $hp$-optimality is also discussed for $mathcal C^0$-IPDG in two and three dimensions, and the stream formulation of the Stokes problem in two dimensions. Numerical experiments validate the theoretical predictions and reveal that $p$-suboptimality occurs in presence of singular essential boundary conditions.
{"title":"hp-optimal interior penalty discontinuous Galerkin methods for the biharmonic problem","authors":"Zhaonan Dong, Lorenzo Mascotto","doi":"10.48550/arXiv.2212.03735","DOIUrl":"https://doi.org/10.48550/arXiv.2212.03735","url":null,"abstract":"We prove $hp$-optimal error estimates for interior penalty discontinuous Galerkin methods (IPDG) for the biharmonic problem with homogeneous essential boundary conditions. We consider tensor product-type meshes in two and three dimensions, and triangular meshes in two dimensions. An essential ingredient in the analysis is the construction of a global $H^2$ piecewise polynomial approximants with $hp$-optimal approximation properties over the given meshes. The $hp$-optimality is also discussed for $mathcal C^0$-IPDG in two and three dimensions, and the stream formulation of the Stokes problem in two dimensions. Numerical experiments validate the theoretical predictions and reveal that $p$-suboptimality occurs in presence of singular essential boundary conditions.","PeriodicalId":21812,"journal":{"name":"SIAM J. Sci. Comput.","volume":"3 1","pages":"30"},"PeriodicalIF":0.0,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85680037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-23DOI: 10.48550/arXiv.2211.12953
Sara N. Pollock, L. Rebholz
This work introduces, analyzes and demonstrates an efficient and theoretically sound filtering strategy to ensure the condition of the least-squares problem solved at each iteration of Anderson acceleration. The filtering strategy consists of two steps: the first controls the length disparity between columns of the least-squares matrix, and the second enforces a lower bound on the angles between subspaces spanned by the columns of that matrix. The combined strategy is shown to control the condition number of the least-squares matrix at each iteration. The method is shown to be effective on a range of problems based on discretizations of partial differential equations. It is shown particularly effective for problems where the initial iterate may lie far from the solution, and which progress through distinct preasymptotic and asymptotic phases.
{"title":"Filtering for Anderson acceleration","authors":"Sara N. Pollock, L. Rebholz","doi":"10.48550/arXiv.2211.12953","DOIUrl":"https://doi.org/10.48550/arXiv.2211.12953","url":null,"abstract":"This work introduces, analyzes and demonstrates an efficient and theoretically sound filtering strategy to ensure the condition of the least-squares problem solved at each iteration of Anderson acceleration. The filtering strategy consists of two steps: the first controls the length disparity between columns of the least-squares matrix, and the second enforces a lower bound on the angles between subspaces spanned by the columns of that matrix. The combined strategy is shown to control the condition number of the least-squares matrix at each iteration. The method is shown to be effective on a range of problems based on discretizations of partial differential equations. It is shown particularly effective for problems where the initial iterate may lie far from the solution, and which progress through distinct preasymptotic and asymptotic phases.","PeriodicalId":21812,"journal":{"name":"SIAM J. Sci. Comput.","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85297952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. F. Celiktug, M. O. Karsavuran, Seher Acer, C. Aykanat
Several successful partitioning models and methods have been proposed and used for computational load balancing of irregularly sparse applications in a distributed-memory setting. However, the literature lacks partitioning models and methods that encode both computational and data load balancing. In this article, we try to close this gap in the literature by proposing two hypergraph partitioning (HP) models which simultaneously encode computational and data load balancing. Both models utilize a two-constraint formulation, where the first constraint encodes the computational loads and the second constraint encodes the data loads. In the first model, we introduce explicit data vertices for encoding data load and we replicate those data vertices at each recursive bipartitioning (RB) step for encoding data replication. In the second model, we introduce a data weight distribution scheme for encoding data load and we update those weights at each RB step. The nice property of both proposed models is that they do not necessitate developing a new partitioner from scratch. Both models can easily be implemented by invoking any HP tool that supports multiconstraint partitioning as a two-way partitioner at each RB step. The validity of the proposed models are tested on two widely used irregularly sparse applications: parallel mesh simulations and parallel sparse matrix sparse matrix multiplication. Both proposed models achieve significant improvement over a baseline model.
{"title":"Simultaneous Computational and Data Load Balancing in Distributed-Memory Setting","authors":"M. F. Celiktug, M. O. Karsavuran, Seher Acer, C. Aykanat","doi":"10.1137/22m1485772","DOIUrl":"https://doi.org/10.1137/22m1485772","url":null,"abstract":"Several successful partitioning models and methods have been proposed and used for computational load balancing of irregularly sparse applications in a distributed-memory setting. However, the literature lacks partitioning models and methods that encode both computational and data load balancing. In this article, we try to close this gap in the literature by proposing two hypergraph partitioning (HP) models which simultaneously encode computational and data load balancing. Both models utilize a two-constraint formulation, where the first constraint encodes the computational loads and the second constraint encodes the data loads. In the first model, we introduce explicit data vertices for encoding data load and we replicate those data vertices at each recursive bipartitioning (RB) step for encoding data replication. In the second model, we introduce a data weight distribution scheme for encoding data load and we update those weights at each RB step. The nice property of both proposed models is that they do not necessitate developing a new partitioner from scratch. Both models can easily be implemented by invoking any HP tool that supports multiconstraint partitioning as a two-way partitioner at each RB step. The validity of the proposed models are tested on two widely used irregularly sparse applications: parallel mesh simulations and parallel sparse matrix sparse matrix multiplication. Both proposed models achieve significant improvement over a baseline model.","PeriodicalId":21812,"journal":{"name":"SIAM J. Sci. Comput.","volume":"71 1","pages":"399-"},"PeriodicalIF":0.0,"publicationDate":"2022-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79582755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}