Pub Date : 2023-03-21DOI: https://dl.acm.org/doi/10.1145/3582492
Mathias Anselmann, Markus Bause
We present a parallelized geometric multigrid (GMG) method, based on the cell-based Vanka smoother, for higher order space-time finite element methods (STFEM) to the incompressible Navier–Stokes equations. The STFEM is implemented as a time marching scheme. The GMG solver is applied as a preconditioner for generalized minimal residual iterations. Its performance properties are demonstrated for 2D and 3D benchmarks of flow around a cylinder. The key ingredients of the GMG approach are the construction of the local Vanka smoother over all degrees of freedom in time of the respective subinterval and its efficient application. For this, data structures that store pre-computed cell inverses of the Jacobian for all hierarchical levels and require only a reasonable amount of memory overhead are generated. The GMG method is built for the deal.II finite element library. The concepts are flexible and can be transferred to similar software platforms.
{"title":"A Geometric Multigrid Method for Space-Time Finite Element Discretizations of the Navier–Stokes Equations and its Application to 3D Flow Simulation","authors":"Mathias Anselmann, Markus Bause","doi":"https://dl.acm.org/doi/10.1145/3582492","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3582492","url":null,"abstract":"<p>We present a parallelized geometric multigrid (GMG) method, based on the cell-based Vanka smoother, for higher order space-time finite element methods (STFEM) to the incompressible Navier–Stokes equations. The STFEM is implemented as a time marching scheme. The GMG solver is applied as a preconditioner for generalized minimal residual iterations. Its performance properties are demonstrated for 2D and 3D benchmarks of flow around a cylinder. The key ingredients of the GMG approach are the construction of the local Vanka smoother over all degrees of freedom in time of the respective subinterval and its efficient application. For this, data structures that store pre-computed cell inverses of the Jacobian for all hierarchical levels and require only a reasonable amount of memory overhead are generated. The GMG method is built for the <i>deal.II</i> finite element library. The concepts are flexible and can be transferred to similar software platforms.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"70 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-21DOI: https://dl.acm.org/doi/10.1145/3573383
Gregorio Quintana-Ortí, Fernando Hernando, Francisco D. Igual
The minimum distance of a linear code is a key concept in information theory. Therefore, the time required by its computation is very important to many problems in this area. In this article, we introduce a family of implementations of the Brouwer–Zimmermann algorithm for distributed-memory architectures for computing the minimum distance of a random linear code over 𝔽2. Both current commercial and public-domain software only work on either unicore architectures or shared-memory architectures, which are limited in the number of cores/processors employed in the computation. Our implementations focus on distributed-memory architectures, thus being able to employ hundreds or even thousands of cores in the computation of the minimum distance. Our experimental results show that our implementations are much faster, even up to several orders of magnitude, than current implementations widely used nowadays.
{"title":"Algorithm 1033: Parallel Implementations for Computing the Minimum Distance of a Random Linear Code on Distributed-memory Architectures","authors":"Gregorio Quintana-Ortí, Fernando Hernando, Francisco D. Igual","doi":"https://dl.acm.org/doi/10.1145/3573383","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3573383","url":null,"abstract":"<p>The minimum distance of a linear code is a key concept in information theory. Therefore, the time required by its computation is very important to many problems in this area. In this article, we introduce a family of implementations of the Brouwer–Zimmermann algorithm for distributed-memory architectures for computing the minimum distance of a random linear code over 𝔽<sub>2</sub>. Both current commercial and public-domain software only work on either unicore architectures or shared-memory architectures, which are limited in the number of cores/processors employed in the computation. Our implementations focus on distributed-memory architectures, thus being able to employ hundreds or even thousands of cores in the computation of the minimum distance. Our experimental results show that our implementations are much faster, even up to several orders of magnitude, than current implementations widely used nowadays.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"35 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-21DOI: https://dl.acm.org/doi/10.1145/3570158
Jörg Peters, Kyle Lo, Kȩstutis Karčiauskas
For control nets outlining a large class of topological polyhedra, not just tensor-product grids, bi-cubic polyhedral splines form a piecewise polynomial, first-order differentiable space that associates one function with each vertex. Akin to tensor-product splines, the resulting smooth surface approximates the polyhedron. Admissible polyhedral control nets consist of quadrilateral faces in a grid-like layout, star-configuration where n ≠ 4 quadrilateral faces join around an interior vertex, n-gon configurations, where 2n quadrilaterals surround an n-gon, polar configurations where a cone of n triangles meeting at a vertex is surrounded by a ribbon of n quadrilaterals, and three types of T-junctions where two quad-strips merge into one.
The bi-cubic pieces of a polyhedral spline have matching derivatives along their break lines, possibly after a known change of variables. The pieces are represented in Bernstein-Bézier form with coefficients depending linearly on the polyhedral control net, so that evaluation, differentiation, integration, moments, and so on, are no more costly than for standard tensor-product splines. Bi-cubic polyhedral splines can be used both to model geometry and for computing functions on the geometry. Although polyhedral splines do not offer nested refinement by refinement of the control net, polyhedral splines support engineering analysis of curved smooth objects. Coarse nets typically suffice since the splines efficiently model curved features. Algorithm 1032 is a C++ library with input-output example pairs and an IGES output choice.
{"title":"Algorithm 1032: Bi-cubic Splines for Polyhedral Control Nets","authors":"Jörg Peters, Kyle Lo, Kȩstutis Karčiauskas","doi":"https://dl.acm.org/doi/10.1145/3570158","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3570158","url":null,"abstract":"<p>For control nets outlining a large class of topological polyhedra, not just tensor-product grids, bi-cubic polyhedral splines form a piecewise polynomial, first-order differentiable space that associates one function with each vertex. Akin to tensor-product splines, the resulting smooth surface approximates the polyhedron. Admissible polyhedral control nets consist of quadrilateral faces in a grid-like layout, star-configuration where <i>n</i> ≠ 4 quadrilateral faces join around an interior vertex, <i>n</i>-gon configurations, where <i>2n</i> quadrilaterals surround an <i>n</i>-gon, polar configurations where a cone of <i>n</i> triangles meeting at a vertex is surrounded by a ribbon of <i>n</i> quadrilaterals, and three types of T-junctions where two quad-strips merge into one. </p><p>The bi-cubic pieces of a polyhedral spline have matching derivatives along their break lines, possibly after a known change of variables. The pieces are represented in Bernstein-Bézier form with coefficients depending linearly on the polyhedral control net, so that evaluation, differentiation, integration, moments, and so on, are no more costly than for standard tensor-product splines. Bi-cubic polyhedral splines can be used both to model geometry and for computing functions on the geometry. Although polyhedral splines do not offer nested refinement by refinement of the control net, polyhedral splines support engineering analysis of curved smooth objects. Coarse nets typically suffice since the splines efficiently model curved features. Algorithm 1032 is a C++ library with input-output example pairs and an IGES output choice.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"65 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ksenia Bestuzheva, Mathieu Besançon, Weikun Chen, Antonia Chmiela, Tim Donkiewicz, Jasper van Doornmalen, L. Eifler, Oliver Gaul, Gerald Gamrath, A. Gleixner, Leona Gottwald, Christoph Graczyk, Katrin Halbig, A. Hoen, Christopher Hojny, R. V. D. Hulst, T. Koch, M. Lübbecke, Stephen J. Maher, Frederic Matter, Erik Mühmer, Benjamin Müller, M. Pfetsch, D. Rehfeldt, Steffan Schlein, Franziska SchlÃŰsser, Felipe Serrano, Y. Shinano, Boro Sofranac, Mark Turner, Stefan Vigerske, Fabian Wegscheider, Philip A. Wellner, Dieter Weninger, Jakob Witzig
The SCIP Optimization Suite provides a collection of software packages for mathematical optimization centered around the constraint integer programming framework SCIP. The focus of this article is on the role of the SCIP Optimization Suite in supporting research. SCIP’s main design principles are discussed, followed by a presentation of the latest performance improvements and developments in version 8.0, which serve both as examples of SCIP’s application as a research tool and as a platform for further developments. Furthermore, this article gives an overview of interfaces to other programming and modeling languages, new features that expand the possibilities for user interaction with the framework, and the latest developments in several extensions built upon SCIP.
{"title":"Enabling Research through the SCIP Optimization Suite 8.0","authors":"Ksenia Bestuzheva, Mathieu Besançon, Weikun Chen, Antonia Chmiela, Tim Donkiewicz, Jasper van Doornmalen, L. Eifler, Oliver Gaul, Gerald Gamrath, A. Gleixner, Leona Gottwald, Christoph Graczyk, Katrin Halbig, A. Hoen, Christopher Hojny, R. V. D. Hulst, T. Koch, M. Lübbecke, Stephen J. Maher, Frederic Matter, Erik Mühmer, Benjamin Müller, M. Pfetsch, D. Rehfeldt, Steffan Schlein, Franziska SchlÃŰsser, Felipe Serrano, Y. Shinano, Boro Sofranac, Mark Turner, Stefan Vigerske, Fabian Wegscheider, Philip A. Wellner, Dieter Weninger, Jakob Witzig","doi":"10.1145/3585516","DOIUrl":"https://doi.org/10.1145/3585516","url":null,"abstract":"The SCIP Optimization Suite provides a collection of software packages for mathematical optimization centered around the constraint integer programming framework SCIP. The focus of this article is on the role of the SCIP Optimization Suite in supporting research. SCIP’s main design principles are discussed, followed by a presentation of the latest performance improvements and developments in version 8.0, which serve both as examples of SCIP’s application as a research tool and as a platform for further developments. Furthermore, this article gives an overview of interfaces to other programming and modeling languages, new features that expand the possibilities for user interaction with the framework, and the latest developments in several extensions built upon SCIP.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 21"},"PeriodicalIF":2.7,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42394524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present ATC, a C++ library for advanced Tucker-based lossy compression of dense multidimensional numerical data in a shared-memory parallel setting, based on the sequentially truncated higher-order singular value decomposition (ST-HOSVD) and bit plane truncation. Several techniques are proposed to improve speed, memory usage, error control and compression rate. First, a hybrid truncation scheme is described which combines Tucker rank truncation and TTHRESH quantization. We derive a novel expression to approximate the error of truncated Tucker decompositions in the case of core and factor perturbations. We parallelize the quantization and encoding scheme and adjust this phase to improve error control. Implementation aspects are described, such as an ST-HOSVD procedure using only a single transposition. We also discuss several usability features of ATC, including the presence of multiple interfaces, extensive data type support, and integrated downsampling of the decompressed data. Numerical results show that ATC maintains state-of-the-art Tucker compression rates while providing average speed-up factors of 2.2 to 3.5 and halving memory usage. Our compressor provides precise error control, deviating only 1.4% from the requested error on average. Finally, ATC often achieves higher compression than non-Tucker-based compressors in the high-error domain.
{"title":"Algorithm 1036: ATC, An Advanced Tucker Compression Library for Multidimensional Data","authors":"Wouter Baert, N. Vannieuwenhoven","doi":"10.1145/3585514","DOIUrl":"https://doi.org/10.1145/3585514","url":null,"abstract":"We present ATC, a C++ library for advanced Tucker-based lossy compression of dense multidimensional numerical data in a shared-memory parallel setting, based on the sequentially truncated higher-order singular value decomposition (ST-HOSVD) and bit plane truncation. Several techniques are proposed to improve speed, memory usage, error control and compression rate. First, a hybrid truncation scheme is described which combines Tucker rank truncation and TTHRESH quantization. We derive a novel expression to approximate the error of truncated Tucker decompositions in the case of core and factor perturbations. We parallelize the quantization and encoding scheme and adjust this phase to improve error control. Implementation aspects are described, such as an ST-HOSVD procedure using only a single transposition. We also discuss several usability features of ATC, including the presence of multiple interfaces, extensive data type support, and integrated downsampling of the decompressed data. Numerical results show that ATC maintains state-of-the-art Tucker compression rates while providing average speed-up factors of 2.2 to 3.5 and halving memory usage. Our compressor provides precise error control, deviating only 1.4% from the requested error on average. Finally, ATC often achieves higher compression than non-Tucker-based compressors in the high-error domain.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 25"},"PeriodicalIF":2.7,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42418864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One can simulate low-precision floating-point arithmetic via software by executing each arithmetic operation in hardware and then rounding the result to the desired number of significant bits. For IEEE-compliant formats, rounding requires only standard mathematical library functions, but handling subnormals, underflow, and overflow demands special attention, and numerical errors can cause mathematically correct formulae to behave incorrectly in finite arithmetic. Moreover, the ensuing implementations are not necessarily efficient, as the library functions these techniques build upon are typically designed to handle a broad range of cases and may not be optimized for the specific needs of rounding algorithms. CPFloat is a C library for simulating low-precision arithmetics. It offers efficient routines for rounding, performing mathematical computations, and querying properties of the simulated low-precision format. The software exploits the bit-level floating-point representation of the format in which the numbers are stored and replaces costly library calls with low-level bit manipulations and integer arithmetic. In numerical experiments, the new techniques bring a considerable speedup (typically one order of magnitude or more) over existing alternatives in C, C++, and MATLAB. To our knowledge, CPFloat is currently the most efficient and complete library for experimenting with custom low-precision floating-point arithmetic.
{"title":"CPFloat: A C Library for Simulating Low-precision Arithmetic","authors":"M. Fasi, M. Mikaitis","doi":"10.1145/3585515","DOIUrl":"https://doi.org/10.1145/3585515","url":null,"abstract":"One can simulate low-precision floating-point arithmetic via software by executing each arithmetic operation in hardware and then rounding the result to the desired number of significant bits. For IEEE-compliant formats, rounding requires only standard mathematical library functions, but handling subnormals, underflow, and overflow demands special attention, and numerical errors can cause mathematically correct formulae to behave incorrectly in finite arithmetic. Moreover, the ensuing implementations are not necessarily efficient, as the library functions these techniques build upon are typically designed to handle a broad range of cases and may not be optimized for the specific needs of rounding algorithms. CPFloat is a C library for simulating low-precision arithmetics. It offers efficient routines for rounding, performing mathematical computations, and querying properties of the simulated low-precision format. The software exploits the bit-level floating-point representation of the format in which the numbers are stored and replaces costly library calls with low-level bit manipulations and integer arithmetic. In numerical experiments, the new techniques bring a considerable speedup (typically one order of magnitude or more) over existing alternatives in C, C++, and MATLAB. To our knowledge, CPFloat is currently the most efficient and complete library for experimenting with custom low-precision floating-point arithmetic.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 32"},"PeriodicalIF":2.7,"publicationDate":"2023-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46068558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Agullo, A. Buttari, A. Guermouche, J. Herrmann, Antoine Jego
Task-based programming models have succeeded in gaining the interest of the high-performance mathematical software community because they relieve part of the burden of developing and implementing distributed-memory parallel algorithms in an efficient and portable way.In increasingly larger, more heterogeneous clusters of computers, these models appear as a way to maintain and enhance more complex algorithms. However, task-based programming models lack the flexibility and the features that are necessary to express in an elegant and compact way scalable algorithms that rely on advanced communication patterns. We show that the Sequential Task Flow paradigm can be extended to write compact yet efficient and scalable routines for linear algebra computations. Although, this work focuses on dense General Matrix Multiplication, the proposed features enable the implementation of more complex algorithms. We describe the implementation of these features and of the resulting GEMM operation. Finally, we present an experimental analysis on two homogeneous supercomputers showing that our approach is competitive up to 32,768 CPU cores with state-of-the-art libraries and may outperform them for some problem dimensions. Although our code can use GPUs straightforwardly, we do not deal with this case because it implies other issues which are out of the scope of this work.
{"title":"Task-based Parallel Programming for Scalable Matrix Product Algorithms","authors":"E. Agullo, A. Buttari, A. Guermouche, J. Herrmann, Antoine Jego","doi":"10.1145/3583560","DOIUrl":"https://doi.org/10.1145/3583560","url":null,"abstract":"Task-based programming models have succeeded in gaining the interest of the high-performance mathematical software community because they relieve part of the burden of developing and implementing distributed-memory parallel algorithms in an efficient and portable way.In increasingly larger, more heterogeneous clusters of computers, these models appear as a way to maintain and enhance more complex algorithms. However, task-based programming models lack the flexibility and the features that are necessary to express in an elegant and compact way scalable algorithms that rely on advanced communication patterns. We show that the Sequential Task Flow paradigm can be extended to write compact yet efficient and scalable routines for linear algebra computations. Although, this work focuses on dense General Matrix Multiplication, the proposed features enable the implementation of more complex algorithms. We describe the implementation of these features and of the resulting GEMM operation. Finally, we present an experimental analysis on two homogeneous supercomputers showing that our approach is competitive up to 32,768 CPU cores with state-of-the-art libraries and may outperform them for some problem dimensions. Although our code can use GPUs straightforwardly, we do not deal with this case because it implies other issues which are out of the scope of this work.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 23"},"PeriodicalIF":2.7,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48323698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nestor Demeure, C. Chevalier, C. Denis, P. Dossantos-Uzarralde
Floating-point numbers represent only a subset of real numbers. As such, floating-point arithmetic introduces approximations that can compound and have a significant impact on numerical simulations. We introduce Encapsulated error, a new way to estimate the numerical error of an application and provide a reference implementation, the Shaman library. Our method uses dedicated arithmetic over a type that encapsulates both the result the user would have had with the original computation and an approximation of its numerical error. We thus can measure the number of significant digits of any result or intermediate result in a simulation. We show that this approach, while simple, gives results competitive with state of the art methods. It has a smaller overhead, and it is compatible with parallelism, making it suitable for the study of large scale applications.
{"title":"Algorithm xxx: Encapsulated error, a direct approach to evaluate floating-point accuracy","authors":"Nestor Demeure, C. Chevalier, C. Denis, P. Dossantos-Uzarralde","doi":"10.1145/3549205","DOIUrl":"https://doi.org/10.1145/3549205","url":null,"abstract":"Floating-point numbers represent only a subset of real numbers. As such, floating-point arithmetic introduces approximations that can compound and have a significant impact on numerical simulations. We introduce Encapsulated error, a new way to estimate the numerical error of an application and provide a reference implementation, the Shaman library. Our method uses dedicated arithmetic over a type that encapsulates both the result the user would have had with the original computation and an approximation of its numerical error. We thus can measure the number of significant digits of any result or intermediate result in a simulation. We show that this approach, while simple, gives results competitive with state of the art methods. It has a smaller overhead, and it is compatible with parallelism, making it suitable for the study of large scale applications.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42088619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-17DOI: https://dl.acm.org/doi/10.1145/3549205
Nestor Demeure, Cédric Chevalier, Christophe Denis, Pierre Dossantos-Uzarralde
Floating-point numbers represent only a subset of real numbers. As such, floating-point arithmetic introduces approximations that can compound and have a significant impact on numerical simulations. We introduce Encapsulated error, a new way to estimate the numerical error of an application and provide a reference implementation, the Shaman library. Our method uses dedicated arithmetic over a type that encapsulates both the result the user would have had with the original computation and an approximation of its numerical error. We thus can measure the number of significant digits of any result or intermediate result in a simulation. We show that this approach, while simple, gives results competitive with state of the art methods. It has a smaller overhead, and it is compatible with parallelism, making it suitable for the study of large scale applications.
{"title":"Algorithm xxx: Encapsulated error, a direct approach to evaluate floating-point accuracy","authors":"Nestor Demeure, Cédric Chevalier, Christophe Denis, Pierre Dossantos-Uzarralde","doi":"https://dl.acm.org/doi/10.1145/3549205","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3549205","url":null,"abstract":"<p>Floating-point numbers represent only a subset of real numbers. As such, floating-point arithmetic introduces approximations that can compound and have a significant impact on numerical simulations. We introduce Encapsulated error, a new way to estimate the numerical error of an application and provide a reference implementation, the Shaman library. Our method uses dedicated arithmetic over a type that encapsulates both the result the user would have had with the original computation and an approximation of its numerical error. We thus can measure the number of significant digits of any result or intermediate result in a simulation. We show that this approach, while simple, gives results competitive with state of the art methods. It has a smaller overhead, and it is compatible with parallelism, making it suitable for the study of large scale applications.</p>","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"5 ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138505929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Amestoy, A. Buttari, N. Higham, J. L’Excellent, Théo Mary, Bastien Vieublé
The standard LU factorization-based solution process for linear systems can be enhanced in speed or accuracy by employing mixed-precision iterative refinement. Most recent work has focused on dense systems. We investigate the potential of mixed-precision iterative refinement to enhance methods for sparse systems based on approximate sparse factorizations. In doing so, we first develop a new error analysis for LU- and GMRES-based iterative refinement under a general model of LU factorization that accounts for the approximation methods typically used by modern sparse solvers, such as low-rank approximations or relaxed pivoting strategies. We then provide a detailed performance analysis of both the execution time and memory consumption of different algorithms, based on a selected set of iterative refinement variants and approximate sparse factorizations. Our performance study uses the multifrontal solver MUMPS, which can exploit block low-rank factorization and static pivoting. We evaluate the performance of the algorithms on large, sparse problems coming from a variety of real-life and industrial applications showing that mixed-precision iterative refinement combined with approximate sparse factorization can lead to considerable reductions of both the time and memory consumption.
{"title":"Combining Sparse Approximate Factorizations with Mixed-precision Iterative Refinement","authors":"P. Amestoy, A. Buttari, N. Higham, J. L’Excellent, Théo Mary, Bastien Vieublé","doi":"10.1145/3582493","DOIUrl":"https://doi.org/10.1145/3582493","url":null,"abstract":"The standard LU factorization-based solution process for linear systems can be enhanced in speed or accuracy by employing mixed-precision iterative refinement. Most recent work has focused on dense systems. We investigate the potential of mixed-precision iterative refinement to enhance methods for sparse systems based on approximate sparse factorizations. In doing so, we first develop a new error analysis for LU- and GMRES-based iterative refinement under a general model of LU factorization that accounts for the approximation methods typically used by modern sparse solvers, such as low-rank approximations or relaxed pivoting strategies. We then provide a detailed performance analysis of both the execution time and memory consumption of different algorithms, based on a selected set of iterative refinement variants and approximate sparse factorizations. Our performance study uses the multifrontal solver MUMPS, which can exploit block low-rank factorization and static pivoting. We evaluate the performance of the algorithms on large, sparse problems coming from a variety of real-life and industrial applications showing that mixed-precision iterative refinement combined with approximate sparse factorization can lead to considerable reductions of both the time and memory consumption.","PeriodicalId":50935,"journal":{"name":"ACM Transactions on Mathematical Software","volume":"49 1","pages":"1 - 29"},"PeriodicalIF":2.7,"publicationDate":"2023-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43937560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}