Constraint satisfaction or optimisation models -- even if they are formulated in high-level modelling languages -- need to be reduced into an equivalent format before they can be solved by the use of Quantum Computing. In this paper we show how Boolean and integer FlatZinc builtins over finite-domain integer variables can be equivalently reformulated as linear equations, linear inequalities or binary products of those variables, i.e. as finite-domain quadratic integer programs. Those quadratic integer programs can be further transformed into equivalent Quadratic Unconstrained Binary Optimisation problem models, i.e. a general format for optimisation problems to be solved on Quantum Computers especially on Quantum Annealers.
{"title":"Conversion of Boolean and Integer FlatZinc Builtins to Quadratic or Linear Integer Problems","authors":"Armin Wolf","doi":"arxiv-2404.12797","DOIUrl":"https://doi.org/arxiv-2404.12797","url":null,"abstract":"Constraint satisfaction or optimisation models -- even if they are formulated\u0000in high-level modelling languages -- need to be reduced into an equivalent\u0000format before they can be solved by the use of Quantum Computing. In this paper\u0000we show how Boolean and integer FlatZinc builtins over finite-domain integer\u0000variables can be equivalently reformulated as linear equations, linear\u0000inequalities or binary products of those variables, i.e. as finite-domain\u0000quadratic integer programs. Those quadratic integer programs can be further\u0000transformed into equivalent Quadratic Unconstrained Binary Optimisation problem\u0000models, i.e. a general format for optimisation problems to be solved on Quantum\u0000Computers especially on Quantum Annealers.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, we propose an accuracy-assuring technique for finding a solution for unsymmetric linear systems. Such problems are related to different areas such as image processing, computer vision, and computational fluid dynamics. Parallel implementation of Krylov subspace methods speeds up finding approximate solutions for linear systems. In this context, the refined approach in pipelined BiCGStab enhances scalability on distributed memory machines, yielding to substantial speed improvements compared to the standard BiCGStab method. However, it's worth noting that the pipelined BiCGStab algorithm sacrifices some accuracy, which is stabilized with the residual replacement technique. This paper aims to address this issue by employing the ExBLAS-based reproducible approach. We validate the idea on a set of matrices from the SuiteSparse Matrix Collection.
{"title":"Robustness and Accuracy in Pipelined Bi-Conjugate Gradient Stabilized Method: A Comparative Study","authors":"Mykhailo Havdiak, Jose I. Aliaga, Roman Iakymchuk","doi":"arxiv-2404.13216","DOIUrl":"https://doi.org/arxiv-2404.13216","url":null,"abstract":"In this article, we propose an accuracy-assuring technique for finding a\u0000solution for unsymmetric linear systems. Such problems are related to different\u0000areas such as image processing, computer vision, and computational fluid\u0000dynamics. Parallel implementation of Krylov subspace methods speeds up finding\u0000approximate solutions for linear systems. In this context, the refined approach\u0000in pipelined BiCGStab enhances scalability on distributed memory machines,\u0000yielding to substantial speed improvements compared to the standard BiCGStab\u0000method. However, it's worth noting that the pipelined BiCGStab algorithm\u0000sacrifices some accuracy, which is stabilized with the residual replacement\u0000technique. This paper aims to address this issue by employing the ExBLAS-based\u0000reproducible approach. We validate the idea on a set of matrices from the\u0000SuiteSparse Matrix Collection.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"90 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140800879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Kempf, Marius Kurz, Marcel Blind, Patrick Kopper, Philipp Offenhäuser, Anna Schwarz, Spencer Starr, Jens Keim, Andrea Beck
This work presents GAL{AE}XI as a novel, energy-efficient flow solver for the simulation of compressible flows on unstructured meshes leveraging the parallel computing power of modern Graphics Processing Units (GPUs). GAL{AE}XI implements the high-order Discontinuous Galerkin Spectral Element Method (DGSEM) using shock capturing with a finite-volume subcell approach to ensure the stability of the high-order scheme near shocks. This work provides details on the general code design, the parallelization strategy, and the implementation approach for the compute kernels with a focus on the element local mappings between volume and surface data due to the unstructured mesh. GAL{AE}XI exhibits excellent strong scaling properties up to 1024 GPUs if each GPU is assigned a minimum of one million degrees of freedom degrees of freedom. To verify its implementation, a convergence study is performed that recovers the theoretical order of convergence of the implemented numerical schemes. Moreover, the solver is validated using both the incompressible and compressible formulation of the Taylor-Green-Vortex at a Mach number of 0.1 and 1.25, respectively. A mesh convergence study shows that the results converge to the high-fidelity reference solution and that the results match the original CPU implementation. Finally, GAL{AE}XI is applied to a large-scale wall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37. Here, the supersonic region and shocks at the leading edge are captured accurately and robustly by the implemented shock-capturing approach. It is demonstrated that GAL{AE}XI requires less than half of the energy to carry out this simulation in comparison to the reference CPU implementation. This renders GAL{AE}XI as a potent tool for accurate and efficient simulations of compressible flows in the realm of exascale computing and the associated new HPC architectures.
{"title":"GALÆXI: Solving complex compressible flows with high-order discontinuous Galerkin methods on accelerator-based systems","authors":"Daniel Kempf, Marius Kurz, Marcel Blind, Patrick Kopper, Philipp Offenhäuser, Anna Schwarz, Spencer Starr, Jens Keim, Andrea Beck","doi":"arxiv-2404.12703","DOIUrl":"https://doi.org/arxiv-2404.12703","url":null,"abstract":"This work presents GAL{AE}XI as a novel, energy-efficient flow solver for\u0000the simulation of compressible flows on unstructured meshes leveraging the\u0000parallel computing power of modern Graphics Processing Units (GPUs). GAL{AE}XI\u0000implements the high-order Discontinuous Galerkin Spectral Element Method\u0000(DGSEM) using shock capturing with a finite-volume subcell approach to ensure\u0000the stability of the high-order scheme near shocks. This work provides details\u0000on the general code design, the parallelization strategy, and the\u0000implementation approach for the compute kernels with a focus on the element\u0000local mappings between volume and surface data due to the unstructured mesh.\u0000GAL{AE}XI exhibits excellent strong scaling properties up to 1024 GPUs if each\u0000GPU is assigned a minimum of one million degrees of freedom degrees of freedom.\u0000To verify its implementation, a convergence study is performed that recovers\u0000the theoretical order of convergence of the implemented numerical schemes.\u0000Moreover, the solver is validated using both the incompressible and\u0000compressible formulation of the Taylor-Green-Vortex at a Mach number of 0.1 and\u00001.25, respectively. A mesh convergence study shows that the results converge to\u0000the high-fidelity reference solution and that the results match the original\u0000CPU implementation. Finally, GAL{AE}XI is applied to a large-scale\u0000wall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37.\u0000Here, the supersonic region and shocks at the leading edge are captured\u0000accurately and robustly by the implemented shock-capturing approach. It is\u0000demonstrated that GAL{AE}XI requires less than half of the energy to carry out\u0000this simulation in comparison to the reference CPU implementation. This renders\u0000GAL{AE}XI as a potent tool for accurate and efficient simulations of\u0000compressible flows in the realm of exascale computing and the associated new\u0000HPC architectures.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We discuss what is special about the reproducibility of workflows in computer algebra. It is emphasized how the programming language Julia and the new computer algebra system OSCAR support such a reproducibility, and how users can benefit for their own work.
我们讨论了计算代数中工作流程可重现性的特殊之处。我们强调了编程语言 Julia 和新的计算机代数系统 OSCAR 如何支持这种可重现性,以及用户如何从自己的工作中获益。
{"title":"Confirmable Workflows in OSCAR","authors":"Michael Joswig, Lars Kastner, Benjamin Lorenz","doi":"arxiv-2404.06241","DOIUrl":"https://doi.org/arxiv-2404.06241","url":null,"abstract":"We discuss what is special about the reproducibility of workflows in computer\u0000algebra. It is emphasized how the programming language Julia and the new\u0000computer algebra system OSCAR support such a reproducibility, and how users can\u0000benefit for their own work.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stencil codes are performance-critical in many compute-intensive applications, but suffer from significant address calculation and irregular memory access overheads. This work presents SARIS, a general and highly flexible methodology for stencil acceleration using register-mapped indirect streams. We demonstrate SARIS for various stencil codes on an eight-core RISC-V compute cluster with indirect stream registers, achieving significant speedups of 2.72x, near-ideal FPU utilizations of 81%, and energy efficiency improvements of 1.58x over an RV32G baseline on average. Scaling out to a 256-core manycore system, we estimate an average FPU utilization of 64%, an average speedup of 2.14x, and up to 15% higher fractions of peak compute than a leading GPU code generator.
{"title":"SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers","authors":"Paul Scheffler, Luca Colagrande, Luca Benini","doi":"arxiv-2404.05303","DOIUrl":"https://doi.org/arxiv-2404.05303","url":null,"abstract":"Stencil codes are performance-critical in many compute-intensive\u0000applications, but suffer from significant address calculation and irregular\u0000memory access overheads. This work presents SARIS, a general and highly\u0000flexible methodology for stencil acceleration using register-mapped indirect\u0000streams. We demonstrate SARIS for various stencil codes on an eight-core RISC-V\u0000compute cluster with indirect stream registers, achieving significant speedups\u0000of 2.72x, near-ideal FPU utilizations of 81%, and energy efficiency\u0000improvements of 1.58x over an RV32G baseline on average. Scaling out to a\u0000256-core manycore system, we estimate an average FPU utilization of 64%, an\u0000average speedup of 2.14x, and up to 15% higher fractions of peak compute than a\u0000leading GPU code generator.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Walther NeuperJKU - Johannes Kepler Universität Linz
The paper presents the second part of a precise description of the prototype that has been developed in the course of the ISAC project over the last two decades. This part describes the "specify-phase", while the first part describing the "solve-phase" is already published. In the specify-phase a student interactively constructs a formal specification. The ISAC prototype implements formal specifications as established in theoretical computer science, however, the input language for the construction avoids requiring users to have knowledge of logic; this makes the system useful for various engineering faculties (and also for high school). The paper discusses not only ISAC's design of the specify-phase in detail, but also gives a brief introduction to implementation with the aim of advertising the re-use of formal frameworks (inclusive respective front-ends) with their generic tools for language definition and their rich pool of software components for formal mathematics.
{"title":"Interactive Formal Specification for Mathematical Problems of Engineers","authors":"Walther NeuperJKU - Johannes Kepler Universität Linz","doi":"arxiv-2404.05462","DOIUrl":"https://doi.org/arxiv-2404.05462","url":null,"abstract":"The paper presents the second part of a precise description of the prototype\u0000that has been developed in the course of the ISAC project over the last two\u0000decades. This part describes the \"specify-phase\", while the first part\u0000describing the \"solve-phase\" is already published. In the specify-phase a student interactively constructs a formal\u0000specification. The ISAC prototype implements formal specifications as\u0000established in theoretical computer science, however, the input language for\u0000the construction avoids requiring users to have knowledge of logic; this makes\u0000the system useful for various engineering faculties (and also for high school). The paper discusses not only ISAC's design of the specify-phase in detail,\u0000but also gives a brief introduction to implementation with the aim of\u0000advertising the re-use of formal frameworks (inclusive respective front-ends)\u0000with their generic tools for language definition and their rich pool of\u0000software components for formal mathematics.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As part of Mathematical Research Data Initiative (MaRDI), we have developed a way to preserve a software package into an easy to deploy and use sandbox environment we call a "runtime", via a program we developed called MaPS : MaRDI Packaging System. The program relies on Linux user namespaces to isolate a library environment from the host system, making the sandboxed software reproducible on other systems, with minimal effort. Moreover an overlay filesystem makes local edits persistent. This project will aid reproducibility efforts of research papers: both mathematical and from other disciplines. As a proof of concept, we provide runtimes for the OSCAR Computer Algebra System, polymake software for research in polyhedral geometry, and VIBRANT Virus Identification By iteRative ANnoTation. The software is in a prerelease state: the interface for creating, deploying, and executing runtimes is final, and an interface for easily publishing runtimes is under active development. We thus propose publishing predefined, distributable software environment runtimes along with research papers in an effort to make research with software based results reproducible.
作为 "数学研究数据计划"(Mathematical Research Data Initiative,MaRDI)的一部分,我们开发了一个名为 "MaPS:MaRDIPackaging System "的程序,将软件包保存到一个易于部署和使用的沙盒环境中,我们称之为 "运行时"。该程序依靠 Linux 用户命名空间将软件包环境与主机系统隔离开来,从而使沙箱软件可以在其他系统上重现,而且工作量极小。此外,叠加文件系统还能使本地编辑持久化。该项目将有助于数学和其他学科研究论文的可重现性。作为概念验证,我们提供了 OSCAR 计算机代数系统的运行时、用于多面体几何研究的 polymake 软件以及 VIBRANT VirusIdentification By iteRative ANnoTation。该软件目前处于预发布状态:用于创建、部署和执行运行时的界面已经完成,而用于轻松发布运行时的界面正在积极开发中。因此,我们建议将预定义的、可发布的软件环境运行时与研究论文一起发布,努力使基于软件的研究成果具有可重复性。
{"title":"Predefined Software Environment Runtimes As A Measure For Reproducibility","authors":"Aaruni Kaushik","doi":"arxiv-2404.05563","DOIUrl":"https://doi.org/arxiv-2404.05563","url":null,"abstract":"As part of Mathematical Research Data Initiative (MaRDI), we have developed a\u0000way to preserve a software package into an easy to deploy and use sandbox\u0000environment we call a \"runtime\", via a program we developed called MaPS : MaRDI\u0000Packaging System. The program relies on Linux user namespaces to isolate a\u0000library environment from the host system, making the sandboxed software\u0000reproducible on other systems, with minimal effort. Moreover an overlay\u0000filesystem makes local edits persistent. This project will aid reproducibility\u0000efforts of research papers: both mathematical and from other disciplines. As a\u0000proof of concept, we provide runtimes for the OSCAR Computer Algebra System,\u0000polymake software for research in polyhedral geometry, and VIBRANT Virus\u0000Identification By iteRative ANnoTation. The software is in a prerelease state:\u0000the interface for creating, deploying, and executing runtimes is final, and an\u0000interface for easily publishing runtimes is under active development. We thus\u0000propose publishing predefined, distributable software environment runtimes\u0000along with research papers in an effort to make research with software based\u0000results reproducible.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
George Bisbas, Anton Lydike, Emilien Bauer, Nick Brown, Mathieu Fehr, Lawrence Mitchell, Gabriel Rodriguez-Canal, Maurice Jamieson, Paul H. J. Kelly, Michel Steuwer, Tobias Grosser
Domain Specific Languages (DSLs) increase programmer productivity and provide high performance. Their targeted abstractions allow scientists to express problems at a high level, providing rich details that optimizing compilers can exploit to target current- and next-generation supercomputers. The convenience and performance of DSLs come with significant development and maintenance costs. The siloed design of DSL compilers and the resulting inability to benefit from shared infrastructure cause uncertainties around longevity and the adoption of DSLs at scale. By tailoring the broadly-adopted MLIR compiler framework to HPC, we bring the same synergies that the machine learning community already exploits across their DSLs (e.g. Tensorflow, PyTorch) to the finite-difference stencil HPC community. We introduce new HPC-specific abstractions for message passing targeting distributed stencil computations. We demonstrate the sharing of common components across three distinct HPC stencil-DSL compilers: Devito, PSyclone, and the Open Earth Compiler, showing that our framework generates high-performance executables based upon a shared compiler ecosystem.
{"title":"A shared compilation stack for distributed-memory parallelism in stencil DSLs","authors":"George Bisbas, Anton Lydike, Emilien Bauer, Nick Brown, Mathieu Fehr, Lawrence Mitchell, Gabriel Rodriguez-Canal, Maurice Jamieson, Paul H. J. Kelly, Michel Steuwer, Tobias Grosser","doi":"arxiv-2404.02218","DOIUrl":"https://doi.org/arxiv-2404.02218","url":null,"abstract":"Domain Specific Languages (DSLs) increase programmer productivity and provide\u0000high performance. Their targeted abstractions allow scientists to express\u0000problems at a high level, providing rich details that optimizing compilers can\u0000exploit to target current- and next-generation supercomputers. The convenience\u0000and performance of DSLs come with significant development and maintenance\u0000costs. The siloed design of DSL compilers and the resulting inability to\u0000benefit from shared infrastructure cause uncertainties around longevity and the\u0000adoption of DSLs at scale. By tailoring the broadly-adopted MLIR compiler\u0000framework to HPC, we bring the same synergies that the machine learning\u0000community already exploits across their DSLs (e.g. Tensorflow, PyTorch) to the\u0000finite-difference stencil HPC community. We introduce new HPC-specific\u0000abstractions for message passing targeting distributed stencil computations. We\u0000demonstrate the sharing of common components across three distinct HPC\u0000stencil-DSL compilers: Devito, PSyclone, and the Open Earth Compiler, showing\u0000that our framework generates high-performance executables based upon a shared\u0000compiler ecosystem.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas M. Dutton, Christopher Kumar Anand, Robert Enenkel, Silvia Melitta Müller
Floating-point arithmetic performance determines the overall performance of important applications, from graphics to AI. Meeting the IEEE-754 specification for floating-point requires that final results of addition, subtraction, multiplication, division, and square root are correctly rounded based on the user-selected rounding mode. A frustrating fact for implementers is that naive rounding methods will not produce correctly rounded results even when intermediate results with greater accuracy and precision are available. In contrast, our novel algorithm can correct approximations of reciprocal, division and square root, even ones with slightly lower than target precision. In this paper, we present a family of algorithms that can both increase the accuracy (and potentially the precision) of an estimate and correctly round it according to all binary IEEE-754 rounding modes. We explain how it may be efficiently implemented in hardware, and for completeness, we present proofs that it is not necessary to include equality tests associated with round-to-nearest-even mode for reciprocal, division and square root functions, because it is impossible for input(s) in a given precision to have exact answers exactly midway between representable floating-point numbers in that precision. In fact, our simpler proofs are sometimes stronger.
{"title":"Inexactness and Correction of Floating-Point Reciprocal, Division and Square Root","authors":"Lucas M. Dutton, Christopher Kumar Anand, Robert Enenkel, Silvia Melitta Müller","doi":"arxiv-2404.00387","DOIUrl":"https://doi.org/arxiv-2404.00387","url":null,"abstract":"Floating-point arithmetic performance determines the overall performance of\u0000important applications, from graphics to AI. Meeting the IEEE-754 specification\u0000for floating-point requires that final results of addition, subtraction,\u0000multiplication, division, and square root are correctly rounded based on the\u0000user-selected rounding mode. A frustrating fact for implementers is that naive\u0000rounding methods will not produce correctly rounded results even when\u0000intermediate results with greater accuracy and precision are available. In\u0000contrast, our novel algorithm can correct approximations of reciprocal,\u0000division and square root, even ones with slightly lower than target precision.\u0000In this paper, we present a family of algorithms that can both increase the\u0000accuracy (and potentially the precision) of an estimate and correctly round it\u0000according to all binary IEEE-754 rounding modes. We explain how it may be\u0000efficiently implemented in hardware, and for completeness, we present proofs\u0000that it is not necessary to include equality tests associated with\u0000round-to-nearest-even mode for reciprocal, division and square root functions,\u0000because it is impossible for input(s) in a given precision to have exact\u0000answers exactly midway between representable floating-point numbers in that\u0000precision. In fact, our simpler proofs are sometimes stronger.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manifold-valued measurements exist in numerous applications within computer vision and machine learning. Recent studies have extended Deep Neural Networks (DNNs) to manifolds, and concomitantly, normalization techniques have also been adapted to several manifolds, referred to as Riemannian normalization. Nonetheless, most of the existing Riemannian normalization methods have been derived in an ad hoc manner and only apply to specific manifolds. This paper establishes a unified framework for Riemannian Batch Normalization (RBN) techniques on Lie groups. Our framework offers the theoretical guarantee of controlling both the Riemannian mean and variance. Empirically, we focus on Symmetric Positive Definite (SPD) manifolds, which possess three distinct types of Lie group structures. Using the deformation concept, we generalize the existing Lie groups on SPD manifolds into three families of parameterized Lie groups. Specific normalization layers induced by these Lie groups are then proposed for SPD neural networks. We demonstrate the effectiveness of our approach through three sets of experiments: radar recognition, human action recognition, and electroencephalography (EEG) classification. The code is available at https://github.com/GitZH-Chen/LieBN.git.
流形值测量在计算机视觉和机器学习领域应用广泛。最近的研究将深度神经网络(DNN)扩展到了流形,与此同时,归一化技术也适用于多个流形,即黎曼归一化。然而,现有的大多数黎曼归一化方法都是以临时方式衍生出来的,只适用于特定的流形。本文建立了一个统一的黎曼批量归一化(RBN)技术框架。我们的框架为控制黎曼均值和方差提供了理论保证。在经验上,我们将重点放在对称正定(SPD)流形上,它拥有三种不同类型的李群结构。利用变形概念,我们将 SPD 流形上现有的李群概括为三个参数化李群族。然后,我们为 SPD 神经网络提出了由这些李群诱导的特定归一化层。我们通过雷达识别、人类动作识别和脑电图(EEG)分类等三组实验证明了这一方法的有效性。代码见 https://github.com/GitZH-Chen/LieBN.git。
{"title":"A Lie Group Approach to Riemannian Batch Normalization","authors":"Ziheng Chen, Yue Song, Yunmei Liu, Nicu Sebe","doi":"arxiv-2403.11261","DOIUrl":"https://doi.org/arxiv-2403.11261","url":null,"abstract":"Manifold-valued measurements exist in numerous applications within computer\u0000vision and machine learning. Recent studies have extended Deep Neural Networks\u0000(DNNs) to manifolds, and concomitantly, normalization techniques have also been\u0000adapted to several manifolds, referred to as Riemannian normalization.\u0000Nonetheless, most of the existing Riemannian normalization methods have been\u0000derived in an ad hoc manner and only apply to specific manifolds. This paper\u0000establishes a unified framework for Riemannian Batch Normalization (RBN)\u0000techniques on Lie groups. Our framework offers the theoretical guarantee of\u0000controlling both the Riemannian mean and variance. Empirically, we focus on\u0000Symmetric Positive Definite (SPD) manifolds, which possess three distinct types\u0000of Lie group structures. Using the deformation concept, we generalize the\u0000existing Lie groups on SPD manifolds into three families of parameterized Lie\u0000groups. Specific normalization layers induced by these Lie groups are then\u0000proposed for SPD neural networks. We demonstrate the effectiveness of our\u0000approach through three sets of experiments: radar recognition, human action\u0000recognition, and electroencephalography (EEG) classification. The code is\u0000available at https://github.com/GitZH-Chen/LieBN.git.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140169905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}