首页 > 最新文献

arXiv - CS - Mathematical Software最新文献

英文 中文
Conversion of Boolean and Integer FlatZinc Builtins to Quadratic or Linear Integer Problems 将布尔型和整数型 FlatZinc 内置程序转换为二次或线性整数问题
Pub Date : 2024-04-19 DOI: arxiv-2404.12797
Armin Wolf
Constraint satisfaction or optimisation models -- even if they are formulatedin high-level modelling languages -- need to be reduced into an equivalentformat before they can be solved by the use of Quantum Computing. In this paperwe show how Boolean and integer FlatZinc builtins over finite-domain integervariables can be equivalently reformulated as linear equations, linearinequalities or binary products of those variables, i.e. as finite-domainquadratic integer programs. Those quadratic integer programs can be furthertransformed into equivalent Quadratic Unconstrained Binary Optimisation problemmodels, i.e. a general format for optimisation problems to be solved on QuantumComputers especially on Quantum Annealers.
约束满足或优化模型--即使它们是用高级建模语言表述的--在使用量子计算求解之前,也需要还原成等效格式。在本文中,我们展示了如何将有限域整数变量上的布尔和整数平锌内置变量等价地重新表述为线性方程、线性方程组或这些变量的二元乘积,即有限域二次整数程序。这些二次整数程序可以进一步转化为等效的二次无约束二元优化问题模型,即在量子计算机(尤其是量子退火器)上求解优化问题的通用格式。
{"title":"Conversion of Boolean and Integer FlatZinc Builtins to Quadratic or Linear Integer Problems","authors":"Armin Wolf","doi":"arxiv-2404.12797","DOIUrl":"https://doi.org/arxiv-2404.12797","url":null,"abstract":"Constraint satisfaction or optimisation models -- even if they are formulated\u0000in high-level modelling languages -- need to be reduced into an equivalent\u0000format before they can be solved by the use of Quantum Computing. In this paper\u0000we show how Boolean and integer FlatZinc builtins over finite-domain integer\u0000variables can be equivalently reformulated as linear equations, linear\u0000inequalities or binary products of those variables, i.e. as finite-domain\u0000quadratic integer programs. Those quadratic integer programs can be further\u0000transformed into equivalent Quadratic Unconstrained Binary Optimisation problem\u0000models, i.e. a general format for optimisation problems to be solved on Quantum\u0000Computers especially on Quantum Annealers.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robustness and Accuracy in Pipelined Bi-Conjugate Gradient Stabilized Method: A Comparative Study 流水线双共轭梯度稳定法的稳健性和准确性:比较研究
Pub Date : 2024-04-19 DOI: arxiv-2404.13216
Mykhailo Havdiak, Jose I. Aliaga, Roman Iakymchuk
In this article, we propose an accuracy-assuring technique for finding asolution for unsymmetric linear systems. Such problems are related to differentareas such as image processing, computer vision, and computational fluiddynamics. Parallel implementation of Krylov subspace methods speeds up findingapproximate solutions for linear systems. In this context, the refined approachin pipelined BiCGStab enhances scalability on distributed memory machines,yielding to substantial speed improvements compared to the standard BiCGStabmethod. However, it's worth noting that the pipelined BiCGStab algorithmsacrifices some accuracy, which is stabilized with the residual replacementtechnique. This paper aims to address this issue by employing the ExBLAS-basedreproducible approach. We validate the idea on a set of matrices from theSuiteSparse Matrix Collection.
在本文中,我们提出了一种用于寻找非对称线性系统解的精确保证技术。这类问题涉及图像处理、计算机视觉和计算流体动力学等不同领域。克雷洛夫子空间方法的并行实施加快了寻找线性系统近似解的速度。在此背景下,流水线 BiCGStab 中的改进方法增强了在分布式内存机器上的可扩展性,与标准 BiCGStab 方法相比,速度有了大幅提高。不过,值得注意的是,流水线式 BiCGStab 算法牺牲了一定的精度,而残差替换技术可以稳定这种精度。本文旨在通过采用基于 ExBLAS 的可重现方法来解决这一问题。我们在 SuiteSparse Matrix Collection 中的一组矩阵上验证了这一想法。
{"title":"Robustness and Accuracy in Pipelined Bi-Conjugate Gradient Stabilized Method: A Comparative Study","authors":"Mykhailo Havdiak, Jose I. Aliaga, Roman Iakymchuk","doi":"arxiv-2404.13216","DOIUrl":"https://doi.org/arxiv-2404.13216","url":null,"abstract":"In this article, we propose an accuracy-assuring technique for finding a\u0000solution for unsymmetric linear systems. Such problems are related to different\u0000areas such as image processing, computer vision, and computational fluid\u0000dynamics. Parallel implementation of Krylov subspace methods speeds up finding\u0000approximate solutions for linear systems. In this context, the refined approach\u0000in pipelined BiCGStab enhances scalability on distributed memory machines,\u0000yielding to substantial speed improvements compared to the standard BiCGStab\u0000method. However, it's worth noting that the pipelined BiCGStab algorithm\u0000sacrifices some accuracy, which is stabilized with the residual replacement\u0000technique. This paper aims to address this issue by employing the ExBLAS-based\u0000reproducible approach. We validate the idea on a set of matrices from the\u0000SuiteSparse Matrix Collection.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"90 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140800879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GALÆXI: Solving complex compressible flows with high-order discontinuous Galerkin methods on accelerator-based systems GALÆXI:用基于加速器的系统上的高阶非连续伽勒金方法解决复杂可压缩流动问题
Pub Date : 2024-04-19 DOI: arxiv-2404.12703
Daniel Kempf, Marius Kurz, Marcel Blind, Patrick Kopper, Philipp Offenhäuser, Anna Schwarz, Spencer Starr, Jens Keim, Andrea Beck
This work presents GAL{AE}XI as a novel, energy-efficient flow solver forthe simulation of compressible flows on unstructured meshes leveraging theparallel computing power of modern Graphics Processing Units (GPUs). GAL{AE}XIimplements the high-order Discontinuous Galerkin Spectral Element Method(DGSEM) using shock capturing with a finite-volume subcell approach to ensurethe stability of the high-order scheme near shocks. This work provides detailson the general code design, the parallelization strategy, and theimplementation approach for the compute kernels with a focus on the elementlocal mappings between volume and surface data due to the unstructured mesh.GAL{AE}XI exhibits excellent strong scaling properties up to 1024 GPUs if eachGPU is assigned a minimum of one million degrees of freedom degrees of freedom.To verify its implementation, a convergence study is performed that recoversthe theoretical order of convergence of the implemented numerical schemes.Moreover, the solver is validated using both the incompressible andcompressible formulation of the Taylor-Green-Vortex at a Mach number of 0.1 and1.25, respectively. A mesh convergence study shows that the results converge tothe high-fidelity reference solution and that the results match the originalCPU implementation. Finally, GAL{AE}XI is applied to a large-scalewall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37.Here, the supersonic region and shocks at the leading edge are capturedaccurately and robustly by the implemented shock-capturing approach. It isdemonstrated that GAL{AE}XI requires less than half of the energy to carry outthis simulation in comparison to the reference CPU implementation. This rendersGAL{AE}XI as a potent tool for accurate and efficient simulations ofcompressible flows in the realm of exascale computing and the associated newHPC architectures.
本研究利用现代图形处理器(GPU)的并行计算能力,提出了GAL{/AE}XI作为一种新颖、节能的流动求解器,用于模拟非结构网格上的可压缩流动。GAL{AE}XI 利用冲击捕捉和有限体积子单元方法实现了高阶非连续伽勒金谱元法(DGSEM),以确保高阶方案在冲击附近的稳定性。这项工作详细介绍了计算内核的一般代码设计、并行化策略和实现方法,重点是非结构化网格导致的体积和表面数据之间的元素局部映射。如果为每个 GPU 分配至少一百万自由度,GAL{AE}XI 将在高达 1024 个 GPU 上表现出卓越的强扩展特性。此外,在马赫数分别为0.1和1.25的条件下,使用泰勒-格林-漩涡的不可压缩和可压缩形式对求解器进行了验证。网格收敛研究表明,结果收敛于高保真参考解,并且结果与最初的CPU实现相匹配。最后,GAL{AE}XI 被应用于 NASA 37 号转子线性级联的大尺度分辨大涡流模拟。在这里,冲击捕获方法准确而稳健地捕获了超音速区域和前缘的冲击。结果表明,与参考的CPU实现相比,GAL{/AE}XI进行仿真所需的能量不到一半。这使得GAL{AE}XI成为在超大规模计算领域和相关的新型高性能计算架构中精确、高效地模拟可压缩流动的有力工具。
{"title":"GALÆXI: Solving complex compressible flows with high-order discontinuous Galerkin methods on accelerator-based systems","authors":"Daniel Kempf, Marius Kurz, Marcel Blind, Patrick Kopper, Philipp Offenhäuser, Anna Schwarz, Spencer Starr, Jens Keim, Andrea Beck","doi":"arxiv-2404.12703","DOIUrl":"https://doi.org/arxiv-2404.12703","url":null,"abstract":"This work presents GAL{AE}XI as a novel, energy-efficient flow solver for\u0000the simulation of compressible flows on unstructured meshes leveraging the\u0000parallel computing power of modern Graphics Processing Units (GPUs). GAL{AE}XI\u0000implements the high-order Discontinuous Galerkin Spectral Element Method\u0000(DGSEM) using shock capturing with a finite-volume subcell approach to ensure\u0000the stability of the high-order scheme near shocks. This work provides details\u0000on the general code design, the parallelization strategy, and the\u0000implementation approach for the compute kernels with a focus on the element\u0000local mappings between volume and surface data due to the unstructured mesh.\u0000GAL{AE}XI exhibits excellent strong scaling properties up to 1024 GPUs if each\u0000GPU is assigned a minimum of one million degrees of freedom degrees of freedom.\u0000To verify its implementation, a convergence study is performed that recovers\u0000the theoretical order of convergence of the implemented numerical schemes.\u0000Moreover, the solver is validated using both the incompressible and\u0000compressible formulation of the Taylor-Green-Vortex at a Mach number of 0.1 and\u00001.25, respectively. A mesh convergence study shows that the results converge to\u0000the high-fidelity reference solution and that the results match the original\u0000CPU implementation. Finally, GAL{AE}XI is applied to a large-scale\u0000wall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37.\u0000Here, the supersonic region and shocks at the leading edge are captured\u0000accurately and robustly by the implemented shock-capturing approach. It is\u0000demonstrated that GAL{AE}XI requires less than half of the energy to carry out\u0000this simulation in comparison to the reference CPU implementation. This renders\u0000GAL{AE}XI as a potent tool for accurate and efficient simulations of\u0000compressible flows in the realm of exascale computing and the associated new\u0000HPC architectures.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Confirmable Workflows in OSCAR OSCAR 中的可确认工作流程
Pub Date : 2024-04-09 DOI: arxiv-2404.06241
Michael Joswig, Lars Kastner, Benjamin Lorenz
We discuss what is special about the reproducibility of workflows in computeralgebra. It is emphasized how the programming language Julia and the newcomputer algebra system OSCAR support such a reproducibility, and how users canbenefit for their own work.
我们讨论了计算代数中工作流程可重现性的特殊之处。我们强调了编程语言 Julia 和新的计算机代数系统 OSCAR 如何支持这种可重现性,以及用户如何从自己的工作中获益。
{"title":"Confirmable Workflows in OSCAR","authors":"Michael Joswig, Lars Kastner, Benjamin Lorenz","doi":"arxiv-2404.06241","DOIUrl":"https://doi.org/arxiv-2404.06241","url":null,"abstract":"We discuss what is special about the reproducibility of workflows in computer\u0000algebra. It is emphasized how the programming language Julia and the new\u0000computer algebra system OSCAR support such a reproducibility, and how users can\u0000benefit for their own work.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers SARIS:利用间接流寄存器在高能效 RISC-V 计算集群上加速模版计算
Pub Date : 2024-04-08 DOI: arxiv-2404.05303
Paul Scheffler, Luca Colagrande, Luca Benini
Stencil codes are performance-critical in many compute-intensiveapplications, but suffer from significant address calculation and irregularmemory access overheads. This work presents SARIS, a general and highlyflexible methodology for stencil acceleration using register-mapped indirectstreams. We demonstrate SARIS for various stencil codes on an eight-core RISC-Vcompute cluster with indirect stream registers, achieving significant speedupsof 2.72x, near-ideal FPU utilizations of 81%, and energy efficiencyimprovements of 1.58x over an RV32G baseline on average. Scaling out to a256-core manycore system, we estimate an average FPU utilization of 64%, anaverage speedup of 2.14x, and up to 15% higher fractions of peak compute than aleading GPU code generator.
在许多计算密集型应用中,模版代码对性能至关重要,但却存在大量地址计算和不规则内存访问开销。本研究提出了一种利用寄存器映射间接流进行模版加速的通用且高度灵活的方法--SARIS。我们在带有间接流寄存器的八核 RISC-V 计算集群上演示了各种模板代码的 SARIS,与 RV32G 基准相比,速度显著提高了 2.72 倍,FPU 利用率接近理想值的 81%,能效平均提高了 1.58 倍。扩展到 256 核多核系统,我们估计 FPU 平均利用率为 64%,平均速度提高了 2.14 倍,峰值计算分数比领先的 GPU 代码生成器高 15%。
{"title":"SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers","authors":"Paul Scheffler, Luca Colagrande, Luca Benini","doi":"arxiv-2404.05303","DOIUrl":"https://doi.org/arxiv-2404.05303","url":null,"abstract":"Stencil codes are performance-critical in many compute-intensive\u0000applications, but suffer from significant address calculation and irregular\u0000memory access overheads. This work presents SARIS, a general and highly\u0000flexible methodology for stencil acceleration using register-mapped indirect\u0000streams. We demonstrate SARIS for various stencil codes on an eight-core RISC-V\u0000compute cluster with indirect stream registers, achieving significant speedups\u0000of 2.72x, near-ideal FPU utilizations of 81%, and energy efficiency\u0000improvements of 1.58x over an RV32G baseline on average. Scaling out to a\u0000256-core manycore system, we estimate an average FPU utilization of 64%, an\u0000average speedup of 2.14x, and up to 15% higher fractions of peak compute than a\u0000leading GPU code generator.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive Formal Specification for Mathematical Problems of Engineers 工程师数学问题的交互式形式化规范
Pub Date : 2024-04-08 DOI: arxiv-2404.05462
Walther NeuperJKU - Johannes Kepler Universität Linz
The paper presents the second part of a precise description of the prototypethat has been developed in the course of the ISAC project over the last twodecades. This part describes the "specify-phase", while the first partdescribing the "solve-phase" is already published. In the specify-phase a student interactively constructs a formalspecification. The ISAC prototype implements formal specifications asestablished in theoretical computer science, however, the input language forthe construction avoids requiring users to have knowledge of logic; this makesthe system useful for various engineering faculties (and also for high school). The paper discusses not only ISAC's design of the specify-phase in detail,but also gives a brief introduction to implementation with the aim ofadvertising the re-use of formal frameworks (inclusive respective front-ends)with their generic tools for language definition and their rich pool ofsoftware components for formal mathematics.
本文是对 ISAC 项目在过去二十年中开发的原型进行精确描述的第二部分。这一部分描述的是 "指定阶段",而描述 "求解阶段 "的第一部分已经发表。在 "指定阶段",学生以交互方式构建正式指定。ISAC 原型实现了理论计算机科学中确立的形式规范,但构建的输入语言不要求用户具备逻辑知识;这使得该系统适用于各种工程系(也适用于高中)。本文不仅详细讨论了 ISAC 在指定阶段的设计,还简要介绍了实现方法,目的是宣传形式框架(包括各自的前端)及其通用语言定义工具和丰富的形式数学软件组件库的再利用。
{"title":"Interactive Formal Specification for Mathematical Problems of Engineers","authors":"Walther NeuperJKU - Johannes Kepler Universität Linz","doi":"arxiv-2404.05462","DOIUrl":"https://doi.org/arxiv-2404.05462","url":null,"abstract":"The paper presents the second part of a precise description of the prototype\u0000that has been developed in the course of the ISAC project over the last two\u0000decades. This part describes the \"specify-phase\", while the first part\u0000describing the \"solve-phase\" is already published. In the specify-phase a student interactively constructs a formal\u0000specification. The ISAC prototype implements formal specifications as\u0000established in theoretical computer science, however, the input language for\u0000the construction avoids requiring users to have knowledge of logic; this makes\u0000the system useful for various engineering faculties (and also for high school). The paper discusses not only ISAC's design of the specify-phase in detail,\u0000but also gives a brief introduction to implementation with the aim of\u0000advertising the re-use of formal frameworks (inclusive respective front-ends)\u0000with their generic tools for language definition and their rich pool of\u0000software components for formal mathematics.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predefined Software Environment Runtimes As A Measure For Reproducibility 将预定义软件环境运行时间作为衡量可重复性的标准
Pub Date : 2024-04-08 DOI: arxiv-2404.05563
Aaruni Kaushik
As part of Mathematical Research Data Initiative (MaRDI), we have developed away to preserve a software package into an easy to deploy and use sandboxenvironment we call a "runtime", via a program we developed called MaPS : MaRDIPackaging System. The program relies on Linux user namespaces to isolate alibrary environment from the host system, making the sandboxed softwarereproducible on other systems, with minimal effort. Moreover an overlayfilesystem makes local edits persistent. This project will aid reproducibilityefforts of research papers: both mathematical and from other disciplines. As aproof of concept, we provide runtimes for the OSCAR Computer Algebra System,polymake software for research in polyhedral geometry, and VIBRANT VirusIdentification By iteRative ANnoTation. The software is in a prerelease state:the interface for creating, deploying, and executing runtimes is final, and aninterface for easily publishing runtimes is under active development. We thuspropose publishing predefined, distributable software environment runtimesalong with research papers in an effort to make research with software basedresults reproducible.
作为 "数学研究数据计划"(Mathematical Research Data Initiative,MaRDI)的一部分,我们开发了一个名为 "MaPS:MaRDIPackaging System "的程序,将软件包保存到一个易于部署和使用的沙盒环境中,我们称之为 "运行时"。该程序依靠 Linux 用户命名空间将软件包环境与主机系统隔离开来,从而使沙箱软件可以在其他系统上重现,而且工作量极小。此外,叠加文件系统还能使本地编辑持久化。该项目将有助于数学和其他学科研究论文的可重现性。作为概念验证,我们提供了 OSCAR 计算机代数系统的运行时、用于多面体几何研究的 polymake 软件以及 VIBRANT VirusIdentification By iteRative ANnoTation。该软件目前处于预发布状态:用于创建、部署和执行运行时的界面已经完成,而用于轻松发布运行时的界面正在积极开发中。因此,我们建议将预定义的、可发布的软件环境运行时与研究论文一起发布,努力使基于软件的研究成果具有可重复性。
{"title":"Predefined Software Environment Runtimes As A Measure For Reproducibility","authors":"Aaruni Kaushik","doi":"arxiv-2404.05563","DOIUrl":"https://doi.org/arxiv-2404.05563","url":null,"abstract":"As part of Mathematical Research Data Initiative (MaRDI), we have developed a\u0000way to preserve a software package into an easy to deploy and use sandbox\u0000environment we call a \"runtime\", via a program we developed called MaPS : MaRDI\u0000Packaging System. The program relies on Linux user namespaces to isolate a\u0000library environment from the host system, making the sandboxed software\u0000reproducible on other systems, with minimal effort. Moreover an overlay\u0000filesystem makes local edits persistent. This project will aid reproducibility\u0000efforts of research papers: both mathematical and from other disciplines. As a\u0000proof of concept, we provide runtimes for the OSCAR Computer Algebra System,\u0000polymake software for research in polyhedral geometry, and VIBRANT Virus\u0000Identification By iteRative ANnoTation. The software is in a prerelease state:\u0000the interface for creating, deploying, and executing runtimes is final, and an\u0000interface for easily publishing runtimes is under active development. We thus\u0000propose publishing predefined, distributable software environment runtimes\u0000along with research papers in an effort to make research with software based\u0000results reproducible.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A shared compilation stack for distributed-memory parallelism in stencil DSLs 模板 DSL 中分布式内存并行的共享编译栈
Pub Date : 2024-04-02 DOI: arxiv-2404.02218
George Bisbas, Anton Lydike, Emilien Bauer, Nick Brown, Mathieu Fehr, Lawrence Mitchell, Gabriel Rodriguez-Canal, Maurice Jamieson, Paul H. J. Kelly, Michel Steuwer, Tobias Grosser
Domain Specific Languages (DSLs) increase programmer productivity and providehigh performance. Their targeted abstractions allow scientists to expressproblems at a high level, providing rich details that optimizing compilers canexploit to target current- and next-generation supercomputers. The convenienceand performance of DSLs come with significant development and maintenancecosts. The siloed design of DSL compilers and the resulting inability tobenefit from shared infrastructure cause uncertainties around longevity and theadoption of DSLs at scale. By tailoring the broadly-adopted MLIR compilerframework to HPC, we bring the same synergies that the machine learningcommunity already exploits across their DSLs (e.g. Tensorflow, PyTorch) to thefinite-difference stencil HPC community. We introduce new HPC-specificabstractions for message passing targeting distributed stencil computations. Wedemonstrate the sharing of common components across three distinct HPCstencil-DSL compilers: Devito, PSyclone, and the Open Earth Compiler, showingthat our framework generates high-performance executables based upon a sharedcompiler ecosystem.
特定领域语言(DSL)提高了程序员的工作效率并提供了高性能。它们有针对性的抽象使科学家们能够在高层次上表达问题,提供丰富的细节,优化编译器可以利用这些细节,以当前和下一代超级计算机为目标。DSL 的便利性和性能带来了巨大的开发和维护成本。DSL 编译器的孤岛式设计以及由此导致的无法从共享基础架构中获益的问题,给 DSL 的使用寿命和大规模采用带来了不确定性。通过为高性能计算量身定制广为采用的MLIR编译器框架,我们将机器学习社区已经在其DSL(如Tensorflow、PyTorch)中利用的协同效应带到了有限差分模版高性能计算社区。我们为针对分布式模版计算的消息传递引入了新的 HPC 专用抽象。我们演示了在三种不同的HPC模版-DSL编译器之间共享通用组件:Devito、PSyclone和Open Earth编译器,表明我们的框架基于共享编译器生态系统生成高性能可执行文件。
{"title":"A shared compilation stack for distributed-memory parallelism in stencil DSLs","authors":"George Bisbas, Anton Lydike, Emilien Bauer, Nick Brown, Mathieu Fehr, Lawrence Mitchell, Gabriel Rodriguez-Canal, Maurice Jamieson, Paul H. J. Kelly, Michel Steuwer, Tobias Grosser","doi":"arxiv-2404.02218","DOIUrl":"https://doi.org/arxiv-2404.02218","url":null,"abstract":"Domain Specific Languages (DSLs) increase programmer productivity and provide\u0000high performance. Their targeted abstractions allow scientists to express\u0000problems at a high level, providing rich details that optimizing compilers can\u0000exploit to target current- and next-generation supercomputers. The convenience\u0000and performance of DSLs come with significant development and maintenance\u0000costs. The siloed design of DSL compilers and the resulting inability to\u0000benefit from shared infrastructure cause uncertainties around longevity and the\u0000adoption of DSLs at scale. By tailoring the broadly-adopted MLIR compiler\u0000framework to HPC, we bring the same synergies that the machine learning\u0000community already exploits across their DSLs (e.g. Tensorflow, PyTorch) to the\u0000finite-difference stencil HPC community. We introduce new HPC-specific\u0000abstractions for message passing targeting distributed stencil computations. We\u0000demonstrate the sharing of common components across three distinct HPC\u0000stencil-DSL compilers: Devito, PSyclone, and the Open Earth Compiler, showing\u0000that our framework generates high-performance executables based upon a shared\u0000compiler ecosystem.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inexactness and Correction of Floating-Point Reciprocal, Division and Square Root 浮点倒数、除法和平方根的不精确性和修正
Pub Date : 2024-03-30 DOI: arxiv-2404.00387
Lucas M. Dutton, Christopher Kumar Anand, Robert Enenkel, Silvia Melitta Müller
Floating-point arithmetic performance determines the overall performance ofimportant applications, from graphics to AI. Meeting the IEEE-754 specificationfor floating-point requires that final results of addition, subtraction,multiplication, division, and square root are correctly rounded based on theuser-selected rounding mode. A frustrating fact for implementers is that naiverounding methods will not produce correctly rounded results even whenintermediate results with greater accuracy and precision are available. Incontrast, our novel algorithm can correct approximations of reciprocal,division and square root, even ones with slightly lower than target precision.In this paper, we present a family of algorithms that can both increase theaccuracy (and potentially the precision) of an estimate and correctly round itaccording to all binary IEEE-754 rounding modes. We explain how it may beefficiently implemented in hardware, and for completeness, we present proofsthat it is not necessary to include equality tests associated withround-to-nearest-even mode for reciprocal, division and square root functions,because it is impossible for input(s) in a given precision to have exactanswers exactly midway between representable floating-point numbers in thatprecision. In fact, our simpler proofs are sometimes stronger.
浮点运算性能决定了从图形到人工智能等重要应用的整体性能。要满足 IEEE-754 浮点规范的要求,加法、减法、乘法、除法和平方根的最终结果必须根据用户选择的舍入模式正确舍入。一个令实施者沮丧的事实是,即使有精度和准确度更高的中间结果,传统的舍入方法也不会产生正确的舍入结果。与此相反,我们的新型算法可以修正倒数、除法和平方根的近似值,甚至可以修正精度略低于目标值的近似值。在本文中,我们提出了一系列算法,这些算法既能提高估计值的精度(也可能提高精度),又能根据所有二进制 IEEE-754 舍入模式对估计值进行正确舍入。我们解释了如何在硬件中有效地实现该算法,并且为了完整起见,我们提出了一个证明,即没有必要在倒数、除法和平方根函数中加入与舍入到最近偶数模式相关的相等检验,因为给定精度的输入不可能在该精度的可表示浮点数中间有精确的答案。事实上,我们的简单证明有时更强。
{"title":"Inexactness and Correction of Floating-Point Reciprocal, Division and Square Root","authors":"Lucas M. Dutton, Christopher Kumar Anand, Robert Enenkel, Silvia Melitta Müller","doi":"arxiv-2404.00387","DOIUrl":"https://doi.org/arxiv-2404.00387","url":null,"abstract":"Floating-point arithmetic performance determines the overall performance of\u0000important applications, from graphics to AI. Meeting the IEEE-754 specification\u0000for floating-point requires that final results of addition, subtraction,\u0000multiplication, division, and square root are correctly rounded based on the\u0000user-selected rounding mode. A frustrating fact for implementers is that naive\u0000rounding methods will not produce correctly rounded results even when\u0000intermediate results with greater accuracy and precision are available. In\u0000contrast, our novel algorithm can correct approximations of reciprocal,\u0000division and square root, even ones with slightly lower than target precision.\u0000In this paper, we present a family of algorithms that can both increase the\u0000accuracy (and potentially the precision) of an estimate and correctly round it\u0000according to all binary IEEE-754 rounding modes. We explain how it may be\u0000efficiently implemented in hardware, and for completeness, we present proofs\u0000that it is not necessary to include equality tests associated with\u0000round-to-nearest-even mode for reciprocal, division and square root functions,\u0000because it is impossible for input(s) in a given precision to have exact\u0000answers exactly midway between representable floating-point numbers in that\u0000precision. In fact, our simpler proofs are sometimes stronger.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Lie Group Approach to Riemannian Batch Normalization 黎曼批量归一化的李群方法
Pub Date : 2024-03-17 DOI: arxiv-2403.11261
Ziheng Chen, Yue Song, Yunmei Liu, Nicu Sebe
Manifold-valued measurements exist in numerous applications within computervision and machine learning. Recent studies have extended Deep Neural Networks(DNNs) to manifolds, and concomitantly, normalization techniques have also beenadapted to several manifolds, referred to as Riemannian normalization.Nonetheless, most of the existing Riemannian normalization methods have beenderived in an ad hoc manner and only apply to specific manifolds. This paperestablishes a unified framework for Riemannian Batch Normalization (RBN)techniques on Lie groups. Our framework offers the theoretical guarantee ofcontrolling both the Riemannian mean and variance. Empirically, we focus onSymmetric Positive Definite (SPD) manifolds, which possess three distinct typesof Lie group structures. Using the deformation concept, we generalize theexisting Lie groups on SPD manifolds into three families of parameterized Liegroups. Specific normalization layers induced by these Lie groups are thenproposed for SPD neural networks. We demonstrate the effectiveness of ourapproach through three sets of experiments: radar recognition, human actionrecognition, and electroencephalography (EEG) classification. The code isavailable at https://github.com/GitZH-Chen/LieBN.git.
流形值测量在计算机视觉和机器学习领域应用广泛。最近的研究将深度神经网络(DNN)扩展到了流形,与此同时,归一化技术也适用于多个流形,即黎曼归一化。然而,现有的大多数黎曼归一化方法都是以临时方式衍生出来的,只适用于特定的流形。本文建立了一个统一的黎曼批量归一化(RBN)技术框架。我们的框架为控制黎曼均值和方差提供了理论保证。在经验上,我们将重点放在对称正定(SPD)流形上,它拥有三种不同类型的李群结构。利用变形概念,我们将 SPD 流形上现有的李群概括为三个参数化李群族。然后,我们为 SPD 神经网络提出了由这些李群诱导的特定归一化层。我们通过雷达识别、人类动作识别和脑电图(EEG)分类等三组实验证明了这一方法的有效性。代码见 https://github.com/GitZH-Chen/LieBN.git。
{"title":"A Lie Group Approach to Riemannian Batch Normalization","authors":"Ziheng Chen, Yue Song, Yunmei Liu, Nicu Sebe","doi":"arxiv-2403.11261","DOIUrl":"https://doi.org/arxiv-2403.11261","url":null,"abstract":"Manifold-valued measurements exist in numerous applications within computer\u0000vision and machine learning. Recent studies have extended Deep Neural Networks\u0000(DNNs) to manifolds, and concomitantly, normalization techniques have also been\u0000adapted to several manifolds, referred to as Riemannian normalization.\u0000Nonetheless, most of the existing Riemannian normalization methods have been\u0000derived in an ad hoc manner and only apply to specific manifolds. This paper\u0000establishes a unified framework for Riemannian Batch Normalization (RBN)\u0000techniques on Lie groups. Our framework offers the theoretical guarantee of\u0000controlling both the Riemannian mean and variance. Empirically, we focus on\u0000Symmetric Positive Definite (SPD) manifolds, which possess three distinct types\u0000of Lie group structures. Using the deformation concept, we generalize the\u0000existing Lie groups on SPD manifolds into three families of parameterized Lie\u0000groups. Specific normalization layers induced by these Lie groups are then\u0000proposed for SPD neural networks. We demonstrate the effectiveness of our\u0000approach through three sets of experiments: radar recognition, human action\u0000recognition, and electroencephalography (EEG) classification. The code is\u0000available at https://github.com/GitZH-Chen/LieBN.git.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140169905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Mathematical Software
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1