Marius Kurz , Daniel Kempf , Marcel P. Blind , Patrick Kopper , Philipp Offenhäuser , Anna Schwarz , Spencer Starr , Jens Keim , Andrea Beck
{"title":"GALÆXI:用基于加速器的系统上的高阶非连续伽勒金方法解决复杂可压缩流动问题","authors":"Marius Kurz , Daniel Kempf , Marcel P. Blind , Patrick Kopper , Philipp Offenhäuser , Anna Schwarz , Spencer Starr , Jens Keim , Andrea Beck","doi":"10.1016/j.cpc.2024.109388","DOIUrl":null,"url":null,"abstract":"<div><div>This work presents GALÆXI as a novel, energy-efficient flow solver for the simulation of compressible flows on unstructured hexahedral meshes leveraging the parallel computing power of modern Graphics Processing Units (GPUs). GALÆXI implements the high-order Discontinuous Galerkin Spectral Element Method (DGSEM) using shock capturing with a finite-volume subcell approach to ensure the stability of the high-order scheme near shocks. This work provides details on the general code design, the parallelization strategy, and the implementation approach for the compute kernels with a focus on the element local mappings between volume and surface data due to the unstructured mesh. The scheme is implemented using a pure distributed memory parallelization based on a domain decomposition, where each GPU handles a distinct region of the computational domain. On each GPU, the computations are assigned to different compute streams which allows to antedate the computation of quantities required for communication while performing local computations from other streams to hide the communication latency. This parallelization strategy allows for maximizing the use of available computational resources. This results in excellent strong scaling properties of GALÆXI up to 1024 GPUs if each GPU is assigned a minimum of one million degrees of freedom. To verify its implementation, a convergence study is performed that recovers the theoretical order of convergence of the implemented numerical schemes. Moreover, the solver is validated using both the incompressible and compressible formulation of the Taylor–Green-Vortex at a Mach number of 0.1 and 1.25, respectively. A mesh convergence study shows that the results converge to the high-fidelity reference solution and that the results match the original CPU implementation. Finally, GALÆXI is applied to a large-scale wall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37. Here, the supersonic region and shocks at the leading edge are captured accurately and robustly by the implemented shock-capturing approach. It is demonstrated that GALÆXI requires less than half of the energy to carry out this simulation in comparison to the reference CPU implementation. This renders GALÆXI as a potent tool for accurate and efficient simulations of compressible flows in the realm of exascale computing and the associated new HPC architectures.</div></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":"306 ","pages":"Article 109388"},"PeriodicalIF":7.2000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GALÆXI: Solving complex compressible flows with high-order discontinuous Galerkin methods on accelerator-based systems\",\"authors\":\"Marius Kurz , Daniel Kempf , Marcel P. Blind , Patrick Kopper , Philipp Offenhäuser , Anna Schwarz , Spencer Starr , Jens Keim , Andrea Beck\",\"doi\":\"10.1016/j.cpc.2024.109388\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This work presents GALÆXI as a novel, energy-efficient flow solver for the simulation of compressible flows on unstructured hexahedral meshes leveraging the parallel computing power of modern Graphics Processing Units (GPUs). GALÆXI implements the high-order Discontinuous Galerkin Spectral Element Method (DGSEM) using shock capturing with a finite-volume subcell approach to ensure the stability of the high-order scheme near shocks. This work provides details on the general code design, the parallelization strategy, and the implementation approach for the compute kernels with a focus on the element local mappings between volume and surface data due to the unstructured mesh. The scheme is implemented using a pure distributed memory parallelization based on a domain decomposition, where each GPU handles a distinct region of the computational domain. On each GPU, the computations are assigned to different compute streams which allows to antedate the computation of quantities required for communication while performing local computations from other streams to hide the communication latency. This parallelization strategy allows for maximizing the use of available computational resources. This results in excellent strong scaling properties of GALÆXI up to 1024 GPUs if each GPU is assigned a minimum of one million degrees of freedom. To verify its implementation, a convergence study is performed that recovers the theoretical order of convergence of the implemented numerical schemes. Moreover, the solver is validated using both the incompressible and compressible formulation of the Taylor–Green-Vortex at a Mach number of 0.1 and 1.25, respectively. A mesh convergence study shows that the results converge to the high-fidelity reference solution and that the results match the original CPU implementation. Finally, GALÆXI is applied to a large-scale wall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37. Here, the supersonic region and shocks at the leading edge are captured accurately and robustly by the implemented shock-capturing approach. It is demonstrated that GALÆXI requires less than half of the energy to carry out this simulation in comparison to the reference CPU implementation. This renders GALÆXI as a potent tool for accurate and efficient simulations of compressible flows in the realm of exascale computing and the associated new HPC architectures.</div></div>\",\"PeriodicalId\":285,\"journal\":{\"name\":\"Computer Physics Communications\",\"volume\":\"306 \",\"pages\":\"Article 109388\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Physics Communications\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010465524003114\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465524003114","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
摘要
本研究利用现代图形处理器(GPU)的并行计算能力,将 GALÆXI 作为一种新颖、节能的流动求解器,用于模拟非结构化六面体网格上的可压缩流动。GALÆXI 利用冲击捕捉和有限体积子单元方法实现了高阶非连续伽勒金谱元法(DGSEM),以确保高阶方案在冲击附近的稳定性。本研究详细介绍了一般代码设计、并行化策略和计算内核的实现方法,重点是非结构网格导致的体积和表面数据之间的元素局部映射。该方案采用基于域分解的纯分布式内存并行化技术,每个 GPU 处理计算域的一个不同区域。在每个 GPU 上,计算被分配到不同的计算流中,这样就可以延迟计算通信所需的数量,同时执行来自其他计算流的本地计算,以隐藏通信延迟。这种并行化策略可以最大限度地利用可用计算资源。如果为每个 GPU 分配至少一百万个自由度,GALÆXI 的扩展性能将达到 1024 个 GPU。为了验证其实施效果,进行了收敛性研究,恢复了所实施数值方案的理论收敛阶次。此外,在马赫数分别为 0.1 和 1.25 的条件下,使用泰勒-格林-涡流的不可压缩和可压缩公式对求解器进行了验证。网格收敛研究表明,结果收敛于高保真参考解,并且结果与最初的 CPU 实现相匹配。最后,GALÆXI 被应用于 NASA 37 号转子线性级联的大规模壁面分辨大涡模拟。在这里,冲击捕获方法准确而稳健地捕获了超音速区域和前缘冲击。结果表明,与参考的 CPU 实现相比,GALÆXI 执行该模拟所需的能量不到一半。这使得GALÆXI成为在超大规模计算和相关新型HPC架构下精确高效模拟可压缩流的有力工具。
GALÆXI: Solving complex compressible flows with high-order discontinuous Galerkin methods on accelerator-based systems
This work presents GALÆXI as a novel, energy-efficient flow solver for the simulation of compressible flows on unstructured hexahedral meshes leveraging the parallel computing power of modern Graphics Processing Units (GPUs). GALÆXI implements the high-order Discontinuous Galerkin Spectral Element Method (DGSEM) using shock capturing with a finite-volume subcell approach to ensure the stability of the high-order scheme near shocks. This work provides details on the general code design, the parallelization strategy, and the implementation approach for the compute kernels with a focus on the element local mappings between volume and surface data due to the unstructured mesh. The scheme is implemented using a pure distributed memory parallelization based on a domain decomposition, where each GPU handles a distinct region of the computational domain. On each GPU, the computations are assigned to different compute streams which allows to antedate the computation of quantities required for communication while performing local computations from other streams to hide the communication latency. This parallelization strategy allows for maximizing the use of available computational resources. This results in excellent strong scaling properties of GALÆXI up to 1024 GPUs if each GPU is assigned a minimum of one million degrees of freedom. To verify its implementation, a convergence study is performed that recovers the theoretical order of convergence of the implemented numerical schemes. Moreover, the solver is validated using both the incompressible and compressible formulation of the Taylor–Green-Vortex at a Mach number of 0.1 and 1.25, respectively. A mesh convergence study shows that the results converge to the high-fidelity reference solution and that the results match the original CPU implementation. Finally, GALÆXI is applied to a large-scale wall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37. Here, the supersonic region and shocks at the leading edge are captured accurately and robustly by the implemented shock-capturing approach. It is demonstrated that GALÆXI requires less than half of the energy to carry out this simulation in comparison to the reference CPU implementation. This renders GALÆXI as a potent tool for accurate and efficient simulations of compressible flows in the realm of exascale computing and the associated new HPC architectures.
期刊介绍:
The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper.
Computer Programs in Physics (CPiP)
These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged.
Computational Physics Papers (CP)
These are research papers in, but are not limited to, the following themes across computational physics and related disciplines.
mathematical and numerical methods and algorithms;
computational models including those associated with the design, control and analysis of experiments; and
algebraic computation.
Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.