Daniel Kempf, Marius Kurz, Marcel Blind, Patrick Kopper, Philipp Offenhäuser, Anna Schwarz, Spencer Starr, Jens Keim, Andrea Beck
{"title":"GALÆXI: Solving complex compressible flows with high-order discontinuous Galerkin methods on accelerator-based systems","authors":"Daniel Kempf, Marius Kurz, Marcel Blind, Patrick Kopper, Philipp Offenhäuser, Anna Schwarz, Spencer Starr, Jens Keim, Andrea Beck","doi":"arxiv-2404.12703","DOIUrl":null,"url":null,"abstract":"This work presents GAL{\\AE}XI as a novel, energy-efficient flow solver for\nthe simulation of compressible flows on unstructured meshes leveraging the\nparallel computing power of modern Graphics Processing Units (GPUs). GAL{\\AE}XI\nimplements the high-order Discontinuous Galerkin Spectral Element Method\n(DGSEM) using shock capturing with a finite-volume subcell approach to ensure\nthe stability of the high-order scheme near shocks. This work provides details\non the general code design, the parallelization strategy, and the\nimplementation approach for the compute kernels with a focus on the element\nlocal mappings between volume and surface data due to the unstructured mesh.\nGAL{\\AE}XI exhibits excellent strong scaling properties up to 1024 GPUs if each\nGPU is assigned a minimum of one million degrees of freedom degrees of freedom.\nTo verify its implementation, a convergence study is performed that recovers\nthe theoretical order of convergence of the implemented numerical schemes.\nMoreover, the solver is validated using both the incompressible and\ncompressible formulation of the Taylor-Green-Vortex at a Mach number of 0.1 and\n1.25, respectively. A mesh convergence study shows that the results converge to\nthe high-fidelity reference solution and that the results match the original\nCPU implementation. Finally, GAL{\\AE}XI is applied to a large-scale\nwall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37.\nHere, the supersonic region and shocks at the leading edge are captured\naccurately and robustly by the implemented shock-capturing approach. It is\ndemonstrated that GAL{\\AE}XI requires less than half of the energy to carry out\nthis simulation in comparison to the reference CPU implementation. This renders\nGAL{\\AE}XI as a potent tool for accurate and efficient simulations of\ncompressible flows in the realm of exascale computing and the associated new\nHPC architectures.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.12703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This work presents GAL{\AE}XI as a novel, energy-efficient flow solver for
the simulation of compressible flows on unstructured meshes leveraging the
parallel computing power of modern Graphics Processing Units (GPUs). GAL{\AE}XI
implements the high-order Discontinuous Galerkin Spectral Element Method
(DGSEM) using shock capturing with a finite-volume subcell approach to ensure
the stability of the high-order scheme near shocks. This work provides details
on the general code design, the parallelization strategy, and the
implementation approach for the compute kernels with a focus on the element
local mappings between volume and surface data due to the unstructured mesh.
GAL{\AE}XI exhibits excellent strong scaling properties up to 1024 GPUs if each
GPU is assigned a minimum of one million degrees of freedom degrees of freedom.
To verify its implementation, a convergence study is performed that recovers
the theoretical order of convergence of the implemented numerical schemes.
Moreover, the solver is validated using both the incompressible and
compressible formulation of the Taylor-Green-Vortex at a Mach number of 0.1 and
1.25, respectively. A mesh convergence study shows that the results converge to
the high-fidelity reference solution and that the results match the original
CPU implementation. Finally, GAL{\AE}XI is applied to a large-scale
wall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37.
Here, the supersonic region and shocks at the leading edge are captured
accurately and robustly by the implemented shock-capturing approach. It is
demonstrated that GAL{\AE}XI requires less than half of the energy to carry out
this simulation in comparison to the reference CPU implementation. This renders
GAL{\AE}XI as a potent tool for accurate and efficient simulations of
compressible flows in the realm of exascale computing and the associated new
HPC architectures.