URANOS-2.0：提高性能、增强可移植性并扩展模型，以实现对高速工程流的超大规模计算

IF 7.2 2区物理与天体物理 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computer Physics Communications Pub Date : 2024-06-21 DOI:10.1016/j.cpc.2024.109285

Francesco De Vanna , Giacomo Baldan

{"title":"URANOS-2.0：提高性能、增强可移植性并扩展模型，以实现对高速工程流的超大规模计算","authors":"Francesco De Vanna , Giacomo Baldan","doi":"10.1016/j.cpc.2024.109285","DOIUrl":null,"url":null,"abstract":"<div><p>We present URANOS-2.0, the second major release of our massively parallel, GPU-accelerated solver for compressible wall flow applications. This latest version represents a significant leap forward in our initial tool, which was launched in 2023 (De Vanna et al. <span>[1]</span>), and has been specifically optimized to take full advantage of the opportunities offered by the cutting-edge pre-exascale architectures available within the EuroHPC JU. In particular, URANOS-2.0 emphasizes portability and compatibility improvements with the two top-ranked supercomputing architectures in Europe: LUMI and Leonardo. These systems utilize different GPU architectures, AMD and NVIDIA, respectively, which necessitates extensive efforts to ensure seamless usability across their distinct structures. In pursuit of this objective, the current release adheres to the OpenACC standard. This choice not only facilitates efficient utilization of the full potential inherent in these extensive GPU-based architectures but also upholds the principles of vendor neutrality, a distinctive characteristic of URANOS solvers in the CFD solvers' panorama. However, the URANOS-2.0 version goes beyond the goals of improving usability and portability; it introduces performance enhancements and restructures the most demanding computational kernels. This translates into a 2× speedup over the same architecture. In addition to its enhanced single-GPU performance, the present solver release demonstrates very good scalability in multi-GPU environments. URANOS-2.0, in fact, achieves strong scaling efficiencies of over 80% across 64 compute nodes (256 GPUs) for both LUMI and Leonardo. Furthermore, its weak scaling efficiencies reach approximately 95% and 90% on LUMI and Leonardo, respectively, when up to 256 nodes (1024 GPUs) are considered. These significant performance advancements position URANOS-2.0 as a state-of-the-art supercomputing platform tailored for compressible wall turbulence applications, establishing the solver as an integrated tool for various aerospace and energy engineering applications, which can span from direct numerical simulations, wall-resolved large eddy simulations, up to most recent wall-modeled large eddy simulations.</p></div><div><h3>Program summary</h3><p><em>Program title:</em> Unsteady Robust All-around Navier-StOkes Solver (URANOS)</p><p><em>CPC Library link to program files:</em> <span>https://doi.org/10.17632/pw5hshn9k6.2</span><svg><path></path></svg></p><p><em>Developer's repository link:</em> <span>https://github.com/uranos-gpu/uranos-gpu</span><svg><path></path></svg>, <span>https://github.com/uranos-gpu/uranos-gpu/tree/v2.0</span><svg><path></path></svg></p><p><em>Licensing provisions:</em> BSD License 2.0</p><p><em>Programming language:</em> Modern Fortran, OpenACC, MPI</p><p><em>Nature of problem:</em> Solving the compressible Navier-Stokes equations in a three-dimensional Cartesian framework.</p><p><em>Solution method:</em> Convective terms are treated with high-resolution shock-capturing schemes. The system dynamics is advanced in time with a three-stage Runge-Kutta method. Parallelization adopts MPI+OpenACC.</p></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S001046552400208X/pdfft?md5=7a6e04c9a2b65cdb6b3bf373bd81aed0&pid=1-s2.0-S001046552400208X-main.pdf","citationCount":"0","resultStr":"{\"title\":\"URANOS-2.0: Improved performance, enhanced portability, and model extension towards exascale computing of high-speed engineering flows\",\"authors\":\"Francesco De Vanna , Giacomo Baldan\",\"doi\":\"10.1016/j.cpc.2024.109285\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We present URANOS-2.0, the second major release of our massively parallel, GPU-accelerated solver for compressible wall flow applications. This latest version represents a significant leap forward in our initial tool, which was launched in 2023 (De Vanna et al. <span>[1]</span>), and has been specifically optimized to take full advantage of the opportunities offered by the cutting-edge pre-exascale architectures available within the EuroHPC JU. In particular, URANOS-2.0 emphasizes portability and compatibility improvements with the two top-ranked supercomputing architectures in Europe: LUMI and Leonardo. These systems utilize different GPU architectures, AMD and NVIDIA, respectively, which necessitates extensive efforts to ensure seamless usability across their distinct structures. In pursuit of this objective, the current release adheres to the OpenACC standard. This choice not only facilitates efficient utilization of the full potential inherent in these extensive GPU-based architectures but also upholds the principles of vendor neutrality, a distinctive characteristic of URANOS solvers in the CFD solvers' panorama. However, the URANOS-2.0 version goes beyond the goals of improving usability and portability; it introduces performance enhancements and restructures the most demanding computational kernels. This translates into a 2× speedup over the same architecture. In addition to its enhanced single-GPU performance, the present solver release demonstrates very good scalability in multi-GPU environments. URANOS-2.0, in fact, achieves strong scaling efficiencies of over 80% across 64 compute nodes (256 GPUs) for both LUMI and Leonardo. Furthermore, its weak scaling efficiencies reach approximately 95% and 90% on LUMI and Leonardo, respectively, when up to 256 nodes (1024 GPUs) are considered. These significant performance advancements position URANOS-2.0 as a state-of-the-art supercomputing platform tailored for compressible wall turbulence applications, establishing the solver as an integrated tool for various aerospace and energy engineering applications, which can span from direct numerical simulations, wall-resolved large eddy simulations, up to most recent wall-modeled large eddy simulations.</p></div><div><h3>Program summary</h3><p><em>Program title:</em> Unsteady Robust All-around Navier-StOkes Solver (URANOS)</p><p><em>CPC Library link to program files:</em> <span>https://doi.org/10.17632/pw5hshn9k6.2</span><svg><path></path></svg></p><p><em>Developer's repository link:</em> <span>https://github.com/uranos-gpu/uranos-gpu</span><svg><path></path></svg>, <span>https://github.com/uranos-gpu/uranos-gpu/tree/v2.0</span><svg><path></path></svg></p><p><em>Licensing provisions:</em> BSD License 2.0</p><p><em>Programming language:</em> Modern Fortran, OpenACC, MPI</p><p><em>Nature of problem:</em> Solving the compressible Navier-Stokes equations in a three-dimensional Cartesian framework.</p><p><em>Solution method:</em> Convective terms are treated with high-resolution shock-capturing schemes. The system dynamics is advanced in time with a three-stage Runge-Kutta method. Parallelization adopts MPI+OpenACC.</p></div>\",\"PeriodicalId\":285,\"journal\":{\"name\":\"Computer Physics Communications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2024-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S001046552400208X/pdfft?md5=7a6e04c9a2b65cdb6b3bf373bd81aed0&pid=1-s2.0-S001046552400208X-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Physics Communications\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S001046552400208X\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S001046552400208X","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

我们推出的 URANOS-2.0 是针对可压缩壁流应用的大规模并行 GPU 加速求解器的第二个重要版本。这一最新版本代表了我们在 2023 年推出的初始工具（De Vanna 等人[1]）的重大飞跃，并经过了专门优化，以充分利用 EuroHPC JU 提供的尖端超大规模前架构所带来的机遇。URANOS-2.0特别强调了与欧洲两大顶级超级计算架构的可移植性和兼容性：LUMI和Leonardo。这两个系统分别采用 AMD 和 NVIDIA 两种不同的 GPU 架构，因此需要做出大量努力，以确保其不同结构之间的无缝可用性。为了实现这一目标，当前版本采用了 OpenACC 标准。这一选择不仅有利于高效利用这些基于 GPU 的广泛架构的内在潜力，而且还坚持了厂商中立原则，这也是 URANOS 求解器在 CFD 求解器领域的一个显著特点。然而，URANOS-2.0 版本的目标不仅限于提高可用性和可移植性，它还引入了性能增强功能，并对要求最苛刻的计算内核进行了重组。与相同的架构相比，速度提高了 2 倍。除了增强单 GPU 性能外，当前版本的求解器在多 GPU 环境中也表现出了良好的可扩展性。事实上，URANOS-2.0 在 LUMI 和 Leonardo 的 64 个计算节点（256 个 GPU）上实现了超过 80% 的强大扩展效率。此外，当考虑到多达 256 个节点（1024 个 GPU）时，其在 LUMI 和 Leonardo 上的弱扩展效率分别达到约 95% 和 90%。这些性能上的重大进步将URANOS-2.0定位为专为可压缩壁面湍流应用量身定制的最先进的超级计算平台，从而使该求解器成为各种航空航天和能源工程应用的集成工具，应用范围从直接数值模拟、壁面分辨大涡流模拟到最新的壁面建模大涡流模拟：Unsteady Robust All-around Navier-StOkes Solver (URANOS)CPC Library 链接到程序文件：https://doi.org/10.17632/pw5hshn9k6.2Developer's repository 链接：https://github.com/uranos-gpu/uranos-gpu, https://github.com/uranos-gpu/uranos-gpu/tree/v2.0Licensing provisions：BSD License 2.0编程语言：问题性质：在三维笛卡尔框架内求解可压缩 Navier-Stokes 方程：对流项采用高分辨率冲击捕捉方案进行处理。系统动力学采用三阶段 Runge-Kutta 方法进行时间推进。并行化采用 MPI+OpenACC。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

URANOS-2.0: Improved performance, enhanced portability, and model extension towards exascale computing of high-speed engineering flows

We present URANOS-2.0, the second major release of our massively parallel, GPU-accelerated solver for compressible wall flow applications. This latest version represents a significant leap forward in our initial tool, which was launched in 2023 (De Vanna et al. [1]), and has been specifically optimized to take full advantage of the opportunities offered by the cutting-edge pre-exascale architectures available within the EuroHPC JU. In particular, URANOS-2.0 emphasizes portability and compatibility improvements with the two top-ranked supercomputing architectures in Europe: LUMI and Leonardo. These systems utilize different GPU architectures, AMD and NVIDIA, respectively, which necessitates extensive efforts to ensure seamless usability across their distinct structures. In pursuit of this objective, the current release adheres to the OpenACC standard. This choice not only facilitates efficient utilization of the full potential inherent in these extensive GPU-based architectures but also upholds the principles of vendor neutrality, a distinctive characteristic of URANOS solvers in the CFD solvers' panorama. However, the URANOS-2.0 version goes beyond the goals of improving usability and portability; it introduces performance enhancements and restructures the most demanding computational kernels. This translates into a 2× speedup over the same architecture. In addition to its enhanced single-GPU performance, the present solver release demonstrates very good scalability in multi-GPU environments. URANOS-2.0, in fact, achieves strong scaling efficiencies of over 80% across 64 compute nodes (256 GPUs) for both LUMI and Leonardo. Furthermore, its weak scaling efficiencies reach approximately 95% and 90% on LUMI and Leonardo, respectively, when up to 256 nodes (1024 GPUs) are considered. These significant performance advancements position URANOS-2.0 as a state-of-the-art supercomputing platform tailored for compressible wall turbulence applications, establishing the solver as an integrated tool for various aerospace and energy engineering applications, which can span from direct numerical simulations, wall-resolved large eddy simulations, up to most recent wall-modeled large eddy simulations.

Program summary

Program title: Unsteady Robust All-around Navier-StOkes Solver (URANOS)

CPC Library link to program files: https://doi.org/10.17632/pw5hshn9k6.2

Developer's repository link: https://github.com/uranos-gpu/uranos-gpu, https://github.com/uranos-gpu/uranos-gpu/tree/v2.0

Licensing provisions: BSD License 2.0

Programming language: Modern Fortran, OpenACC, MPI

Nature of problem: Solving the compressible Navier-Stokes equations in a three-dimensional Cartesian framework.

Solution method: Convective terms are treated with high-resolution shock-capturing schemes. The system dynamics is advanced in time with a three-stage Runge-Kutta method. Parallelization adopts MPI+OpenACC.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Physics Communications 物理-计算机：跨学科应用

CiteScore

12.10

自引率

3.20%

发文量

287

审稿时长

5.3 months

期刊介绍： The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper. Computer Programs in Physics (CPiP) These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged. Computational Physics Papers (CP) These are research papers in, but are not limited to, the following themes across computational physics and related disciplines. mathematical and numerical methods and algorithms; computational models including those associated with the design, control and analysis of experiments; and algebraic computation. Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.