Design of Neko—A Scalable High-Fidelity Simulation Framework With Extensive Accelerator Support

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Concurrency and Computation-Practice & Experience Pub Date : 2024-12-29 DOI:10.1002/cpe.8340

Niclas Jansson, Martin Karp, Jacob Wahlgren, Stefano Markidis, Philipp Schlatter

{"title":"Design of Neko—A Scalable High-Fidelity Simulation Framework With Extensive Accelerator Support","authors":"Niclas Jansson, Martin Karp, Jacob Wahlgren, Stefano Markidis, Philipp Schlatter","doi":"10.1002/cpe.8340","DOIUrl":null,"url":null,"abstract":"<p>Recent trends and advancements in including more diverse and heterogeneous hardware in High-Performance Computing (HPC) are challenging scientific software developers in their pursuit of efficient numerical methods with sustained performance across a diverse set of platforms. As a result, researchers are today forced to re-factor their codes to leverage these powerful new heterogeneous systems. We present our design considerations of Neko—a portable framework for high-fidelity spectral element flow simulations. Unlike prior works, Neko adopts a modern object-oriented Fortran 2008 approach, allowing multi-tier abstractions of the solver stack and facilitating various hardware backends ranging from general-purpose processors, accelerators down to exotic vector processors and Field-Programmable Gate Arrays (FPGAs). Focusing on the performance and portability of Neko, we describe the framework's device abstraction layer managing device memory, data transfer and kernel launches from Fortran, allowing for a solver written in a hardware-neutral yet performant way. Accelerator-specific optimizations are also discussed, with auto-tuning of key kernels and various communication strategies using device-aware MPI. Finally, we present performance measurements on a wide range of computing platforms, including the EuroHPC pre-exascale system LUMI, where Neko achieves excellent parallel efficiency for a large direct numerical simulation (DNS) of turbulent fluid flow using up to 80% of the entire LUMI supercomputer.</p>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 2","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpe.8340","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8340","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Recent trends and advancements in including more diverse and heterogeneous hardware in High-Performance Computing (HPC) are challenging scientific software developers in their pursuit of efficient numerical methods with sustained performance across a diverse set of platforms. As a result, researchers are today forced to re-factor their codes to leverage these powerful new heterogeneous systems. We present our design considerations of Neko—a portable framework for high-fidelity spectral element flow simulations. Unlike prior works, Neko adopts a modern object-oriented Fortran 2008 approach, allowing multi-tier abstractions of the solver stack and facilitating various hardware backends ranging from general-purpose processors, accelerators down to exotic vector processors and Field-Programmable Gate Arrays (FPGAs). Focusing on the performance and portability of Neko, we describe the framework's device abstraction layer managing device memory, data transfer and kernel launches from Fortran, allowing for a solver written in a hardware-neutral yet performant way. Accelerator-specific optimizations are also discussed, with auto-tuning of key kernels and various communication strategies using device-aware MPI. Finally, we present performance measurements on a wide range of computing platforms, including the EuroHPC pre-exascale system LUMI, where Neko achieves excellent parallel efficiency for a large direct numerical simulation (DNS) of turbulent fluid flow using up to 80% of the entire LUMI supercomputer.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有广泛加速器支持的可扩展高保真仿真框架Neko-A的设计

在高性能计算（HPC）中包含更多样化和异构硬件的最新趋势和进步，正在挑战科学软件开发人员在不同平台上追求具有持续性能的高效数值方法。因此，研究人员今天被迫重构他们的代码，以利用这些强大的新的异构系统。我们提出了我们对neko的设计考虑-一个用于高保真光谱元素流动模拟的便携式框架。与之前的作品不同，Neko采用了现代面向对象的Fortran 2008方法，允许求解器堆栈的多层抽象，并促进了各种硬件后端，从通用处理器、加速器到奇异的矢量处理器和现场可编程门阵列（fpga）。专注于Neko的性能和可移植性，我们描述了框架的设备抽象层，管理设备内存，数据传输和Fortran内核启动，允许以硬件中立但性能良好的方式编写求解器。还讨论了特定于加速器的优化，使用设备感知MPI对关键内核和各种通信策略进行自动调优。最后，我们介绍了在各种计算平台上的性能测量，包括EuroHPC前百亿亿次系统LUMI， Neko在其中实现了出色的并行效率，用于湍流流动的大型直接数值模拟（DNS），使用整个LUMI超级计算机的80%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Concurrency and Computation-Practice & Experience 工程技术-计算机：理论方法

CiteScore

5.00

自引率

10.00%

发文量

664

审稿时长

9.6 months

期刊介绍： Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.