Design of Neko—A Scalable High-Fidelity Simulation Framework With Extensive Accelerator Support

IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Concurrency and Computation-Practice & Experience Pub Date : 2024-12-29 DOI:10.1002/cpe.8340
Niclas Jansson, Martin Karp, Jacob Wahlgren, Stefano Markidis, Philipp Schlatter
{"title":"Design of Neko—A Scalable High-Fidelity Simulation Framework With Extensive Accelerator Support","authors":"Niclas Jansson,&nbsp;Martin Karp,&nbsp;Jacob Wahlgren,&nbsp;Stefano Markidis,&nbsp;Philipp Schlatter","doi":"10.1002/cpe.8340","DOIUrl":null,"url":null,"abstract":"<p>Recent trends and advancements in including more diverse and heterogeneous hardware in High-Performance Computing (HPC) are challenging scientific software developers in their pursuit of efficient numerical methods with sustained performance across a diverse set of platforms. As a result, researchers are today forced to re-factor their codes to leverage these powerful new heterogeneous systems. We present our design considerations of Neko—a portable framework for high-fidelity spectral element flow simulations. Unlike prior works, Neko adopts a modern object-oriented Fortran 2008 approach, allowing multi-tier abstractions of the solver stack and facilitating various hardware backends ranging from general-purpose processors, accelerators down to exotic vector processors and Field-Programmable Gate Arrays (FPGAs). Focusing on the performance and portability of Neko, we describe the framework's device abstraction layer managing device memory, data transfer and kernel launches from Fortran, allowing for a solver written in a hardware-neutral yet performant way. Accelerator-specific optimizations are also discussed, with auto-tuning of key kernels and various communication strategies using device-aware MPI. Finally, we present performance measurements on a wide range of computing platforms, including the EuroHPC pre-exascale system LUMI, where Neko achieves excellent parallel efficiency for a large direct numerical simulation (DNS) of turbulent fluid flow using up to 80% of the entire LUMI supercomputer.</p>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 2","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpe.8340","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8340","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Recent trends and advancements in including more diverse and heterogeneous hardware in High-Performance Computing (HPC) are challenging scientific software developers in their pursuit of efficient numerical methods with sustained performance across a diverse set of platforms. As a result, researchers are today forced to re-factor their codes to leverage these powerful new heterogeneous systems. We present our design considerations of Neko—a portable framework for high-fidelity spectral element flow simulations. Unlike prior works, Neko adopts a modern object-oriented Fortran 2008 approach, allowing multi-tier abstractions of the solver stack and facilitating various hardware backends ranging from general-purpose processors, accelerators down to exotic vector processors and Field-Programmable Gate Arrays (FPGAs). Focusing on the performance and portability of Neko, we describe the framework's device abstraction layer managing device memory, data transfer and kernel launches from Fortran, allowing for a solver written in a hardware-neutral yet performant way. Accelerator-specific optimizations are also discussed, with auto-tuning of key kernels and various communication strategies using device-aware MPI. Finally, we present performance measurements on a wide range of computing platforms, including the EuroHPC pre-exascale system LUMI, where Neko achieves excellent parallel efficiency for a large direct numerical simulation (DNS) of turbulent fluid flow using up to 80% of the entire LUMI supercomputer.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
具有广泛加速器支持的可扩展高保真仿真框架Neko-A的设计
在高性能计算(HPC)中包含更多样化和异构硬件的最新趋势和进步,正在挑战科学软件开发人员在不同平台上追求具有持续性能的高效数值方法。因此,研究人员今天被迫重构他们的代码,以利用这些强大的新的异构系统。我们提出了我们对neko的设计考虑-一个用于高保真光谱元素流动模拟的便携式框架。与之前的作品不同,Neko采用了现代面向对象的Fortran 2008方法,允许求解器堆栈的多层抽象,并促进了各种硬件后端,从通用处理器、加速器到奇异的矢量处理器和现场可编程门阵列(fpga)。专注于Neko的性能和可移植性,我们描述了框架的设备抽象层,管理设备内存,数据传输和Fortran内核启动,允许以硬件中立但性能良好的方式编写求解器。还讨论了特定于加速器的优化,使用设备感知MPI对关键内核和各种通信策略进行自动调优。最后,我们介绍了在各种计算平台上的性能测量,包括EuroHPC前百亿亿次系统LUMI, Neko在其中实现了出色的并行效率,用于湍流流动的大型直接数值模拟(DNS),使用整个LUMI超级计算机的80%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Concurrency and Computation-Practice & Experience
Concurrency and Computation-Practice & Experience 工程技术-计算机:理论方法
CiteScore
5.00
自引率
10.00%
发文量
664
审稿时长
9.6 months
期刊介绍: Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.
期刊最新文献
DynaGATNet: A Lightweight Dynamic Graph Attention Network for Multimodal Fusion in Industrial PTFE Blend Ratio Prediction DynaGATNet: A Lightweight Dynamic Graph Attention Network for Multimodal Fusion in Industrial PTFE Blend Ratio Prediction A Lightweight and Efficient Insulator Defect Detection Model for Unmanned Aerial Vehicle Inspection Breaking the Efficiency-Resilience Trade-Off: High-Performance Trunk Protection for Massive Topologies Using Cut-Resistant Edge Groups User Revocable Multiple-Replica Based Distributed Auditing Using Improved Lagrange Identity Signature With Geographic Location for Distributed Cloud Storage
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1