An Overview on Mixing MPI and OpenMP Dependent Tasking on A64FX

Romain Pereira, A. Roussel, Miwako Tsuji, Patrick Carribault, Mitsuhisa Sato, Hitoshi Murai, Thierry Gautier
{"title":"An Overview on Mixing MPI and OpenMP Dependent Tasking on A64FX","authors":"Romain Pereira, A. Roussel, Miwako Tsuji, Patrick Carribault, Mitsuhisa Sato, Hitoshi Murai, Thierry Gautier","doi":"10.1145/3636480.3637094","DOIUrl":null,"url":null,"abstract":"The adoption of ARM processor architectures is on the rise in the HPC ecosystem. Fugaku supercomputer is a homogeneous ARM-based machine, and is one among the most powerful machine in the world. In the programming world, dependent task-based programming models are gaining tractions due to their many advantages: dynamic load balancing, implicit expression of communication/computation overlap, early-bird communication posting,...MPI and OpenMP are two widespreads programming standards that make possible task-based programming at a distributed memory level. Despite its many advantages, mixed-use of the standard programming models using dependent tasks is still under-evaluated on large-scale machines. In this paper, we provide an overview on mixing OpenMP dependent tasking model with MPI with the state-of-the-art software stack (GCC-13, Clang17, MPC-OMP). We provide the level of performances to expect by porting applications to such mixed-use of the standard on the Fugaku supercomputers, using two benchmarks (Cholesky, HPCCG) and a proxy-application (LULESH). We show that software stack, resource binding and communication progression mechanisms are factors that have a significant impact on performance. On distributed applications, performances reaches up to 80% of effiency for task-based applications like HPCCG. We also point-out a few areas of improvements in OpenMP runtimes.","PeriodicalId":120904,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops","volume":"4 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3636480.3637094","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The adoption of ARM processor architectures is on the rise in the HPC ecosystem. Fugaku supercomputer is a homogeneous ARM-based machine, and is one among the most powerful machine in the world. In the programming world, dependent task-based programming models are gaining tractions due to their many advantages: dynamic load balancing, implicit expression of communication/computation overlap, early-bird communication posting,...MPI and OpenMP are two widespreads programming standards that make possible task-based programming at a distributed memory level. Despite its many advantages, mixed-use of the standard programming models using dependent tasks is still under-evaluated on large-scale machines. In this paper, we provide an overview on mixing OpenMP dependent tasking model with MPI with the state-of-the-art software stack (GCC-13, Clang17, MPC-OMP). We provide the level of performances to expect by porting applications to such mixed-use of the standard on the Fugaku supercomputers, using two benchmarks (Cholesky, HPCCG) and a proxy-application (LULESH). We show that software stack, resource binding and communication progression mechanisms are factors that have a significant impact on performance. On distributed applications, performances reaches up to 80% of effiency for task-based applications like HPCCG. We also point-out a few areas of improvements in OpenMP runtimes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A64FX 上混合使用 MPI 和 OpenMP 任务分配概述
在高性能计算生态系统中,ARM 处理器架构的采用呈上升趋势。Fugaku 超级计算机是基于 ARM 的同构机器,也是世界上最强大的机器之一。在编程领域,基于任务的依赖性编程模型因其诸多优势而越来越受到青睐:动态负载平衡、通信/计算重叠的隐式表达、早起的鸟儿有虫吃......MPI 和 OpenMP 是两种广泛应用的编程标准,它们使分布式内存级别的基于任务的编程成为可能。尽管MPI和OpenMP有很多优点,但在大规模机器上使用依赖任务的标准编程模型的混合使用仍未得到充分评估。本文概述了将 OpenMP 依赖任务模型与 MPI 混合使用的最新软件栈(GCC-13、Clang17、MPC-OMP)。我们使用两个基准测试(Cholesky、HPCCG)和一个代理应用程序(LULESH),介绍了在 Fugaku 超级计算机上将应用程序移植到这种混合使用标准所能达到的性能水平。我们发现,软件栈、资源绑定和通信进展机制是对性能有重大影响的因素。在分布式应用中,HPCCG 等基于任务的应用的性能最高可达效率的 80%。我们还指出了 OpenMP 运行时需要改进的几个方面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Introducing software pipelining for the A64FX processor into LLVM Performance Evaluation of the Fourth-Generation Xeon with Different Memory Characteristics An Overview on Mixing MPI and OpenMP Dependent Tasking on A64FX Optimize Efficiency of Utilizing Systems by Dynamic Core Binding MPI-Adapter2: An Automatic ABI Translation Library Builder for MPI Application Binary Portability
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1