An Overview on Mixing MPI and OpenMP Dependent Tasking on A64FX

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops Pub Date : 2024-01-11 DOI:10.1145/3636480.3637094

Romain Pereira, A. Roussel, Miwako Tsuji, Patrick Carribault, Mitsuhisa Sato, Hitoshi Murai, Thierry Gautier

{"title":"An Overview on Mixing MPI and OpenMP Dependent Tasking on A64FX","authors":"Romain Pereira, A. Roussel, Miwako Tsuji, Patrick Carribault, Mitsuhisa Sato, Hitoshi Murai, Thierry Gautier","doi":"10.1145/3636480.3637094","DOIUrl":null,"url":null,"abstract":"The adoption of ARM processor architectures is on the rise in the HPC ecosystem. Fugaku supercomputer is a homogeneous ARM-based machine, and is one among the most powerful machine in the world. In the programming world, dependent task-based programming models are gaining tractions due to their many advantages: dynamic load balancing, implicit expression of communication/computation overlap, early-bird communication posting,...MPI and OpenMP are two widespreads programming standards that make possible task-based programming at a distributed memory level. Despite its many advantages, mixed-use of the standard programming models using dependent tasks is still under-evaluated on large-scale machines. In this paper, we provide an overview on mixing OpenMP dependent tasking model with MPI with the state-of-the-art software stack (GCC-13, Clang17, MPC-OMP). We provide the level of performances to expect by porting applications to such mixed-use of the standard on the Fugaku supercomputers, using two benchmarks (Cholesky, HPCCG) and a proxy-application (LULESH). We show that software stack, resource binding and communication progression mechanisms are factors that have a significant impact on performance. On distributed applications, performances reaches up to 80% of effiency for task-based applications like HPCCG. We also point-out a few areas of improvements in OpenMP runtimes.","PeriodicalId":120904,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops","volume":"4 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3636480.3637094","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The adoption of ARM processor architectures is on the rise in the HPC ecosystem. Fugaku supercomputer is a homogeneous ARM-based machine, and is one among the most powerful machine in the world. In the programming world, dependent task-based programming models are gaining tractions due to their many advantages: dynamic load balancing, implicit expression of communication/computation overlap, early-bird communication posting,...MPI and OpenMP are two widespreads programming standards that make possible task-based programming at a distributed memory level. Despite its many advantages, mixed-use of the standard programming models using dependent tasks is still under-evaluated on large-scale machines. In this paper, we provide an overview on mixing OpenMP dependent tasking model with MPI with the state-of-the-art software stack (GCC-13, Clang17, MPC-OMP). We provide the level of performances to expect by porting applications to such mixed-use of the standard on the Fugaku supercomputers, using two benchmarks (Cholesky, HPCCG) and a proxy-application (LULESH). We show that software stack, resource binding and communication progression mechanisms are factors that have a significant impact on performance. On distributed applications, performances reaches up to 80% of effiency for task-based applications like HPCCG. We also point-out a few areas of improvements in OpenMP runtimes.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A64FX 上混合使用 MPI 和 OpenMP 任务分配概述

在高性能计算生态系统中，ARM 处理器架构的采用呈上升趋势。Fugaku 超级计算机是基于 ARM 的同构机器，也是世界上最强大的机器之一。在编程领域，基于任务的依赖性编程模型因其诸多优势而越来越受到青睐：动态负载平衡、通信/计算重叠的隐式表达、早起的鸟儿有虫吃......MPI 和 OpenMP 是两种广泛应用的编程标准，它们使分布式内存级别的基于任务的编程成为可能。尽管MPI和OpenMP有很多优点，但在大规模机器上使用依赖任务的标准编程模型的混合使用仍未得到充分评估。本文概述了将 OpenMP 依赖任务模型与 MPI 混合使用的最新软件栈（GCC-13、Clang17、MPC-OMP）。我们使用两个基准测试（Cholesky、HPCCG）和一个代理应用程序（LULESH），介绍了在 Fugaku 超级计算机上将应用程序移植到这种混合使用标准所能达到的性能水平。我们发现，软件栈、资源绑定和通信进展机制是对性能有重大影响的因素。在分布式应用中，HPCCG 等基于任务的应用的性能最高可达效率的 80%。我们还指出了 OpenMP 运行时需要改进的几个方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops

自引率

0.00%

发文量