Towards a Scalable and Efficient PGAS-based Distributed OpenMP

arXiv - CS - Performance Pub Date : 2024-09-04 DOI:arxiv-2409.02830

Baodi Shan, Mauricio Araya-Polo, Barbara Chapman

{"title":"Towards a Scalable and Efficient PGAS-based Distributed OpenMP","authors":"Baodi Shan, Mauricio Araya-Polo, Barbara Chapman","doi":"arxiv-2409.02830","DOIUrl":null,"url":null,"abstract":"MPI+X has been the de facto standard for distributed memory parallel\nprogramming. It is widely used primarily as an explicit two-sided communication\nmodel, which often leads to complex and error-prone code. Alternatively, PGAS\nmodel utilizes efficient one-sided communication and more intuitive\ncommunication primitives. In this paper, we present a novel approach that\nintegrates PGAS concepts into the OpenMP programming model, leveraging the LLVM\ncompiler infrastructure and the GASNet-EX communication library. Our model\naddresses the complexity associated with traditional MPI+OpenMP programming\nmodels while ensuring excellent performance and scalability. We evaluate our\napproach using a set of micro-benchmarks and application kernels on two\ndistinct platforms: Ookami from Stony Brook University and NERSC Perlmutter.\nThe results demonstrate that DiOMP achieves superior bandwidth and lower\nlatency compared to MPI+OpenMP, up to 25% higher bandwidth and down to 45% on\nlatency. DiOMP offers a promising alternative to the traditional MPI+OpenMP\nhybrid programming model, towards providing a more productive and efficient way\nto develop high-performance parallel applications for distributed memory\nsystems.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"40 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.02830","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

MPI+X has been the de facto standard for distributed memory parallel programming. It is widely used primarily as an explicit two-sided communication model, which often leads to complex and error-prone code. Alternatively, PGAS model utilizes efficient one-sided communication and more intuitive communication primitives. In this paper, we present a novel approach that integrates PGAS concepts into the OpenMP programming model, leveraging the LLVM compiler infrastructure and the GASNet-EX communication library. Our model addresses the complexity associated with traditional MPI+OpenMP programming models while ensuring excellent performance and scalability. We evaluate our approach using a set of micro-benchmarks and application kernels on two distinct platforms: Ookami from Stony Brook University and NERSC Perlmutter. The results demonstrate that DiOMP achieves superior bandwidth and lower latency compared to MPI+OpenMP, up to 25% higher bandwidth and down to 45% on latency. DiOMP offers a promising alternative to the traditional MPI+OpenMP hybrid programming model, towards providing a more productive and efficient way to develop high-performance parallel applications for distributed memory systems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

实现基于 PGAS 的可扩展高效分布式 OpenMP

MPI+X 一直是分布式内存并行编程的事实标准。它主要作为一种显式双面通信模型被广泛使用，这通常会导致代码复杂且容易出错。相反，PGAS 模型利用高效的单边通信和更直观的通信基元。在本文中，我们利用 LLVM 编译器基础架构和 GASNet-EX 通信库，提出了一种将 PGAS 概念集成到 OpenMP 编程模型中的新方法。我们的模型解决了与传统 MPI+OpenMP 编程模型相关的复杂性问题，同时确保了卓越的性能和可扩展性。我们在两个不同的平台上使用一组微基准和应用内核对我们的方法进行了评估：结果表明，与MPI+OpenMP相比，DiOMP实现了更优越的带宽和更低的延迟，带宽提高了25%，延迟降低了45%。DiOMP为传统的MPI+OpenMP混合编程模型提供了一种很有前途的替代方案，为分布式内存系统开发高性能并行应用程序提供了一种更有成效、更高效的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Performance

自引率

0.00%

发文量