The OpenMP Cluster Programming Model

Workshop Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-07-12 DOI:10.1145/3547276.3548444

H. Yviquel, M. Pereira, E. Francesquini, G. Valarini, Gustavo Leite, Pedro Rosso, Rodrigo Ceccato, Carla Cusihualpa, Vitoria Dias, S. Rigo, Alan Souza, G. Araújo

{"title":"The OpenMP Cluster Programming Model","authors":"H. Yviquel, M. Pereira, E. Francesquini, G. Valarini, Gustavo Leite, Pedro Rosso, Rodrigo Ceccato, Carla Cusihualpa, Vitoria Dias, S. Rigo, Alan Souza, G. Araújo","doi":"10.1145/3547276.3548444","DOIUrl":null,"url":null,"abstract":"Despite the various research initiatives and proposed programming models, efficient solutions for parallel programming in HPC clusters still rely on a complex combination of different programming models (e.g., OpenMP and MPI), languages (e.g., C++ and CUDA), and specialized runtimes (e.g., Charm++ and Legion). On the other hand, task parallelism has shown to be an efficient and seamless programming model for clusters. This paper introduces OpenMP Cluster (OMPC), a task-parallel model that extends OpenMP for cluster programming. OMPC leverages OpenMP’s offloading standard to distribute annotated regions of code across the nodes of a distributed system. To achieve that it hides MPI-based data distribution and load-balancing mechanisms behind OpenMP task dependencies. Given its compliance with OpenMP, OMPC allows applications to use the same programming model to exploit intra- and inter-node parallelism, thus simplifying the development process and maintenance. We evaluated OMPC using Task Bench, a synthetic benchmark focused on task parallelism, comparing its performance against other distributed runtimes. Experimental results show that OMPC can deliver up to 1.53x and 2.43x better performance than Charm++ on CCR and scalability experiments, respectively. Experiments also show that OMPC performance weakly scales for both Task Bench and a real-world seismic imaging application.","PeriodicalId":255540,"journal":{"name":"Workshop Proceedings of the 51st International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3547276.3548444","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Despite the various research initiatives and proposed programming models, efficient solutions for parallel programming in HPC clusters still rely on a complex combination of different programming models (e.g., OpenMP and MPI), languages (e.g., C++ and CUDA), and specialized runtimes (e.g., Charm++ and Legion). On the other hand, task parallelism has shown to be an efficient and seamless programming model for clusters. This paper introduces OpenMP Cluster (OMPC), a task-parallel model that extends OpenMP for cluster programming. OMPC leverages OpenMP’s offloading standard to distribute annotated regions of code across the nodes of a distributed system. To achieve that it hides MPI-based data distribution and load-balancing mechanisms behind OpenMP task dependencies. Given its compliance with OpenMP, OMPC allows applications to use the same programming model to exploit intra- and inter-node parallelism, thus simplifying the development process and maintenance. We evaluated OMPC using Task Bench, a synthetic benchmark focused on task parallelism, comparing its performance against other distributed runtimes. Experimental results show that OMPC can deliver up to 1.53x and 2.43x better performance than Charm++ on CCR and scalability experiments, respectively. Experiments also show that OMPC performance weakly scales for both Task Bench and a real-world seismic imaging application.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

OpenMP集群编程模型

尽管有各种各样的研究计划和提出的编程模型，高性能计算集群中并行编程的有效解决方案仍然依赖于不同编程模型(例如，OpenMP和MPI)、语言(例如，c++和CUDA)和专门的运行时(例如，Charm++和Legion)的复杂组合。另一方面，任务并行已被证明是一种高效且无缝的集群编程模型。本文介绍了OpenMP集群(OMPC)，这是一种扩展OpenMP用于集群编程的任务并行模型。OMPC利用OpenMP的卸载标准在分布式系统的节点之间分发带注释的代码区域。为此，它将基于mpi的数据分布和负载平衡机制隐藏在OpenMP任务依赖项后面。鉴于它与OpenMP的遵从性，OMPC允许应用程序使用相同的编程模型来利用节点内和节点间的并行性，从而简化了开发过程和维护。我们使用Task Bench(一个专注于任务并行性的综合基准)对OMPC进行了评估，并将其性能与其他分布式运行时进行了比较。实验结果表明，在CCR和可扩展性实验中，OMPC的性能分别比Charm++提高了1.53倍和2.43倍。实验还表明，OMPC的性能在Task Bench和实际地震成像应用中都是弱尺度的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊