In search of the best MPI-OpenMP distribution for optimum Intel-MIC cluster performance

G. Utrera, Marisa Gil, X. Martorell
{"title":"In search of the best MPI-OpenMP distribution for optimum Intel-MIC cluster performance","authors":"G. Utrera, Marisa Gil, X. Martorell","doi":"10.1109/HPCSim.2015.7237072","DOIUrl":null,"url":null,"abstract":"Applications for HPC platforms are mainly based on hybrid programming models: MPI for communication and OpenMP for task and fork-join parallelism to exploit shared memory communication inside a node. On the basis of this scheme, much research has been carried out to improve performance. Some examples are: the overlap of communication and computation, or the increase of speedup and bandwidth on new network fabrics (i.e. Infiniband and 10GB or 40GB ethernet). Henceforth, as far as computation and communication are concerned, the HPC platforms will be heterogeneous with high-speed networks. And, in this context, an important issue is to decide how to distribute the workload among all the nodes in order to balance the application execution as well as choosing the most appropriate programming model to exploit parallelism inside the node. In this paper we propose a mechanism to balance dynamically the work distribution among the heterogeneous components of an heterogeneous cluster based on their performance characteristics. For our evaluations we run the miniFE mini-application of the Mantevo suite benchmark, in a heterogeneous Intel MIC cluster. Experimental results show that making an effort to choose the appropriate number of threads can improve performance significantly over choosing the maximum available number of cores in the Intel MIC.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCSim.2015.7237072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Applications for HPC platforms are mainly based on hybrid programming models: MPI for communication and OpenMP for task and fork-join parallelism to exploit shared memory communication inside a node. On the basis of this scheme, much research has been carried out to improve performance. Some examples are: the overlap of communication and computation, or the increase of speedup and bandwidth on new network fabrics (i.e. Infiniband and 10GB or 40GB ethernet). Henceforth, as far as computation and communication are concerned, the HPC platforms will be heterogeneous with high-speed networks. And, in this context, an important issue is to decide how to distribute the workload among all the nodes in order to balance the application execution as well as choosing the most appropriate programming model to exploit parallelism inside the node. In this paper we propose a mechanism to balance dynamically the work distribution among the heterogeneous components of an heterogeneous cluster based on their performance characteristics. For our evaluations we run the miniFE mini-application of the Mantevo suite benchmark, in a heterogeneous Intel MIC cluster. Experimental results show that making an effort to choose the appropriate number of threads can improve performance significantly over choosing the maximum available number of cores in the Intel MIC.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
寻找最佳的MPI-OpenMP分布以获得最佳的Intel-MIC集群性能
HPC平台的应用程序主要基于混合编程模型:MPI用于通信,OpenMP用于任务和fork-join并行性,以利用节点内的共享内存通信。在此方案的基础上,进行了大量的研究,以提高性能。一些例子是:通信和计算的重叠,或者在新的网络结构(即Infiniband和10GB或40GB以太网)上加速和带宽的增加。今后,就计算和通信而言,高性能计算平台将与高速网络一起异构化。在这种情况下,一个重要的问题是决定如何在所有节点之间分配工作负载,以平衡应用程序的执行,以及选择最合适的编程模型来利用节点内部的并行性。本文提出了一种基于异构集群中异构组件性能特征的工作分配动态平衡机制。为了进行评估,我们在异构Intel MIC集群中运行了Mantevo套件基准测试的miniFE迷你应用程序。实验结果表明,与选择Intel MIC中最大可用核数相比,努力选择适当的线程数可以显著提高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Transient performance evaluation of cloud computing applications and dynamic resource control in large-scale distributed systems A security framework for population-scale genomics analysis Deep learning with shallow architecture for image classification A new reality requiers new ecosystems Investigation of DVFS based dynamic reliability management for chip multiprocessors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1