Machine-agnostic and Communication-aware Designs for MPI on Emerging Architectures

J. Hashmi, Shulei Xu, B. Ramesh, Mohammadreza Bayatpour, H. Subramoni, D. Panda
{"title":"Machine-agnostic and Communication-aware Designs for MPI on Emerging Architectures","authors":"J. Hashmi, Shulei Xu, B. Ramesh, Mohammadreza Bayatpour, H. Subramoni, D. Panda","doi":"10.1109/IPDPS47924.2020.00014","DOIUrl":null,"url":null,"abstract":"Modern multi-/many-cores offer higher core-density, hardware multi-threading, deeper memory hierarchies, and diverse architectural capabilities. While emerging cloud-based HPC systems are able to deliver near-native performance, they bring more diversity to the architectures. The Message Passing Interface (MPI) offers the flexibility to arbitrarily bind application processes to CPU cores, however the static nature of these binding policies typically does not take applications’ communication patterns and underlying machine architecture into consideration. This lack of association between the dynamic nature of applications and architectural diversity offered by modern processors makes it difficult for the application developers and MPI designers to exploit modern multi-/many-core systems to their full potential. In this paper, we propose a set of low-level benchmarking based approaches and MPI-level designs to infer vendor-specific machine characteristics e.g., physical to virtual machine topologies, and dynamic communication patterns of the applications. By utilizing this information, we propose two novel algorithms to construct efficient MPI mappings for any given architecture and application communication pattern. The proposed designs are implemented in the MVAPICH2 MPI library and are evaluated on three different architectures using various micro-benchmarks and application kernels. We demonstrate up to 2X performance improvement for MPI collectives, and up to 3.5X and 26% improvement for NAS-CG and miniAMR application kernels, respectively.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"8 1","pages":"32-41"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Modern multi-/many-cores offer higher core-density, hardware multi-threading, deeper memory hierarchies, and diverse architectural capabilities. While emerging cloud-based HPC systems are able to deliver near-native performance, they bring more diversity to the architectures. The Message Passing Interface (MPI) offers the flexibility to arbitrarily bind application processes to CPU cores, however the static nature of these binding policies typically does not take applications’ communication patterns and underlying machine architecture into consideration. This lack of association between the dynamic nature of applications and architectural diversity offered by modern processors makes it difficult for the application developers and MPI designers to exploit modern multi-/many-core systems to their full potential. In this paper, we propose a set of low-level benchmarking based approaches and MPI-level designs to infer vendor-specific machine characteristics e.g., physical to virtual machine topologies, and dynamic communication patterns of the applications. By utilizing this information, we propose two novel algorithms to construct efficient MPI mappings for any given architecture and application communication pattern. The proposed designs are implemented in the MVAPICH2 MPI library and are evaluated on three different architectures using various micro-benchmarks and application kernels. We demonstrate up to 2X performance improvement for MPI collectives, and up to 3.5X and 26% improvement for NAS-CG and miniAMR application kernels, respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
新兴体系结构上MPI的机器不可知和通信感知设计
现代多核/多核提供更高的核密度、硬件多线程、更深的内存层次结构和多样化的体系结构能力。虽然新兴的基于云的高性能计算系统能够提供接近本地的性能,但它们为架构带来了更多的多样性。消息传递接口(Message Passing Interface, MPI)提供了将应用程序进程任意绑定到CPU内核的灵活性,但是这些绑定策略的静态特性通常不会考虑应用程序的通信模式和底层机器体系结构。应用程序的动态特性与现代处理器提供的体系结构多样性之间缺乏联系,这使得应用程序开发人员和MPI设计人员难以充分利用现代多核/多核系统的潜力。在本文中,我们提出了一组基于低级基准测试的方法和mpi级设计,以推断供应商特定的机器特征,例如,物理到虚拟机的拓扑结构,以及应用程序的动态通信模式。通过利用这些信息,我们提出了两种新的算法来为任何给定的体系结构和应用程序通信模式构建有效的MPI映射。提出的设计在MVAPICH2 MPI库中实现,并使用各种微基准和应用程序内核在三种不同的体系结构上进行了评估。我们展示了MPI组的性能提高了2倍,NAS-CG和miniAMR应用程序内核的性能分别提高了3.5倍和26%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Asynch-SGBDT: Train Stochastic Gradient Boosting Decision Trees in an Asynchronous Parallel Manner Resilience at Extreme Scale and Connections with Other Domains A Tale of Two C's: Convergence and Composability 12 Ways to Fool the Masses with Irreproducible Results Is Asymptotic Cost Analysis Useful in Developing Practical Parallel Algorithms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1