Performance Evaluation of Open MPI on Cray XE/XK Systems

S. Gutierrez, N. Hjelm, Manjunath Gorentla Venkata, R. Graham
{"title":"Performance Evaluation of Open MPI on Cray XE/XK Systems","authors":"S. Gutierrez, N. Hjelm, Manjunath Gorentla Venkata, R. Graham","doi":"10.1109/HOTI.2012.11","DOIUrl":null,"url":null,"abstract":"Open MPI is a widely used open-source implementation of the MPI-2 standard that supports a variety of platforms and interconnects. Current versions of Open MPI, however, lack support for the Cray XE6 and XK6 architectures -- both of which use the Gemini System Interconnect. In this paper, we present extensions to natively support these architectures within Open MPI, describe and propose solutions for performance and scalability bottlenecks, and provide an extensive evaluation of our implementation, which is the first completely open-source MPI implementation for the Cray XE/XK system families used at 49,152 processes. Application and micro-benchmark results show that the performance and scaling characteristics of our implementation are similar to the vendor-supplied MPI's. Micro-benchmark results show short-data 1-byte and 1,024-byte message latencies of 1.20 μs and 4.13 μs, which are 10.00% and 39.71% better than the vendor-supplied MPI's, respectively. Our implementation achieves a bandwidth of 5.32 GB/s at 8 MB, which is similar to the vendor-supplied MPI's bandwidth at the same message size. Two Sequoia benchmark applications, LAMMPS and AMG2006, were also chosen to evaluate our implementation at scales up to 49,152 cores -- where we exhibited similar performance and scaling characteristics when compared to the vendor-supplied MPI implementation. LAMMPS achieved a parallel efficiency of 88.20% at 49,152 cores using Open MPI, which is on par with the vendor-supplied MPI's achieved parallel efficiency.","PeriodicalId":197180,"journal":{"name":"2012 IEEE 20th Annual Symposium on High-Performance Interconnects","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 20th Annual Symposium on High-Performance Interconnects","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HOTI.2012.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

Open MPI is a widely used open-source implementation of the MPI-2 standard that supports a variety of platforms and interconnects. Current versions of Open MPI, however, lack support for the Cray XE6 and XK6 architectures -- both of which use the Gemini System Interconnect. In this paper, we present extensions to natively support these architectures within Open MPI, describe and propose solutions for performance and scalability bottlenecks, and provide an extensive evaluation of our implementation, which is the first completely open-source MPI implementation for the Cray XE/XK system families used at 49,152 processes. Application and micro-benchmark results show that the performance and scaling characteristics of our implementation are similar to the vendor-supplied MPI's. Micro-benchmark results show short-data 1-byte and 1,024-byte message latencies of 1.20 μs and 4.13 μs, which are 10.00% and 39.71% better than the vendor-supplied MPI's, respectively. Our implementation achieves a bandwidth of 5.32 GB/s at 8 MB, which is similar to the vendor-supplied MPI's bandwidth at the same message size. Two Sequoia benchmark applications, LAMMPS and AMG2006, were also chosen to evaluate our implementation at scales up to 49,152 cores -- where we exhibited similar performance and scaling characteristics when compared to the vendor-supplied MPI implementation. LAMMPS achieved a parallel efficiency of 88.20% at 49,152 cores using Open MPI, which is on par with the vendor-supplied MPI's achieved parallel efficiency.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Open MPI在Cray XE/XK系统上的性能评价
Open MPI是一种广泛使用的MPI-2标准的开源实现,它支持各种平台和互连。然而,当前版本的Open MPI缺乏对Cray XE6和XK6架构的支持,这两种架构都使用Gemini System Interconnect。在本文中,我们提出了在Open MPI中本地支持这些架构的扩展,描述并提出了性能和可扩展性瓶颈的解决方案,并对我们的实现进行了广泛的评估,这是在49,152个进程中使用的Cray XE/XK系统系列中第一个完全开源的MPI实现。应用和微基准测试结果表明,我们实现的性能和扩展特性与供应商提供的MPI相似。微基准测试结果表明,短数据1字节和1024字节的消息延迟分别为1.20 μs和4.13 μs,比厂商提供的MPI分别提高10.00%和39.71%。我们的实现在8 MB时实现了5.32 GB/s的带宽,这与供应商提供的MPI在相同消息大小下的带宽相似。我们还选择了两个Sequoia基准应用程序LAMMPS和AMG2006来评估我们在高达49,152核的规模下的实现——与供应商提供的MPI实现相比,我们表现出了相似的性能和扩展特性。使用Open MPI, LAMMPS在49152个内核上实现了88.20%的并行效率,这与供应商提供的MPI实现的并行效率相当。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Tutorials - HOTI 2012 Occupancy Sampling for Terabit CEE Switches Performance Evaluation of Open MPI on Cray XE/XK Systems Rx Stack Accelerator for 10 GbE Integrated NIC Caliper: Precise and Responsive Traffic Generator
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1