S. Gutierrez, N. Hjelm, Manjunath Gorentla Venkata, R. Graham
{"title":"Open MPI在Cray XE/XK系统上的性能评价","authors":"S. Gutierrez, N. Hjelm, Manjunath Gorentla Venkata, R. Graham","doi":"10.1109/HOTI.2012.11","DOIUrl":null,"url":null,"abstract":"Open MPI is a widely used open-source implementation of the MPI-2 standard that supports a variety of platforms and interconnects. Current versions of Open MPI, however, lack support for the Cray XE6 and XK6 architectures -- both of which use the Gemini System Interconnect. In this paper, we present extensions to natively support these architectures within Open MPI, describe and propose solutions for performance and scalability bottlenecks, and provide an extensive evaluation of our implementation, which is the first completely open-source MPI implementation for the Cray XE/XK system families used at 49,152 processes. Application and micro-benchmark results show that the performance and scaling characteristics of our implementation are similar to the vendor-supplied MPI's. Micro-benchmark results show short-data 1-byte and 1,024-byte message latencies of 1.20 μs and 4.13 μs, which are 10.00% and 39.71% better than the vendor-supplied MPI's, respectively. Our implementation achieves a bandwidth of 5.32 GB/s at 8 MB, which is similar to the vendor-supplied MPI's bandwidth at the same message size. Two Sequoia benchmark applications, LAMMPS and AMG2006, were also chosen to evaluate our implementation at scales up to 49,152 cores -- where we exhibited similar performance and scaling characteristics when compared to the vendor-supplied MPI implementation. LAMMPS achieved a parallel efficiency of 88.20% at 49,152 cores using Open MPI, which is on par with the vendor-supplied MPI's achieved parallel efficiency.","PeriodicalId":197180,"journal":{"name":"2012 IEEE 20th Annual Symposium on High-Performance Interconnects","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Performance Evaluation of Open MPI on Cray XE/XK Systems\",\"authors\":\"S. Gutierrez, N. Hjelm, Manjunath Gorentla Venkata, R. Graham\",\"doi\":\"10.1109/HOTI.2012.11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Open MPI is a widely used open-source implementation of the MPI-2 standard that supports a variety of platforms and interconnects. Current versions of Open MPI, however, lack support for the Cray XE6 and XK6 architectures -- both of which use the Gemini System Interconnect. In this paper, we present extensions to natively support these architectures within Open MPI, describe and propose solutions for performance and scalability bottlenecks, and provide an extensive evaluation of our implementation, which is the first completely open-source MPI implementation for the Cray XE/XK system families used at 49,152 processes. Application and micro-benchmark results show that the performance and scaling characteristics of our implementation are similar to the vendor-supplied MPI's. Micro-benchmark results show short-data 1-byte and 1,024-byte message latencies of 1.20 μs and 4.13 μs, which are 10.00% and 39.71% better than the vendor-supplied MPI's, respectively. Our implementation achieves a bandwidth of 5.32 GB/s at 8 MB, which is similar to the vendor-supplied MPI's bandwidth at the same message size. Two Sequoia benchmark applications, LAMMPS and AMG2006, were also chosen to evaluate our implementation at scales up to 49,152 cores -- where we exhibited similar performance and scaling characteristics when compared to the vendor-supplied MPI implementation. LAMMPS achieved a parallel efficiency of 88.20% at 49,152 cores using Open MPI, which is on par with the vendor-supplied MPI's achieved parallel efficiency.\",\"PeriodicalId\":197180,\"journal\":{\"name\":\"2012 IEEE 20th Annual Symposium on High-Performance Interconnects\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 20th Annual Symposium on High-Performance Interconnects\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HOTI.2012.11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 20th Annual Symposium on High-Performance Interconnects","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HOTI.2012.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18
摘要
Open MPI是一种广泛使用的MPI-2标准的开源实现,它支持各种平台和互连。然而,当前版本的Open MPI缺乏对Cray XE6和XK6架构的支持,这两种架构都使用Gemini System Interconnect。在本文中,我们提出了在Open MPI中本地支持这些架构的扩展,描述并提出了性能和可扩展性瓶颈的解决方案,并对我们的实现进行了广泛的评估,这是在49,152个进程中使用的Cray XE/XK系统系列中第一个完全开源的MPI实现。应用和微基准测试结果表明,我们实现的性能和扩展特性与供应商提供的MPI相似。微基准测试结果表明,短数据1字节和1024字节的消息延迟分别为1.20 μs和4.13 μs,比厂商提供的MPI分别提高10.00%和39.71%。我们的实现在8 MB时实现了5.32 GB/s的带宽,这与供应商提供的MPI在相同消息大小下的带宽相似。我们还选择了两个Sequoia基准应用程序LAMMPS和AMG2006来评估我们在高达49,152核的规模下的实现——与供应商提供的MPI实现相比,我们表现出了相似的性能和扩展特性。使用Open MPI, LAMMPS在49152个内核上实现了88.20%的并行效率,这与供应商提供的MPI实现的并行效率相当。
Performance Evaluation of Open MPI on Cray XE/XK Systems
Open MPI is a widely used open-source implementation of the MPI-2 standard that supports a variety of platforms and interconnects. Current versions of Open MPI, however, lack support for the Cray XE6 and XK6 architectures -- both of which use the Gemini System Interconnect. In this paper, we present extensions to natively support these architectures within Open MPI, describe and propose solutions for performance and scalability bottlenecks, and provide an extensive evaluation of our implementation, which is the first completely open-source MPI implementation for the Cray XE/XK system families used at 49,152 processes. Application and micro-benchmark results show that the performance and scaling characteristics of our implementation are similar to the vendor-supplied MPI's. Micro-benchmark results show short-data 1-byte and 1,024-byte message latencies of 1.20 μs and 4.13 μs, which are 10.00% and 39.71% better than the vendor-supplied MPI's, respectively. Our implementation achieves a bandwidth of 5.32 GB/s at 8 MB, which is similar to the vendor-supplied MPI's bandwidth at the same message size. Two Sequoia benchmark applications, LAMMPS and AMG2006, were also chosen to evaluate our implementation at scales up to 49,152 cores -- where we exhibited similar performance and scaling characteristics when compared to the vendor-supplied MPI implementation. LAMMPS achieved a parallel efficiency of 88.20% at 49,152 cores using Open MPI, which is on par with the vendor-supplied MPI's achieved parallel efficiency.