MVAPICH2 over OpenStack with SR-IOV: An Efficient Approach to Build HPC Clouds

Jie Zhang, Xiaoyi Lu, Mark Daniel Arnold, D. Panda
{"title":"MVAPICH2 over OpenStack with SR-IOV: An Efficient Approach to Build HPC Clouds","authors":"Jie Zhang, Xiaoyi Lu, Mark Daniel Arnold, D. Panda","doi":"10.1109/CCGrid.2015.166","DOIUrl":null,"url":null,"abstract":"Cloud Computing with Virtualization offers attractive flexibility and elasticity to deliver resources by providing a platform for consolidating complex IT resources in a scalable manner. However, efficiently running HPC applications on Cloud Computing systems is still full of challenges. One of the biggest hurdles in building efficient HPC clouds is the unsatisfactory performance offered by underlying virtualized environments, more specifically, virtualized I/O devices. Recently, Single Root I/O Virtualization (SR-IOV) technology has been steadily gaining momentum for high-performance interconnects such as InfiniBand and 10GigE. Due to its near native performance for inter-node communication, many cloud systems such as Amazon EC2 have been using SR-IOV in their production environments. Nevertheless, recent studies have shown that the SR-IOV scheme lacks locality aware communication support, which leads to performance overheads for inter-VM communication within the same physical node. In this paper, we propose an efficient approach to build HPC clouds based on MVAPICH2 over Open Stack with SR-IOV. We first propose an extension for Open Stack Nova system to enable the IV Shmem channel in deployed virtual machines. We further present and discuss our high-performance design of virtual machine aware MVAPICH2 library over Open Stack-based HPC Clouds. Our design can fully take advantage of high-performance SR-IOV communication for inter-node communication as well as Inter-VM Shmem (IVShmem) for intra-node communication. A comprehensive performance evaluation with micro-benchmarks and HPC applications has been conducted on an experimental Open Stack-based HPC cloud and Amazon EC2. The evaluation results on the experimental HPC cloud show that our design and extension can deliver near bare-metal performance for implementing SR-IOV-based HPC clouds with virtualization. Further, compared with the performance on EC2, our experimental HPC cloud can exhibit up to 160X, 65X, 12X improvement potential in terms of point-to-point, collective and application for future HPC clouds.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"31 1","pages":"71-80"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2015.166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

Abstract

Cloud Computing with Virtualization offers attractive flexibility and elasticity to deliver resources by providing a platform for consolidating complex IT resources in a scalable manner. However, efficiently running HPC applications on Cloud Computing systems is still full of challenges. One of the biggest hurdles in building efficient HPC clouds is the unsatisfactory performance offered by underlying virtualized environments, more specifically, virtualized I/O devices. Recently, Single Root I/O Virtualization (SR-IOV) technology has been steadily gaining momentum for high-performance interconnects such as InfiniBand and 10GigE. Due to its near native performance for inter-node communication, many cloud systems such as Amazon EC2 have been using SR-IOV in their production environments. Nevertheless, recent studies have shown that the SR-IOV scheme lacks locality aware communication support, which leads to performance overheads for inter-VM communication within the same physical node. In this paper, we propose an efficient approach to build HPC clouds based on MVAPICH2 over Open Stack with SR-IOV. We first propose an extension for Open Stack Nova system to enable the IV Shmem channel in deployed virtual machines. We further present and discuss our high-performance design of virtual machine aware MVAPICH2 library over Open Stack-based HPC Clouds. Our design can fully take advantage of high-performance SR-IOV communication for inter-node communication as well as Inter-VM Shmem (IVShmem) for intra-node communication. A comprehensive performance evaluation with micro-benchmarks and HPC applications has been conducted on an experimental Open Stack-based HPC cloud and Amazon EC2. The evaluation results on the experimental HPC cloud show that our design and extension can deliver near bare-metal performance for implementing SR-IOV-based HPC clouds with virtualization. Further, compared with the performance on EC2, our experimental HPC cloud can exhibit up to 160X, 65X, 12X improvement potential in terms of point-to-point, collective and application for future HPC clouds.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于SR-IOV的MVAPICH2 over OpenStack:构建高性能计算云的有效方法
虚拟化云计算提供了一个平台,以可扩展的方式整合复杂的IT资源,从而为交付资源提供了极具吸引力的灵活性和弹性。然而,在云计算系统上高效运行HPC应用程序仍然充满挑战。构建高效HPC云的最大障碍之一是底层虚拟化环境(更具体地说,是虚拟化I/O设备)提供的令人不满意的性能。最近,单根I/O虚拟化(SR-IOV)技术在InfiniBand和10GigE等高性能互连中得到了稳步发展。由于SR-IOV在节点间通信方面的性能接近原生,许多云系统(如Amazon EC2)已经在其生产环境中使用SR-IOV。然而,最近的研究表明,SR-IOV方案缺乏对位置感知的通信支持,这导致了同一物理节点内vm间通信的性能开销。在本文中,我们提出了一种基于MVAPICH2和SR-IOV在Open Stack上构建高性能计算云的有效方法。我们首先为Open Stack Nova系统提出了一个扩展,以在已部署的虚拟机中启用IV Shmem通道。我们进一步介绍和讨论了基于Open stack的高性能计算云的虚拟机感知MVAPICH2库的高性能设计。我们的设计可以充分利用节点间通信的高性能SR-IOV通信和节点内通信的Inter-VM Shmem (IVShmem)。在基于Open stack的实验HPC云和Amazon EC2上进行了微基准测试和HPC应用的综合性能评估。在实验HPC云上的评估结果表明,我们的设计和扩展可以提供接近裸机的性能,实现基于sr - iov的虚拟化HPC云。此外,与EC2上的性能相比,我们的实验HPC云在点对点、集合体和未来HPC云的应用方面都有高达160倍、65倍、12倍的提升潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Self Protecting Data Sharing Using Generic Policies Partition-Aware Routing to Improve Network Isolation in Infiniband Based Multi-tenant Clusters MIC-Tandem: Parallel X!Tandem Using MIC on Tandem Mass Spectrometry Based Proteomics Data Study of the KVM CPU Performance of Open-Source Cloud Management Platforms Visualizing City Events on Search Engine: Tword the Search Infrustration for Smart City
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1