SHMEMPMI——基于共享内存的PMI，用于改进性能和可伸缩性

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI:10.1109/CCGrid.2016.99

S. Chakraborty, H. Subramoni, Jonathan L. Perkins, D. Panda

{"title":"SHMEMPMI——基于共享内存的PMI，用于改进性能和可伸缩性","authors":"S. Chakraborty, H. Subramoni, Jonathan L. Perkins, D. Panda","doi":"10.1109/CCGrid.2016.99","DOIUrl":null,"url":null,"abstract":"Dense systems with large number of cores per node are becoming increasingly popular. Existing designs of the Process Management Interface (PMI) show poor scalability in terms of performance and memory consumption on such systems with large number of processes concurrently accessing the PMI interface. Our analysis shows the local socket-based communication scheme used by PMI to be a major bottleneck. While using a shared memory based channel can avoid this bottleneck and thus reduce memory consumption and improve performance, there are several challenges associated with such a design. We investigate several such alternatives and propose a novel design that is based on a hybrid socket+shared memory based communication protocol and uses multiple shared memory regions. This design can reduce the memory usage per node by a factor of Processes per Node. Our evaluations show that memory consumption per node can be reduced by an estimated 1GB with 1 million MPI processes and 16 processes per node. Additionally, performance of PMI Get is improved by 1,000 times compared to the existing design. The proposed design is backward compatible, secure, and imposes negligible overhead.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"SHMEMPMI -- Shared Memory Based PMI for Improved Performance and Scalability\",\"authors\":\"S. Chakraborty, H. Subramoni, Jonathan L. Perkins, D. Panda\",\"doi\":\"10.1109/CCGrid.2016.99\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dense systems with large number of cores per node are becoming increasingly popular. Existing designs of the Process Management Interface (PMI) show poor scalability in terms of performance and memory consumption on such systems with large number of processes concurrently accessing the PMI interface. Our analysis shows the local socket-based communication scheme used by PMI to be a major bottleneck. While using a shared memory based channel can avoid this bottleneck and thus reduce memory consumption and improve performance, there are several challenges associated with such a design. We investigate several such alternatives and propose a novel design that is based on a hybrid socket+shared memory based communication protocol and uses multiple shared memory regions. This design can reduce the memory usage per node by a factor of Processes per Node. Our evaluations show that memory consumption per node can be reduced by an estimated 1GB with 1 million MPI processes and 16 processes per node. Additionally, performance of PMI Get is improved by 1,000 times compared to the existing design. The proposed design is backward compatible, secure, and imposes negligible overhead.\",\"PeriodicalId\":103641,\"journal\":{\"name\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2016.99\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.99","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

每个节点拥有大量核心的密集系统正变得越来越流行。进程管理接口(PMI)的现有设计在性能和内存消耗方面表现出较差的可伸缩性，因为在这样的系统中有大量进程并发地访问PMI接口。我们的分析表明，PMI使用的基于套接字的本地通信方案是一个主要瓶颈。虽然使用基于共享内存的通道可以避免这种瓶颈，从而减少内存消耗并提高性能，但这种设计存在一些挑战。我们研究了几种这样的替代方案，并提出了一种基于混合套接字+基于共享内存的通信协议的新设计，并使用多个共享内存区域。这种设计可以将每个节点的内存使用量降低到每个节点的进程数。我们的评估表明，使用100万个MPI进程和每个节点16个进程，每个节点的内存消耗可以减少约1GB。此外，与现有设计相比，PMI Get的性能提高了1000倍。所建议的设计是向后兼容的、安全的，并且可以忽略开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SHMEMPMI -- Shared Memory Based PMI for Improved Performance and Scalability

Dense systems with large number of cores per node are becoming increasingly popular. Existing designs of the Process Management Interface (PMI) show poor scalability in terms of performance and memory consumption on such systems with large number of processes concurrently accessing the PMI interface. Our analysis shows the local socket-based communication scheme used by PMI to be a major bottleneck. While using a shared memory based channel can avoid this bottleneck and thus reduce memory consumption and improve performance, there are several challenges associated with such a design. We investigate several such alternatives and propose a novel design that is based on a hybrid socket+shared memory based communication protocol and uses multiple shared memory regions. This design can reduce the memory usage per node by a factor of Processes per Node. Our evaluations show that memory consumption per node can be reduced by an estimated 1GB with 1 million MPI processes and 16 processes per node. Additionally, performance of PMI Get is improved by 1,000 times compared to the existing design. The proposed design is backward compatible, secure, and imposes negligible overhead.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

自引率

0.00%

发文量