{"title":"Benefits of Cross Memory Attach for MPI libraries on HPC Clusters","authors":"Jérôme Vienne","doi":"10.1145/2616498.2616532","DOIUrl":null,"url":null,"abstract":"With the number of cores per node increasing in modern clusters, an efficient implementation of intra-node communications is critical for application performance. MPI libraries generally use shared memory mechanisms for communication inside the node, unfortunately this approach has some limitations for large messages. The release of Linux kernel 3.2 introduced Cross Memory Attach (CMA) which is a mechanism to improve the communication between MPI processes inside the same node. But, as this feature is not enabled by default inside MPI libraries supporting it, it could be left disabled by HPC administrators which leads to a loss of performance benefits to users. In this paper, we explain how to use CMA and present an evaluation of CMA using micro-benchmarks and NAS parallel benchmarks (NPB) which are a set of applications commonly used to evaluate parallel systems.\n Our performance evaluation reveals that CMA outperforms shared memory performance for large messages. Micro-benchmark level evaluations show that CMA can enhance the performance by as much as a factor of four. With NPB, we see up to 24.75% improvement in total execution time for FT and up to 24.08% for IS.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"41 1","pages":"33:1-33:6"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2616498.2616532","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
With the number of cores per node increasing in modern clusters, an efficient implementation of intra-node communications is critical for application performance. MPI libraries generally use shared memory mechanisms for communication inside the node, unfortunately this approach has some limitations for large messages. The release of Linux kernel 3.2 introduced Cross Memory Attach (CMA) which is a mechanism to improve the communication between MPI processes inside the same node. But, as this feature is not enabled by default inside MPI libraries supporting it, it could be left disabled by HPC administrators which leads to a loss of performance benefits to users. In this paper, we explain how to use CMA and present an evaluation of CMA using micro-benchmarks and NAS parallel benchmarks (NPB) which are a set of applications commonly used to evaluate parallel systems.
Our performance evaluation reveals that CMA outperforms shared memory performance for large messages. Micro-benchmark level evaluations show that CMA can enhance the performance by as much as a factor of four. With NPB, we see up to 24.75% improvement in total execution time for FT and up to 24.08% for IS.