{"title":"节点聚合方法在MPI全局集合中的应用——矩阵块聚合算法","authors":"G. Chochia, David G. Solt, Joshua Hursey","doi":"10.1145/3555819.3555821","DOIUrl":null,"url":null,"abstract":"This paper presents algorithms for all-to-all and all-to-all(v) MPI collectives optimized for small-medium messages and large task counts per node to support multicore CPUs in HPC systems. The complexity of these algorithms is analyzed for two metrics: the number of messages and the volume of data exchanged per task. These algorithms have optimal complexity for the second metric, which is better by a logarithmic factor than that in algorithms designed for short messages, with logarithmic complexity for the first metric. It is shown that the balance between these two metrics is key to achieving optimal performance. The performance advantage of the new algorithm is demonstrated at scale by comparing performance versus logarithmic algorithm implementations in Open MPI and Spectrum MPI. The two-phase design for the all-to-all(v) algorithm is presented. It combines efficient implementations for short and large messages in a single framework which is known to be an issue in logarithmic all-to-all(v) algorithms.","PeriodicalId":423846,"journal":{"name":"Proceedings of the 29th European MPI Users' Group Meeting","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Applying on Node Aggregation Methods to MPI Alltoall Collectives: Matrix Block Aggregation Algorithm\",\"authors\":\"G. Chochia, David G. Solt, Joshua Hursey\",\"doi\":\"10.1145/3555819.3555821\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents algorithms for all-to-all and all-to-all(v) MPI collectives optimized for small-medium messages and large task counts per node to support multicore CPUs in HPC systems. The complexity of these algorithms is analyzed for two metrics: the number of messages and the volume of data exchanged per task. These algorithms have optimal complexity for the second metric, which is better by a logarithmic factor than that in algorithms designed for short messages, with logarithmic complexity for the first metric. It is shown that the balance between these two metrics is key to achieving optimal performance. The performance advantage of the new algorithm is demonstrated at scale by comparing performance versus logarithmic algorithm implementations in Open MPI and Spectrum MPI. The two-phase design for the all-to-all(v) algorithm is presented. It combines efficient implementations for short and large messages in a single framework which is known to be an issue in logarithmic all-to-all(v) algorithms.\",\"PeriodicalId\":423846,\"journal\":{\"name\":\"Proceedings of the 29th European MPI Users' Group Meeting\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 29th European MPI Users' Group Meeting\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3555819.3555821\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th European MPI Users' Group Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3555819.3555821","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Applying on Node Aggregation Methods to MPI Alltoall Collectives: Matrix Block Aggregation Algorithm
This paper presents algorithms for all-to-all and all-to-all(v) MPI collectives optimized for small-medium messages and large task counts per node to support multicore CPUs in HPC systems. The complexity of these algorithms is analyzed for two metrics: the number of messages and the volume of data exchanged per task. These algorithms have optimal complexity for the second metric, which is better by a logarithmic factor than that in algorithms designed for short messages, with logarithmic complexity for the first metric. It is shown that the balance between these two metrics is key to achieving optimal performance. The performance advantage of the new algorithm is demonstrated at scale by comparing performance versus logarithmic algorithm implementations in Open MPI and Spectrum MPI. The two-phase design for the all-to-all(v) algorithm is presented. It combines efficient implementations for short and large messages in a single framework which is known to be an issue in logarithmic all-to-all(v) algorithms.