节点聚合方法在MPI全局集合中的应用——矩阵块聚合算法

Proceedings of the 29th European MPI Users' Group Meeting Pub Date : 2022-09-14 DOI:10.1145/3555819.3555821

G. Chochia, David G. Solt, Joshua Hursey

{"title":"节点聚合方法在MPI全局集合中的应用——矩阵块聚合算法","authors":"G. Chochia, David G. Solt, Joshua Hursey","doi":"10.1145/3555819.3555821","DOIUrl":null,"url":null,"abstract":"This paper presents algorithms for all-to-all and all-to-all(v) MPI collectives optimized for small-medium messages and large task counts per node to support multicore CPUs in HPC systems. The complexity of these algorithms is analyzed for two metrics: the number of messages and the volume of data exchanged per task. These algorithms have optimal complexity for the second metric, which is better by a logarithmic factor than that in algorithms designed for short messages, with logarithmic complexity for the first metric. It is shown that the balance between these two metrics is key to achieving optimal performance. The performance advantage of the new algorithm is demonstrated at scale by comparing performance versus logarithmic algorithm implementations in Open MPI and Spectrum MPI. The two-phase design for the all-to-all(v) algorithm is presented. It combines efficient implementations for short and large messages in a single framework which is known to be an issue in logarithmic all-to-all(v) algorithms.","PeriodicalId":423846,"journal":{"name":"Proceedings of the 29th European MPI Users' Group Meeting","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Applying on Node Aggregation Methods to MPI Alltoall Collectives: Matrix Block Aggregation Algorithm\",\"authors\":\"G. Chochia, David G. Solt, Joshua Hursey\",\"doi\":\"10.1145/3555819.3555821\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents algorithms for all-to-all and all-to-all(v) MPI collectives optimized for small-medium messages and large task counts per node to support multicore CPUs in HPC systems. The complexity of these algorithms is analyzed for two metrics: the number of messages and the volume of data exchanged per task. These algorithms have optimal complexity for the second metric, which is better by a logarithmic factor than that in algorithms designed for short messages, with logarithmic complexity for the first metric. It is shown that the balance between these two metrics is key to achieving optimal performance. The performance advantage of the new algorithm is demonstrated at scale by comparing performance versus logarithmic algorithm implementations in Open MPI and Spectrum MPI. The two-phase design for the all-to-all(v) algorithm is presented. It combines efficient implementations for short and large messages in a single framework which is known to be an issue in logarithmic all-to-all(v) algorithms.\",\"PeriodicalId\":423846,\"journal\":{\"name\":\"Proceedings of the 29th European MPI Users' Group Meeting\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 29th European MPI Users' Group Meeting\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3555819.3555821\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th European MPI Users' Group Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3555819.3555821","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文提出了所有对所有和所有对所有(v) MPI集合的算法，这些算法针对每个节点的中小型消息和大型任务计数进行了优化，以支持HPC系统中的多核cpu。通过两个指标分析这些算法的复杂性:消息数量和每个任务交换的数据量。这些算法在第二个指标上具有最优的复杂度，它比为短消息设计的算法要好一个对数因子，在第一个指标上具有对数复杂度。结果表明，这两个指标之间的平衡是实现最佳性能的关键。通过比较Open MPI和Spectrum MPI中对数算法实现的性能，可以大规模地证明新算法的性能优势。给出了全对全(v)算法的两阶段设计。它在单个框架中结合了短消息和大消息的有效实现，这是对数全对全(v)算法中已知的一个问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Applying on Node Aggregation Methods to MPI Alltoall Collectives: Matrix Block Aggregation Algorithm

This paper presents algorithms for all-to-all and all-to-all(v) MPI collectives optimized for small-medium messages and large task counts per node to support multicore CPUs in HPC systems. The complexity of these algorithms is analyzed for two metrics: the number of messages and the volume of data exchanged per task. These algorithms have optimal complexity for the second metric, which is better by a logarithmic factor than that in algorithms designed for short messages, with logarithmic complexity for the first metric. It is shown that the balance between these two metrics is key to achieving optimal performance. The performance advantage of the new algorithm is demonstrated at scale by comparing performance versus logarithmic algorithm implementations in Open MPI and Spectrum MPI. The two-phase design for the all-to-all(v) algorithm is presented. It combines efficient implementations for short and large messages in a single framework which is known to be an issue in logarithmic all-to-all(v) algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 29th European MPI Users' Group Meeting

自引率

0.00%

发文量

期刊最新文献

Distributed Acceleration of Adhesive Dynamics Simulations Applying on Node Aggregation Methods to MPI Alltoall Collectives: Matrix Block Aggregation Algorithm Efficient Process Arrival Pattern Aware Collective Communication for Deep Learning Enabling Global MPI Process Addressing in MPI Applications Towards a Hybrid MPI Correctness Benchmark Suite