MPI performance on the SGI Power Challenge

Proceedings. Second MPI Developer's Conference Pub Date : 1996-07-01 DOI:10.1109/MPIDC.1996.534116

T. Loos, R. Bramley

{"title":"MPI performance on the SGI Power Challenge","authors":"T. Loos, R. Bramley","doi":"10.1109/MPIDC.1996.534116","DOIUrl":null,"url":null,"abstract":"The widely implemented MPI standard defines primitives for point-to-point and collective inter-processor communication (IPC), and synchronization based on message passing. The main reason to use a message passing standard is to ease the development, porting, and execution of applications on the variety of parallel computers that can support the paradigm, including shared memory, distributed memory, and shared memory array multiprocessors. The paper concentrates on the SGI Power Challenge, a shared memory multiprocessor with comparison results provided for the distributed memory Intel Paragon. Memory and communications tests written in C++ using messages of double precision arrays show both memory and MPI blocking IPC performance on the Power Challenge degrade once total message sizes grow larger than the second level cache. Comparing the MPI and memory performance curves indicate Power Challenge native MPI point-to-point communication is implemented using memory copying. A model of blocking IPC for the SGI Power Challenge was developed and validated with performance results for use as part of a cost function in a graph partitioning algorithm. A new measure of communications efficiency and overhead, the ratio of IPC time to memory copy time, is used to compare relative IPC performance. Comparison between the Power Challenge and the Paragon show that the Paragon is more efficient for small messages, but the Power Challenge is better on large messages. Power Challenge observations do not correspond well with Paragon results, indicating shared memory multiprocessor results should not be used to predict distributed memory multiprocessor performance. This suggests that relative performance of parallel algorithms should not judged based on one type of machine.","PeriodicalId":432081,"journal":{"name":"Proceedings. Second MPI Developer's Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Second MPI Developer's Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MPIDC.1996.534116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

The widely implemented MPI standard defines primitives for point-to-point and collective inter-processor communication (IPC), and synchronization based on message passing. The main reason to use a message passing standard is to ease the development, porting, and execution of applications on the variety of parallel computers that can support the paradigm, including shared memory, distributed memory, and shared memory array multiprocessors. The paper concentrates on the SGI Power Challenge, a shared memory multiprocessor with comparison results provided for the distributed memory Intel Paragon. Memory and communications tests written in C++ using messages of double precision arrays show both memory and MPI blocking IPC performance on the Power Challenge degrade once total message sizes grow larger than the second level cache. Comparing the MPI and memory performance curves indicate Power Challenge native MPI point-to-point communication is implemented using memory copying. A model of blocking IPC for the SGI Power Challenge was developed and validated with performance results for use as part of a cost function in a graph partitioning algorithm. A new measure of communications efficiency and overhead, the ratio of IPC time to memory copy time, is used to compare relative IPC performance. Comparison between the Power Challenge and the Paragon show that the Paragon is more efficient for small messages, but the Power Challenge is better on large messages. Power Challenge observations do not correspond well with Paragon results, indicating shared memory multiprocessor results should not be used to predict distributed memory multiprocessor performance. This suggests that relative performance of parallel algorithms should not judged based on one type of machine.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MPI在SGI动力挑战中的表现

广泛实现的MPI标准定义了点对点和集体处理器间通信(IPC)的原语，以及基于消息传递的同步。使用消息传递标准的主要原因是简化在各种支持该范式的并行计算机(包括共享内存、分布式内存和共享内存阵列多处理器)上开发、移植和执行应用程序的过程。本文重点介绍了共享内存多处理器SGI Power Challenge，并对分布式内存Intel Paragon提供了比较结果。使用双精度数组消息用c++编写的内存和通信测试表明，一旦消息总大小大于二级缓存，Power Challenge上的内存和MPI阻塞IPC性能都会下降。比较MPI和内存性能曲线表明，Power Challenge本机MPI点对点通信是使用内存复制实现的。开发了用于SGI Power Challenge的阻塞IPC模型，并使用性能结果验证了其作为图划分算法中成本函数的一部分。一种新的衡量通信效率和开销的方法——IPC时间与内存复制时间之比——被用来比较IPC的相对性能。Power Challenge和Paragon的比较表明，Paragon处理小消息效率更高，而Power Challenge处理大消息效率更高。Power Challenge的观察结果与Paragon的结果不太一致，这表明共享内存多处理器的结果不应该用于预测分布式内存多处理器的性能。这表明，并行算法的相对性能不应该基于一种类型的机器来判断。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings. Second MPI Developer's Conference

自引率

0.00%

发文量

期刊最新文献

Parallel molecular dynamics visualization using MPI with MPE graphics Early implementation of Para++ with MPI-2 Generalized communicators in the Message Passing Interface MPI performance on the SGI Power Challenge Cone beam tomography using MPI on heterogeneous workstation clusters