{"title":"MPI performance on the SGI Power Challenge","authors":"T. Loos, R. Bramley","doi":"10.1109/MPIDC.1996.534116","DOIUrl":null,"url":null,"abstract":"The widely implemented MPI standard defines primitives for point-to-point and collective inter-processor communication (IPC), and synchronization based on message passing. The main reason to use a message passing standard is to ease the development, porting, and execution of applications on the variety of parallel computers that can support the paradigm, including shared memory, distributed memory, and shared memory array multiprocessors. The paper concentrates on the SGI Power Challenge, a shared memory multiprocessor with comparison results provided for the distributed memory Intel Paragon. Memory and communications tests written in C++ using messages of double precision arrays show both memory and MPI blocking IPC performance on the Power Challenge degrade once total message sizes grow larger than the second level cache. Comparing the MPI and memory performance curves indicate Power Challenge native MPI point-to-point communication is implemented using memory copying. A model of blocking IPC for the SGI Power Challenge was developed and validated with performance results for use as part of a cost function in a graph partitioning algorithm. A new measure of communications efficiency and overhead, the ratio of IPC time to memory copy time, is used to compare relative IPC performance. Comparison between the Power Challenge and the Paragon show that the Paragon is more efficient for small messages, but the Power Challenge is better on large messages. Power Challenge observations do not correspond well with Paragon results, indicating shared memory multiprocessor results should not be used to predict distributed memory multiprocessor performance. This suggests that relative performance of parallel algorithms should not judged based on one type of machine.","PeriodicalId":432081,"journal":{"name":"Proceedings. Second MPI Developer's Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Second MPI Developer's Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MPIDC.1996.534116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The widely implemented MPI standard defines primitives for point-to-point and collective inter-processor communication (IPC), and synchronization based on message passing. The main reason to use a message passing standard is to ease the development, porting, and execution of applications on the variety of parallel computers that can support the paradigm, including shared memory, distributed memory, and shared memory array multiprocessors. The paper concentrates on the SGI Power Challenge, a shared memory multiprocessor with comparison results provided for the distributed memory Intel Paragon. Memory and communications tests written in C++ using messages of double precision arrays show both memory and MPI blocking IPC performance on the Power Challenge degrade once total message sizes grow larger than the second level cache. Comparing the MPI and memory performance curves indicate Power Challenge native MPI point-to-point communication is implemented using memory copying. A model of blocking IPC for the SGI Power Challenge was developed and validated with performance results for use as part of a cost function in a graph partitioning algorithm. A new measure of communications efficiency and overhead, the ratio of IPC time to memory copy time, is used to compare relative IPC performance. Comparison between the Power Challenge and the Paragon show that the Paragon is more efficient for small messages, but the Power Challenge is better on large messages. Power Challenge observations do not correspond well with Paragon results, indicating shared memory multiprocessor results should not be used to predict distributed memory multiprocessor performance. This suggests that relative performance of parallel algorithms should not judged based on one type of machine.