{"title":"Mpi集体通信操作的实验结果","authors":"M. Bernaschi, G. Iannello, S. Crea","doi":"10.1142/S0129626405002179","DOIUrl":null,"url":null,"abstract":"Collective communication performance is critical in a number of MPI applications, yet relatively few results are available to assess the performance of mainstream MPI implementations. In this paper we focus on two widely used primitives, broadcast and reduce, and present experimental results for the Cray T3E and the IBM SP2. We compare the performance of the existing MPI primitives with our implementation based on a new algorithm. Our tests show that existing all-software implementations can be improved and highlight the advantages of the Cray hardware-assisted implementation.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"32 1","pages":"774-783"},"PeriodicalIF":0.5000,"publicationDate":"1999-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Experimental Results about Mpi Collective Communication Operations\",\"authors\":\"M. Bernaschi, G. Iannello, S. Crea\",\"doi\":\"10.1142/S0129626405002179\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Collective communication performance is critical in a number of MPI applications, yet relatively few results are available to assess the performance of mainstream MPI implementations. In this paper we focus on two widely used primitives, broadcast and reduce, and present experimental results for the Cray T3E and the IBM SP2. We compare the performance of the existing MPI primitives with our implementation based on a new algorithm. Our tests show that existing all-software implementations can be improved and highlight the advantages of the Cray hardware-assisted implementation.\",\"PeriodicalId\":44742,\"journal\":{\"name\":\"Parallel Processing Letters\",\"volume\":\"32 1\",\"pages\":\"774-783\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"1999-04-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Parallel Processing Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/S0129626405002179\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parallel Processing Letters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129626405002179","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Experimental Results about Mpi Collective Communication Operations
Collective communication performance is critical in a number of MPI applications, yet relatively few results are available to assess the performance of mainstream MPI implementations. In this paper we focus on two widely used primitives, broadcast and reduce, and present experimental results for the Cray T3E and the IBM SP2. We compare the performance of the existing MPI primitives with our implementation based on a new algorithm. Our tests show that existing all-software implementations can be improved and highlight the advantages of the Cray hardware-assisted implementation.
期刊介绍:
Parallel Processing Letters (PPL) aims to rapidly disseminate results on a worldwide basis in the field of parallel processing in the form of short papers. It fills the need for an information vehicle which can convey recent achievements and further the exchange of scientific information in the field. This journal has a wide scope and topics covered included: - design and analysis of parallel and distributed algorithms - theory of parallel computation - parallel programming languages - parallel programming environments - parallel architectures and VLSI circuits