富士通VPP500并行超级计算机的迭代求解器包

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing Pub Date : 1995-04-19 DOI:10.1109/ICAPP.1995.472196

Z. Leyk, M. Dow

{"title":"富士通VPP500并行超级计算机的迭代求解器包","authors":"Z. Leyk, M. Dow","doi":"10.1109/ICAPP.1995.472196","DOIUrl":null,"url":null,"abstract":"We are implementing iterative methods on the VPP500 parallel computer. During this process we have met with different kind of problems. It is easy to notice that performance on the VPP500 depends critically on the type of matrices taken for computations. In sparse computations, it is important to take advantage of the structure of the matrix. There can be a big difference between the performance obtained from a matrix stored in the diagonal format and one stored in a more general format. Therefore it is necessary to choose an appropriate format for a matrix used in computations. Preliminary tests show that implementation of the package is scalable with respect to the number of processors, especially for large problems. It is becoming clear for us that the traditional efficient preconditioning techniques result only in a speedup of factor 2 at best. We need to look for new preconditioners more adjusted for parallel computations. The polynomial preconditioning approach is attractive because of the negligible preprocessing cost involved. We favour the reverse communication interface for added flexibility necessary for doing tests with different storage formats and preconditioners. We can conclude that it is crucial to experiment with existing parallel machines to better understand the effects that are difficult to derive from theory, such as impact of communication costs or ways of storing data.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Package of iterative solvers for the Fujitsu VPP500 parallel supercomputer\",\"authors\":\"Z. Leyk, M. Dow\",\"doi\":\"10.1109/ICAPP.1995.472196\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We are implementing iterative methods on the VPP500 parallel computer. During this process we have met with different kind of problems. It is easy to notice that performance on the VPP500 depends critically on the type of matrices taken for computations. In sparse computations, it is important to take advantage of the structure of the matrix. There can be a big difference between the performance obtained from a matrix stored in the diagonal format and one stored in a more general format. Therefore it is necessary to choose an appropriate format for a matrix used in computations. Preliminary tests show that implementation of the package is scalable with respect to the number of processors, especially for large problems. It is becoming clear for us that the traditional efficient preconditioning techniques result only in a speedup of factor 2 at best. We need to look for new preconditioners more adjusted for parallel computations. The polynomial preconditioning approach is attractive because of the negligible preprocessing cost involved. We favour the reverse communication interface for added flexibility necessary for doing tests with different storage formats and preconditioners. We can conclude that it is crucial to experiment with existing parallel machines to better understand the effects that are difficult to derive from theory, such as impact of communication costs or ways of storing data.<<ETX>>\",\"PeriodicalId\":448130,\"journal\":{\"name\":\"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAPP.1995.472196\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPP.1995.472196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们正在VPP500并行计算机上实现迭代方法。在这个过程中，我们遇到了各种各样的问题。很容易注意到，VPP500上的性能主要取决于用于计算的矩阵的类型。在稀疏计算中，利用矩阵的结构是很重要的。从以对角线格式存储的矩阵和以更通用格式存储的矩阵中获得的性能可能存在很大差异。因此，有必要为计算中使用的矩阵选择适当的格式。初步测试表明，相对于处理器的数量，包的实现是可扩展的，特别是对于大型问题。我们越来越清楚，传统的高效预处理技术最多只能使速度提高2倍。我们需要寻找更适合并行计算的新前置条件。由于所涉及的预处理成本可以忽略不计，多项式预处理方法很有吸引力。我们倾向于使用反向通信接口，以增加使用不同存储格式和前置条件进行测试所需的灵活性。我们可以得出结论，用现有的并行机器进行实验，以更好地理解难以从理论中得出的影响，如通信成本或存储数据方式的影响，是至关重要的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Package of iterative solvers for the Fujitsu VPP500 parallel supercomputer

We are implementing iterative methods on the VPP500 parallel computer. During this process we have met with different kind of problems. It is easy to notice that performance on the VPP500 depends critically on the type of matrices taken for computations. In sparse computations, it is important to take advantage of the structure of the matrix. There can be a big difference between the performance obtained from a matrix stored in the diagonal format and one stored in a more general format. Therefore it is necessary to choose an appropriate format for a matrix used in computations. Preliminary tests show that implementation of the package is scalable with respect to the number of processors, especially for large problems. It is becoming clear for us that the traditional efficient preconditioning techniques result only in a speedup of factor 2 at best. We need to look for new preconditioners more adjusted for parallel computations. The polynomial preconditioning approach is attractive because of the negligible preprocessing cost involved. We favour the reverse communication interface for added flexibility necessary for doing tests with different storage formats and preconditioners. We can conclude that it is crucial to experiment with existing parallel machines to better understand the effects that are difficult to derive from theory, such as impact of communication costs or ways of storing data.<>

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助