{"title":"富士通VPP500并行超级计算机的迭代求解器包","authors":"Z. Leyk, M. Dow","doi":"10.1109/ICAPP.1995.472196","DOIUrl":null,"url":null,"abstract":"We are implementing iterative methods on the VPP500 parallel computer. During this process we have met with different kind of problems. It is easy to notice that performance on the VPP500 depends critically on the type of matrices taken for computations. In sparse computations, it is important to take advantage of the structure of the matrix. There can be a big difference between the performance obtained from a matrix stored in the diagonal format and one stored in a more general format. Therefore it is necessary to choose an appropriate format for a matrix used in computations. Preliminary tests show that implementation of the package is scalable with respect to the number of processors, especially for large problems. It is becoming clear for us that the traditional efficient preconditioning techniques result only in a speedup of factor 2 at best. We need to look for new preconditioners more adjusted for parallel computations. The polynomial preconditioning approach is attractive because of the negligible preprocessing cost involved. We favour the reverse communication interface for added flexibility necessary for doing tests with different storage formats and preconditioners. We can conclude that it is crucial to experiment with existing parallel machines to better understand the effects that are difficult to derive from theory, such as impact of communication costs or ways of storing data.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Package of iterative solvers for the Fujitsu VPP500 parallel supercomputer\",\"authors\":\"Z. Leyk, M. Dow\",\"doi\":\"10.1109/ICAPP.1995.472196\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We are implementing iterative methods on the VPP500 parallel computer. During this process we have met with different kind of problems. It is easy to notice that performance on the VPP500 depends critically on the type of matrices taken for computations. In sparse computations, it is important to take advantage of the structure of the matrix. There can be a big difference between the performance obtained from a matrix stored in the diagonal format and one stored in a more general format. Therefore it is necessary to choose an appropriate format for a matrix used in computations. Preliminary tests show that implementation of the package is scalable with respect to the number of processors, especially for large problems. It is becoming clear for us that the traditional efficient preconditioning techniques result only in a speedup of factor 2 at best. We need to look for new preconditioners more adjusted for parallel computations. The polynomial preconditioning approach is attractive because of the negligible preprocessing cost involved. We favour the reverse communication interface for added flexibility necessary for doing tests with different storage formats and preconditioners. We can conclude that it is crucial to experiment with existing parallel machines to better understand the effects that are difficult to derive from theory, such as impact of communication costs or ways of storing data.<<ETX>>\",\"PeriodicalId\":448130,\"journal\":{\"name\":\"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAPP.1995.472196\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPP.1995.472196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Package of iterative solvers for the Fujitsu VPP500 parallel supercomputer
We are implementing iterative methods on the VPP500 parallel computer. During this process we have met with different kind of problems. It is easy to notice that performance on the VPP500 depends critically on the type of matrices taken for computations. In sparse computations, it is important to take advantage of the structure of the matrix. There can be a big difference between the performance obtained from a matrix stored in the diagonal format and one stored in a more general format. Therefore it is necessary to choose an appropriate format for a matrix used in computations. Preliminary tests show that implementation of the package is scalable with respect to the number of processors, especially for large problems. It is becoming clear for us that the traditional efficient preconditioning techniques result only in a speedup of factor 2 at best. We need to look for new preconditioners more adjusted for parallel computations. The polynomial preconditioning approach is attractive because of the negligible preprocessing cost involved. We favour the reverse communication interface for added flexibility necessary for doing tests with different storage formats and preconditioners. We can conclude that it is crucial to experiment with existing parallel machines to better understand the effects that are difficult to derive from theory, such as impact of communication costs or ways of storing data.<>