{"title":"在大规模分布式存储计算机上求解稀疏最小二乘问题","authors":"L. Yang","doi":"10.1109/APDC.1997.574029","DOIUrl":null,"url":null,"abstract":"In this paper we study the parallel aspects of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations, and incomplete modified Gram-Schmidt (IMGS) preconditioner for solving sparse least squares problems on massively parallel distributed memory computers. The performance of these methods on this kind of architecture is always limited because of the global communication required for the inner products. We describe the parallelization of PCGLS and IMGS preconditioner by two ways of improvement. One is to assemble the results of a number of inner products collectively and the other is to create situations when communication can be overlapped with computation. A theoretical model of computation and communication phases is presented which allows us to decide the number of processors that minimizes the runtime. Several numerical experiments on Parsytec GC/PowerPlus are presented.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"11651 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Solving sparse least squares problems on massively distributed memory computers\",\"authors\":\"L. Yang\",\"doi\":\"10.1109/APDC.1997.574029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we study the parallel aspects of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations, and incomplete modified Gram-Schmidt (IMGS) preconditioner for solving sparse least squares problems on massively parallel distributed memory computers. The performance of these methods on this kind of architecture is always limited because of the global communication required for the inner products. We describe the parallelization of PCGLS and IMGS preconditioner by two ways of improvement. One is to assemble the results of a number of inner products collectively and the other is to create situations when communication can be overlapped with computation. A theoretical model of computation and communication phases is presented which allows us to decide the number of processors that minimizes the runtime. Several numerical experiments on Parsytec GC/PowerPlus are presented.\",\"PeriodicalId\":413925,\"journal\":{\"name\":\"Proceedings. Advances in Parallel and Distributed Computing\",\"volume\":\"11651 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Advances in Parallel and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APDC.1997.574029\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Advances in Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APDC.1997.574029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Solving sparse least squares problems on massively distributed memory computers
In this paper we study the parallel aspects of PCGLS, a basic iterative method whose main idea is to organize the computation of conjugate gradient method with preconditioner applied to normal equations, and incomplete modified Gram-Schmidt (IMGS) preconditioner for solving sparse least squares problems on massively parallel distributed memory computers. The performance of these methods on this kind of architecture is always limited because of the global communication required for the inner products. We describe the parallelization of PCGLS and IMGS preconditioner by two ways of improvement. One is to assemble the results of a number of inner products collectively and the other is to create situations when communication can be overlapped with computation. A theoretical model of computation and communication phases is presented which allows us to decide the number of processors that minimizes the runtime. Several numerical experiments on Parsytec GC/PowerPlus are presented.