{"title":"Multi-level parallelism in the block-Jacobi SVD algorithm","authors":"G. Okša, M. Vajtersic","doi":"10.1109/EMPDP.2001.905057","DOIUrl":null,"url":null,"abstract":"We analyse the fine-grained parallelism of the two-sided block-Jacobi algorithm for the singular value decomposition (SVD) of matrix A/spl isin/R/sup m/spl times/n/, m/spl ges/n. The algorithm involves the class CO of parallel orderings on the two-dimensional toroidal mesh with p processors. The mathematical background is based on the QR decomposition (QRD) of local data matrices and on the triangular Kogbetliantz algorithm (TKA) for local SVDs in the diagonal mesh processors. Subsequent updates of local matrices in the diagonal as well as nondiagonal mesh processors are required. WE show that all updates can be realized by orthogonal modified Givens rotations. These rotations can be efficiently pipelined in parallel in the horizontal and vertical rings of /spl radic/p processors through the toroidal mesh. For one mesh processor our solution requires O[(m+n)/sup 2///sub p/] systolic processing elements (PEs). O(m/sup 2//p) local memory registers and O[(m+n)/sup 2//p] additional delay elements. The time complexity of our solution is O[(m+n/sup 3/2//p/sup 3/4/)/spl Delta/] time steps per one global iteration where /spl Delta/ is the length of the global synchronization time step that is given by evaluation and application of two modified Givens rotations in TKA.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EMPDP.2001.905057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We analyse the fine-grained parallelism of the two-sided block-Jacobi algorithm for the singular value decomposition (SVD) of matrix A/spl isin/R/sup m/spl times/n/, m/spl ges/n. The algorithm involves the class CO of parallel orderings on the two-dimensional toroidal mesh with p processors. The mathematical background is based on the QR decomposition (QRD) of local data matrices and on the triangular Kogbetliantz algorithm (TKA) for local SVDs in the diagonal mesh processors. Subsequent updates of local matrices in the diagonal as well as nondiagonal mesh processors are required. WE show that all updates can be realized by orthogonal modified Givens rotations. These rotations can be efficiently pipelined in parallel in the horizontal and vertical rings of /spl radic/p processors through the toroidal mesh. For one mesh processor our solution requires O[(m+n)/sup 2///sub p/] systolic processing elements (PEs). O(m/sup 2//p) local memory registers and O[(m+n)/sup 2//p] additional delay elements. The time complexity of our solution is O[(m+n/sup 3/2//p/sup 3/4/)/spl Delta/] time steps per one global iteration where /spl Delta/ is the length of the global synchronization time step that is given by evaluation and application of two modified Givens rotations in TKA.