{"title":"Solving Large Rank-Deficient Linear Least-Squares Problems on Shared-Memory CPU Architectures and GPU Architectures","authors":"Mónica Chillarón, Gregorio Quintana-Ortí, Vicente Vidal, Per-Gunnar Martinsson","doi":"arxiv-2408.05238","DOIUrl":null,"url":null,"abstract":"Solving very large linear systems of equations is a key computational task in\nscience and technology. In many cases, the coefficient matrix of the linear\nsystem is rank-deficient, leading to systems that may be underdetermined,\ninconsistent, or both. In such cases, one generally seeks to compute the least\nsquares solution that minimizes the residual of the problem, which can be\nfurther defined as the solution with smallest norm in cases where the\ncoefficient matrix has a nontrivial nullspace. This work presents several new\ntechniques for solving least squares problems involving coefficient matrices\nthat are so large that they do not fit in main memory. The implementations\ninclude both CPU and GPU variants. All techniques rely on complete orthogonal\ndecompositions that guarantee that both conditions of a least squares solution\nare met, regardless of the rank properties of the matrix. Specifically, they\nrely on the recently proposed \"randUTV\" algorithm that is particularly\neffective in strongly communication-constrained environments. A detailed\nprecision and performance study reveals that the new methods, that operate on\ndata stored on disk, are competitive with state-of-the-art methods that store\nall data in main memory.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.05238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Solving very large linear systems of equations is a key computational task in
science and technology. In many cases, the coefficient matrix of the linear
system is rank-deficient, leading to systems that may be underdetermined,
inconsistent, or both. In such cases, one generally seeks to compute the least
squares solution that minimizes the residual of the problem, which can be
further defined as the solution with smallest norm in cases where the
coefficient matrix has a nontrivial nullspace. This work presents several new
techniques for solving least squares problems involving coefficient matrices
that are so large that they do not fit in main memory. The implementations
include both CPU and GPU variants. All techniques rely on complete orthogonal
decompositions that guarantee that both conditions of a least squares solution
are met, regardless of the rank properties of the matrix. Specifically, they
rely on the recently proposed "randUTV" algorithm that is particularly
effective in strongly communication-constrained environments. A detailed
precision and performance study reveals that the new methods, that operate on
data stored on disk, are competitive with state-of-the-art methods that store
all data in main memory.