Solving Large Rank-Deficient Linear Least-Squares Problems on Shared-Memory CPU Architectures and GPU Architectures

Mónica Chillarón, Gregorio Quintana-Ortí, Vicente Vidal, Per-Gunnar Martinsson
{"title":"Solving Large Rank-Deficient Linear Least-Squares Problems on Shared-Memory CPU Architectures and GPU Architectures","authors":"Mónica Chillarón, Gregorio Quintana-Ortí, Vicente Vidal, Per-Gunnar Martinsson","doi":"arxiv-2408.05238","DOIUrl":null,"url":null,"abstract":"Solving very large linear systems of equations is a key computational task in\nscience and technology. In many cases, the coefficient matrix of the linear\nsystem is rank-deficient, leading to systems that may be underdetermined,\ninconsistent, or both. In such cases, one generally seeks to compute the least\nsquares solution that minimizes the residual of the problem, which can be\nfurther defined as the solution with smallest norm in cases where the\ncoefficient matrix has a nontrivial nullspace. This work presents several new\ntechniques for solving least squares problems involving coefficient matrices\nthat are so large that they do not fit in main memory. The implementations\ninclude both CPU and GPU variants. All techniques rely on complete orthogonal\ndecompositions that guarantee that both conditions of a least squares solution\nare met, regardless of the rank properties of the matrix. Specifically, they\nrely on the recently proposed \"randUTV\" algorithm that is particularly\neffective in strongly communication-constrained environments. A detailed\nprecision and performance study reveals that the new methods, that operate on\ndata stored on disk, are competitive with state-of-the-art methods that store\nall data in main memory.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.05238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Solving very large linear systems of equations is a key computational task in science and technology. In many cases, the coefficient matrix of the linear system is rank-deficient, leading to systems that may be underdetermined, inconsistent, or both. In such cases, one generally seeks to compute the least squares solution that minimizes the residual of the problem, which can be further defined as the solution with smallest norm in cases where the coefficient matrix has a nontrivial nullspace. This work presents several new techniques for solving least squares problems involving coefficient matrices that are so large that they do not fit in main memory. The implementations include both CPU and GPU variants. All techniques rely on complete orthogonal decompositions that guarantee that both conditions of a least squares solution are met, regardless of the rank properties of the matrix. Specifically, they rely on the recently proposed "randUTV" algorithm that is particularly effective in strongly communication-constrained environments. A detailed precision and performance study reveals that the new methods, that operate on data stored on disk, are competitive with state-of-the-art methods that store all data in main memory.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在共享内存 CPU 架构和 GPU 架构上解决大型缺阶线性最小二乘法问题
求解超大线性方程组是科学和技术领域的一项关键计算任务。在许多情况下,线性方程组的系数矩阵存在秩缺陷,导致方程组可能是未定方程、不一致方程或两者兼而有之。在这种情况下,人们通常寻求计算最小二乘法解,使问题的残差最小,在系数矩阵具有非三维空域的情况下,残差可进一步定义为具有最小规范的解。本研究提出了几种新技术,用于求解涉及系数矩阵大到无法放入主内存的最小二乘法问题。实现方法包括 CPU 和 GPU 变体。所有技术都依赖于完整的正交分解,无论矩阵的秩属性如何,都能保证满足最小二乘法求解的两个条件。具体来说,它们依赖于最近提出的 "randUTV "算法,该算法在通信受限的环境中特别有效。详细的精度和性能研究表明,新方法对存储在磁盘上的数据进行操作,与将所有数据存储在主存储器中的最先进方法相比,具有很强的竞争力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
HRA: A Multi-Criteria Framework for Ranking Metaheuristic Optimization Algorithms Temporal Load Imbalance on Ondes3D Seismic Simulator for Different Multicore Architectures Can Graph Reordering Speed Up Graph Neural Network Training? An Experimental Study The Landscape of GPU-Centric Communication A Global Perspective on the Past, Present, and Future of Video Streaming over Starlink
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1