LU decomposition of banded matrices and the solution of linear systems on hypercubes

D. Walker, T. Aldcroft, A. Cisneros, G. Fox, W. Furmanski
{"title":"LU decomposition of banded matrices and the solution of linear systems on hypercubes","authors":"D. Walker, T. Aldcroft, A. Cisneros, G. Fox, W. Furmanski","doi":"10.1145/63047.63124","DOIUrl":null,"url":null,"abstract":"We describe the solution of linear systems of equations, Ax = b, on distributed-memory concurrent computers whose interconnect topology contains a two-dimensional mesh. A is assumed to be an M×M banded matrix. The problem is generalized to the case in which there are nb distinct right-hand sides, b, and can thus be expressed as AX = B, where X and B are both M×nb matrices. The solution is obtained by the LU decomposition method which proceeds in three stages: (1) LU decomposition of the matrix A, (2) forward reduction, (3) back substitution. Since the matrix A is banded a simple rectangular subblock decomposition of the matrices A, X, and B over the nodes of the ensemble results in excessive load imbalance. A scattered decomposition is therefore used to decompose the data. The sequential and concurrent algorithms are described in detail, and models of the performance of the concurrent algorithm are presented for each of the three stages of the algorithm. In order to ensure numerical stability the algorithm is extended to include partial pivoting. Performance models for the pivoting case are also given. Results from a 128-node Caltech/JPL Mark II hypercube are presented, and the performance models are found to be a good agreement with these data. Indexing overhead was found to contribute significantly to the total concurrent overhead.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Hypercube Concurrent Computers and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/63047.63124","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

We describe the solution of linear systems of equations, Ax = b, on distributed-memory concurrent computers whose interconnect topology contains a two-dimensional mesh. A is assumed to be an M×M banded matrix. The problem is generalized to the case in which there are nb distinct right-hand sides, b, and can thus be expressed as AX = B, where X and B are both M×nb matrices. The solution is obtained by the LU decomposition method which proceeds in three stages: (1) LU decomposition of the matrix A, (2) forward reduction, (3) back substitution. Since the matrix A is banded a simple rectangular subblock decomposition of the matrices A, X, and B over the nodes of the ensemble results in excessive load imbalance. A scattered decomposition is therefore used to decompose the data. The sequential and concurrent algorithms are described in detail, and models of the performance of the concurrent algorithm are presented for each of the three stages of the algorithm. In order to ensure numerical stability the algorithm is extended to include partial pivoting. Performance models for the pivoting case are also given. Results from a 128-node Caltech/JPL Mark II hypercube are presented, and the performance models are found to be a good agreement with these data. Indexing overhead was found to contribute significantly to the total concurrent overhead.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
带阵的LU分解及超立方体上线性系统的解
本文描述了互连拓扑包含二维网格的分布式存储并发计算机上线性方程组Ax = b的解。假设A是一个M×M带状矩阵。这个问题推广到有nb个不同的右侧b的情况,因此可以表示为AX = b,其中X和b都是M×nb矩阵。解由LU分解法得到,该方法分三个阶段进行:(1)矩阵A的LU分解,(2)正向约简,(3)反向代入。由于矩阵A是带状的,在集合的节点上对矩阵A、X和B进行简单的矩形子块分解会导致过度的负载不平衡。因此,使用分散分解来分解数据。对顺序算法和并发算法进行了详细的描述,并对算法的三个阶段分别给出了并发算法的性能模型。为了保证数值稳定性,将算法扩展到包含部分枢轴。给出了轴向壳体的性能模型。给出了加州理工学院/喷气推进实验室Mark II超立方体的128个节点的计算结果,发现性能模型与这些数据很好地吻合。发现索引开销对总并发开销有很大贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Task allocation onto a hypercube by recursive mincut bipartitioning Comparison of two-dimensional FFT methods on the hypercube Best-first branch-and bound on a hypercube An interactive system for seismic velocity analysis QED on the connection machine
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1