Implemention of a divide and conquer cyclic reduction algorithm on the FPS T-20 hypercube

Conference on Hypercube Concurrent Computers and Applications Pub Date : 1989-01-03 DOI:10.1145/63047.63111

C. Cox

{"title":"Implemention of a divide and conquer cyclic reduction algorithm on the FPS T-20 hypercube","authors":"C. Cox","doi":"10.1145/63047.63111","DOIUrl":null,"url":null,"abstract":"A simple variant of the odd-even cyclic reduction algorithm for solving tridiagonal linear systems is presented. The target architecture for this scheme is a parallel computer with nodes which are vector processors, such as the Floating Point Systems T-Series hypercube. Of particular interest is the case where the number of equations is much larger than the number of processors. The matrix system is partitioned into local subsystems, with the partitioning governed by a parameter which determines the amount of redundancy in computations. The algorithm proceeds after the distribution of local systems with independent computations, all-to-all broadcast of a small number of equations from each processor, solution of this subsystem, more independent computations, and output of the solution. Some redundancy in calculations between neighboring processors results in minimized communication costs. One feature of this approach is that computations are well balanced, as each processor executes an identical algebraic routine.\nA brief description of the standard cyclic reduction algorithm is given. Then the divide and conquer strategy is presented along with some estimates of speedup and efficiency. Finally, an Occam program for this algorithm which runs on the FPS T-20 computer is discussed along with experimental results.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Hypercube Concurrent Computers and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/63047.63111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

A simple variant of the odd-even cyclic reduction algorithm for solving tridiagonal linear systems is presented. The target architecture for this scheme is a parallel computer with nodes which are vector processors, such as the Floating Point Systems T-Series hypercube. Of particular interest is the case where the number of equations is much larger than the number of processors. The matrix system is partitioned into local subsystems, with the partitioning governed by a parameter which determines the amount of redundancy in computations. The algorithm proceeds after the distribution of local systems with independent computations, all-to-all broadcast of a small number of equations from each processor, solution of this subsystem, more independent computations, and output of the solution. Some redundancy in calculations between neighboring processors results in minimized communication costs. One feature of this approach is that computations are well balanced, as each processor executes an identical algebraic routine. A brief description of the standard cyclic reduction algorithm is given. Then the divide and conquer strategy is presented along with some estimates of speedup and efficiency. Finally, an Occam program for this algorithm which runs on the FPS T-20 computer is discussed along with experimental results.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种分治循环约简算法在FPS T-20超立方体上的实现

给出了求解三对角线性系统的奇偶循环约简算法的一个简单变体。该方案的目标体系结构是一个并行计算机，其节点是矢量处理器，例如浮点系统t系列超立方体。特别有趣的是，当方程的数量远远大于处理器的数量时。矩阵系统被划分为局部子系统，划分由一个参数控制，该参数决定了计算中的冗余量。该算法首先进行独立计算的局部系统分布、各处理器少量方程的全对全广播、该子系统的解、更独立的计算、解的输出。相邻处理器之间的冗余计算使通信成本最小化。这种方法的一个特点是计算很好地平衡，因为每个处理器执行相同的代数例程。给出了标准循环约简算法的简要描述。然后提出了分而治之的策略，并对加速和效率进行了估计。最后，讨论了该算法在FPS T-20计算机上的Occam程序，并给出了实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Conference on Hypercube Concurrent Computers and Applications

自引率

0.00%

发文量

期刊最新文献

Task allocation onto a hypercube by recursive mincut bipartitioning Comparison of two-dimensional FFT methods on the hypercube Best-first branch-and bound on a hypercube An interactive system for seismic velocity analysis QED on the connection machine