Conference on Hypercube Concurrent Computers and Applications最新文献

英文中文

Intrinsically parallel multiscale algorithms for hypercubes 超立方体的内在并行多尺度算法

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63131

P. Frederickson, O. McBryan

Most algorithms implemented on parallel computers have been optimal serial algorithms, slightly modified or parallelized. An exciting possibility is the search for intrinsically parallel algorithms. These are algorithms which do not have a sensible serial equivalent — any serial equivalent is so inefficient as to be of little use.We describe a multiscale algorithm for the solution of PDE systems that is designed specifically for massively parallel supercomputers. Unlike conventional multigrid algorithms, the new algorithm utilizes the same number of processors at all times. Convergence rates are much faster than for standard multigrid methods — the solution error decreases by up to three digits per iteration. The basic idea is to solve many coarse scale problems simultaneously, combining the results in an optimal way to provide an improved fine scale solution.On massively parallel machines the improved convergence rate is attained at no extra computational cost since processors that would otherwise be sitting idle are utilized to provide the better convergence. Furthermore the algorithm is ideally suited to SIMD computers as well as MIMD computers. On serial machines the algorithm is much slower than standard multigrid because of the extra time spent on multiple coarse scales, though in certain cases the improved convergence rate may justify this — primarily in cases where other methods do not converge. The algorithm provides an extremely fast solution of various standard elliptic equations on machines such as the 65,536 processor Connection Machine, and uses only &Ogr; (log(N)) parallel machine instructions to solve such equations. The discovery of this algorithm was motivated entirely by new hardware. It was a surprise to the authors to find that developments in computer architecture might lead to new mathematics. Undoubtedly further intrinsically parallel algorithms await discovery.

在并行计算机上实现的大多数算法都是最优串行算法，稍加修改或并行化。一个令人兴奋的可能性是寻找本质上并行的算法。这些算法没有合理的序列等价物——任何序列等价物都是低效的，几乎没有什么用处。我们描述了一种多尺度的PDE系统求解算法，该算法是专门为大规模并行超级计算机设计的。与传统的多网格算法不同，新算法在任何时候都使用相同数量的处理器。收敛速度比标准的多重网格方法快得多——每次迭代的求解误差减少了三位数。其基本思想是同时解决许多粗尺度问题，将结果以最优方式结合起来，提供改进的细尺度解决方案。在大规模并行机器上，提高的收敛速度不需要额外的计算成本，因为可以利用原本闲置的处理器来提供更好的收敛速度。此外，该算法非常适合SIMD计算机和MIMD计算机。在串行机器上，该算法比标准多重网格慢得多，因为在多个粗尺度上花费了额外的时间，尽管在某些情况下，改进的收敛速度可能证明了这一点——主要是在其他方法不收敛的情况下。该算法在诸如65,536处理器连接机之类的机器上提供了各种标准椭圆方程的极快解，并且仅使用&Ogr;(log(N))条并行机器指令来求解这样的方程。这个算法的发现完全是由新的硬件驱动的。令作者吃惊的是，计算机体系结构的发展可能会带来新的数学。毫无疑问，进一步的内在并行算法有待发现。

{"title":"Intrinsically parallel multiscale algorithms for hypercubes","authors":"P. Frederickson, O. McBryan","doi":"10.1145/63047.63131","DOIUrl":"https://doi.org/10.1145/63047.63131","url":null,"abstract":"Most algorithms implemented on parallel computers have been optimal serial algorithms, slightly modified or parallelized. An exciting possibility is the search for intrinsically parallel algorithms. These are algorithms which do not have a sensible serial equivalent — any serial equivalent is so inefficient as to be of little use.\u0000We describe a multiscale algorithm for the solution of PDE systems that is designed specifically for massively parallel supercomputers. Unlike conventional multigrid algorithms, the new algorithm utilizes the same number of processors at all times. Convergence rates are much faster than for standard multigrid methods — the solution error decreases by up to three digits per iteration. The basic idea is to solve many coarse scale problems simultaneously, combining the results in an optimal way to provide an improved fine scale solution.\u0000On massively parallel machines the improved convergence rate is attained at no extra computational cost since processors that would otherwise be sitting idle are utilized to provide the better convergence. Furthermore the algorithm is ideally suited to SIMD computers as well as MIMD computers. On serial machines the algorithm is much slower than standard multigrid because of the extra time spent on multiple coarse scales, though in certain cases the improved convergence rate may justify this — primarily in cases where other methods do not converge. The algorithm provides an extremely fast solution of various standard elliptic equations on machines such as the 65,536 processor Connection Machine, and uses only &Ogr; (log(N)) parallel machine instructions to solve such equations. The discovery of this algorithm was motivated entirely by new hardware. It was a surprise to the authors to find that developments in computer architecture might lead to new mathematics. Undoubtedly further intrinsically parallel algorithms await discovery.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132802207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Large-grain pipelining on hypercube multiprocessors 超立方体多处理器上的大粒度流水线

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63119

C. King, L.M. Ni

A new paradigm, called large-grain pipelining, for developing efficient parallel algorithms on distributed-memory multiprocessors, e.g., hypercube machines, is introduced. Large-grain pipelining attempts to maximize the degree of overlapping and minimize the effect of communication overhead in a multiprocessor system through macro-pipelining between the nodes. Algorithms developed through large-grain pipelining to perform matrix multiplication are presented. To model the pipelined computations, an analytic model is introduced, which takes into account both underlying architecture and algorithm behavior. Through the analytic model, important design parameters, such as data partition sizes, can be determined. Experiments were conducted on a 64-node NCUBE multiprocessor. The measured results match closely with the analyzed results, which establishes the analytic model as an integral part of algorithm design. Comparison with an algorithm which does not use large-grain pipelining also shows that large-grain pipelining is an efficient scheme for achieving a greater parallelism.

介绍了一种新的范式，称为大粒度流水线，用于在分布式内存多处理器(如超立方体机器)上开发高效的并行算法。在多处理器系统中，大粒度管道试图通过节点之间的宏管道来最大化重叠程度和最小化通信开销的影响。提出了通过大粒度流水线实现矩阵乘法的算法。为了对流水线计算建模，引入了一个考虑底层架构和算法行为的分析模型。通过分析模型，可以确定重要的设计参数，如数据分区大小。实验在64节点的NCUBE多处理器上进行。实测结果与分析结果吻合较好，建立了作为算法设计组成部分的解析模型。与不使用大粒度流水线的算法进行比较也表明，大粒度流水线是实现更高并行度的有效方案。

引用次数: 8

Piriform (Olfactory) cortex model on the hypercube 超立方体上的梨状(嗅觉)皮质模型

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63052

J. Bower, M. Nelson, M. Wilson, G. Fox, W. Furmanski

We present a concurrent hypercube implementation of a neurophysiological model for the piriform (olfactory) cortex.The project was undertaken as the first step towards constructing a general neural network simulator on the hypercube, suitable both for applied and biological nets.The method presented here is expected to be useful for a class of complex and computationally expensive network models with long range connectivity and non-homogeneous activity patterns. The hypercube communication for the fully interconnected case is efficiently realized by the fold algorithm, constructed previously for problems in concurrent matrix algebra whereas the patchy activity is successfully load balanced by the scattered decomposition. We discuss also briefly other communication strategies, relevant for sparse and variable connectivities.Sample numerical results presented here were derived on the NCUBE hypercube at Caltech.

我们提出了一个并发的超立方体实现的神经生理模型的梨状(嗅觉)皮层。该项目是在超立方体上构建通用神经网络模拟器的第一步，适用于应用和生物网络。本文提出的方法有望用于一类具有长距离连通性和非均匀活动模式的复杂且计算成本高的网络模型。利用先前针对并发矩阵代数问题构建的折叠算法有效地实现了完全互联情况下的超立方体通信，而通过分散分解成功地实现了局部活动的负载平衡。我们还简要讨论了与稀疏连接和可变连接相关的其他通信策略。本文给出的示例数值结果是在加州理工学院的NCUBE超立方体上得出的。

引用次数: 12

Implementing Gauss Jordan on a hypercube multicomputer 在超立方体多计算机上实现高斯乔丹

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63117

A. Gerasoulis, Nikolaos Missirlis, I. Nelken, R. Peskin

We consider the solution of dense algebraic systems on the NCUBE hypercube via the Gauss Jordan method. Advanced loop interchange techniques are used to determine the appropriate algorithm for MIMD architectures. For a computer with p = n processors, we show that Gauss Jordan is competitive to Gaussian elimination when pivoting is not used. We experiment with three mappings of columns to processors: block, wrap and reflection. We demonstrate that load balancing the processors results in a considerable reduction of execution time.

利用高斯约当方法研究了NCUBE超立方体上密集代数系统的解。先进的环路交换技术用于确定适合MIMD体系结构的算法。对于具有p = n处理器的计算机，我们证明了在不使用旋转时高斯乔丹与高斯消去是竞争的。我们试验了列到处理器的三种映射:块、换行和反射。我们演示了处理器的负载平衡可以大大减少执行时间。

引用次数: 5

Process and workload migration for a parallel branch-and-bound algorithm on a hypercube multicomputer 超立方体多计算机上并行分支定界算法的进程和工作负载迁移

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63110

K. Schwan, J. Gawkowski, Sen Blake

This paper describes the design and experimental evaluation of a novel parallel implementation of a branch-and-bound algorithm for solving the Traveling Salesperson Problem on a 32 node Intel hypercube. Issues studied experimentally are trade-offs in speed, memory, and communication costs as well as the effects of workload balancing and node utilization on speedup.Since the actual distribution of work among the parallel tasks of the TSP application cannot be predicted in advance, strategies and tradeoffs regarding the migration of processes from heavily loaded processors or the migration of work from heavily loaded processes can be studied. Toward this end, we have implemented operating system constructs for work and for process migration as extensions to the Intel iPSC hypercube's operating system. Furthermore, operating system support for the rapid sharing of intermediate values of the global objective function being optimized (i.e. 'tour' values in TSP) are provided.

本文描述了一种在32节点Intel超立方体上求解旅行销售人员问题的分支定界算法的新型并行实现的设计和实验评估。实验研究的问题是速度、内存和通信成本的权衡，以及工作负载平衡和节点利用率对加速的影响。由于TSP应用程序的并行任务之间的实际工作分配不能提前预测，因此可以研究从负载沉重的处理器迁移进程或从负载沉重的进程迁移工作的策略和权衡。为此，我们实现了用于工作和进程迁移的操作系统结构，作为对Intel iPSC超立方体操作系统的扩展。此外，优化了操作系统对全局目标函数中间值快速共享的支持(即。提供了TSP中的“tour”值。

引用次数: 12

Finding eigenvalues and eigenvectors of unsymmetric matrices using a hypercube multiprocessor 利用超立方多处理机求非对称矩阵的特征值和特征向量

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63118

A. Geist, R. Ward, G. J. Davis, R. Funderlic

Distributed-memory algorithms for finding the eigenvalues and eigenvectors of a dense unsymmetric matrix are given. While several parallel algorithms have been developed for symmetric systems, little work has been done on the unsymmetric case. Our parallel implementation proceeds in three major steps: reduction of the original matrix to Hessenberg form, application of the implicit double-shift QR algorithm to compute the eigenvalues, and back transformations to compute the eigenvectors. Several modifications to our parallel QR algorithm, including ring communication and pipelining, are discussed and compared. Results and timings are given.

给出了求密集非对称矩阵特征值和特征向量的分布式记忆算法。虽然针对对称系统已经开发了几种并行算法，但针对非对称情况的并行算法却很少。我们的并行实现分为三个主要步骤:将原始矩阵简化为Hessenberg形式，应用隐式双移QR算法来计算特征值，以及反向变换来计算特征向量。讨论并比较了对并行QR算法的改进，包括环通信和流水线。给出了实验结果和时间安排。

引用次数: 20

Best-first branch-and bound on a hypercube 超立方体上的最佳优先分支

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63107

E. Felten

The branch-and-bound technique is a common method for finding exact solutions to difficult problems in combinatorial optimization. This paper will discuss issues surrounding implementation of a particular branch-and-bound algorithm for the traveling-salesman problem on a hypercube multi-computer.The natural parallel algorithm is based on a number of asynchronous processes which interact through a globally shared list of unfinished work. In a distributed-memory environment we must find a non-centralized version of this shared data structure. In addition, detecting termination of the computation is tricky; an algorithm will be presented which ensures proper termination. Finally, issues affecting performance will be discussed.

分支定界法是求解组合优化难题精确解的常用方法。本文讨论了在超立方体多计算机上旅行商问题的一种分支定界算法的实现问题。自然并行算法基于许多异步进程，这些进程通过全局共享的未完成工作列表进行交互。在分布式内存环境中，我们必须找到这种共享数据结构的非集中式版本。此外，检测计算的终止是棘手的;本文将提出一种算法来保证适当的终止。最后，将讨论影响性能的问题。

引用次数: 21

Implementing a distributed combat simulation on the Time Warp operating system 在Time Warp操作系统上实现分布式战斗模拟

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63080

F. Wieland, L. Hawley, A. Feinberg

Utilizing the Time Warp Operating System, the CTLS project at JPL has produced a distributed combat simulation called STB-87 and measured its performance on the JPL Mark III Hypercube. By applying the spiral model of software development, the CTLS project will produce a series of software test beds, to culminate in the completion of a working prototype theater level simulation three to five years hence. STB-87, the first software test bed, is a ground-based combat simulation decomposed into objects which communicate via time-stamped messages. The use of incremental object-based design, coding, and testing has been helpful when developing a parallel simulation. The performance measurements show that, with the appropriate choice of object granularity, STB-87 is able to achieve a speedup factor of 12 running on a 32-node Mark III Hypercube.

利用时间扭曲操作系统，JPL的CTLS项目制作了一个名为STB-87的分布式战斗模拟，并在JPL Mark III Hypercube上测量了其性能。通过应用软件开发的螺旋模型，CTLS项目将产生一系列的软件测试平台，最终在三至五年后完成一个战区级模拟的工作原型。STB-87是第一个软件测试平台，是一个基于地面的战斗模拟，分解成通过时间戳消息进行通信的对象。在开发并行模拟时，使用增量的基于对象的设计、编码和测试是有帮助的。性能测量表明，通过适当选择对象粒度，STB-87能够在32节点Mark III Hypercube上实现12的加速系数。

引用次数: 4

Finite difference time domain solution of electromagnetic scattering on the hypercube 超立方体上电磁散射的时域有限差分解

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63062

Ruel H. Calalo, J. Lyons, W. Imbriale

Electromagnetic fields interacting with a dielectric or conducting structure produce scattered electromagnetic fields. To model the fields produced by complicated, volumetric structures, the finite difference time domain (FDTD) method employs an iterative solution to Maxwell's time dependent curl equations. Implementations of the FDTD method intensively use memory and perform numerous calculations per time step iteration. We implemented an FDTD code on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. This code allows us to solve problems requiring as many as 2,048,000 unit cells on a 32 node Hypercube. For smaller problems, the code produces solutions in a fraction of the time to solve the same problems on sequential computers.

电磁场与电介质或导电结构相互作用产生散射电磁场。为了模拟由复杂的体积结构产生的场，时域有限差分(FDTD)方法采用了麦克斯韦时间相关旋度方程的迭代解。FDTD方法的实现大量使用内存，并在每次时间步迭代中执行大量计算。我们在加州理工学院/喷气推进实验室Mark III Hypercube上实现了FDTD代码。这段代码允许我们解决在32节点Hypercube上需要多达2,048,000个单元格的问题。对于较小的问题，代码可以在顺序计算机上解决相同问题的一小部分时间内生成解决方案。

引用次数: 2

Shift-register sequence random number generators on the hypercube conurrent computers 超立方体并发计算机上的移位寄存器序列随机数生成器

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63098

T. Chiu

We discuss the design of a class of shift-register sequence random number generators for the MIMD parallel computers, and particularly for the hypercube concurrent computers. The simplest implementation is to have each processor generating its own sequence provided that the initial seeds are linearly independent. We generate these initial seeds by using distinct linear congruential generators and finally bit-by-bit-exclusive-or with the system time in microseconds. Our shift-register sequence random number generators are coded in C and run under the CUBIX. The statistical tests are performed on each sequence generated by every single processor as well as on the combined sequence produced by all processors. The tests include chi- square, Kolmogorov-Smirnov, auto-correlation, runlength and n-tuple distribution tests. A statistical test has been devised for testing the sequences of random numbers generated by a MIMD parallel computer. Our test results indicate that our generators do provide independent sequences of random numbers with extremely long periods.

我们讨论了一类移位寄存器序列随机数生成器的设计，特别是用于MIMD并行计算机，特别是超立方体并行计算机。最简单的实现是让每个处理器生成自己的序列，前提是初始种子是线性独立的。我们通过使用不同的线性同余生成器生成这些初始种子，并最终以微秒为单位逐位排他地生成这些种子。我们的移位寄存器序列随机数生成器是用C语言编码的，并在CUBIX下运行。统计测试对每个处理器生成的每个序列以及所有处理器生成的组合序列执行。检验包括卡方检验、柯尔莫哥洛夫-斯米尔诺夫检验、自相关检验、运行长度检验和n元组分布检验。本文设计了一种统计测试方法，用于测试由MIMD并行计算机生成的随机数序列。我们的测试结果表明，我们的生成器确实提供了具有极长周期的独立随机数序列。

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Conference on Hypercube Concurrent Computers and Applications

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀