首页 > 最新文献

The Sixth Distributed Memory Computing Conference, 1991. Proceedings最新文献

英文 中文
Distributed particle based fluid flow simulation 基于分布粒子的流体流动模拟
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633164
T. Gilman, T. Huntsberger, P. Sharma
Many attempts have been made t o simulate the motion of non-rigid objects. While there have been many successes in this area, the animation of fluids is still a relatively unconquered frontier. This paper describes a distributed model for fluid flow study based on behavioral simulation of atom-like particles. These particles define the size and shape of the fluid. In addiiion, these particles have inertia and respond to attraction, repulsion and gravitation. Unlike previous fluid flow systems, inter-particle forces are explicitly included an the model. A distributed mapping of the particle database similar to recent load-balanced PIC studies [5, 61 allows large numbers of particles to be included in the model. We also present the results of some experimental studies performed on the NCUBE/lD system at the University of South Carolina.
为了模拟非刚性物体的运动,人们做了许多尝试。虽然在这一领域取得了许多成功,但流体的动画仍然是一个相对未被征服的前沿。本文介绍了一种基于类原子粒子行为模拟的流体流动研究的分布式模型。这些颗粒决定了流体的大小和形状。此外,这些粒子具有惯性,并对引力、斥力和万有引力作出反应。与以前的流体流动系统不同,粒子间力被明确地包括在模型中。粒子数据库的分布式映射类似于最近的负载平衡PIC研究[5,61],允许将大量粒子包含在模型中。我们还介绍了南卡罗莱纳大学在NCUBE/lD系统上进行的一些实验研究的结果。
{"title":"Distributed particle based fluid flow simulation","authors":"T. Gilman, T. Huntsberger, P. Sharma","doi":"10.1109/DMCC.1991.633164","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633164","url":null,"abstract":"Many attempts have been made t o simulate the motion of non-rigid objects. While there have been many successes in this area, the animation of fluids is still a relatively unconquered frontier. This paper describes a distributed model for fluid flow study based on behavioral simulation of atom-like particles. These particles define the size and shape of the fluid. In addiiion, these particles have inertia and respond to attraction, repulsion and gravitation. Unlike previous fluid flow systems, inter-particle forces are explicitly included an the model. A distributed mapping of the particle database similar to recent load-balanced PIC studies [5, 61 allows large numbers of particles to be included in the model. We also present the results of some experimental studies performed on the NCUBE/lD system at the University of South Carolina.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129349447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fault Tolerant Communication in the C.NET High Levell Programming Environment C.NET高级编程环境中的容错通信
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633361
J. Adamo, J. Benneville, C. Bonello, L. Trejo
This work is pad of a high-level environment we are developing for a reconfigurable transputer-based machine. It deals with the design of a virtual channel monitor. A protocol is described which, among other things, allows pre-emption of communications and possible failure of the links to be handled consistently.
这项工作是一个高层次的环境,我们正在开发一个可重构的基于传输器的机器。介绍了一种虚拟信道监视器的设计。本文描述了一种协议,除其他事项外,该协议允许对通信的抢占和可能的链路故障进行一致的处理。
{"title":"Fault Tolerant Communication in the C.NET High Levell Programming Environment","authors":"J. Adamo, J. Benneville, C. Bonello, L. Trejo","doi":"10.1109/DMCC.1991.633361","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633361","url":null,"abstract":"This work is pad of a high-level environment we are developing for a reconfigurable transputer-based machine. It deals with the design of a virtual channel monitor. A protocol is described which, among other things, allows pre-emption of communications and possible failure of the links to be handled consistently.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133065115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Implementing Agenda Parallelism in Production Systems 论生产系统中日程并行的实现
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633218
G. A. Sawyer, G. Lamont
Parallel rule execution (agenda parallelism) represents a relatively unexplored method for increasing the execution speed ojr production systems on parallel computer architectures. Agenda parallelism possesses the potential .for increasing the execution speed o f parallel production systems b y an orde,r of magnitude. However, agenda parallelism also introduces a number of significant overhead factors that must be contended with. This paper presents an overview of AFIT’s initial research on agenda parallelism; it includes a discussion ojf the advaniiages and liabilities associated with this decomposition approach based on formal proofs, problem analysis and actual implementation.
并行规则执行(议程并行)代表了一种相对未开发的方法,用于提高并行计算机体系结构上生产系统的执行速度。日程并行具有将并行生产系统的执行速度提高一个数量级或一个数量级的潜力。然而,议程并行性也引入了许多必须处理的重要开销因素。本文概述了AFIT在议程并行性方面的初步研究;它包括对基于正式证明、问题分析和实际实现的与这种分解方法相关的优点和缺点的讨论。
{"title":"On Implementing Agenda Parallelism in Production Systems","authors":"G. A. Sawyer, G. Lamont","doi":"10.1109/DMCC.1991.633218","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633218","url":null,"abstract":"Parallel rule execution (agenda parallelism) represents a relatively unexplored method for increasing the execution speed ojr production systems on parallel computer architectures. Agenda parallelism possesses the potential .for increasing the execution speed o f parallel production systems b y an orde,r of magnitude. However, agenda parallelism also introduces a number of significant overhead factors that must be contended with. This paper presents an overview of AFIT’s initial research on agenda parallelism; it includes a discussion ojf the advaniiages and liabilities associated with this decomposition approach based on formal proofs, problem analysis and actual implementation.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124147767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linear Speedup of Winograd's Matrix Multiplication Algorithm Using an Array Processor 阵列处理器对Winograd矩阵乘法算法的线性加速
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633203
De-Lei Lee, M. A. Aboelaze
Winogradi’s matrix multiplication algorithm halves the number of multiplication operations required of the conventional 0 ( N 3 ) matrix multiplication algoirithm by slightly increasing the number of addition operations. Such it technique can be computatiorially advantageous when the machine performing the matrix computation takes much more time for multiplication over addition operations. This is overwhelmingly the case in the massively parallel computing paradigm, where each processor is extremely simple by itself and the computing power is obtained by the use of a large number of such processors. In this paper, we describe a parallel version of Winograd’s imatrix multiplication algorithm using an array processor and show how to achieve nearly linear speedup over its sequential counterpart.
Winogradi的矩阵乘法算法通过稍微增加加法运算的次数,将传统的0 (N - 3)矩阵乘法算法所需的乘法运算次数减半。当执行矩阵计算的机器需要更多的时间进行乘法运算而不是加法运算时,这种技术在计算上是有利的。这在大规模并行计算范例中是压倒性的情况,其中每个处理器本身都非常简单,计算能力是通过使用大量这样的处理器获得的。在本文中,我们描述了使用阵列处理器的Winograd矩阵乘法算法的并行版本,并展示了如何实现比其顺序对等体接近线性的加速。
{"title":"Linear Speedup of Winograd's Matrix Multiplication Algorithm Using an Array Processor","authors":"De-Lei Lee, M. A. Aboelaze","doi":"10.1109/DMCC.1991.633203","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633203","url":null,"abstract":"Winogradi’s matrix multiplication algorithm halves the number of multiplication operations required of the conventional 0 ( N 3 ) matrix multiplication algoirithm by slightly increasing the number of addition operations. Such it technique can be computatiorially advantageous when the machine performing the matrix computation takes much more time for multiplication over addition operations. This is overwhelmingly the case in the massively parallel computing paradigm, where each processor is extremely simple by itself and the computing power is obtained by the use of a large number of such processors. In this paper, we describe a parallel version of Winograd’s imatrix multiplication algorithm using an array processor and show how to achieve nearly linear speedup over its sequential counterpart.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"604 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116373773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Using Parallel Programming Paradigms for Structuring Programs on Distributed Memory Computers 在分布式存储计算机上使用并行编程范式构建程序
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633127
A. W. Kwan, L. Bic
Programming paradigms have been advocated as a method of abstraction for viewing parallel algorithms. By viewing such paradigms as a method of algorithm chwijication, we have used paradigms as a technque f i r structuring certain types of algorithm on distributed memory computers, allowing f i r separation of computation and synchronization. The structuring technique assists the parallel programmer with synchronization, allowing the programmer to concentrate more on developing code f i r computatwn. Experiments with the compute-aggregate-broa&ast paradigm indicate that such a structuring technique can be utilized for diflerentprograms, andcan be efficient.
编程范例一直被提倡作为一种观察并行算法的抽象方法。通过将这种范式视为一种算法优化方法,我们已经将范式作为一种技术,在分布式内存计算机上构建某些类型的算法,从而允许计算和同步分离。结构化技术帮助并行程序员进行同步,使程序员能够更多地集中精力开发计算代码。对计算-聚合-宽带范式的实验表明,这种结构技术可以用于不同的程序,并且是有效的。
{"title":"Using Parallel Programming Paradigms for Structuring Programs on Distributed Memory Computers","authors":"A. W. Kwan, L. Bic","doi":"10.1109/DMCC.1991.633127","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633127","url":null,"abstract":"Programming paradigms have been advocated as a method of abstraction for viewing parallel algorithms. By viewing such paradigms as a method of algorithm chwijication, we have used paradigms as a technque f i r structuring certain types of algorithm on distributed memory computers, allowing f i r separation of computation and synchronization. The structuring technique assists the parallel programmer with synchronization, allowing the programmer to concentrate more on developing code f i r computatwn. Experiments with the compute-aggregate-broa&ast paradigm indicate that such a structuring technique can be utilized for diflerentprograms, andcan be efficient.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123713165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Parallel-Vector Algorithm for Solving Periodic Tridiagonal Linear Systems of Equations 求解周期三对角线性方程组的平行向量算法
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633307
T. Taha
Periodic tridiagonal linear systems of equations typi- cally arise from discretizing second order differential equations with periodic boundary conditions. In this paper a parallel-vector algorithm is introduced to solve such systems. Implementation of the new algorithm is carried out on an Intel iPSC/2 hypercube with vector processor boards attached to each node processor. It is to be noted that t his algorithm can be extended to solve other periodic banded linear systems.
周期三对角线性方程组通常是由具有周期边界条件的二阶微分方程离散产生的。本文引入了一种并行向量算法来求解这类系统。新算法的实现是在Intel iPSC/2超立方体上进行的,每个节点处理器都附有矢量处理器板。值得注意的是,该算法可以推广到求解其他周期带状线性系统。
{"title":"A Parallel-Vector Algorithm for Solving Periodic Tridiagonal Linear Systems of Equations","authors":"T. Taha","doi":"10.1109/DMCC.1991.633307","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633307","url":null,"abstract":"Periodic tridiagonal linear systems of equations typi- cally arise from discretizing second order differential equations with periodic boundary conditions. In this paper a parallel-vector algorithm is introduced to solve such systems. Implementation of the new algorithm is carried out on an Intel iPSC/2 hypercube with vector processor boards attached to each node processor. It is to be noted that t his algorithm can be extended to solve other periodic banded linear systems.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124865387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Helmholtz Finite Elements Performance On Mark III and Intel iPSC/860 Hypercubes Helmholtz有限元在Mark III和Intel iPSC/860 Hypercubes上的性能
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633158
J. Parker, T. Cwik, R. Ferraro, P. Liewer, P. Lyster, J. Patterson
The large distributed memory capacities of hypercube computers are exploited by a finite element application which computes the scattered electromagetic field from heterogeneous objects with size large compared to a wavelength. Such problems scale well with hypercube dimension fo r large objects: by using the Recursive Inertial Partitioning algorithm and an iterative solver, the work done by each processor is nearly equal and communication overhead for the system set-up and solution is low. The application has been integrated into a user-friendly eirvironment on a graphics workstation in a local area network including hypercube host machines. Users need never know their solutions are obtained via a parallel computer. Scaling is shown by computing solutions for a series of models which double the number of variables for each increment of hypercube dimension. Timings are compared for the JPLICaltech Mark IIIfp Hypercube and the Intel iPSCI860 hypercube. Acceptable quality of solutions is obtained for object domains of hundreds of square wavelengths and resulting sparse matrix systems with order of 100,000 complex unknowns.
利用超立方体计算机庞大的分布式存储容量,实现了对尺寸大于波长的异质物体散射电磁场的有限元计算。这类问题在超立方体维度下可以很好地扩展到大型对象:通过使用递归惯性划分算法和迭代求解器,每个处理器所做的工作几乎相等,并且系统设置和解决方案的通信开销很低。该应用程序已集成到包括hypercube主机在内的局域网图形工作站的用户友好环境中。用户永远不需要知道他们的解是通过并行计算机得到的。通过计算一系列模型的解决方案来显示缩放,这些模型每增加一个超立方体维度,变量的数量就增加一倍。比较了JPLICaltech Mark IIIfp Hypercube和Intel iPSCI860 Hypercube的时序。对于数百平方波长的目标域和100,000阶复杂未知数的稀疏矩阵系统,获得了可接受的解质量。
{"title":"Helmholtz Finite Elements Performance On Mark III and Intel iPSC/860 Hypercubes","authors":"J. Parker, T. Cwik, R. Ferraro, P. Liewer, P. Lyster, J. Patterson","doi":"10.1109/DMCC.1991.633158","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633158","url":null,"abstract":"The large distributed memory capacities of hypercube computers are exploited by a finite element application which computes the scattered electromagetic field from heterogeneous objects with size large compared to a wavelength. Such problems scale well with hypercube dimension fo r large objects: by using the Recursive Inertial Partitioning algorithm and an iterative solver, the work done by each processor is nearly equal and communication overhead for the system set-up and solution is low. The application has been integrated into a user-friendly eirvironment on a graphics workstation in a local area network including hypercube host machines. Users need never know their solutions are obtained via a parallel computer. Scaling is shown by computing solutions for a series of models which double the number of variables for each increment of hypercube dimension. Timings are compared for the JPLICaltech Mark IIIfp Hypercube and the Intel iPSCI860 hypercube. Acceptable quality of solutions is obtained for object domains of hundreds of square wavelengths and resulting sparse matrix systems with order of 100,000 complex unknowns.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116692409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies 超立方体和网格拓扑中的高效全对全通信模式
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633174
D. Scott
Some application programs on distributed memory parallel computers occasionally require an "all-to-all" communication pattern, where each compute node must send a distinct message to each other compute node. Assuming that each node can send and receive only one message at a t ime, the all-to-all pattern must be implemented as a sequence of phases in which certain nodes send and receive messages. r f there are p compute nodes, then at least p-1 phases are needed to complete the operation. A proof of a schedule achieving this lower bound on a circuit switched hypercube with fuced routing is given. This lower bound cannot be achieved on a 2 dimensional mesh. On an axa mesh, dl4 is shown to be a lower bound and a schedule with this number of phases is given. Whether hypercubes or meshes are better for this algorithm depends on the relative bandwidths of the communication channels.
分布式内存并行计算机上的一些应用程序偶尔需要“全对全”通信模式,其中每个计算节点必须向其他计算节点发送不同的消息。假设每个节点一次只能发送和接收一条消息,那么所有到所有模式必须实现为一系列阶段,在这些阶段中,某些节点发送和接收消息。R如果有p个计算节点,则至少需要p-1个阶段来完成操作。给出了一个在带混合路由的电路交换超立方体上实现该下界的调度的证明。这个下界不能在二维网格上实现。在axa网格上,dl4被证明是一个下界,并给出了一个具有这个阶段数的时间表。超立方体和网格孰优孰弱取决于通信信道的相对带宽。
{"title":"Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies","authors":"D. Scott","doi":"10.1109/DMCC.1991.633174","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633174","url":null,"abstract":"Some application programs on distributed memory parallel computers occasionally require an \"all-to-all\" communication pattern, where each compute node must send a distinct message to each other compute node. Assuming that each node can send and receive only one message at a t ime, the all-to-all pattern must be implemented as a sequence of phases in which certain nodes send and receive messages. r f there are p compute nodes, then at least p-1 phases are needed to complete the operation. A proof of a schedule achieving this lower bound on a circuit switched hypercube with fuced routing is given. This lower bound cannot be achieved on a 2 dimensional mesh. On an axa mesh, dl4 is shown to be a lower bound and a schedule with this number of phases is given. Whether hypercubes or meshes are better for this algorithm depends on the relative bandwidths of the communication channels.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132329335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 146
Implementing the Perfect ARC2D Benchmark on the BBN TC2000 Parallel Supercomputer 在BBN TC2000并行超级计算机上实现完美的ARC2D基准
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633200
S. Breit
The TC.2000 is a MIMD parallel processor wi,th memory that is physically distributed memory, but logically shared. Interprocessor covnmunication, and therefore access to shared memory, is sufficiently fast that most applications can be ported to the TC.2000 without rewriting the code from scratch. This paper shows how this was done for the Perfect ARC'2D benchmark. The code was first restructured by changing the order of subroutine calls so that interprocessor communication would be reduced to the equivalent of three full transposes ofthe data per iteration. The parallel implementation was then completed by inserting shared data declarations and parallel extensions provided by the TC.2000 Fortran language. Thi:F approach was easier to implement than a domain decomposition technique, but requires more interprocessor communication. It is feasible only (because of the TC.2000'~ highspeed interprocessor communications network. References to shared memory take about 25% of the totai execution time for the parallel version of ARC2D. an acceptable amount considering the code did not have to be completely rewritten. High parallel efficiency was obtained using up
TC.2000是一种MIMD并行处理器,其内存在物理上是分布式的,但在逻辑上是共享的。处理器间通信以及对共享内存的访问速度足够快,因此大多数应用程序都可以移植到TC.2000上,而无需从头重写代码。本文将展示如何在《Perfect ARC》的2D基准测试中实现这一点。代码首先通过改变子程序调用的顺序进行重组,这样处理器间的通信将减少到相当于每次迭代三次完整的数据转置。然后通过插入共享数据声明和由TC.2000 Fortran语言提供的并行扩展来完成并行实现。这种方法比域分解技术更容易实现,但需要更多的处理器间通信。由于有TC.2000的高速处理器间通信网络,这是可行的。对于并行版本的ARC2D,对共享内存的引用大约占用总执行时间的25%。考虑到代码不必完全重写,这是一个可接受的数量。利用up获得了较高的并行效率
{"title":"Implementing the Perfect ARC2D Benchmark on the BBN TC2000 Parallel Supercomputer","authors":"S. Breit","doi":"10.1109/DMCC.1991.633200","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633200","url":null,"abstract":"The TC.2000 is a MIMD parallel processor wi,th memory that is physically distributed memory, but logically shared. Interprocessor covnmunication, and therefore access to shared memory, is sufficiently fast that most applications can be ported to the TC.2000 without rewriting the code from scratch. This paper shows how this was done for the Perfect ARC'2D benchmark. The code was first restructured by changing the order of subroutine calls so that interprocessor communication would be reduced to the equivalent of three full transposes ofthe data per iteration. The parallel implementation was then completed by inserting shared data declarations and parallel extensions provided by the TC.2000 Fortran language. Thi:F approach was easier to implement than a domain decomposition technique, but requires more interprocessor communication. It is feasible only (because of the TC.2000'~ highspeed interprocessor communications network. References to shared memory take about 25% of the totai execution time for the parallel version of ARC2D. an acceptable amount considering the code did not have to be completely rewritten. High parallel efficiency was obtained using up","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130116534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Software En ineering Aspects of the ProSolver -SES Skyline Solver ProSolver的软件工程方面-SES Skyline Solver
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633170
E. Castro-Leon, M. L. Barton, E. Kushner
The Prosolver-SE:? software i s one of the direct equation solvers available for the iPSC@16160. It uses skyline storage of matrix elements, and is applicable to linear systems that do not require pivoting. The product is available as a library thzt includes additional' operations to support Finite Element Method applications. This paper discusses the software architecture and some of the high performance algorithms.
Prosolver-SE: ?软件是可用于iPSC@16160的直接方程求解器之一。它使用矩阵元素的天际线存储,并且适用于不需要旋转的线性系统。该产品可作为库提供,其中包括支持有限元方法应用的附加操作。本文讨论了该系统的软件结构和一些高性能算法。
{"title":"Software En ineering Aspects of the ProSolver -SES Skyline Solver","authors":"E. Castro-Leon, M. L. Barton, E. Kushner","doi":"10.1109/DMCC.1991.633170","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633170","url":null,"abstract":"The Prosolver-SE:? software i s one of the direct equation solvers available for the iPSC@16160. It uses skyline storage of matrix elements, and is applicable to linear systems that do not require pivoting. The product is available as a library thzt includes additional' operations to support Finite Element Method applications. This paper discusses the software architecture and some of the high performance algorithms.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130494114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
The Sixth Distributed Memory Computing Conference, 1991. Proceedings
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1