Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555377
R. Daniel, K. Teague
Filtering data to remove noise is an important operation in image processing. While linear filters are common, they have serious drawbacks since they cannot discriminate between large and small discontinuities. This is especially serious since large discontinuities are frequently important edges in the scene. However, if the smoothing action is reduced to preserve the large discontinuities, very little noise will be removed from the data. This paper discusses the parallel implementation of a connectionist network that attempts to smooth data without blurring edges. The network operates by iteratively minimizing a non-linear error measure which explicitly models image edges. We discuss the origin of the network and its simulation on an iPSC/2. We also discuss its performance versus the number of nodes, the SNR of the data, and compare its performance with a linear Gaussian filter and a median filter.
{"title":"A Connectionist Technique for Data Smoothing","authors":"R. Daniel, K. Teague","doi":"10.1109/DMCC.1990.555377","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555377","url":null,"abstract":"Filtering data to remove noise is an important operation in image processing. While linear filters are common, they have serious drawbacks since they cannot discriminate between large and small discontinuities. This is especially serious since large discontinuities are frequently important edges in the scene. However, if the smoothing action is reduced to preserve the large discontinuities, very little noise will be removed from the data. This paper discusses the parallel implementation of a connectionist network that attempts to smooth data without blurring edges. The network operates by iteratively minimizing a non-linear error measure which explicitly models image edges. We discuss the origin of the network and its simulation on an iPSC/2. We also discuss its performance versus the number of nodes, the SNR of the data, and compare its performance with a linear Gaussian filter and a median filter.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121206289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555416
A. Mirin
package FPPAC [1,2], which nonlinear multispecies FokkerPlanck collision operator for a plasma in twodimensional velocity space, has been rewritten for the Connection Machine 2. This has involved allocation of variables either to the front end or the CM2, minimization of data flow, and replacement of Crayoptimized algorithms with ones suitable for a massively parallel architecture. Coding has been done utilizing Connection Machine Fortran. Calculations have been carried out on various Connection Machines throughout the country. Results and timings on these machines have been compared to each other and to those on the static memory Cray-2 at the National Magnetic Fusion Energy Computer Center. For large problem size, the Connection Machine 2 is found to be cost-efficient.
{"title":"Massively Parallel Fokker-Planck Calculations","authors":"A. Mirin","doi":"10.1109/DMCC.1990.555416","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555416","url":null,"abstract":"package FPPAC [1,2], which nonlinear multispecies FokkerPlanck collision operator for a plasma in twodimensional velocity space, has been rewritten for the Connection Machine 2. This has involved allocation of variables either to the front end or the CM2, minimization of data flow, and replacement of Crayoptimized algorithms with ones suitable for a massively parallel architecture. Coding has been done utilizing Connection Machine Fortran. Calculations have been carried out on various Connection Machines throughout the country. Results and timings on these machines have been compared to each other and to those on the static memory Cray-2 at the National Magnetic Fusion Energy Computer Center. For large problem size, the Connection Machine 2 is found to be cost-efficient.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121834961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556398
Ching-Tien Ho, S. Johnsson
The embedding of arrays in Boolean cubes, when there are more array elements than nodes in the cube, can always be made with optimal load-factor by reshaping the array to a one-dimensional array. We show that the dilation for such an embedding is of an .to x .t1 x - + x &-I array in an n-cube.Dila tion one embeddings can be obtained by splitting each axis into segments and assigning segments to nodes in the cube by a Gray code. The load-factor is optimal if the axis lengths contain sufficiently many powers of two. The congestion is minimized, if the segment lengths along the different axes are as equal as possible, for the cube configured with at most as many axes as the array. A further decrease in the congestion is possible if the array is partitioned into subarrays, and corresponding axis of different subarrays make use of edge-disjoint Hamiltonian cycles within subcubes. The congestion can also be reduced by using multiple paths between pairs of cube nodes, i.e., by using “fat” edges.
在布尔数据集中嵌入数组时,当数组元素多于数据集中的节点时,总是可以通过将数组重塑为一维数组来实现最佳负载因子。我们证明了这种嵌入的扩展是在一个n立方体中的一个。到x .t1 x - + x &-I数组。Dila 1嵌入可以通过将每个轴分成段,并通过Gray编码将段分配给立方体中的节点来获得。如果轴长包含足够多的2次幂,则负载因子是最优的。如果沿着不同轴的段长度尽可能相等,那么对于配置了最多与数组一样多的轴的立方体,拥塞就会最小化。如果将数组划分为子数组,并且不同子数组的相应轴利用子立方体内的边不相交哈密顿环,则可能进一步减少拥塞。拥塞也可以通过在对立方体节点之间使用多条路径来减少,即通过使用“胖”边。
{"title":"Embedding Meshes into Small Boolean Cubes","authors":"Ching-Tien Ho, S. Johnsson","doi":"10.1109/DMCC.1990.556398","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556398","url":null,"abstract":"The embedding of arrays in Boolean cubes, when there are more array elements than nodes in the cube, can always be made with optimal load-factor by reshaping the array to a one-dimensional array. We show that the dilation for such an embedding is of an .to x .t1 x - + x &-I array in an n-cube.Dila tion one embeddings can be obtained by splitting each axis into segments and assigning segments to nodes in the cube by a Gray code. The load-factor is optimal if the axis lengths contain sufficiently many powers of two. The congestion is minimized, if the segment lengths along the different axes are as equal as possible, for the cube configured with at most as many axes as the array. A further decrease in the congestion is possible if the array is partitioned into subarrays, and corresponding axis of different subarrays make use of edge-disjoint Hamiltonian cycles within subcubes. The congestion can also be reduced by using multiple paths between pairs of cube nodes, i.e., by using “fat” edges.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124923711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556294
H. Embrechts, J.P. Jones
Hypercube-topology concurrent multicomputers owe at least part of their popularity to the fact that it is relatively simple to decompose rectangularly-shaped Mdimensional domains into subdomains and assign these subdoniains to processors (PES) in a manner which preserves the adjacencies of the subdoniains. However, this decomposition involves some rearrangement of the data during input/output operations to (linear memory) data acquisition, display, or mass storage devices. We show that this rearrangement can be done efficiently, in parallel. The main consequence of this algorithm is that Mdimensional data can be stored in a simple, general format and yet be communicated efaiciently independent of the dimension of the hypercube or the number of these dimensions assigned to the dimensions of the domain. This algorithm is also relevant to applications with mixed domain decompositions, and to parallel mass storage media such as disk farms.
{"title":"An Input/Output Algorithm for M-Dimensional Rectangular Domain Decompositions on N-Dimensional Hypercube Multicomputers","authors":"H. Embrechts, J.P. Jones","doi":"10.1109/DMCC.1990.556294","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556294","url":null,"abstract":"Hypercube-topology concurrent multicomputers owe at least part of their popularity to the fact that it is relatively simple to decompose rectangularly-shaped Mdimensional domains into subdomains and assign these subdoniains to processors (PES) in a manner which preserves the adjacencies of the subdoniains. However, this decomposition involves some rearrangement of the data during input/output operations to (linear memory) data acquisition, display, or mass storage devices. We show that this rearrangement can be done efficiently, in parallel. The main consequence of this algorithm is that Mdimensional data can be stored in a simple, general format and yet be communicated efaiciently independent of the dimension of the hypercube or the number of these dimensions assigned to the dimensions of the domain. This algorithm is also relevant to applications with mixed domain decompositions, and to parallel mass storage media such as disk farms.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121794642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556298
S. Horiike
This paper presents a new algorithm for mapping of tasks onto a hypercube. Given a weighted task graph, the algorithm finds good mapping in a reasonable computation time. When the target computer is ndimensional cube (n-cube), the proposed algorithm is composed of n stages. The algorithm starts with an initial state in which the tasks are mapped onto 2n 0cubes. At each stage k, the task graph is mapped onto 2n-k k-cubes. At the beginning of stage k, the tasks have already been mapped onto 2n-(k-1) (k-1)-cubes. The tasks are mapped onto k-cubes by combining a pair of (k-1)-cubes. 2n-k pairs of (k-1)-cubes are determined, and they are combined so that the mapping onto the k-cubes makes the communication cost as low as possible. When the target computer is n-dimensional cube (ncube), the proposed algorithm is composed of n stages. The algorithm starts with an initial state in which the tasks are mapped onto 2" 0-cubes. At each stage k (k=1,2,..,n), the task graph is mapped onto 2n-k k-cubes. At the beginning of stage k, the tasks are already mapped onto 2n-(k-1) (k-1)-cubes. The mapping onto k-cubes can be done by combining a pair of (k-1)-cubes. 2n-k pairs are determined among 2n-(k-1) (k-1)-cubes, and they are combined so that mapping onto the k-cubes makes the communication cost as low as possible.
{"title":"A Task Mapping Method for a Hypercube by Combining Subcubes","authors":"S. Horiike","doi":"10.1109/DMCC.1990.556298","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556298","url":null,"abstract":"This paper presents a new algorithm for mapping of tasks onto a hypercube. Given a weighted task graph, the algorithm finds good mapping in a reasonable computation time. When the target computer is ndimensional cube (n-cube), the proposed algorithm is composed of n stages. The algorithm starts with an initial state in which the tasks are mapped onto 2n 0cubes. At each stage k, the task graph is mapped onto 2n-k k-cubes. At the beginning of stage k, the tasks have already been mapped onto 2n-(k-1) (k-1)-cubes. The tasks are mapped onto k-cubes by combining a pair of (k-1)-cubes. 2n-k pairs of (k-1)-cubes are determined, and they are combined so that the mapping onto the k-cubes makes the communication cost as low as possible. When the target computer is n-dimensional cube (ncube), the proposed algorithm is composed of n stages. The algorithm starts with an initial state in which the tasks are mapped onto 2\" 0-cubes. At each stage k (k=1,2,..,n), the task graph is mapped onto 2n-k k-cubes. At the beginning of stage k, the tasks are already mapped onto 2n-(k-1) (k-1)-cubes. The mapping onto k-cubes can be done by combining a pair of (k-1)-cubes. 2n-k pairs are determined among 2n-(k-1) (k-1)-cubes, and they are combined so that mapping onto the k-cubes makes the communication cost as low as possible.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127697939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555367
W. Bain
This paper describes a new algorithm for the synchronization of a class of parallel discrete event simulations on distributed memory, parallel computers. Unlike previous algorithms which synchronize on a per process basis, this algorithm synchronizes on a per processor basis. The algorithm allows full generality in the simulation model by allowing dynamic process creation and destruction and full inter-process interconnections, and it is shown to be deadlock and livelock free. It has been used to simulate very large parallel computer architectures.
{"title":"Parallel Discrete Event Simulation Using Synchronized Event Schedulers","authors":"W. Bain","doi":"10.1109/DMCC.1990.555367","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555367","url":null,"abstract":"This paper describes a new algorithm for the synchronization of a class of parallel discrete event simulations on distributed memory, parallel computers. Unlike previous algorithms which synchronize on a per process basis, this algorithm synchronizes on a per processor basis. The algorithm allows full generality in the simulation model by allowing dynamic process creation and destruction and full inter-process interconnections, and it is shown to be deadlock and livelock free. It has been used to simulate very large parallel computer architectures.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131827078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556327
D. Grit
SISAL is a general-purpose applicative 11 anguage intended for use on both conventional aiid novel multiprocessor systems. In this paper we describe the port of a shared memory implemeni,ation to a distributed memory environment. A ni mber of issues are specifically addressed: the e~~aluation strategy, memory management, schedulinp , stream handling, and task synchronization.
{"title":"A Distributed Memory Implementation of SISAL","authors":"D. Grit","doi":"10.1109/DMCC.1990.556327","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556327","url":null,"abstract":"SISAL is a general-purpose applicative 11 anguage intended for use on both conventional aiid novel multiprocessor systems. In this paper we describe the port of a shared memory implemeni,ation to a distributed memory environment. A ni mber of issues are specifically addressed: the e~~aluation strategy, memory management, schedulinp , stream handling, and task synchronization.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114182629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556337
M. Heath
In this talk we show how graphical animation of the behavior of parallel algorithms can facilitate the design and performance enhancement of algorithms for matrix computations on parallel computer architectures. Using a portable instrumented communication library and a graphical animation package developed at Oak Ridge National Laboratory, we illustrate the effects of various strategies in parallel algorithm design, including interconnection topologies, global communication patterns, data mapping schemes, load balancing, and pipelining techniques for overlapping communication with computation. In this talk we focus on distributed-memory parallel architectures in which the processors communicate by passing messages. The linear algebra problems we consider include matrix factorization and the solution of triangular systems.
{"title":"Visual Animation of Parallel Algorithms for Matrix Computations","authors":"M. Heath","doi":"10.1109/DMCC.1990.556337","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556337","url":null,"abstract":"In this talk we show how graphical animation of the behavior of parallel algorithms can facilitate the design and performance enhancement of algorithms for matrix computations on parallel computer architectures. Using a portable instrumented communication library and a graphical animation package developed at Oak Ridge National Laboratory, we illustrate the effects of various strategies in parallel algorithm design, including interconnection topologies, global communication patterns, data mapping schemes, load balancing, and pipelining techniques for overlapping communication with computation. In this talk we focus on distributed-memory parallel architectures in which the processors communicate by passing messages. The linear algebra problems we consider include matrix factorization and the solution of triangular systems.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114588854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555399
A. Elster
Parallel systems are in general complicated to utilize eficiently. As they evolve in complexity, it hence becomes increasingly more important to provide libraries and language features that can spare the users from the knowledge of low-level system details. Our effort in this direction is to develop a set of basic matrix algorithms for distributed memory systems such as the hypercube. The goal is to be able to provide for distributed memory systems an environment similar to that which the Level-3 Basic Linear Algebra Subprograms (BLAS3) provide for the sequential and shared memory environments. These subprograms facilitate the development of eficient and portable algorithms that are rich in matrix-matrix multiplication, on which major software eflorts such as LAPACK have been built. To demonstrate the concept, some of these Level-3 algorithms are being developed on the Intel iPSC/2 hypercube. Central to this effort is the General Matrix-Matrix Multiplication routine PGEMM. The symmetric and triangular multiplications as well as, rank-tk updates (symmetric case), and the solution of triangular systems with multiple right hand sides, are also discussed.
{"title":"Basic Matrix Subprograms for Distributed Memory Systems","authors":"A. Elster","doi":"10.1109/DMCC.1990.555399","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555399","url":null,"abstract":"Parallel systems are in general complicated to utilize eficiently. As they evolve in complexity, it hence becomes increasingly more important to provide libraries and language features that can spare the users from the knowledge of low-level system details. Our effort in this direction is to develop a set of basic matrix algorithms for distributed memory systems such as the hypercube. The goal is to be able to provide for distributed memory systems an environment similar to that which the Level-3 Basic Linear Algebra Subprograms (BLAS3) provide for the sequential and shared memory environments. These subprograms facilitate the development of eficient and portable algorithms that are rich in matrix-matrix multiplication, on which major software eflorts such as LAPACK have been built. To demonstrate the concept, some of these Level-3 algorithms are being developed on the Intel iPSC/2 hypercube. Central to this effort is the General Matrix-Matrix Multiplication routine PGEMM. The symmetric and triangular multiplications as well as, rank-tk updates (symmetric case), and the solution of triangular systems with multiple right hand sides, are also discussed.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115855568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556395
P. Fraigniaud, S. Miguet, Y. Robert
In this paper, we prove that the complexity of scattering in an oriented ring of p processors is (p-1) * (p + L * z) where L is the length of the messages, p the communication startup, and z the elemental propagation time. 1. SCATTERING In a recent paper, Saad and Schultz [SSI study various basic communication kernels in parallel architectures. They point out that interprocessor communication is often one of the main obstacles to increasing performance of parallel algorithms for multiprocessors. They consider the following data exchange operations: (1) One-to-one: moving data from one processor to another.
本文证明了p个处理器组成的定向环的散射复杂度为(p-1) * (p + L * z),其中L为消息长度,p为通信启动,z为元素传播时间。1. 在最近的一篇论文中,Saad和Schultz [SSI]研究了并行架构中的各种基本通信内核。他们指出,处理器间通信通常是提高多处理器并行算法性能的主要障碍之一。他们考虑以下数据交换操作:(1)一对一:将数据从一个处理器移动到另一个处理器。
{"title":"Complexity Of Scattering On A Ring Of Processors","authors":"P. Fraigniaud, S. Miguet, Y. Robert","doi":"10.1109/DMCC.1990.556395","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556395","url":null,"abstract":"In this paper, we prove that the complexity of scattering in an oriented ring of p processors is (p-1) * (p + L * z) where L is the length of the messages, p the communication startup, and z the elemental propagation time. 1. SCATTERING In a recent paper, Saad and Schultz [SSI study various basic communication kernels in parallel architectures. They point out that interprocessor communication is often one of the main obstacles to increasing performance of parallel algorithms for multiprocessors. They consider the following data exchange operations: (1) One-to-one: moving data from one processor to another.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114993575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}