首页 > 最新文献

Proceedings of the Fifth Distributed Memory Computing Conference, 1990.最新文献

英文 中文
A Connectionist Technique for Data Smoothing 数据平滑的连接技术
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.555377
R. Daniel, K. Teague
Filtering data to remove noise is an important operation in image processing. While linear filters are common, they have serious drawbacks since they cannot discriminate between large and small discontinuities. This is especially serious since large discontinuities are frequently important edges in the scene. However, if the smoothing action is reduced to preserve the large discontinuities, very little noise will be removed from the data. This paper discusses the parallel implementation of a connectionist network that attempts to smooth data without blurring edges. The network operates by iteratively minimizing a non-linear error measure which explicitly models image edges. We discuss the origin of the network and its simulation on an iPSC/2. We also discuss its performance versus the number of nodes, the SNR of the data, and compare its performance with a linear Gaussian filter and a median filter.
对数据进行滤波去除噪声是图像处理中的一项重要操作。虽然线性滤波器很常见,但它们有严重的缺点,因为它们不能区分大的和小的不连续。这是特别严重的,因为大的不连续经常是场景中的重要边缘。然而,如果减少平滑动作以保留大的不连续,则从数据中去除的噪声非常少。本文讨论了一个连接网络的并行实现,该网络试图平滑数据而不模糊边缘。该网络通过迭代最小化明确建模图像边缘的非线性误差度量来运行。讨论了该网络的起源及其在iPSC/2上的仿真。我们还讨论了其性能与节点数量、数据信噪比的关系,并将其性能与线性高斯滤波器和中值滤波器进行了比较。
{"title":"A Connectionist Technique for Data Smoothing","authors":"R. Daniel, K. Teague","doi":"10.1109/DMCC.1990.555377","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555377","url":null,"abstract":"Filtering data to remove noise is an important operation in image processing. While linear filters are common, they have serious drawbacks since they cannot discriminate between large and small discontinuities. This is especially serious since large discontinuities are frequently important edges in the scene. However, if the smoothing action is reduced to preserve the large discontinuities, very little noise will be removed from the data. This paper discusses the parallel implementation of a connectionist network that attempts to smooth data without blurring edges. The network operates by iteratively minimizing a non-linear error measure which explicitly models image edges. We discuss the origin of the network and its simulation on an iPSC/2. We also discuss its performance versus the number of nodes, the SNR of the data, and compare its performance with a linear Gaussian filter and a median filter.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121206289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Massively Parallel Fokker-Planck Calculations 大规模并行福克-普朗克计算
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.555416
A. Mirin
package FPPAC [1,2], which nonlinear multispecies FokkerPlanck collision operator for a plasma in twodimensional velocity space, has been rewritten for the Connection Machine 2. This has involved allocation of variables either to the front end or the CM2, minimization of data flow, and replacement of Crayoptimized algorithms with ones suitable for a massively parallel architecture. Coding has been done utilizing Connection Machine Fortran. Calculations have been carried out on various Connection Machines throughout the country. Results and timings on these machines have been compared to each other and to those on the static memory Cray-2 at the National Magnetic Fusion Energy Computer Center. For large problem size, the Connection Machine 2 is found to be cost-efficient.
在连接机2上重写了二维速度空间等离子体的非线性多种FokkerPlanck碰撞算子包FPPAC[1,2]。这涉及到将变量分配到前端或CM2,最小化数据流,以及用适合大规模并行架构的算法替换crayar优化算法。使用连接机Fortran进行编码。在全国各地的各种连接机上进行了计算。这些机器上的结果和时间已经相互比较,并与国家磁聚变能计算机中心的克雷-2静态存储器上的结果和时间进行了比较。对于较大的问题规模,发现连接机器2具有成本效益。
{"title":"Massively Parallel Fokker-Planck Calculations","authors":"A. Mirin","doi":"10.1109/DMCC.1990.555416","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555416","url":null,"abstract":"package FPPAC [1,2], which nonlinear multispecies FokkerPlanck collision operator for a plasma in twodimensional velocity space, has been rewritten for the Connection Machine 2. This has involved allocation of variables either to the front end or the CM2, minimization of data flow, and replacement of Crayoptimized algorithms with ones suitable for a massively parallel architecture. Coding has been done utilizing Connection Machine Fortran. Calculations have been carried out on various Connection Machines throughout the country. Results and timings on these machines have been compared to each other and to those on the static memory Cray-2 at the National Magnetic Fusion Energy Computer Center. For large problem size, the Connection Machine 2 is found to be cost-efficient.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121834961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Embedding Meshes into Small Boolean Cubes 嵌入网格到小布尔立方体
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556398
Ching-Tien Ho, S. Johnsson
The embedding of arrays in Boolean cubes, when there are more array elements than nodes in the cube, can always be made with optimal load-factor by reshaping the array to a one-dimensional array. We show that the dilation for such an embedding is of an .to x .t1 x - + x &-I array in an n-cube.Dila tion one embeddings can be obtained by splitting each axis into segments and assigning segments to nodes in the cube by a Gray code. The load-factor is optimal if the axis lengths contain sufficiently many powers of two. The congestion is minimized, if the segment lengths along the different axes are as equal as possible, for the cube configured with at most as many axes as the array. A further decrease in the congestion is possible if the array is partitioned into subarrays, and corresponding axis of different subarrays make use of edge-disjoint Hamiltonian cycles within subcubes. The congestion can also be reduced by using multiple paths between pairs of cube nodes, i.e., by using “fat” edges.
在布尔数据集中嵌入数组时,当数组元素多于数据集中的节点时,总是可以通过将数组重塑为一维数组来实现最佳负载因子。我们证明了这种嵌入的扩展是在一个n立方体中的一个。到x .t1 x - + x &-I数组。Dila 1嵌入可以通过将每个轴分成段,并通过Gray编码将段分配给立方体中的节点来获得。如果轴长包含足够多的2次幂,则负载因子是最优的。如果沿着不同轴的段长度尽可能相等,那么对于配置了最多与数组一样多的轴的立方体,拥塞就会最小化。如果将数组划分为子数组,并且不同子数组的相应轴利用子立方体内的边不相交哈密顿环,则可能进一步减少拥塞。拥塞也可以通过在对立方体节点之间使用多条路径来减少,即通过使用“胖”边。
{"title":"Embedding Meshes into Small Boolean Cubes","authors":"Ching-Tien Ho, S. Johnsson","doi":"10.1109/DMCC.1990.556398","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556398","url":null,"abstract":"The embedding of arrays in Boolean cubes, when there are more array elements than nodes in the cube, can always be made with optimal load-factor by reshaping the array to a one-dimensional array. We show that the dilation for such an embedding is of an .to x .t1 x - + x &-I array in an n-cube.Dila tion one embeddings can be obtained by splitting each axis into segments and assigning segments to nodes in the cube by a Gray code. The load-factor is optimal if the axis lengths contain sufficiently many powers of two. The congestion is minimized, if the segment lengths along the different axes are as equal as possible, for the cube configured with at most as many axes as the array. A further decrease in the congestion is possible if the array is partitioned into subarrays, and corresponding axis of different subarrays make use of edge-disjoint Hamiltonian cycles within subcubes. The congestion can also be reduced by using multiple paths between pairs of cube nodes, i.e., by using “fat” edges.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124923711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An Input/Output Algorithm for M-Dimensional Rectangular Domain Decompositions on N-Dimensional Hypercube Multicomputers n维超立方体多计算机上m维矩形域分解的输入/输出算法
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556294
H. Embrechts, J.P. Jones
Hypercube-topology concurrent multicomputers owe at least part of their popularity to the fact that it is relatively simple to decompose rectangularly-shaped Mdimensional domains into subdomains and assign these subdoniains to processors (PES) in a manner which preserves the adjacencies of the subdoniains. However, this decomposition involves some rearrangement of the data during input/output operations to (linear memory) data acquisition, display, or mass storage devices. We show that this rearrangement can be done efficiently, in parallel. The main consequence of this algorithm is that Mdimensional data can be stored in a simple, general format and yet be communicated efaiciently independent of the dimension of the hypercube or the number of these dimensions assigned to the dimensions of the domain. This algorithm is also relevant to applications with mixed domain decompositions, and to parallel mass storage media such as disk farms.
超立方体拓扑并发多计算机的流行至少部分归功于这样一个事实,即将矩形的m维域分解为子域并以保留子域邻接性的方式将这些子域分配给处理器(PES)相对简单。然而,这种分解涉及到在(线性存储器)数据采集、显示或大容量存储设备的输入/输出操作期间对数据进行一些重新排列。我们证明了这种重排可以高效地并行完成。该算法的主要结果是,m维数据可以以简单、通用的格式存储,并且可以独立于超立方体的维度或分配给域维度的这些维度的数量而有效地进行通信。该算法也适用于混合域分解的应用,以及并行大容量存储介质(如磁盘场)。
{"title":"An Input/Output Algorithm for M-Dimensional Rectangular Domain Decompositions on N-Dimensional Hypercube Multicomputers","authors":"H. Embrechts, J.P. Jones","doi":"10.1109/DMCC.1990.556294","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556294","url":null,"abstract":"Hypercube-topology concurrent multicomputers owe at least part of their popularity to the fact that it is relatively simple to decompose rectangularly-shaped Mdimensional domains into subdomains and assign these subdoniains to processors (PES) in a manner which preserves the adjacencies of the subdoniains. However, this decomposition involves some rearrangement of the data during input/output operations to (linear memory) data acquisition, display, or mass storage devices. We show that this rearrangement can be done efficiently, in parallel. The main consequence of this algorithm is that Mdimensional data can be stored in a simple, general format and yet be communicated efaiciently independent of the dimension of the hypercube or the number of these dimensions assigned to the dimensions of the domain. This algorithm is also relevant to applications with mixed domain decompositions, and to parallel mass storage media such as disk farms.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121794642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Task Mapping Method for a Hypercube by Combining Subcubes 组合子数据集的超立方体任务映射方法
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556298
S. Horiike
This paper presents a new algorithm for mapping of tasks onto a hypercube. Given a weighted task graph, the algorithm finds good mapping in a reasonable computation time. When the target computer is ndimensional cube (n-cube), the proposed algorithm is composed of n stages. The algorithm starts with an initial state in which the tasks are mapped onto 2n 0cubes. At each stage k, the task graph is mapped onto 2n-k k-cubes. At the beginning of stage k, the tasks have already been mapped onto 2n-(k-1) (k-1)-cubes. The tasks are mapped onto k-cubes by combining a pair of (k-1)-cubes. 2n-k pairs of (k-1)-cubes are determined, and they are combined so that the mapping onto the k-cubes makes the communication cost as low as possible. When the target computer is n-dimensional cube (ncube), the proposed algorithm is composed of n stages. The algorithm starts with an initial state in which the tasks are mapped onto 2" 0-cubes. At each stage k (k=1,2,..,n), the task graph is mapped onto 2n-k k-cubes. At the beginning of stage k, the tasks are already mapped onto 2n-(k-1) (k-1)-cubes. The mapping onto k-cubes can be done by combining a pair of (k-1)-cubes. 2n-k pairs are determined among 2n-(k-1) (k-1)-cubes, and they are combined so that mapping onto the k-cubes makes the communication cost as low as possible.
提出了一种将任务映射到超立方体上的新算法。给定一个加权任务图,该算法在合理的计算时间内找到较好的映射。当目标计算机为n维立方体(n-cube)时,该算法由n个阶段组成。该算法从一个初始状态开始,在初始状态下,任务被映射到2n个立方体上。在每个阶段k,任务图被映射到2n-k个立方体上。在阶段k开始时,任务已经被映射到2n-(k-1) (k-1)个立方体上。任务通过组合一对(k-1)个立方体映射到k个立方体上。确定了2n-k对(k-1)立方体,并将它们组合在一起,以便映射到k个立方体上,使通信成本尽可能低。当目标计算机为n维立方体(ncube)时,算法由n个阶段组成。该算法从一个初始状态开始,在初始状态下,任务被映射到2英寸的0立方上。在每个阶段k (k=1,2,…,n),任务图被映射到2n-k -k -立方体上。在阶段k开始时,任务已经被映射到2n-(k-1) (k-1)个立方体上。映射到k-立方体可以通过组合一对(k-1)-立方体来完成。在2n-(k-1) (k-1)个立方体中确定2n-k对,并将它们组合起来,以便映射到k个立方体上,使通信成本尽可能低。
{"title":"A Task Mapping Method for a Hypercube by Combining Subcubes","authors":"S. Horiike","doi":"10.1109/DMCC.1990.556298","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556298","url":null,"abstract":"This paper presents a new algorithm for mapping of tasks onto a hypercube. Given a weighted task graph, the algorithm finds good mapping in a reasonable computation time. When the target computer is ndimensional cube (n-cube), the proposed algorithm is composed of n stages. The algorithm starts with an initial state in which the tasks are mapped onto 2n 0cubes. At each stage k, the task graph is mapped onto 2n-k k-cubes. At the beginning of stage k, the tasks have already been mapped onto 2n-(k-1) (k-1)-cubes. The tasks are mapped onto k-cubes by combining a pair of (k-1)-cubes. 2n-k pairs of (k-1)-cubes are determined, and they are combined so that the mapping onto the k-cubes makes the communication cost as low as possible. When the target computer is n-dimensional cube (ncube), the proposed algorithm is composed of n stages. The algorithm starts with an initial state in which the tasks are mapped onto 2\" 0-cubes. At each stage k (k=1,2,..,n), the task graph is mapped onto 2n-k k-cubes. At the beginning of stage k, the tasks are already mapped onto 2n-(k-1) (k-1)-cubes. The mapping onto k-cubes can be done by combining a pair of (k-1)-cubes. 2n-k pairs are determined among 2n-(k-1) (k-1)-cubes, and they are combined so that mapping onto the k-cubes makes the communication cost as low as possible.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127697939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Parallel Discrete Event Simulation Using Synchronized Event Schedulers 使用同步事件调度程序的并行离散事件模拟
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.555367
W. Bain
This paper describes a new algorithm for the synchronization of a class of parallel discrete event simulations on distributed memory, parallel computers. Unlike previous algorithms which synchronize on a per process basis, this algorithm synchronizes on a per processor basis. The algorithm allows full generality in the simulation model by allowing dynamic process creation and destruction and full inter-process interconnections, and it is shown to be deadlock and livelock free. It has been used to simulate very large parallel computer architectures.
本文提出了一种在分布式存储、并行计算机上对一类并行离散事件模拟进行同步的新算法。与以前以每个进程为基础进行同步的算法不同,该算法以每个处理器为基础进行同步。该算法通过允许动态进程的创建和销毁以及进程间的完全互连,使仿真模型具有充分的通用性,并且无死锁和活锁。它已被用于模拟非常大的并行计算机体系结构。
{"title":"Parallel Discrete Event Simulation Using Synchronized Event Schedulers","authors":"W. Bain","doi":"10.1109/DMCC.1990.555367","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555367","url":null,"abstract":"This paper describes a new algorithm for the synchronization of a class of parallel discrete event simulations on distributed memory, parallel computers. Unlike previous algorithms which synchronize on a per process basis, this algorithm synchronizes on a per processor basis. The algorithm allows full generality in the simulation model by allowing dynamic process creation and destruction and full inter-process interconnections, and it is shown to be deadlock and livelock free. It has been used to simulate very large parallel computer architectures.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131827078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Distributed Memory Implementation of SISAL SISAL的分布式内存实现
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556327
D. Grit
SISAL is a general-purpose applicative 11 anguage intended for use on both conventional aiid novel multiprocessor systems. In this paper we describe the port of a shared memory implemeni,ation to a distributed memory environment. A ni mber of issues are specifically addressed: the e~~aluation strategy, memory management, schedulinp , stream handling, and task synchronization.
SISAL是一种通用的应用语言,用于传统的和新型的多处理器系统。在本文中,我们描述了共享内存实现到分布式内存环境的移植。具体解决了许多问题:e~~求值策略、内存管理、调度、流处理和任务同步。
{"title":"A Distributed Memory Implementation of SISAL","authors":"D. Grit","doi":"10.1109/DMCC.1990.556327","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556327","url":null,"abstract":"SISAL is a general-purpose applicative 11 anguage intended for use on both conventional aiid novel multiprocessor systems. In this paper we describe the port of a shared memory implemeni,ation to a distributed memory environment. A ni mber of issues are specifically addressed: the e~~aluation strategy, memory management, schedulinp , stream handling, and task synchronization.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114182629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Visual Animation of Parallel Algorithms for Matrix Computations 矩阵计算并行算法的视觉动画
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556337
M. Heath
In this talk we show how graphical animation of the behavior of parallel algorithms can facilitate the design and performance enhancement of algorithms for matrix computations on parallel computer architectures. Using a portable instrumented communication library and a graphical animation package developed at Oak Ridge National Laboratory, we illustrate the effects of various strategies in parallel algorithm design, including interconnection topologies, global communication patterns, data mapping schemes, load balancing, and pipelining techniques for overlapping communication with computation. In this talk we focus on distributed-memory parallel architectures in which the processors communicate by passing messages. The linear algebra problems we consider include matrix factorization and the solution of triangular systems.
在这次演讲中,我们将展示并行算法行为的图形化动画如何促进并行计算机架构上矩阵计算算法的设计和性能增强。利用美国橡树岭国家实验室开发的便携式仪器通信库和图形动画包,我们说明了并行算法设计中各种策略的影响,包括互连拓扑、全局通信模式、数据映射方案、负载平衡和用于重叠通信与计算的流水线技术。在本次演讲中,我们将重点讨论分布式内存并行架构,其中处理器通过传递消息进行通信。我们考虑的线性代数问题包括矩阵分解和三角形系统的解。
{"title":"Visual Animation of Parallel Algorithms for Matrix Computations","authors":"M. Heath","doi":"10.1109/DMCC.1990.556337","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556337","url":null,"abstract":"In this talk we show how graphical animation of the behavior of parallel algorithms can facilitate the design and performance enhancement of algorithms for matrix computations on parallel computer architectures. Using a portable instrumented communication library and a graphical animation package developed at Oak Ridge National Laboratory, we illustrate the effects of various strategies in parallel algorithm design, including interconnection topologies, global communication patterns, data mapping schemes, load balancing, and pipelining techniques for overlapping communication with computation. In this talk we focus on distributed-memory parallel architectures in which the processors communicate by passing messages. The linear algebra problems we consider include matrix factorization and the solution of triangular systems.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114588854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Basic Matrix Subprograms for Distributed Memory Systems 分布式存储系统的基本矩阵子程序
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.555399
A. Elster
Parallel systems are in general complicated to utilize eficiently. As they evolve in complexity, it hence becomes increasingly more important to provide libraries and language features that can spare the users from the knowledge of low-level system details. Our effort in this direction is to develop a set of basic matrix algorithms for distributed memory systems such as the hypercube. The goal is to be able to provide for distributed memory systems an environment similar to that which the Level-3 Basic Linear Algebra Subprograms (BLAS3) provide for the sequential and shared memory environments. These subprograms facilitate the development of eficient and portable algorithms that are rich in matrix-matrix multiplication, on which major software eflorts such as LAPACK have been built. To demonstrate the concept, some of these Level-3 algorithms are being developed on the Intel iPSC/2 hypercube. Central to this effort is the General Matrix-Matrix Multiplication routine PGEMM. The symmetric and triangular multiplications as well as, rank-tk updates (symmetric case), and the solution of triangular systems with multiple right hand sides, are also discussed.
一般来说,并行系统很难有效地利用。随着复杂性的发展,提供库和语言特性变得越来越重要,这些库和语言特性可以使用户免于了解底层系统细节。我们在这个方向上的努力是为分布式内存系统(如hypercube)开发一套基本矩阵算法。我们的目标是能够为分布式内存系统提供类似于3级基本线性代数子程序(BLAS3)为顺序和共享内存环境提供的环境。这些子程序促进了高效和可移植算法的开发,这些算法具有丰富的矩阵-矩阵乘法,在这些算法的基础上已经建立了诸如LAPACK之类的主要软件。为了演示这个概念,其中一些Level-3算法正在英特尔iPSC/2超立方体上开发。这项工作的核心是通用矩阵-矩阵乘法例程PGEMM。讨论了对称和三角乘法、秩-tk更新(对称情况)以及具有多个右边边的三角系统的解。
{"title":"Basic Matrix Subprograms for Distributed Memory Systems","authors":"A. Elster","doi":"10.1109/DMCC.1990.555399","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555399","url":null,"abstract":"Parallel systems are in general complicated to utilize eficiently. As they evolve in complexity, it hence becomes increasingly more important to provide libraries and language features that can spare the users from the knowledge of low-level system details. Our effort in this direction is to develop a set of basic matrix algorithms for distributed memory systems such as the hypercube. The goal is to be able to provide for distributed memory systems an environment similar to that which the Level-3 Basic Linear Algebra Subprograms (BLAS3) provide for the sequential and shared memory environments. These subprograms facilitate the development of eficient and portable algorithms that are rich in matrix-matrix multiplication, on which major software eflorts such as LAPACK have been built. To demonstrate the concept, some of these Level-3 algorithms are being developed on the Intel iPSC/2 hypercube. Central to this effort is the General Matrix-Matrix Multiplication routine PGEMM. The symmetric and triangular multiplications as well as, rank-tk updates (symmetric case), and the solution of triangular systems with multiple right hand sides, are also discussed.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115855568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Complexity Of Scattering On A Ring Of Processors 处理器环上散射的复杂性
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556395
P. Fraigniaud, S. Miguet, Y. Robert
In this paper, we prove that the complexity of scattering in an oriented ring of p processors is (p-1) * (p + L * z) where L is the length of the messages, p the communication startup, and z the elemental propagation time. 1. SCATTERING In a recent paper, Saad and Schultz [SSI study various basic communication kernels in parallel architectures. They point out that interprocessor communication is often one of the main obstacles to increasing performance of parallel algorithms for multiprocessors. They consider the following data exchange operations: (1) One-to-one: moving data from one processor to another.
本文证明了p个处理器组成的定向环的散射复杂度为(p-1) * (p + L * z),其中L为消息长度,p为通信启动,z为元素传播时间。1. 在最近的一篇论文中,Saad和Schultz [SSI]研究了并行架构中的各种基本通信内核。他们指出,处理器间通信通常是提高多处理器并行算法性能的主要障碍之一。他们考虑以下数据交换操作:(1)一对一:将数据从一个处理器移动到另一个处理器。
{"title":"Complexity Of Scattering On A Ring Of Processors","authors":"P. Fraigniaud, S. Miguet, Y. Robert","doi":"10.1109/DMCC.1990.556395","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556395","url":null,"abstract":"In this paper, we prove that the complexity of scattering in an oriented ring of p processors is (p-1) * (p + L * z) where L is the length of the messages, p the communication startup, and z the elemental propagation time. 1. SCATTERING In a recent paper, Saad and Schultz [SSI study various basic communication kernels in parallel architectures. They point out that interprocessor communication is often one of the main obstacles to increasing performance of parallel algorithms for multiprocessors. They consider the following data exchange operations: (1) One-to-one: moving data from one processor to another.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114993575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the Fifth Distributed Memory Computing Conference, 1990.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1