首页 > 最新文献

Proceedings of the Fifth Distributed Memory Computing Conference, 1990.最新文献

英文 中文
Massively Parallel Computation of the Euler Equations 欧拉方程的大规模并行计算
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.555419
C. Grosch, M. Ghose, S.N. Gupta, T. L. Jackson, M. Zubair
We present a systematic study of the applicability of massively parallel computers, the AMT DAP-510/610 and the TMC CM-2, to the solution of the two-dimensional unsteady Euler equa tions using a compact high-order scheme. The performance of these machines is compared to that of the Cray-2 and the Cray-YMP/832 using the same algorithm and for the same test problem.
我们系统地研究了大规模并行计算机AMT DAP-510/610和TMC CM-2在使用紧凑高阶格式求解二维非定常欧拉方程中的适用性。使用相同的算法和相同的测试问题,将这些机器的性能与Cray-2和Cray-YMP/832进行比较。
{"title":"Massively Parallel Computation of the Euler Equations","authors":"C. Grosch, M. Ghose, S.N. Gupta, T. L. Jackson, M. Zubair","doi":"10.1109/DMCC.1990.555419","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555419","url":null,"abstract":"We present a systematic study of the applicability of massively parallel computers, the AMT DAP-510/610 and the TMC CM-2, to the solution of the two-dimensional unsteady Euler equa tions using a compact high-order scheme. The performance of these machines is compared to that of the Cray-2 and the Cray-YMP/832 using the same algorithm and for the same test problem.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115264077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallel Distributed-Memory Implementation of the Corrective Switching Problem 纠错开关问题的并行分布式存储器实现
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.555358
J. Blanc, D. Trystram, J. Ryckbosch
LMC-IIVLAG EDF-DER Abstract. For the past 20 years, an increasing interest has been devoted to the sequential Conjugate Gradient Method for solving large linear systems arising from the modeling of physical problems (especially for very large systems with sparse matrices). This paper deals with the implementation on parallel supercomputers of a preconditioned conjugate gradient method for solving the corrective switching problem obtained while modeling the behavior of power systems in electrical networks. This problem consists in finding the successive solutions of many close linear systems (not too large) with very ill-conditioned matrices (sometimes even singular). We present a new method based on the Preconditioned Conjugate Gradient algorithm with an original preconditioning and study its parallelization on both shared and distributed memory computers. 1. Setting of the problem During the control of electrical networks, the operator must ensure the system to bc in a safc state (i.e. to be able to protect the system against incidents liable to occur in real time). The demand and the possibility of the plants are such that nuclear energy between two plants flows from various nodes of the network. The loss of one element could jeopardize the security of the whole system by a chain tripping: in such case, an overload line occurs and without any operation the protective devices will act and the line will trip out. In actual operations conditions, the switching actions that the operator applies to the electrical network ensure that overloads will disappear before the delayed protective devices go into action. Such actions are shown on the picture at the end of the paper. The computation of switching actions is a combinatorial problem, very hard to solve. The connections of the switching elements are described as discrete variables. The corrective switching problem corresponds to determine the various possible solutions of the load flow calculation. Each such situation requires to solve a linear system where the matrices have only a few elements which differ from each other. Let us consider the N consecutive linear systems below: (Si) Ajx; = b;, lG
LMC-IIVLAG EDF-DER摘要。在过去的20年里,序列共轭梯度法在求解大型线性系统(特别是具有稀疏矩阵的非常大的系统)的物理问题建模中引起了越来越多的兴趣。本文研究了一种预条件共轭梯度法在并行超级计算机上的实现,该方法用于求解电网中电力系统行为建模时得到的校正开关问题。这个问题包括寻找许多具有非常病态矩阵(有时甚至是奇异矩阵)的紧密线性系统(不太大)的连续解。提出了一种基于预条件共轭梯度算法的新方法,并对其在共享和分布式存储计算机上的并行化进行了研究。1. 在电网控制过程中,操作员必须确保系统处于安全状态(即能够保护系统免受实时可能发生的事故的影响)。电厂的需求和可能性是这样的,两个电厂之间的核能从网络的不同节点流动。一个元件的丢失可能会因链式跳闸而危及整个系统的安全:在这种情况下,线路发生过载,不需要任何操作,保护装置就会起作用,线路就会跳闸。在实际运行条件下,操作人员对电网的切换动作确保在延迟保护装置动作之前过载消失。这些动作在文章末尾的图片中都有显示。开关动作的计算是一个很难解决的组合问题。开关元件的连接被描述为离散变量。纠偏切换问题对应于确定潮流计算的各种可能解。每个这样的情况都需要求解一个线性系统,其中矩阵只有几个元素彼此不同。让我们考虑下面的N个连续线性系统:(Si) Ajx;= b;, lG
{"title":"Parallel Distributed-Memory Implementation of the Corrective Switching Problem","authors":"J. Blanc, D. Trystram, J. Ryckbosch","doi":"10.1109/DMCC.1990.555358","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555358","url":null,"abstract":"LMC-IIVLAG EDF-DER Abstract. For the past 20 years, an increasing interest has been devoted to the sequential Conjugate Gradient Method for solving large linear systems arising from the modeling of physical problems (especially for very large systems with sparse matrices). This paper deals with the implementation on parallel supercomputers of a preconditioned conjugate gradient method for solving the corrective switching problem obtained while modeling the behavior of power systems in electrical networks. This problem consists in finding the successive solutions of many close linear systems (not too large) with very ill-conditioned matrices (sometimes even singular). We present a new method based on the Preconditioned Conjugate Gradient algorithm with an original preconditioning and study its parallelization on both shared and distributed memory computers. 1. Setting of the problem During the control of electrical networks, the operator must ensure the system to bc in a safc state (i.e. to be able to protect the system against incidents liable to occur in real time). The demand and the possibility of the plants are such that nuclear energy between two plants flows from various nodes of the network. The loss of one element could jeopardize the security of the whole system by a chain tripping: in such case, an overload line occurs and without any operation the protective devices will act and the line will trip out. In actual operations conditions, the switching actions that the operator applies to the electrical network ensure that overloads will disappear before the delayed protective devices go into action. Such actions are shown on the picture at the end of the paper. The computation of switching actions is a combinatorial problem, very hard to solve. The connections of the switching elements are described as discrete variables. The corrective switching problem corresponds to determine the various possible solutions of the load flow calculation. Each such situation requires to solve a linear system where the matrices have only a few elements which differ from each other. Let us consider the N consecutive linear systems below: (Si) Ajx; = b;, lG<N where the matrices Ai (of size n by n) are \"close\" to each other, viz, A;+1 = Ai+Ai, with Ai of small norm. The solutions xi will be close to each other in this sense, and we want to take full advantage of this. Note that this problem also occurs in Adaptive Filtering or Finite Element modeling.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116306272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallel Loops on Distributed Machines 分布式机器上的并行循环
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556322
C. Koelbel, P. Mehrotra, J. Saltz, H. Berryman
Any programming environment for distributed memory machines that allows the user to specify pdwallel do loops over globally defined data structures requires optimizations that go beyond the specification of Lrppropriate data and workload partitionings. In this paper, we consider optimizations that are required for efficient execution of a code segment that consists of pmallel loops over distributed data Structures. On distributed memory machines it is typically very expensive tci fetch individual data elements. Instead, before a parallirl loop executes, it is desirable to prefetch all off-processor data required in the loop. We specify a scheme for s boring copies of fetched data along with a scheme for accessing copies of off-processor data during the computafJ ion of the loop. The performance of such optimizations rm the iPSC/2 and the NCUBE is also presented.
任何允许用户在全局定义的数据结构上指定pdwall_do循环的分布式内存机器编程环境,都需要进行超出适当数据和工作负载分区规范的优化。在本文中,我们考虑了有效执行由分布数据结构上的并行循环组成的代码段所需的优化。在分布式内存机器上,获取单个数据元素通常非常昂贵。相反,在并行循环执行之前,最好是预取循环中所需的所有离处理器数据。我们为获取的数据的5个无聊副本指定了一种方案,并为在循环计算期间访问离处理器数据的副本指定了一种方案。本文还介绍了iPSC/2和NCUBE的优化性能。
{"title":"Parallel Loops on Distributed Machines","authors":"C. Koelbel, P. Mehrotra, J. Saltz, H. Berryman","doi":"10.1109/DMCC.1990.556322","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556322","url":null,"abstract":"Any programming environment for distributed memory machines that allows the user to specify pdwallel do loops over globally defined data structures requires optimizations that go beyond the specification of Lrppropriate data and workload partitionings. In this paper, we consider optimizations that are required for efficient execution of a code segment that consists of pmallel loops over distributed data Structures. On distributed memory machines it is typically very expensive tci fetch individual data elements. Instead, before a parallirl loop executes, it is desirable to prefetch all off-processor data required in the loop. We specify a scheme for s boring copies of fetched data along with a scheme for accessing copies of off-processor data during the computafJ ion of the loop. The performance of such optimizations rm the iPSC/2 and the NCUBE is also presented.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116018461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Surface Reconstruction and Discontinuity Detection: A Fast Hierarchical Approach on a Two-Dimensional Mesh 二维网格表面重构与不连续检测:一种快速分层方法
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.555382
R. Battiti
Recently multigrid techniques have been proposed for solving low-level vision problems in optimal time (i.e. time proportional to the number of pixels). In the present work this method is extended to incorporate a discontinuity detection process cooperating with the smoothing phase on all scales. Activation of line element detectors that signal the presence of relevant discontinuities is based on information gathered from neighboring points at the same and different scales. Because the required computation is local, parallelism can be profitably used. A mapping of the required data structure onto a two dimensional mesh of processors is suggested. Domain decomposition is shown to be efficient on MIMD computers capable of containing many individual cells in each processor. Some examples of the proposed multiscale solution techniques are shown for two different applications. In the first case a surface is reconstructed from first derivative information (extracted from the intensity data), in the second case from noisy depth constraints.
最近,多网格技术被提出用于在最佳时间(即与像素数成比例的时间)内解决低级视觉问题。在目前的工作中,该方法被扩展到包含一个在所有尺度上与平滑阶段合作的不连续检测过程。线素探测器的激活是基于从相同或不同尺度的相邻点收集的信息来指示相关不连续点的存在。由于所需的计算是局部的,因此可以有效地使用并行性。建议将所需的数据结构映射到处理器的二维网格上。领域分解在能够在每个处理器中包含许多单独单元的MIMD计算机上被证明是有效的。针对两种不同的应用,给出了所提出的多尺度解决技术的一些示例。在第一种情况下,从一阶导数信息(从强度数据中提取)重建表面,在第二种情况下,从噪声深度约束中重建表面。
{"title":"Surface Reconstruction and Discontinuity Detection: A Fast Hierarchical Approach on a Two-Dimensional Mesh","authors":"R. Battiti","doi":"10.1109/DMCC.1990.555382","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555382","url":null,"abstract":"Recently multigrid techniques have been proposed for solving low-level vision problems in optimal time (i.e. time proportional to the number of pixels). In the present work this method is extended to incorporate a discontinuity detection process cooperating with the smoothing phase on all scales. Activation of line element detectors that signal the presence of relevant discontinuities is based on information gathered from neighboring points at the same and different scales. Because the required computation is local, parallelism can be profitably used. A mapping of the required data structure onto a two dimensional mesh of processors is suggested. Domain decomposition is shown to be efficient on MIMD computers capable of containing many individual cells in each processor. Some examples of the proposed multiscale solution techniques are shown for two different applications. In the first case a surface is reconstructed from first derivative information (extracted from the intensity data), in the second case from noisy depth constraints.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121840684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Communication Parameter Tests and Parallel Back Propagation Algorithms on iPSC/2 Hypercube Multiprocessor iPSC/2超立方多处理机通信参数测试及并行反向传播算法
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556397
B. Mak, O. Egecioglu
The communication complexity on Intel’s second generation iPSC/2 hypercube and its effect on parallelization of Back Propagation type training algorithms for neural networks are explored. On iPSC/2 , different broadcasting methods are tested and three inter-node communication schemes are evaluated based on their performance on vector addition. These communication schemes are then utilized on parallel versions of the Back Propagation training algorithm. The performance of the resulting parallel variants of Back Propagation are analyzed using two medium size problems: vowel classification and English text-to-speech conversion (NETtalk data).
研究了Intel第二代iPSC/2超立方体处理器的通信复杂度及其对神经网络反向传播训练算法并行化的影响。在iPSC/2上对不同的广播方式进行了测试,并对三种节点间通信方案的矢量加法性能进行了评价。然后将这些通信方案用于并行版本的反向传播训练算法。使用两个中等规模的问题:元音分类和英语文本到语音转换(NETtalk数据)来分析反向传播的并行变体的性能。
{"title":"Communication Parameter Tests and Parallel Back Propagation Algorithms on iPSC/2 Hypercube Multiprocessor","authors":"B. Mak, O. Egecioglu","doi":"10.1109/DMCC.1990.556397","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556397","url":null,"abstract":"The communication complexity on Intel’s second generation iPSC/2 hypercube and its effect on parallelization of Back Propagation type training algorithms for neural networks are explored. On iPSC/2 , different broadcasting methods are tested and three inter-node communication schemes are evaluated based on their performance on vector addition. These communication schemes are then utilized on parallel versions of the Back Propagation training algorithm. The performance of the resulting parallel variants of Back Propagation are analyzed using two medium size problems: vowel classification and English text-to-speech conversion (NETtalk data).","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123338190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Visual Animation of Parallel Algorithms for Matrix Computations 矩阵计算并行算法的视觉动画
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556337
M. Heath
In this talk we show how graphical animation of the behavior of parallel algorithms can facilitate the design and performance enhancement of algorithms for matrix computations on parallel computer architectures. Using a portable instrumented communication library and a graphical animation package developed at Oak Ridge National Laboratory, we illustrate the effects of various strategies in parallel algorithm design, including interconnection topologies, global communication patterns, data mapping schemes, load balancing, and pipelining techniques for overlapping communication with computation. In this talk we focus on distributed-memory parallel architectures in which the processors communicate by passing messages. The linear algebra problems we consider include matrix factorization and the solution of triangular systems.
在这次演讲中,我们将展示并行算法行为的图形化动画如何促进并行计算机架构上矩阵计算算法的设计和性能增强。利用美国橡树岭国家实验室开发的便携式仪器通信库和图形动画包,我们说明了并行算法设计中各种策略的影响,包括互连拓扑、全局通信模式、数据映射方案、负载平衡和用于重叠通信与计算的流水线技术。在本次演讲中,我们将重点讨论分布式内存并行架构,其中处理器通过传递消息进行通信。我们考虑的线性代数问题包括矩阵分解和三角形系统的解。
{"title":"Visual Animation of Parallel Algorithms for Matrix Computations","authors":"M. Heath","doi":"10.1109/DMCC.1990.556337","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556337","url":null,"abstract":"In this talk we show how graphical animation of the behavior of parallel algorithms can facilitate the design and performance enhancement of algorithms for matrix computations on parallel computer architectures. Using a portable instrumented communication library and a graphical animation package developed at Oak Ridge National Laboratory, we illustrate the effects of various strategies in parallel algorithm design, including interconnection topologies, global communication patterns, data mapping schemes, load balancing, and pipelining techniques for overlapping communication with computation. In this talk we focus on distributed-memory parallel architectures in which the processors communicate by passing messages. The linear algebra problems we consider include matrix factorization and the solution of triangular systems.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114588854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Local Search Variants for Hypercube Embedding 超立方体嵌入的局部搜索变体
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556399
Woei-kae Chen, Matthias F. Stallmann
The hypercube embedding problem, a restricted ver- sion of the general mapping problem, is the problem of mapping a set of communicating processes to a hy- percube multiprocessor. The goal is to find a map- ping that minimizes the average length of the paths between communicating processes. Iterative improve- ment heuristics for hypercube embedding, including a local search, a Kernighan-Lin, and a simulated an- nealing, are evaluated under different options includ- ing neighborhoods (all-swaps versus cube-neighbors), initial solutions (random versus greedy), and enhance- ments on terminating conditions (flat moves and up- hill moves). By varying these options we obtain a wide range of tradeoffs between execution time and solution quality.
超立方体嵌入问题是一般映射问题的一个限制版本,它是将一组通信进程映射到一个超立方体多处理器的问题。目标是找到一种映射,使通信进程之间的平均路径长度最小化。超立方体嵌入的迭代改进启发式,包括局部搜索、Kernighan-Lin和模拟逼近,在不同的选项下进行评估,包括邻域(全交换与立方体邻居)、初始解(随机与贪婪)和终止条件的增强(平移和上坡移动)。通过改变这些选项,我们可以在执行时间和解决方案质量之间获得广泛的折衷。
{"title":"Local Search Variants for Hypercube Embedding","authors":"Woei-kae Chen, Matthias F. Stallmann","doi":"10.1109/DMCC.1990.556399","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556399","url":null,"abstract":"The hypercube embedding problem, a restricted ver- sion of the general mapping problem, is the problem of mapping a set of communicating processes to a hy- percube multiprocessor. The goal is to find a map- ping that minimizes the average length of the paths between communicating processes. Iterative improve- ment heuristics for hypercube embedding, including a local search, a Kernighan-Lin, and a simulated an- nealing, are evaluated under different options includ- ing neighborhoods (all-swaps versus cube-neighbors), initial solutions (random versus greedy), and enhance- ments on terminating conditions (flat moves and up- hill moves). By varying these options we obtain a wide range of tradeoffs between execution time and solution quality.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128421556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Dynamic Load Balancing in a Concurrent Plasma PIC Code on the JPL/Caltech Mark III Hypercube JPL/Caltech Mark III Hypercube上并发等离子体PIC代码的动态负载平衡
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556302
P. Liewer, E. W. Leaver, V. Decyk, J. Dawson
Dynamic load balancing has been implemented in a concurrent one-dimensional electromagnetic plasma particle-in-cell (PIC) simulation code using a method which adds very little overhead to the parallel code. In PIC codes, the orbits of many interacting plasma electrons and ions are followed as an initial value problem as the particles move in electromagnetic fields calculated self-consistently from the particle motions. The code was implemented using the GCPIC algorithm in which the particles are divided among processors by partitioning the spatial domain of the simulation. The problem is load-balanced by partitioning the spatial domain so that each partition has approximately the same number of particles. During the simulation, the partitions are dynamically recreated as the spatial distribution of the particles changes in order to maintain processor load balance.
在一维电磁等离子体粒子池(PIC)仿真代码中实现了动态负载平衡,该方法对并行代码增加的开销很小。在PIC代码中,当粒子在自一致计算的电磁场中运动时,许多相互作用的等离子体电子和离子的轨道作为初始值问题被遵循。该代码使用GCPIC算法实现,其中粒子通过划分模拟的空间域在处理器之间进行划分。该问题通过划分空间域来实现负载平衡,以便每个分区具有大约相同数量的粒子。在模拟过程中,随着粒子空间分布的变化,动态地重新创建分区,以保持处理器负载平衡。
{"title":"Dynamic Load Balancing in a Concurrent Plasma PIC Code on the JPL/Caltech Mark III Hypercube","authors":"P. Liewer, E. W. Leaver, V. Decyk, J. Dawson","doi":"10.1109/DMCC.1990.556302","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556302","url":null,"abstract":"Dynamic load balancing has been implemented in a concurrent one-dimensional electromagnetic plasma particle-in-cell (PIC) simulation code using a method which adds very little overhead to the parallel code. In PIC codes, the orbits of many interacting plasma electrons and ions are followed as an initial value problem as the particles move in electromagnetic fields calculated self-consistently from the particle motions. The code was implemented using the GCPIC algorithm in which the particles are divided among processors by partitioning the spatial domain of the simulation. The problem is load-balanced by partitioning the spatial domain so that each partition has approximately the same number of particles. During the simulation, the partitions are dynamically recreated as the spatial distribution of the particles changes in order to maintain processor load balance.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130498587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Molecular Dynamics Simulations of Short-Range Force Systems on 1024-Node Hypercubes 1024节点超立方体上短程力系统的分子动力学模拟
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.555423
S. Plimpton
Two parallel algorithms for classical molecular dynamics are presented. The first assigns each processor to a subset of particles; the second assigns each to a fixed region of 3d space. The algorithms are implemented on 1024-node hypercubes for problems characterized by short-range forces, diffusion (so that each particle’s neighbors change in time), and problem size ranging from 250 to 10000 particles. Timings for the algorithms on the 1024-node NCUBE/ten and the newer NCUBE 2 hypercubes are given. The latter is found to be competitive with a CRAY-XMP, running an optimized serial algorithm. For smaller problems the NCUBE 2 and CRAY-XMP are roughly the same; for larger ones the NCUBE 2 (with 1024 nodes) is up to twice as fast. Parallel efficiencies of the algorithms and communication parameters for the two hypercubes are also examined.
提出了经典分子动力学的两种并行算法。第一种方法将每个处理器分配给一个粒子子集;第二种方法是将每个人分配到三维空间的一个固定区域。这些算法在1024节点的超立方体上实现,用于具有短程力、扩散(因此每个粒子的邻居随时间变化)和问题大小从250到10000个粒子的问题。给出了算法在1024节点的NCUBE/ 10和较新的NCUBE 2超立方体上的时序。后者可以与运行优化串行算法的CRAY-XMP相竞争。对于较小的问题,NCUBE 2和CRAY-XMP大致相同;对于更大的节点,NCUBE 2(有1024个节点)的速度是前者的两倍。研究了两个超立方体算法的并行效率和通信参数。
{"title":"Molecular Dynamics Simulations of Short-Range Force Systems on 1024-Node Hypercubes","authors":"S. Plimpton","doi":"10.1109/DMCC.1990.555423","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555423","url":null,"abstract":"Two parallel algorithms for classical molecular dynamics are presented. The first assigns each processor to a subset of particles; the second assigns each to a fixed region of 3d space. The algorithms are implemented on 1024-node hypercubes for problems characterized by short-range forces, diffusion (so that each particle’s neighbors change in time), and problem size ranging from 250 to 10000 particles. Timings for the algorithms on the 1024-node NCUBE/ten and the newer NCUBE 2 hypercubes are given. The latter is found to be competitive with a CRAY-XMP, running an optimized serial algorithm. For smaller problems the NCUBE 2 and CRAY-XMP are roughly the same; for larger ones the NCUBE 2 (with 1024 nodes) is up to twice as fast. Parallel efficiencies of the algorithms and communication parameters for the two hypercubes are also examined.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130773478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Hypercube Dynamic Load Balancing Hypercube动态负载平衡
Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556305
D. King, E. Wegman
This paper reports on the results of a preliminary study in dynamic load balancing on an Intel Hypercube. The purpose of this research is to provide experimental data in how parallel algorithms should be constructed to obtain maximal utilization of a parallel architecture. This study is one aspect of an ongoing research project into the construction of an automated parallelization tool. This tool will take FORTRAN source as input, and construct a parallel algorithm that will produce the same results as the original serial input. The focus of this paper is on the load balancing aspect of that project. The basic idea is to reserve a certain percentage of the computation task, subdivide that percentage into arbitrarily fine tasks, and dole those small tasks out to nodes on request. Ij” the percentage is chosen correctly, then a minority of nodes should be involved in consuming the filler tasks, and the overall throughput of the job should increase as a result of the individual node efJciencies having increased. This paper will outline our approach to performing dynamic load balancing on an Intel iPSC/2. We take the view that the problem of load balancing is really a problem of dividing a “computational task” into smaller components, each of roughly equal complexity, and each an independent event. After this is done, the components of the task can be sent to a node for execution. The key to an optimally balanced load across all computational nodes is the ability to form a statistical profile of the individual components of each computational task. This statistical profile will determine an initial sequence of execution. Our experience indicates that a speedup on the order of 80% is achievable with the judicious use of profiled load balancing. During the process of execution, the initial profile will be altered according to the actual behavior exhibited by the nodes. The difference between the actual and expected performance will be used to determine how much additional time should be devoted to altering the current execution schedule. Currently, our work involves statically setting the load balancing parameters. Our load balancing system determines the execution schedule
本文报告了在Intel Hypercube上进行动态负载平衡的初步研究结果。本研究的目的是为如何构建并行算法以最大限度地利用并行架构提供实验数据。这项研究是正在进行的自动化并行化工具构建研究项目的一个方面。该工具将采用FORTRAN源作为输入,并构造一个并行算法,该算法将产生与原始串行输入相同的结果。本文的重点是该项目的负载平衡方面。其基本思想是预留一定百分比的计算任务,将该百分比细分为任意精细的任务,并根据请求将这些小任务分发给节点。如果正确选择了百分比,则应该有少数节点参与使用填充任务,并且由于单个节点效率的提高,作业的总体吞吐量应该增加。本文将概述我们在英特尔iPSC/2上执行动态负载平衡的方法。我们认为,负载平衡问题实际上是将“计算任务”划分为更小的组件的问题,每个组件的复杂性大致相等,每个组件都是独立的事件。完成此操作后,可以将任务的组件发送到节点执行。在所有计算节点之间实现最佳均衡负载的关键是能够形成每个计算任务的单个组件的统计概要。此统计概要文件将确定初始执行顺序。我们的经验表明,通过明智地使用概要负载平衡,可以实现80%左右的加速。在执行过程中,初始配置文件将根据节点显示的实际行为进行更改。实际性能和预期性能之间的差异将用于确定应该投入多少额外时间来更改当前执行计划。目前,我们的工作涉及静态设置负载平衡参数。我们的负载平衡系统决定执行时间表
{"title":"Hypercube Dynamic Load Balancing","authors":"D. King, E. Wegman","doi":"10.1109/DMCC.1990.556305","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556305","url":null,"abstract":"This paper reports on the results of a preliminary study in dynamic load balancing on an Intel Hypercube. The purpose of this research is to provide experimental data in how parallel algorithms should be constructed to obtain maximal utilization of a parallel architecture. This study is one aspect of an ongoing research project into the construction of an automated parallelization tool. This tool will take FORTRAN source as input, and construct a parallel algorithm that will produce the same results as the original serial input. The focus of this paper is on the load balancing aspect of that project. The basic idea is to reserve a certain percentage of the computation task, subdivide that percentage into arbitrarily fine tasks, and dole those small tasks out to nodes on request. Ij” the percentage is chosen correctly, then a minority of nodes should be involved in consuming the filler tasks, and the overall throughput of the job should increase as a result of the individual node efJciencies having increased. This paper will outline our approach to performing dynamic load balancing on an Intel iPSC/2. We take the view that the problem of load balancing is really a problem of dividing a “computational task” into smaller components, each of roughly equal complexity, and each an independent event. After this is done, the components of the task can be sent to a node for execution. The key to an optimally balanced load across all computational nodes is the ability to form a statistical profile of the individual components of each computational task. This statistical profile will determine an initial sequence of execution. Our experience indicates that a speedup on the order of 80% is achievable with the judicious use of profiled load balancing. During the process of execution, the initial profile will be altered according to the actual behavior exhibited by the nodes. The difference between the actual and expected performance will be used to determine how much additional time should be devoted to altering the current execution schedule. Currently, our work involves statically setting the load balancing parameters. Our load balancing system determines the execution schedule","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128696924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Proceedings of the Fifth Distributed Memory Computing Conference, 1990.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1