Proceedings Scalable High Performance Computing Conference SHPCC-92.最新文献

英文中文

Balancing interprocessor communication and computation on torus-connected multicomputers running compiler-parallelized code 在运行编译器并行代码的环面连接多台计算机上平衡处理器间通信和计算

Proceedings Scalable High Performance Computing Conference SHPCC-92.

Pub Date : 1992-04-26 DOI: 10.1109/SHPCC.1992.232672

M. Annaratone, R. Rühl

The machine model considered in this paper is that of a distributed memory parallel processor (DMPP) with a two-dimensional torus topology. Within this framework, the authors study the relationship between the speedup delivered by compiler-parallelized code and the machine's interprocessor communication speed. It is shown that compiler-parallelized code often exhibits more interprocessor communication than manually parallelized code and that the performance of the former is therefore more sensitive to the machine's interprocessor communication speed. Because of this, a parallelizing compiler developed for a platform not explicitly designed to sustain the increased interprocessor communication will produce-in the general case-code that delivers disappointing speedups. Finally, the study provides the point of diminishing return for the interprocessor communication speed beyond which the DMPP designer should focus on improving other architectural parameters, such as the local memory-processor bandwidth.<>

本文考虑的机器模型是具有二维环面拓扑结构的分布式存储并行处理器(DMPP)。在这个框架内，作者研究了编译器并行代码提供的加速与机器的处理器间通信速度之间的关系。结果表明，编译器并行化的代码通常比手动并行化的代码表现出更多的处理器间通信，因此前者的性能对机器的处理器间通信速度更敏感。因此，为平台开发的并行化编译器没有明确地设计为支持增加的处理器间通信，通常情况下，会产生令人失望的加速代码。最后，该研究提供了处理器间通信速度的递减回报点，DMPP设计者应该专注于改进其他架构参数，如本地内存处理器带宽。

引用次数: 14

Towards a distributed memory implementation of Sisal 迈向分布式内存的Sisal实现

Proceedings Scalable High Performance Computing Conference SHPCC-92.

Pub Date : 1992-04-26 DOI: 10.1109/SHPCC.1992.232668

M. Haines, W. Bohm

Sisal is a functional language for scientific applications implemented efficiently on shared memory, vector, and hierarchical memory multiprocessors. The current compiler assumes a flat, shared addressing space, and the runtime system is implemented using locks and shared queues. This paper describes a first implementation of Sisal on the nCUBE 2 distributed memory architecture. Most of the effort is focused on altering the runtime system for execution in a message passing environment and providing the Sisal compiler with a distributed shared memory. The authors give preliminary performance results and outline future work.<>

Sisal是一种用于科学应用程序的功能语言，可在共享内存、矢量和分层内存多处理器上有效实现。当前编译器假定一个平面的共享寻址空间，运行时系统使用锁和共享队列实现。本文描述了在nCUBE 2分布式内存架构上的第一个Sisal实现。大部分工作都集中在修改运行时系统，以便在消息传递环境中执行，并为Sisal编译器提供分布式共享内存。作者给出了初步的性能结果，并概述了未来的工作。

引用次数: 10

Scalability of data transport 数据传输的可扩展性

Proceedings Scalable High Performance Computing Conference SHPCC-92.

Pub Date : 1992-04-26 DOI: 10.1109/SHPCC.1992.232695

H. Jordan

Peak floating point rate is a very limited way to characterize high performance computer systems. A better method is to use the bandwidth and latency of data transport for the major components of a system. Bandwidth scales well with increasing system size, but latency does not. The demands placed by a program on data transport determine how well an architecture will execute it. The article discusses two program metrics which describe latency characteristics of programs and shows how they can help optimize program structure.<>

峰值浮点率是表征高性能计算机系统的一种非常有限的方法。更好的方法是为系统的主要组件使用数据传输的带宽和延迟。随着系统大小的增加，带宽可以很好地扩展，但延迟却不能。程序对数据传输的需求决定了体系结构执行它的能力。本文讨论了描述程序延迟特性的两个程序度量，并展示了它们如何帮助优化程序结构。

引用次数: 5

A test suite approach for Fortran90D compilers on MIMD distributed memory parallel computers 在MIMD分布式内存并行计算机上Fortran90D编译器的测试套件方法

Proceedings Scalable High Performance Computing Conference SHPCC-92.

Pub Date : 1992-04-26 DOI: 10.1109/SHPCC.1992.232667

M.-Y. Wu, G. C. Fox

Describes a test suite approach for a Fortran90D compiler, a source-to-source parallel compiler for distributed memory systems. Different from Fortran77 parallelizing compilers, a Fortran90D compiler does not parallelize sequential constructs. Only parallelism expressed by Fortran90D parallel constructs is exploited. The authors discuss compiler directives and the methodology of parallelizing Fortran programs. An introductory example of Gaussian elimination is used, among other programs in the test suite, to explain the compilation techniques.<>

描述Fortran90D编译器的测试套件方法，Fortran90D编译器是一种用于分布式内存系统的源对源并行编译器。与Fortran77并行化编译器不同，Fortran90D编译器不并行化顺序结构。只有由Fortran90D并行结构表达的并行性被利用。作者讨论了编译器指令和并行化Fortran程序的方法。在测试套件中的其他程序中，使用了一个高斯消去的介绍性示例来解释编译技术

引用次数: 8

A parallel scalable approach to short-range molecular dynamics on the CM-5 CM-5上短程分子动力学的并行可扩展方法

Proceedings Scalable High Performance Computing Conference SHPCC-92.

Pub Date : 1992-04-26 DOI: 10.1109/SHPCC.1992.232636

R. Giles, P. Tamayo

Presents a scalable algorithm for short-range molecular dynamics which minimizes interprocessor communications at the expense of a modest computational redundancy. The method combines Verlet neighbor lists with coarse-grained cells. Each processing node is associated with a cubic volume of space and the particles it owns are those initially contained in the volume. Data structures for 'own' and 'visitor' particle coordinates are maintained in each node. Visitors are particles owned by one of the 26 neighboring cells but lying within an interaction range of a face. The Verlet neighbor list includes pointers to own-own and own-visitor interactions. To communicate, each of the 26 neighbor cells sends a corresponding block of particle coordinates using message-passing cells. The algorithms has the numerical properties of the standard serial Verlet method and is efficient for hundreds to thousands of particles per node allowing the simulation of large systems with millions of particles. Preliminary results on the new CM-5 supercomputer are described.<>

提出了一种可扩展的短程分子动力学算法，以适度的计算冗余为代价，最大限度地减少了处理器间的通信。该方法将Verlet邻居列表与粗粒度单元结合起来。每个处理节点都与一个立方的空间体积相关联，它拥有的粒子是最初包含在该体积中的粒子。在每个节点中维护“自己”和“访客”粒子坐标的数据结构。访客是由26个相邻细胞中的一个拥有的粒子，但位于面部的相互作用范围内。Verlet邻居列表包括指向own-own和own-visitor交互的指针。为了进行通信，26个相邻单元中的每一个都使用消息传递单元发送相应的粒子坐标块。该算法具有标准串行Verlet方法的数值特性，并且对每个节点数百到数千个粒子有效，允许模拟具有数百万粒子的大型系统。介绍了新型CM-5超级计算机的初步结果。

引用次数: 14

Communication efficient global load balancing 通信高效的全局负载均衡

Proceedings Scalable High Performance Computing Conference SHPCC-92.

Pub Date : 1992-04-26 DOI: 10.1109/SHPCC.1992.232629

D. Nicol

Proposes a scalable parallel algorithm, called direct mapping, for balancing workload in a global, synchronous way. Direct mapping is particularly attractive for SIMD architectures, as it makes use of the scan operation. Unlike previously proposed scalable methods for the problem of interest, direct mapping transfers the minimum volume of workload necessary to achieve perfect load balance. This paper describes the algorithm, and studies its performance via simulation in comparison to previously proposed methods.<>

提出了一种可扩展的并行算法，称为直接映射，以全局同步的方式平衡工作负载。直接映射对于SIMD体系结构特别有吸引力，因为它利用了扫描操作。与先前提出的针对感兴趣的问题的可扩展方法不同，直接映射传输了实现完美负载平衡所需的最小工作量。本文对该算法进行了描述，并通过仿真对其性能进行了比较。

引用次数: 22

Debugging mapped parallel programs 调试映射并行程序

Proceedings Scalable High Performance Computing Conference SHPCC-92.

Pub Date : 1992-04-26 DOI: 10.1109/SHPCC.1992.232646

J. May, F. Berman

As more sophisticated tools for parallel programming become available, programmers will inevitably want to use them together. However, some parallel programming tools can interact with each other in ways that make them less useful. In particular, it a mapping tool is used to adapt a parallel program to run on relatively few processors, the information presented by a debugger may become difficult to interpret. The authors examine the problems that can arise when programmers use debuggers to interpret the patterns of message traffic in mapped parallel programs. They also suggest how to avoid these problems and made debugging tools more useful.<>

随着更复杂的并行编程工具的出现，程序员将不可避免地希望将它们一起使用。然而，一些并行编程工具之间的交互方式会使它们变得不那么有用。特别是，如果使用映射工具使并行程序在相对较少的处理器上运行，则调试器提供的信息可能难以解释。作者研究了当程序员使用调试器解释映射并行程序中的消息流量模式时可能出现的问题。他们还建议如何避免这些问题，并使调试工具更有用。

引用次数: 2

A global synchronization algorithm for the Intel iPSC/860 iPSC/860的全局同步算法

Proceedings Scalable High Performance Computing Conference SHPCC-92.

Pub Date : 1992-04-26 DOI: 10.1109/SHPCC.1992.232641

S. Seidel, M. Davis

Precisely synchronizing the processors of a distributed memory multicomputer provides them with a common baseline from which time can be measured. This amounts to providing the processors with a global clock. This work investigates a global processor synchronization algorithm for the Intel iPSC/860. Previous work has shown that for certain communication problems, such as the one-to-all broadcast and the complete exchange, the most effective use of the iPSC/860 interconnection network is obtained only when communicating pairs of processors are suitably synchronized. For other communication problems, such as the shift operation, global processor synchronization ensures the most effective use of the communication network. This work presents an algorithm that synchronizes processors more closely than the synchronization primitive by Intel. This new synchronization algorithm is used as the basis of an efficient implementation of the shift operation.<>

精确地同步分布式内存多计算机的处理器为它们提供了一个可以测量时间的公共基线。这相当于为处理器提供一个全局时钟。本文研究了Intel iPSC/860处理器的全局同步算法。以前的工作表明，对于某些通信问题，如一对所有广播和完全交换，只有当通信对处理器适当同步时，才能最有效地利用iPSC/860互连网络。对于其他通信问题，如移位操作，全局处理器同步确保最有效地利用通信网络。这项工作提出了一种比英特尔的同步原语更紧密地同步处理器的算法。这种新的同步算法被用作有效实现移位操作的基础

引用次数: 3

PFP: a scalable parallel programming model PFP:一个可伸缩的并行编程模型

Proceedings Scalable High Performance Computing Conference SHPCC-92.

Pub Date : 1992-04-26 DOI: 10.1109/SHPCC.1992.232653

B. Corda, K. Warren

The Parallel Fortran Preprocessor (PFP) is a programming model for multiple instruction multiple data (MIMD) parallel computers. It provides a simple paradigm consisting of data storage modifiers and parallel execution control statements. The model is lightweight and scalable in nature. The control constructs impose no implicit synchronizations, nor do they require off-processor memory references. The model is portable. It is implemented as a source-to-source translator which requires very little support from the back-end compiler. The implementation has an option to option to produce serial code which can then be compiled for serial execution.<>

并行Fortran预处理器(PFP)是一种多指令多数据(MIMD)并行计算机的编程模型。它提供了一个由数据存储修饰符和并行执行控制语句组成的简单范例。该模型本质上是轻量级和可伸缩的。控制构造不强制隐式同步，也不需要处理器外内存引用。这种模型是便携式的。它是作为一个源到源的转换器实现的，只需要很少的后端编译器的支持。该实现有一个选项对选项，以产生串行代码，然后可以编译串行执行。

引用次数: 4

Using atomic data structures for parallel simulation 使用原子数据结构进行并行模拟

Proceedings Scalable High Performance Computing Conference SHPCC-92.

Pub Date : 1992-04-26 DOI: 10.1109/SHPCC.1992.232691

P. Barth

Synchronizing access to shared data structures is a difficult problem for simulation programs. Frequently, synchronizing operations within and between simulation steps substantially curtails parallelism. The paper presents a general technique for performing this synchronization while sustaining parallelism. The technique combines fine-grained, exclusive locks with futures, a write-once data structure supporting producer-consumer parallelism. The combination allows multiple operations within a simulation step to run in parallel; further, successive simulation steps can overlap without compromising serializability or requiring roll-backs. The cumulative effect of these two sources of parallelism is dramatic: the example presented shows almost 20-fold increase in parallelism over traditional synchronization mechanisms.<>

对共享数据结构的同步访问是仿真程序的一个难题。通常，模拟步骤内部和步骤之间的同步操作会大大降低并行性。本文提出了在保持并行性的同时执行这种同步的一般技术。该技术将细粒度的排他锁与future(一种支持生产者-消费者并行性的一次性写入数据结构)结合在一起。这种组合允许在一个模拟步骤内并行运行多个操作;此外，连续的模拟步骤可以重叠，而不会影响序列化性或需要回滚。这两种并行性来源的累积效应是显著的:本文给出的示例显示，与传统同步机制相比，并行性几乎增加了20倍

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings Scalable High Performance Computing Conference SHPCC-92.

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀