Proceedings of the Fifth Distributed Memory Computing Conference, 1990.最新文献

英文中文

Load balanced sort on hypercube multiprocessors 在超立方体多处理器上进行负载均衡排序

Proceedings of the Fifth Distributed Memory Computing Conference, 1990.

Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.555388

B. Abali, F. Ozguner, A. Bataineh

A parallel algorithm for sorting n elements evenly distributed over 2d = p nodes of a d-dimensional hypercube is given. The algorithm ensures that the nodes always receive equal number of elements (n/p) at the end, regardless of the skew in data distribution.

给出了一种对均匀分布在二维超立方体上的n个元素进行排序的并行算法。该算法确保节点在最后总是接收到相同数量的元素(n/p)，而不管数据分布是否倾斜。

引用次数: 7

Experiences with Bilingual Parallel Programming 有双语并行编程经验

Proceedings of the Fifth Distributed Memory Computing Conference, 1990.

Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556328

I.F. ter, R. Overbeek

Parallel programming requires tools that implify the expression of complex algorithms, providl portability across different classes of machine, an i allow reuse of existing sequential code. We have pr viously proposed bilingual programming as a basis 1 x such tools. In particular, we have proposed the 1 se of a high-level concurrent programming language 1 such as Strand1) to construct parallel programs from ( 1 )ossibly pre-existing) sequential components. We rep1 Irt here on an applications study intended to evaluatcl the effectiveness of this approach. We describe expl !riences developing both new codes and parallel versioi s of existing codes in computational biology, weathei modeling, and automated reasoning. We find that tl e bilingual approach encourages the development of parallel programs that perform well, are portable, and (ire easy to maintain.

并行编程需要简化复杂算法表达式的工具，提供跨不同机器类别的可移植性，并且允许重用现有的顺序代码。我们以前曾建议双语编程作为这类工具的基础。特别是，我们建议使用高级并发编程语言(如Strand1)从(1)可能已经存在的)顺序组件构建并行程序。我们在此报告一项旨在评估该方法有效性的应用研究。我们描述了在计算生物学、天气建模和自动推理中开发新代码和现有代码并行版本的经验。我们发现，双语方法鼓励开发性能良好、可移植且易于维护的并行程序。

引用次数: 4

Design of a Communication Modeling Tool for Debugging Parallel Programs 用于并行程序调试的通信建模工具的设计

Proceedings of the Fifth Distributed Memory Computing Conference, 1990.

Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556335

J. Francioni, M. Gach

This paper describes a system tool designed for debugging the interprocess communication of a m1:ssagepassing parallel program. The tool includcs an i nteractive environment that helps the user generate a yaphical display of the program-in-question’s e pected communication behavior. This graph is consicl xed to be the program’s communication model. The debugging tool then runs the real program and comp res the aforementioned model to the program’s actui 1 communication behavior determined at run timi . The results of the comparison are displayed via a g~ phical animation that is based on the model-grap I. The debugging tool provides the user with a mechari ism for directing a debugging session based on the user s mental abstractions of a program’s communicatioi structure. Additionally, the communication model can be designed for any level of the program allow ng the user to debug the program in a top-down fashioi I.

本文介绍了一种用于调试一个并行程序的进程间通信的系统工具。该工具包括一个交互式环境，该环境可帮助用户生成有关程序的预期通信行为的典型显示。这个图被认为是程序的通信模型。然后调试工具运行实际的程序，并将上述模型与运行时确定的程序的实际通信行为进行比较。比较的结果通过基于模型图1的物理动画显示。调试工具为用户提供了一种机制，使用户根据对程序通信结构的心理抽象来指导调试会话。此外，通信模型可以设计为任何级别的程序，允许用户以自顶向下的方式调试程序。

引用次数: 2

Numerical Simulations of Dynamically Triangulated Random Surfaces on Parallel Computers with 100% Speedup 动态三角随机曲面在并行计算机上100%加速的数值模拟

Proceedings of the Fifth Distributed Memory Computing Conference, 1990.

Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556382

C. Baillie, Roy D. Williams

We are currently performing large-scale numerical simulations of dynamically triangulated random surfaces on several parallel computers. Herein we briefly explain the importance of random surface simulations and describe in detail our computer program that simulates such surfaces with extrinsic curvature in arbitrary dimension. As this program is an ideal benchmark of the scalar performance of a computer, we also present performance measurements for it on several sequential and parallel machines.

我们目前正在几台并行计算机上进行动态三角随机曲面的大规模数值模拟。本文简要地说明了随机曲面模拟的重要性，并详细地描述了我们的计算机程序来模拟任意维的具有外在曲率的随机曲面。由于该程序是计算机标量性能的理想基准，我们还在几台顺序和并行机器上对其进行了性能测量。

引用次数: 1

Mapping and Compiled Communication on the Connection Machine System 连接机系统的映射与编译通信

Proceedings of the Fifth Distributed Memory Computing Conference, 1990.

Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556279

E. Dahl

Large processing speeds may be achieved by coordinating the work of many processors in a distributed memory architecture. FOT most applications, this approach mandates the communication of data amongst the distributed memories, and the cost of this communication can offset the advantage brought by massively parallel processing. We describe an optimization strategy that addresses this problem by dramatically reducing communication costs on the Connection Machine system.

通过在分布式内存体系结构中协调许多处理器的工作，可以实现大的处理速度。对于大多数应用程序，这种方法要求在分布式存储器之间进行数据通信，这种通信的成本可以抵消大规模并行处理带来的优势。我们描述了一种优化策略，通过显著降低连接机系统上的通信成本来解决这个问题。

引用次数: 61

Embedding A Pyramid On The Hypercube With Minimal Routing Load 在最小路由负载的超立方体上嵌入金字塔

Proceedings of the Fifth Distributed Memory Computing Conference, 1990.

Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556401

R. Sen

The problem of embedding a pyramid on a boolean hypercube has been addressed. A maximal set of edges of the pyramid having an image edge in the hypercube is found. This is based on a breadth-first search that embeds a maximal bipartite subgraph of the pyramid. It has been shown that for a pyramid 70% of its edges may always have image edges in the hypercube. These edges may be statically mapped. This would reduce run-time routing load in the hypercube computer considerably.

解决了在布尔超立方体上嵌入金字塔的问题。找到了在超立方体中具有图像边缘的金字塔边缘的最大集合。这是基于嵌入金字塔的最大二部子图的宽度优先搜索。已经证明，对于金字塔，其70%的边缘在超立方体中可能总是具有图像边缘。这些边可以静态映射。这将大大减少超立方体计算机中的运行时路由负载。

引用次数: 0

Linear Algebra for Dense Matrices on a Hypercube 超立方体上密集矩阵的线性代数

Proceedings of the Fifth Distributed Memory Computing Conference, 1990.

Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.555400

M. P. Sears

A set of routines has been written for dense matrix operations optimized for the NCUBE/6400 parallel processor. This work was motivated by a Sandia effort to parallelize certain electronic structure calculations [l]. Routines are included for matrix transpose, multiply, Cholesky decomposition, triangular inversion, and Householder tridiagonalization. The library is written in C and is callable from Fortran. Matrices up to order 1600 can be handled on 128 processors. For each operation, the algorithm used is presented along with typical timings and es timates of performance. Performance for order 1600 on 128 processors varies from 42 MFLOPs (Householder tridiagonalization, triangular inverse) up to 126 MFLOPs (matrix multiply). We also present performance results for communications and basic linear algebra operations (saxpy and dot products).

编写了一套针对NCUBE/6400并行处理器优化的密集矩阵运算例程。这项工作的动机是桑迪亚努力并行某些电子结构计算[1]。例程包括矩阵转置，乘法，Cholesky分解，三角反演，和Householder三对角化。该库是用C编写的，可从Fortran调用。高达1600的矩阵可以在128个处理器上处理。对于每个操作，所使用的算法以及典型的计时和性能估计。订单1600在128个处理器上的性能从42 MFLOPs(家庭三角对角化，三角逆)到126 MFLOPs(矩阵乘法)不等。我们还展示了通信和基本线性代数运算(saxpy和点积)的性能结果。

引用次数: 3

Solution of Periodic Tridiagonal Linear Systems on a Hypercube 超立方体上周期三对角线性系统的解

Proceedings of the Fifth Distributed Memory Computing Conference, 1990.

Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.555404

T. Taha

引用次数: 2

A 2D Electrostatic PIC Code for the Mark III Hypercube Mark III超立方体的二维静电PIC代码

Proceedings of the Fifth Distributed Memory Computing Conference, 1990.

Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.555418

R. Ferraro, P. Liewer, V. Decyk

引用次数: 5

Performance Results on the Intel Touchstone Gamma Prototype 英特尔Touchstone Gamma原型机的性能结果

Proceedings of the Fifth Distributed Memory Computing Conference, 1990.

Pub Date : 1990-04-08 DOI: 10.1109/DMCC.1990.556381

D. Bailey, E. Barszcz, R. Fatoohi, H. Simon, S. Weeratunga

This paper describes the Intel Touchstone Gamma Prototype, a distributed memory MIMD parallel computer based on the new Intel i860 floating point processor. With 128 nodes, this system has a theoretical peak performance of over seven GFLOPS. This paper presents some initial performance results on this system, including results for individual node computation, message passing and complete applications using multiple nodes. The highest rate achieved on a multiprocessor Fortran application program is 844 MFLOPS. Overview of the Touchstone Gamma System In spring of 1989 DARPA and Intel Scientific Computers announced the Touchstone project. This project calls for the development of a series of prototype machines by Intel Scientific Computers, based on hardware and software technologies being developed by Intel in collaboration with research teams at CalTech, MIT, UC Berkeley, Princeton, and the University of Illinois. The eventual goal of this project is the Sigma prototype, a 150 GFLOPS peak parallel supercomputer, with 2000 processing nodes. One of the milestones towards the Sigma prototype is the Gamma prototype. At the end of December 1989, the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Ames Research Center took delivery of one of the first two Touchstone Gamma systems, and it became available for testing in January 1990. The Touchstone Gamma system is based on the new 64 bit i860 microprocessor by Intel [4]. The i860 has over 1 million transistors and runs at 40 MHz (the initial Touchstone Gamma systems were delivered with 33 MHz processors, but these have since been upgraded to 40 MHz). The theoretical peak speed is 80 MFLOPS in 32 bit floating point and 60 MFLOPS for 64 bit floating point operations. The i860 features 32 integer address registers, with 32 bits each, and 16 floating point registers with 64 bits each (or 32 floating point registers with 32 bits each). It also features an 8 kilobyte onchip data cache and a 4 kilobyte instruction cache. There is a 128 bit data path between cache and registers. There is a 64 bit data path between main memory and registers. The i860 has a number of advanced features to facilitate high execution rates. First of all, a number of important operations, including floating point add, multiply and fetch from main memory, are pipelined operations. This means that they are segmented into three stages, and in most cases a new operation can be initiated every 25 nanosecond clock period. Another advanced feature is the fact that multiple instructions can be executed in a single clock period. For example, a memory fetch, a floating add and a floating multiply can all be initiated in a single clock period. A single node of the Touchstone Gamma system consists of the i860, 8 megabytes (MB) of dynamic random access memory, and hardware for communication to other nodes. The Touchstone Gamma system at NASA Ames consists of 128 computational nodes. The theoretical peak performance of this system is

本文介绍了一种基于新型Intel i860浮点处理器的分布式内存MIMD并行计算机——Intel Touchstone Gamma Prototype。该系统有128个节点，理论峰值性能超过7 GFLOPS。本文给出了该系统的一些初步性能结果，包括单个节点计算的结果、消息传递的结果和使用多个节点的完整应用程序的结果。在多处理器Fortran应用程序上实现的最高速率为844 MFLOPS。1989年春天，DARPA和英特尔科学计算机公司宣布了“试金石”项目。该项目要求英特尔科学计算机开发一系列原型机，基于英特尔与加州理工学院、麻省理工学院、加州大学伯克利分校、普林斯顿大学和伊利诺伊大学的研究团队合作开发的硬件和软件技术。这个项目的最终目标是Sigma原型，一个150 GFLOPS的峰值并行超级计算机，有2000个处理节点。Sigma原型的里程碑之一是Gamma原型。1989年12月底，NASA艾姆斯研究中心的数值空气动力学模拟(NAS)系统部接收了首批两个Touchstone Gamma系统中的一个，并于1990年1月开始进行测试。Touchstone Gamma系统是基于Intel的新型64位i860微处理器[4]。i860拥有超过100万个晶体管，运行频率为40兆赫(最初的Touchstone Gamma系统配备了33兆赫的处理器，但这些处理器已经升级到40兆赫)。32位浮点运算的理论峰值速度为80 MFLOPS, 64位浮点运算的峰值速度为60 MFLOPS。i860具有32个整数地址寄存器，每个32位，16个浮点寄存器，每个64位(或32个浮点寄存器，每个32位)。它还具有一个8kb的片上数据缓存和一个4kb的指令缓存。在缓存和寄存器之间有一条128位的数据路径。在主存和寄存器之间有一条64位的数据路径。i860有许多先进的功能，以促进高执行速度。首先，许多重要的操作，包括浮点数的加法、乘法和从主存中取出，都是流水线操作。这意味着它们被分成三个阶段，并且在大多数情况下，每25纳秒时钟周期可以启动一个新的操作。另一个高级特性是可以在一个时钟周期内执行多个指令。例如，内存提取、浮点加法和浮点乘法都可以在单个时钟周期内启动。Touchstone Gamma系统的单个节点由i860、8mb的动态随机存取存储器和用于与其他节点通信的硬件组成。位于NASA Ames的Touchstone Gamma系统由128个计算节点组成。因此，该系统在64位数据上的理论峰值性能约为7.5 GFLOPS。128个节点采用iPSC/2的直接连接路由模块和超立方体互连技术，排列在一个七维超立方体中。互连系统的点到点总带宽为每通道2.8 MB/秒，与iPSC/2相同。然而，消息传递的延迟从大约350微秒减少到大约90微秒。这种减少主要是通过在Touchstone Gamma机器上提高i860的速度来实现的

{"title":"Performance Results on the Intel Touchstone Gamma Prototype","authors":"D. Bailey, E. Barszcz, R. Fatoohi, H. Simon, S. Weeratunga","doi":"10.1109/DMCC.1990.556381","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556381","url":null,"abstract":"This paper describes the Intel Touchstone Gamma Prototype, a distributed memory MIMD parallel computer based on the new Intel i860 floating point processor. With 128 nodes, this system has a theoretical peak performance of over seven GFLOPS. This paper presents some initial performance results on this system, including results for individual node computation, message passing and complete applications using multiple nodes. The highest rate achieved on a multiprocessor Fortran application program is 844 MFLOPS. Overview of the Touchstone Gamma System In spring of 1989 DARPA and Intel Scientific Computers announced the Touchstone project. This project calls for the development of a series of prototype machines by Intel Scientific Computers, based on hardware and software technologies being developed by Intel in collaboration with research teams at CalTech, MIT, UC Berkeley, Princeton, and the University of Illinois. The eventual goal of this project is the Sigma prototype, a 150 GFLOPS peak parallel supercomputer, with 2000 processing nodes. One of the milestones towards the Sigma prototype is the Gamma prototype. At the end of December 1989, the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Ames Research Center took delivery of one of the first two Touchstone Gamma systems, and it became available for testing in January 1990. The Touchstone Gamma system is based on the new 64 bit i860 microprocessor by Intel [4]. The i860 has over 1 million transistors and runs at 40 MHz (the initial Touchstone Gamma systems were delivered with 33 MHz processors, but these have since been upgraded to 40 MHz). The theoretical peak speed is 80 MFLOPS in 32 bit floating point and 60 MFLOPS for 64 bit floating point operations. The i860 features 32 integer address registers, with 32 bits each, and 16 floating point registers with 64 bits each (or 32 floating point registers with 32 bits each). It also features an 8 kilobyte onchip data cache and a 4 kilobyte instruction cache. There is a 128 bit data path between cache and registers. There is a 64 bit data path between main memory and registers. The i860 has a number of advanced features to facilitate high execution rates. First of all, a number of important operations, including floating point add, multiply and fetch from main memory, are pipelined operations. This means that they are segmented into three stages, and in most cases a new operation can be initiated every 25 nanosecond clock period. Another advanced feature is the fact that multiple instructions can be executed in a single clock period. For example, a memory fetch, a floating add and a floating multiply can all be initiated in a single clock period. A single node of the Touchstone Gamma system consists of the i860, 8 megabytes (MB) of dynamic random access memory, and hardware for communication to other nodes. The Touchstone Gamma system at NASA Ames consists of 128 computational nodes. The theoretical peak performance of this system is","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"240 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114239576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the Fifth Distributed Memory Computing Conference, 1990.

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀