[1993] Proceedings Seventh International Parallel Processing Symposium最新文献

英文中文

CMMD I/O: a parallel Unix I/O cmd I/O: Unix并行I/O

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262828

Michael L. Best, A. Greenberg, C. Stanfill, L. W. Tucker

The authors propose a library providing Unix file system support for highly parallel distributed-memory computers. CMMD I/O supports Unix I/O commands on the CM-5 supercomputer. The overall objective of the library is to provide the node level parallel programmer with routines for opening, reading, writing a file, and so forth. The default behavior mimics standard Unix running on each node; individual nodes can independently perform file system operations. New extensions to the standard Unix file descriptor semantics provide for co-operative parallel I/O. New functions provide access to very large (multi-gigabyte) files.<>

作者提出了一个为高度并行分布式内存计算机提供Unix文件系统支持的库。CMMD I/O在CM-5超级计算机上支持Unix I/O命令。该库的总体目标是为节点级并行程序员提供用于打开、读取、写入文件等的例程。默认行为模仿在每个节点上运行的标准Unix;单个节点可以独立执行文件系统操作。标准Unix文件描述符语义的新扩展提供了协作并行I/O。新功能提供了对非常大(几gb)文件的访问。

引用次数: 44

Barrier synchronization in distributed-memory multiprocessors using rendezvous primitives 分布式内存多处理器中使用会合原语的屏障同步

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262826

S. Gupta, D. Panda

This paper deals with barrier synchronization in wormhole routed distributed-memory multiprocessors. New rendezvous and multirendezvous synchronization primitives are proposed to implement a barrier between two and multiple processors, respectively. These primitives reduce the number of communication steps required to implement a barrier; thus, significantly reducing the synchronization overhead for networks with high communication start-up cost. Two algorithms for barrier synchronization on k-ary n-cube networks are presented. The rendezvous primitive allows one to synchronize all processors in nlog/sub 2/(k) steps. The multirendezvous primitive allows one to synchronize an arbitrary subset of processors in optimal number of communication steps depending on the ratio of the communication start-up (t/sub s/) to the link-propagation (t/sub p/) cost.<>

研究了虫洞路由分布式存储多处理器中的屏障同步问题。提出了新的会合和多会合同步原语，分别实现了两个处理器和多个处理器之间的屏障。这些原语减少了实现屏障所需的通信步骤;因此，显著降低了通信启动成本高的网络的同步开销。提出了k元n立方网络上的两种屏障同步算法。会合原语允许以nlog/sub 2/(k)步同步所有处理器。多交会原语允许人们根据通信启动(t/sub s/)与链路传播(t/sub p/)成本的比率，以最优通信步数同步任意处理器子集。

引用次数: 7

Image processing with the MGAP: a cost effective solution 使用MGAP进行图像处理:具有成本效益的解决方案

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262835

R. Bajwa, R. Owens, M. J. Irwin

Image processing applications are suitable candidates for parallelism and have at least in part motivated the design and development of some of the pioneering massively parallel processing systems including the CLIP family, the DAP, the MPP and the GAPP. By exploiting design techniques and architectures suitable for VLSI technology one can now build hardware which provides comparable performance at a fraction of the cost it took for these earlier designs. The authors describe the use of a fine-grained, massively parallel VLSI processor array, the Micro-Grained Array Processor (MGAP) for image processing applications. The array and its support systems, in their current configuration, are designed to be used as a co-processor board in a desk-top workstation. The array can be used for applications other than image processing as well. The versatility of the array and the single broad design provide a cost effective solution for a variety of parallelizable tasks.<>

图像处理应用是并行的合适候选人，并且至少在一定程度上推动了一些开创性的大规模并行处理系统的设计和开发，包括CLIP家族，DAP, MPP和GAPP。通过开发适合VLSI技术的设计技术和架构，现在可以构建硬件，以提供与这些早期设计相当的性能，而成本只是这些早期设计的一小部分。作者描述了用于图像处理应用的细粒度大规模并行VLSI处理器阵列，微粒度阵列处理器(MGAP)的使用。该阵列及其支持系统，在其当前配置中，被设计用作桌面工作站的协处理器板。该阵列也可以用于图像处理以外的应用程序。阵列的多功能性和单宽设计为各种可并行化任务提供了经济有效的解决方案

引用次数: 26

A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction 具有内存缩减的Strassen矩阵乘法算法的张量积公式

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262814

B. Kumar, Chua-Huang Huang, Rodney W. Johnson, P. Sadayappan

A programming methodology based on tensor products has been used for designing and implementing block recursive algorithms for parallel and vector multiprocessors. A previous tensor product formulation of Strassen's matrix multiplication algorithm requires working arrays of size O(7/sup n/) for multiplying 2/sup n/*2/sup n/ matrices. The authors present a modified tensor product formulation of Strassen's algorithm in which the size of working arrays can be reduced to O(4/sup n/). The modified formulation exhibits sufficient parallel and vector operations for efficient implementation. Performance results on the Cray Y-MP are presented.<>

一种基于张量积的编程方法被用于设计和实现并行和矢量多处理器的块递归算法。Strassen矩阵乘法算法之前的张量积公式需要大小为0 (7/sup n/)的工作数组来乘以2/sup n/*2/sup n/矩阵。提出了一种改进的Strassen算法的张量积公式，其中工作数组的大小可以减少到O(4/sup n/)。改进后的公式具有足够的并行和矢量运算，可以有效地实现。给出了Cray Y-MP的性能结果。

引用次数: 70

Complexity of intensive communications on balanced generalized hypercubes 平衡广义超立方体上密集通信的复杂性

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262914

J. Antonio, L. Lin, R. C. Metzger

Lower bound complexities are derived for three intensive communication patterns assuming a balanced generalized hypercube (BGHC) topology. The BGHC is a generalized hypercube that has exactly w nodes along each of the d dimensions for a total of w/sup d/ nodes. A BGHC is said to be dense if the w nodes along each dimension form a complete directed graph. A BGHC is said to be sparse if the w nodes along each dimension form a unidirectional ring. It is shown that a dense N node BGHC with a node degree equal to Klog/sub 2/N, where K>or=2, can process certain intensive communication patterns K(K-1) times faster than an N node binary hypercube (which has a node degree equal to log/sub 2/N). Furthermore, a sparse N node BGHC with a node degree equal to /sup 1///sub L/log/sub 2/N, where L>or=2, is 2/sup L/ times slower at processing certain intensive communication patterns than an N node binary hypercube.<>

在平衡广义超立方体(BGHC)拓扑下，导出了三种密集通信模式的下界复杂度。BGHC是一个广义的超立方体，它在d维上的每个维度上都有w个节点，总共有w/sup /个节点。如果沿每个维度的w个节点形成一个完整的有向图，则称BGHC是密集的。如果沿每个维度的w个节点形成一个单向环，则称BGHC是稀疏的。结果表明，节点度为Klog/sub 2/N且K>或=2的密集N节点BGHC处理某些密集通信模式的速度比节点度为log/sub 2/N的N节点二元超立方体快K(K-1)倍。此外，节点度等于/sup 1///sub L/log/sub 2/N的稀疏N节点BGHC，当L>或=2时，在处理某些密集通信模式时比N节点二进制超立方体慢2/sup L/倍。

引用次数: 7

Automatic parallelization of LINPACK routines on distributed memory parallel processors 分布式内存并行处理器上LINPACK例程的自动并行化

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262774

M. Neeracher, R. Rühl

Distributed memory parallel processors (DMPPs) have no hardware support for a global address space. However, conventional programs written in a sequential imperative language such as Fortran typically manipulate few, large arrays. The Oxygen compiler, developed as part of the K2 project, accepts conventional Fortran code, augmented with code and data distribution directives. These directives support a global name space through a run-time mechanism called data consistency analysis. Many sequential Fortran programs can be efficiently parallelized, with Oxygen directives introduced manually by the user into the sequential code. This work presents an analysis pass added to the compiler that makes suggestions for the directives to be inserted into the code. Automatic parallelization of LINPACK routines was attempted and results are given.<>

分布式内存并行处理器(dmpp)没有对全局地址空间的硬件支持。然而，用顺序命令式语言(如Fortran)编写的传统程序通常操作很少的大型数组。作为K2项目的一部分开发的Oxygen编译器可以接受传统的Fortran代码，并增强了代码和数据分发指令。这些指令通过称为数据一致性分析的运行时机制支持全局名称空间。许多顺序Fortran程序可以有效地并行化，用户可以在顺序代码中手动引入Oxygen指令。这项工作提供了一个添加到编译器中的分析通道，该通道为要插入到代码中的指令提供建议。尝试了LINPACK例程的自动并行化，并给出了结果

引用次数: 11

Delay analysis in synchronous circuit-switched delta networks 同步电路交换增量网络的延迟分析

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262801

A. Bhattacharya, R. R. Rao, Ting-Ting Y. Lin

Multistage interconnection networks (MINs) provide a cost-effective alternative to a full crossbar connection for processor-processor or processor-memory communication in a tightly coupled multiprocessor system. Delta networks, a class of blocking type MIN with unique path property, have been studied extensively for their self-routing capability. A probabilistic analysis of the blocking and its effect on the delay is presented here, for such a network operated in a synchronous circuit-switched mode. Under the assumption of uniformly distributed access requests independently generated at each unblocked source, an upper bound on the expected latency has been established. The bound has been compared with simulation results.<>

多级互连网络(MINs)为紧耦合多处理器系统中的处理器-处理器或处理器-存储器通信提供了一种经济有效的替代方案。Delta网络是一类具有唯一路径特性的阻塞型MIN网络，其自路由能力得到了广泛的研究。在此，对于以同步电路切换模式运行的网络，给出了阻塞及其对延迟影响的概率分析。假设访问请求是均匀分布的，在每个未阻塞的源上独立产生，建立了期望延迟的上界。并与仿真结果进行了比较。

引用次数: 5

Supporting insertions and deletions in striped parallel filesystems 支持在条纹并行文件系统中插入和删除

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262921

T. Johnson

The dramatic improvements in the processing rates of parallel computers are turning many compute-bound jobs into IO-bound jobs. Parallel file systems have been proposed to better match IO throughput to processing power. Many parallel file systems stripe files across numerous disks; each disk has its own controller. A striped file can be appended (or prepended) to and maintain its structure. However, a block can't be inserted into or deleted from the middle of the file, since this would destroy the round robin striping structure of the file. The author presents a distributed file structure that maintains files in indexed striped extents on a message passing multiprocessor. This approach allows highly parallel random and sequential reads, and also allows insertion and deletion into the middle of the file.<>

并行计算机处理速度的显著提高正在将许多与计算机相关的工作转变为与io相关的工作。为了更好地匹配IO吞吐量和处理能力，人们提出了并行文件系统。许多并行文件系统在许多磁盘上分条文件;每个磁盘都有自己的控制器。条纹文件可以追加(或预先添加)到它的结构中并维持它的结构。但是，不能在文件中间插入或删除块，因为这会破坏文件的轮循分条结构。作者提出了一种分布式文件结构，该结构在消息传递的多处理器上以索引条纹区维护文件。这种方法允许高度并行的随机和顺序读取，也允许在文件中间插入和删除。

引用次数: 3

Efficient off-line routing of permutations on restricted access expanded delta networks 限制接入扩展增量网络中排列的高效脱机路由

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262894

I. Scherson, R. Subramanian

This paper presents an off-line algorithm for routing permutations on expanded delta networks (EDNs) with restricted access. Restricted access means that the number of elements to be permuted may exceed the number of inputs to the EDN. For every N-element permutation on an M-input EDN, the algorithm computes a routing that takes exactly 3N/M passes (assuming M divides N). On a certain class of EDNs, the number of passes can be reduced to 2N/M. For example, for every 16 K-element permutation on the 1 K-input global network of the MasPar MP-1 and MP-2, the algorithm computes a routing that takes exactly 32 passes. The time complexity of the algorithm is Theta (NlogN) sequentially, and Theta (log/sup 2/N) on an N-processor PRAM.<>

提出了一种用于限制接入的扩展增量网络(edn)路由置换的离线算法。限制访问意味着要排列的元素数量可能超过EDN的输入数量。对于M输入EDN上的每一个N个元素的排列，该算法计算出一条恰好需要3N/M次的路由(假设M除以N)。对于某一类EDN，通过次数可以减少到2N/M。例如，在MasPar MP-1和MP-2的1个k输入的全局网络中，对于每16个k元素的排列，该算法计算的路由恰好需要32次通过。算法的时间复杂度依次为Theta (NlogN)，在N处理器的PRAM上为Theta (log/sup 2/N)。

引用次数: 3

The connection cubes: symmetric, low diameter interconnection networks with low node degree 连接立方体:对称、低直径、低节点度的互连网络

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262892

Nitin K. Singhvi

The enhanced connection cube or ECC and the minimal connection cube or MCC, proposed in this paper, are regular and symmetric static interconnection networks for large-scale, loosely coupled systems. The ECC connects 2/sup 2n+1/ processing nodes with only n+2 links per node, almost half the number used in a comparable hypercube. Yet its diameter is only n+2, almost half that of the hypercube. The MCC connects 2/sup 2n+1/ nodes using only n+1 links per node, has about the same diameter as a hypercube and is scalable like the hypercube. The MCC can be converted into the ECC by adding one more link per node. Both networks can emulate all the connections present in a hypercube of the same size, with no increase in routing complexity, so that typical parallel applications run on both types of CCs with the same time complexity as on a hypercube.<>

本文提出的增强连接立方体(enhanced connection cube, ECC)和最小连接立方体(minimum connection cube, MCC)是用于大规模松散耦合系统的规则对称静态互连网络。ECC连接2/sup 2n+1/处理节点，每个节点只有n+2条链路，几乎是类似超立方体中使用的数量的一半。然而它的直径只有n+2，几乎是超立方体的一半。MCC连接2/sup 2n+1/个节点，每个节点仅使用n+1条链路，其直径与超立方体大致相同，并且可以像超立方体一样扩展。通过在每个节点上增加一条链路，MCC可以转换为ECC。这两种网络都可以模拟相同大小的超立方体中存在的所有连接，而不会增加路由复杂性，因此典型的并行应用程序在这两种类型的cc上运行，其时间复杂度与在超立方体上运行相同。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

[1993] Proceedings Seventh International Parallel Processing Symposium

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀