The Sixth Distributed Memory Computing Conference, 1991. Proceedings最新文献

英文中文

Benefits of Weak Coherence for Distributed shared Memory Systems 弱一致性对分布式共享内存系统的好处

The Sixth Distributed Memory Computing Conference, 1991. Proceedings

Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633356

L. Borrmann, P. Istavrinos

This paper describes a new scheme for weakly coherent, distributed shared memory systems. It shows that for most applications the semantics of weak coherence are sufficient. After sketching the bmic implementation schemes for weak coherence protocols it presents their benefits, mainly an improved exploitation of parallelism. Not only latency masking for write operations is exploited but also techniques like accumulating u,pdate and invalidation messages are introduced. First results ofa prototype implementation are given.

本文提出了一种用于弱相干分布式共享存储系统的新方案。结果表明，对于大多数应用来说，弱相干语义是足够的。在概述了弱相干协议的bmic实现方案后，介绍了它们的优点，主要是对并行性的改进利用。它不仅利用了写操作的延迟屏蔽，还引入了诸如累积u、更新和无效消息之类的技术。给出了原型实现的初步结果。

引用次数: 1

Highly Parallel Realization of Sparse Distributed Memory System 稀疏分布式存储系统的高度并行实现

The Sixth Distributed Memory Computing Conference, 1991. Proceedings

Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633355

M. Linden, J. Saarinen, K. Kaski, P. Kanerva

A highly parallel realization of Kanerva 's Sparse Distributed Memory has been developed using advanced structures. The system consists of a host computer, address unit and memory unit. The address and memory units have been implemented with commercially available digital components to two functioning boards, and they perform the Hamming distance comparison and memory storage functions. In ordeT to achieve an effective hardware realization the units are designed for highly parallel processing. The host computer i s used to edit, compile, and down-load the programs to be run in the units. The software environment has been implemented under UNIX operating system, and the set of specific commands has been designed to support simulations. The system is intended for real-time applications. The performance estimations are also presented.

利用先进的结构，开发了Kanerva稀疏分布式存储器的高度并行实现。该系统由主机、地址单元和存储单元组成。地址和存储单元已经用商用数字元件实现到两个功能板上，它们执行汉明距离比较和内存存储功能。为了实现有效的硬件实现，设计了高度并行处理的单元。主机用于编辑、编译和下载要在单元中运行的程序。软件环境在UNIX操作系统下实现，并设计了支持仿真的特定命令集。该系统用于实时应用。并给出了性能评价。

引用次数: 4

A 2D Electromagnetic PIC Code for Distributed Memory Parallel Computers 分布式存储并行计算机的二维电磁PIC码

The Sixth Distributed Memory Computing Conference, 1991. Proceedings

Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633212

T. Krucken, P. Liewer, R. Ferraro, V. Decyk

The two dimensional electrostatic plasma particle in cell (PIC) code described an [1] has been upgraded to a 2D electromagnetic PIC code running on the Caltech/JPL Mark IIIfp and the Intel iPSC/860 parallel MIMD computers. The code solves the complete time dependent Maxwell’s equations where the plasma responses, i.e., the charge and current density in the plasma, are evaluated by advancing in time the trajectories of ~ 10^6 particles in their self-consistent electromagnetic field. The field equations are solved in Fourier space. Parallelisation is achieved through domain decomposition in real and Fourier space. Results from a simulation showing a two-dimensional Alfen wave filamentation instability are shown; these are the first simulations of this 2D Alfen wave decay process.

文献[1]中描述的二维静电等离子体粒子单元(PIC)代码已经升级为二维电磁PIC代码，运行在加州理工学院/喷气推进实验室Mark IIIfp和英特尔iPSC/860并行MIMD计算机上。该代码解决了完全的随时间变化的麦克斯韦方程组，其中等离子体响应，即等离子体中的电荷和电流密度，通过在时间上推进~ 10^6粒子在其自一致电磁场中的轨迹来评估。在傅里叶空间中求解场方程。并行化是通过实空间和傅里叶空间的域分解实现的。模拟结果显示了二维Alfen波的丝化不稳定性;这是第一次模拟二维阿尔芬波衰减过程。

引用次数: 7

ExterniaJ Sorting on a Distributed Memory Machine 分布式内存机的外部排序

The Sixth Distributed Memory Computing Conference, 1991. Proceedings

Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633302

D. Ecklund

Sorting is a classic problern[5], which naturally lends itself to parallel processing. Many researchers have investigated memory-based parallel sorting [3], but only a few researchers have inve,stigated the piroblem d parallel external sorting[2, 41. Existing algorithms employ local sorting of runs followedby pipelined merging of runs. The writing of the final merged result is a serial process performed by a single processor. This sequential bottleneck has a significant negative impact on the total sort time. It also does not make effective use of the concurrent I/O capabilities provided on ai number of parallel machines. I have proposed and prototyped a two phase parallel external sorting algorithm that removes the “final merge bottleneck” by partitioning sorted imns anid utilizing multiple processors to build a merged Iun.

排序是一个经典问题[5]，它自然适合并行处理。许多研究者研究了基于内存的并行排序[3]，但只有少数研究者研究了并行外部排序的问题[2,41]。现有算法采用局部排序，然后是流水合并。最终合并结果的写入是由单个处理器执行的串行过程。这个顺序瓶颈对总排序时间有显著的负面影响。它也不能有效地利用大量并行机器上提供的并发I/O功能。我已经提出并原型化了一个两阶段并行外部排序算法，该算法通过划分排序的内存并利用多个处理器构建合并的内存来消除“最终合并瓶颈”。

引用次数: 0

A Visualization Model For Massively Parallel Algorithms 大规模并行算法的可视化模型

The Sixth Distributed Memory Computing Conference, 1991. Proceedings

Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633346

R. Khanna, B. McMillin

A visualization model has been deireloped to analyse the performance of a massively parallel algorithm. Most visualization tools that have beten developed so far for performance analysis are based generally on individual processor information and commltrnication patterns. These tools, however, are inadequate ,for massively parallel computations. It is difSlcult to comprehend the visual information for many processors. The model, SMIW (Scientific visualization in Multicomputing for Interpretation of Large amounts of Injformation), addresses this problem by using abstract rqpresentations to attain a composite picture which gives better insight to the behavior of the algorithm. Chernoffs Faces have been selected to represent the multidimensional data because of their abiliry to portray multidimensional data in a very perceptible manner. SMILS has been used on an asynchronous massively parallel PDE (partial direrential equation) solver that is based on the multigrid paradigm. The visualization tool helps in tuning the control parameters of the multigrid algorithm to get optimal results.

建立了一个可视化模型来分析大规模并行算法的性能。到目前为止，为性能分析开发的大多数可视化工具通常基于单个处理器信息和通信模式。然而，这些工具对于大规模并行计算来说是不够的。对于许多处理器来说，理解视觉信息是很困难的。该模型名为SMIW(用于解释大量信息的多计算科学可视化)，通过使用抽象的rq表示来获得复合图像，从而更好地了解算法的行为，从而解决了这个问题。选择Chernoffs Faces来表示多维数据，因为它们能够以非常可感知的方式描绘多维数据。在基于多网格范式的异步大规模并行偏微分方程(PDE)求解器中应用了sims。可视化工具有助于调整多网格算法的控制参数，以获得最优结果。

引用次数: 1

The ProSolver-SESm Library, a Skyline Solver for the iPSC/860 ProSolver-SESm库，用于iPSC/860的Skyline求解器

The Sixth Distributed Memory Computing Conference, 1991. Proceedings

Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633168

E. Kushner, E. Castro-Leon, M. L. Barton

A direct equation :rolver that addresses very large (out-of-core), linear systems has been developed for the iPSCl860. Routines that are included within the Prosolver-SES Library can factor and solve any matrix for which pivoting i s unnecessary. The xoftware i s designed to solve sparse matrices in which the non-zero pattern can be described by a skyline or profile. Separate routines exist to support applications that generate symmetric or non-symmetric coeflcient matrices. High performance has been achieved through the ,use of a dot product routine coded in is60 assembly language. In addition disk IIO has been optimized to ensure performance on very large applications. For problems that are small enough to fit in memory, the Prosolver-SES Library achieves approximately I5 MFLOPS p e r processor. On large problems with signijtcant I t 0 (10 000 x 10 000). current performance varies from 8 to 15 MFLOPS per processor.

一个直接的方程:rolver，解决了非常大的(核外)，线性系统已经为iPSCl860开发。Prosolver-SES库中包含的例程可以分解和求解任何不需要旋转的矩阵。该软件设计用于求解稀疏矩阵，其中非零模式可以用天际线或轮廓来描述。存在单独的例程来支持生成对称或非对称系数矩阵的应用程序。通过使用is60汇编语言编写的点积例程实现了高性能。此外，磁盘IIO已经过优化，以确保在非常大的应用程序上的性能。对于小到足以装入内存的问题，Prosolver-SES库在每个处理器上实现大约I5 MFLOPS。对于具有显著I = 0 (10,000 x 10,000)的大问题。当前每个处理器的性能从8到15 MFLOPS不等。

引用次数: 5

High Performance Parallel File Objects 高性能并行文件对象

The Sixth Distributed Memory Computing Conference, 1991. Proceedings

Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633362

Andrew, Grimshaw, Jeff Rem

High performance parallel computers are expected to solve problems involving very large data sets, often far larger than can fir in primary memory. If It0 is not performed intelligently, then the wait for I10 can become a serious bottleneck, limiting the gains from improved processor technology. This paper introduces ELFS (ExtensibLe File Systems). ELFS is a parallel, asynchronous It0 system designed to attack the I10 bottleneck. It combines recent technological advances in three areas: objectoriented systems design, latency obscuring compiler technology, and parallel disk arrays attached to parallel architectures. We present the ELFS class pfo (parallel file object), a parallel 2D-matrix class. Pfo's allow the user to: I ) specify the access pattern, e.g., row-wise, column-wise, or by sub-blocks; 2 ) partition the p f o into sub-pfos defined by subsets of the original file structure, and specify where the new sub-pfo should be located; and 3) access the file in an asynchronous and pipelined manner. Preliminary performance results are presented.

高性能并行计算机有望解决涉及非常大的数据集的问题，通常远远大于主存储器中的数据集。如果不能智能地执行It0，那么等待I10可能成为一个严重的瓶颈，限制了改进的处理器技术所带来的收益。本文介绍了ELFS(可扩展文件系统)。ELFS是一个并行、异步的It0系统，旨在攻击I10瓶颈。它结合了三个领域的最新技术进步:面向对象的系统设计、延迟模糊编译器技术以及附加到并行体系结构上的并行磁盘阵列。我们提出了ELFS类pfo(并行文件对象)，一个并行2d矩阵类。Pfo允许用户:1)指定访问模式，例如，按行、按列或按子块;2)将pfo划分为由原文件结构子集定义的子pfo，并指定新的子pfo的位置;3)以异步和流水线的方式访问文件。给出了初步的性能结果。

引用次数: 13

nCUBE's Parallel I/O with Unix Compatibility 与Unix兼容的nCUBE并行I/O

The Sixth Distributed Memory Computing Conference, 1991. Proceedings

Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633142

E. Debenedictis, P. Madams

This paper presents a parallel 1/0 facility based on an extension of Unix. This facility, both scalable and transparently integrated, is part of the upcoming release 3 of nCUBEs system software. With the addition of scalability for 1/0 as well as computing, distributed memory machines become balanced between the two functions, suiting them for a wider applications range than their traditional domain of computation-intensive tasks. The basis of the 1/0 facility is a system-level data structure called a mapping function. A mapping function describes how data from the parts of a parallel program or parallel 1/0 device are combined to form a single 1/0 stream. Combining mapping functions from senders and receivers allows the system to me an optimal communications strategy. Finally, these facilities are added as extensions to Unix. For programs with a single processor, an exact Unix environment is provided. For parallel programs, the Unix environment is extended in a natural way to accommodate parallel I/O.

本文提出了一种基于Unix扩展的并行1/0机制。这个功能既可伸缩又透明集成，是即将发布的nCUBEs系统软件3版的一部分。由于增加了1/0和计算的可伸缩性，分布式内存机器在这两种功能之间取得了平衡，使它们比传统的计算密集型任务领域更适合于更广泛的应用范围。1/0功能的基础是称为映射函数的系统级数据结构。映射函数描述了来自并行程序或并行1/0设备各部分的数据如何组合成单个1/0流。将发送者和接收者的映射功能相结合，使系统能够提供最佳的通信策略。最后，将这些工具作为扩展添加到Unix中。对于具有单处理器的程序，提供了一个精确的Unix环境。对于并行程序，Unix环境以一种自然的方式进行了扩展，以适应并行I/O。

引用次数: 10

Massively Parallel Heuristic Search for Approximate Optimization Problems 近似优化问题的大规模并行启发式搜索

The Sixth Distributed Memory Computing Conference, 1991. Proceedings

Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633159

A. Mahanti, C. J. Daniels, S. Ghosh, M. Evett, A. Pal

Most admissible search algorithms fail to solve reallife problems because of their exponential time and storage requirements. Therefore, to quickljy obtain near-optimal solutions, the use of approximute algorithms and inadmissible heuristics are of practical interest. The use of parallel and distributed ahgorithms [l, 6, 8, 111 further reduces search complexity. I n this paper we present empirical results on a massively parallel search algorithm using a Connection .Machine CM-2. Our algorithm, PBDA', is based on the idea of staged search [9, lo]. Its execution time is directly proportional t o the depth of search, and solution quality is scalable with the number of processors. W e tested it on the 1Bpuzzle problem using both admissible and inadmissible heuristics. The best results gave an average relative error of 1.66% and 66% optimal solutions.

大多数可接受的搜索算法无法解决现实生活中的问题，因为它们的时间和存储需求呈指数级增长。因此，为了快速获得近似最优解，使用近似算法和不可容许启发式是有实际意义的。并行和分布式算法的使用[1,6,8,111]进一步降低了搜索复杂度。在本文中，我们给出了一个使用连接机CM-2的大规模并行搜索算法的实证结果。我们的算法PBDA是基于分阶段搜索的思想[9,10]。它的执行时间与搜索深度成正比，解决方案的质量随处理器数量的增加而增加。我们使用可接受的和不可接受的启发式方法对它进行了测试。最佳结果的平均相对误差为1.66%，最优解为66%。

引用次数: 0

The Assign Parallel Program Generator 分配并行程序生成器

The Sixth Distributed Memory Computing Conference, 1991. Proceedings

Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633117

D. O'Hallaron

ASSIGN is a toolfor building large-scale applications, in particular signal processing applications, on distributedmemory multicomputers. The jrst target machine is iWarp, a multicomputer system developed jointly by Intel Corporation and Carnegie Mellon University. This paper gives a high-level introduction to ASSIGN .

ASSIGN是一个用于在分布式内存多计算机上构建大规模应用程序，特别是信号处理应用程序的工具。第一个目标机器是iWarp，一个由英特尔公司和卡内基梅隆大学联合开发的多计算机系统。本文对ASSIGN作了一个高层次的介绍。

引用次数: 43

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

The Sixth Distributed Memory Computing Conference, 1991. Proceedings

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀