[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation最新文献

英文中文

Scientific visualization theatre 科学可视化剧场

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234874

T. Sterling

Summary form only given. Discusses the latest in massively parallel processing (MPP) applications' results through high-resolution graphics and animation. Three themes are represented, demonstrating the relationship between massively parallel computing and scientific visualization. Results of applications computed on MPPs and visualized on graphics workstations are shown for many of the cases. Examples of result data whose image rendering are performed using parallel algorithms on MPPs are shown, and some performance measurements are given. Finally, graphics presentation of data representing the behavioral dynamics of MPPs are shown, opening the way for scientific visualization to assist in the optimization of MPP computation.<>

只提供摘要形式。通过高分辨率图形和动画讨论了大规模并行处理(MPP)应用的最新结果。三个主题代表了大规模并行计算和科学可视化之间的关系。在许多情况下，给出了在mpp上计算并在图形工作站上可视化的应用结果。给出了在mpp上使用并行算法进行图像绘制的结果数据示例，并给出了一些性能度量。最后，给出了MPP行为动力学数据的图形表示，为科学可视化帮助优化MPP计算开辟了道路。

引用次数: 0

The new frontiers: A workshop on future directions in massively parallel processing 新的前沿:大规模并行处理的未来方向研讨会

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234882

I.D. Scherson

The task of identifying some of the basic research issues facing modern massively parallel processing is addressed. Processing element architecture, interconnection networks, languages and compilers, and software development tools are considered.<>

确定现代大规模并行处理面临的一些基本研究问题的任务。考虑了处理元素体系结构、互连网络、语言和编译器以及软件开发工具。

引用次数: 0

Massively parallel sparse LU factorization 大规模并行稀疏LU分解

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234896

S. Kratzer

The multifrontal algorithm for sparse LU factorization has been expressed as a data parallel program that is suitable for massively parallel computers. A new way of mapping data and computations to processors is used, and good processor utilization is obtained even for unstructured sparse matrices. The sparse problem is decomposed into many smaller, dense subproblems, with low overhead for communications and memory access. Performance results are provided for factorization of regular and irregular finite-element grid matrices on the MasPar MP-1.<>

稀疏LU分解的多正面算法已被表示为一种适用于大规模并行计算机的数据并行程序。采用了一种将数据和计算映射到处理器的新方法，即使对于非结构化稀疏矩阵，也能获得良好的处理器利用率。稀疏问题被分解成许多更小、更密集的子问题，通信和内存访问的开销很低。给出了在MasPar MP-1.>上对规则和不规则有限元网格矩阵进行分解的性能结果

引用次数: 6

Communication overhead on the CM5: an experimental performance evaluation CM5上的通信开销:实验性能评估

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234899

R. Ponnusamy, A. Choudhary, G. Fox

The authors present experimental results for communication overhead on the scalable parallel machine CM-5. It is observed that the communication latency of the data network is 88 mu s. It was also observed that the communication cost for messages that are a multiple of 16 bytes is much smaller than for messages that are not, and therefore, for better performance, a user should pad messages to make them a multiple of 16 bytes. The authors also studied the communication overhead of three complete exchange algorithms. For small message sizes, the recursive exchange algorithm performs the best, especially for large multiprocessors. However, for large message sizes, the pairwise exchange algorithm is preferable. Finally, the authors studied two algorithms for one-to-all broadcast: the linear broadcast algorithm and the recursive broadcast algorithm. Linear broadcast does not perform well; the recursive broadcast algorithm performs well.<>

给出了在可扩展并行机CM-5上通信开销的实验结果。观察到数据网络的通信延迟为88 μ s。还观察到16字节倍数的消息的通信成本远小于16字节倍数的消息，因此，为了获得更好的性能，用户应将消息填充为16字节的倍数。作者还研究了三种完整交换算法的通信开销。对于较小的消息大小，递归交换算法表现最好，特别是对于大型多处理器。但是，对于较大的消息大小，更可取的是成对交换算法。最后，研究了一对所有广播的两种算法:线性广播算法和递归广播算法。线性广播表现不佳;递归广播算法性能良好。

引用次数: 32

Parallel holographic image calculation and compression 并行全息图像的计算和压缩

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234923

D. M. Newman, D. Goeckel, R. D. Crawford, S. Abraham

The authors describe the parallel implementation of an algorithm suitable for hologram creation on a 16384 processor SIMD (single-instruction multiple-data) MasPar machine. When computing an image of typical complexity, the parallel implementation sacrifices up to 11% efficiency in data compression to gain a performance up to 250 times greater than that achieved on a uniprocessor workstation. The MasPar can achieve pattern generation more than 750 times faster than the fully optimized Sparc C code.<>

本文描述了一种适用于全息图生成的算法在16384处理器SIMD(单指令多数据)MasPar机上的并行实现。当计算典型复杂性的图像时，并行实现在数据压缩方面牺牲高达11%的效率，以获得比单处理器工作站高250倍的性能。MasPar的模式生成速度比完全优化的Sparc代码快750倍以上。

引用次数: 0

Throughput analysis of pipelined multiprocessor modules 流水线多处理器模块的吞吐量分析

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234926

S.-Y. Lee

A feasible form of parallel architecture would be one which consists of several pipeline stages, each of which is a multiprocessor module of a large number of processing elements (PEs). In many applications, such as real-time image processing and dynamic control, the optimized computing structure would be in this form. In the present study, the performance of a parallel processing model of such an organization has been analyzed. In particular, the effect of interstage communication on throughput of the model has been investigated to suggest an efficient way of transferring data between stages. The numerical results obtained in this study could be a useful guideline for designing a parallel computer system consisting of pipeline stages each of which contains a large number of PEs.<>

并行体系结构的一种可行形式是由几个流水线阶段组成，每个流水线阶段是一个由大量处理元素(pe)组成的多处理器模块。在许多应用中，如实时图像处理和动态控制，优化后的计算结构将是这种形式。在本研究中，分析了这种组织的并行处理模型的性能。特别地，研究了级间通信对模型吞吐量的影响，提出了一种在级间传输数据的有效方法。本研究的数值结果可为设计由管道级组成的并行计算机系统提供有用的指导，每个管道级都包含大量pe。

引用次数: 0

Traffic analysis of hypercubes and banyan-hypercubes 超立方体和榕树超立方体的流量分析

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234950

A. Bellaachia, A. Youssef

The routing performance of banyan-hypercubes (BHs) is studied and compared with that of hypercubes. To evaluate the routing capabilities of BHs and hypercubes, a communication model is assumed. Based on this model, the traffic intensity of both networks is computed and the saturation probability of each network is determined. To compute the average time delay, the average queue length, the throughput, and the maximum queue size, extensive simulations were conducted for both networks for different sizes and different package generation packet rates. The saturation probability obtained through the simulation results is very close to that computed theoretically. The simulation results showed that all of the aforementioned measures are decreased when the network size gets larger. BHs with more than two levels are shown to congest faster than a hypercube of the same size, and deliver less throughput. However, a two-level BH has better performance than a hypercube of the same size. Although the BH has a better diameter and average distance, it does not necessarily have better communication capabilities than hypercubes.<>

研究了榕树超立方体(BHs)的路由性能，并与超立方体进行了比较。为了评估BHs和超多维数据集的路由能力，假设了一个通信模型。在此模型的基础上，计算了两个网络的流量强度，确定了每个网络的饱和概率。为了计算平均时延、平均队列长度、吞吐量和最大队列大小，对两种网络在不同大小和不同包生成速率下进行了广泛的模拟。模拟得到的饱和概率与理论计算结果非常接近。仿真结果表明，随着网络规模的增大，上述指标均有所降低。具有两个以上级别的BHs比相同大小的超立方体拥挤得更快，并且提供更少的吞吐量。然而，两能级黑洞比同样大小的超立方体具有更好的性能。虽然黑洞具有更好的直径和平均距离，但它不一定具有比超立方体更好的通信能力。

引用次数: 2

Program transformation in massively parallel systems 大规模并行系统中的程序转换

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234873

T. Al-Marzooq, F. Bastani

The authors present two problems in mapping highly maintainable expressive parallel code manipulating multidimensional arrays in massively parallel computers: bottlenecks due to simultaneous accesses in the EREW model, and interprocessor communication. They present a source code transformation approach to solve the expressibility-high-performance problem for the multidimensional arrays designed with a four-level hierarchical design of the data types (aggregate, abstract, logical, and physical levels). A systematic method is developed to transform parallel high-level low-performance code into parallel low-level efficient ones. The method is illustrated with matrix multiplication. The method is also used to generate high-performance logical-level code for the backpropagation algorithm of neural networks that makes extensive use of matrices. The transformed code has a much higher performance than the code with a naive mapping.<>

作者提出了在大规模并行计算机中映射高可维护的表达性并行代码操作多维数组的两个问题:由于在EREW模型中同时访问而产生的瓶颈，以及处理器间通信。他们提出了一种源代码转换方法，用于解决使用数据类型(聚合层、抽象层、逻辑层和物理层)的四级分层设计设计的多维数组的可表达性-高性能问题。提出了一种将高性能并行代码转化为高性能并行代码的系统方法。用矩阵乘法来说明该方法。该方法还用于为广泛使用矩阵的神经网络反向传播算法生成高性能逻辑级代码。转换后的代码比原始映射的代码具有更高的性能。

引用次数: 2

Quantitative studies of processing element granularity 加工元素粒度的定量研究

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234925

T. C. Marek, E. Davis

Quantitative results of experiments on PE (processing element) granularities are presented. An architecture simulation workbench has been developed for experiments on PE granularities of 1, 4, 8, and 16-b. An analysis of the impact of various I/O (input/output) and communication path widths is also possible. Overall performance, communication balance, PE utilization, and operand lengths can be monitored to evaluate the merits of various granularities and feature sets. This workbench has been used to run a set of benchmark algorithms that cover a range of computation and communication requirements, a range of data sizes, and a range of problem array sizes. The authors report results for two of the algorithms studied by T.C. Marek (1992): image rotation and image resampling. The results obtained are counterintuitive. They indicate that bit-serial machines have performance advantages due to inherent bit-oriented activity, even when using multiple bit operands, and to inter-PE communication when paths are narrower than the processor granularity.<>

给出了PE(加工元件)粒度的定量实验结果。针对PE粒度为1、4、8和16-b的实验，开发了体系结构仿真工作台。还可以分析各种I/O(输入/输出)和通信路径宽度的影响。可以监控总体性能、通信平衡、PE利用率和操作数长度，以评估各种粒度和特性集的优点。这个工作台用于运行一组基准测试算法，这些算法涵盖了一系列的计算和通信需求、一系列的数据大小和一系列的问题数组大小。作者报告了T.C. Marek(1992)研究的两种算法的结果:图像旋转和图像重采样。得到的结果是违反直觉的。他们指出，位串行机由于固有的面向位的活动而具有性能优势，即使在使用多个位操作数时也是如此，并且当路径比处理器粒度更窄时，pe间通信也具有性能优势

引用次数: 3

Automatic data distribution for nearest neighbor networks 最近邻网络的自动数据分发

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234890

M. Philippsen

An algorithm for mapping an arbitrary, multidimensional array onto an arbitrarily shaped multidimensional nearest-neighbor network of a distributed memory machine is presented. The individual dimensions of the array are labeled with high-level usage descriptors that either can be provided by the programmer or can be derived by sophisticated static compiler analysis. The presented algorithm achieves an appropriate exploitation of nearest-neighbor communication and allows for efficient address calculations. The author describes the integration of this technique into an optimizing compiler for Modula-2 and derives extensions that render efficient translation of nested parallelism possible and that provide support for thread scheduling.<>

提出了一种将任意多维数组映射到任意形状的分布式存储机多维近邻网络的算法。数组的各个维度都用高级用法描述符进行标记，这些描述符可以由程序员提供，也可以由复杂的静态编译器分析派生。该算法充分利用了最近邻通信，实现了高效的地址计算。作者描述了将这种技术集成到Modula-2的优化编译器中，并派生了一些扩展，使嵌套并行的有效转换成为可能，并提供了对线程调度的支持。

引用次数: 9

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀