[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation最新文献

英文中文

An asymptotically optimal parallel bin-packing algorithm 一种渐近最优并行装箱算法

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234866

N. S. Coleman, Pearl Y. Wang

The authors introduce a bin-packing heuristic that is well-suited for implementation on massively parallel SIMD (single-instruction multiple-data) or MIMD (multiple-instruction multiple-data) computing systems. The average-case behavior (and the variance) of the packing technique can be predicted when the input data have a symmetric distribution. The method is asymptotically optimal, yields perfect packings, and achieves the best possible average case behavior with high probability. The analytical result improves upon any online algorithms previously reported in the literature and is identical to the best results reported so far for offline algorithms.<>

作者介绍了一种非常适合在大规模并行SIMD(单指令多数据)或MIMD(多指令多数据)计算系统上实现的装箱启发式算法。当输入数据具有对称分布时，可以预测打包技术的平均情况行为(和方差)。该方法是渐近最优的，可以得到完美的包装，并以高概率获得最佳的平均情况行为。分析结果改进了以前在文献中报道的任何在线算法，并且与迄今为止报道的离线算法的最佳结果相同。

引用次数: 3

Benchmarking performance of massively parallel AI architectures 大规模并行AI架构的性能基准测试

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234865

R. Demara, H. Kitano

The authors address the architectural evaluation of massively parallel machines suitable for artificial intelligence (AI). The approach is to identify the impact of specific algorithm features by measuring execution time on a SNAP-1 and a Connection Machine-2 using different knowledge base and machine configurations. Since a wide variety of parallel AI languages and processing architectures are in use, the authors developed a portable benchmark set for Parallel AI Computational Efficiency (PACE). PACE provides a representative set of processing workloads, knowledge base topologies, and performance indices. The authors also analyze speedup and scalability of fundamental AI operations in terms of the massively parallel paradigm.<>

作者讨论了适用于人工智能(AI)的大规模并行机器的架构评估。该方法是通过使用不同的知识库和机器配置测量SNAP-1和Connection machine -2上的执行时间来确定特定算法特性的影响。由于使用了各种各样的并行AI语言和处理架构，作者开发了一个可移植的并行AI计算效率(PACE)基准集。PACE提供了一组具有代表性的处理工作负载、知识库拓扑和性能指标。作者还从大规模并行范式的角度分析了基本人工智能操作的加速和可扩展性。

引用次数: 1

Connection Machine model CM-5 system overview 连接机型号CM-5系统概述

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234877

J. Palmer, G. Steele

The Connection Machine model CM-5 provides high performance and ease of use for large data-intensive applications. The CM-5 architecture is designed to scale to teraflops performance on terabyte-sized problems. SPARC-based processing nodes, each with four vector pipes, are connected by two communications networks, the Data Network and the Control Network. The system combines the best features of SIMD (single-instruction multiple-data) and MIMD (multiple-instruction multiple-data) designs, integrating them into a single 'universal' parallel architecture. The processor nodes may be divided into independent computational partitions; each partition may be independently timeshared or devoted to batch processing. Programming languages include Fortran (with Fortran 90 array constructs) and C*, a parallel dialect of C. The PRISM programming environment supports source-level debugging, tracing, and profiling through a graphical interface based on X Windows.<>

Connection Machine型号CM-5为大型数据密集型应用程序提供高性能和易用性。CM-5架构的设计是为了在太字节大小的问题上达到每秒万亿次的性能。基于sparc的处理节点，每个节点有四个矢量管道，由两个通信网络，数据网络和控制网络连接。该系统结合了SIMD(单指令多数据)和MIMD(多指令多数据)设计的最佳功能，将它们集成到一个单一的“通用”并行架构中。处理器节点可划分为独立的计算分区;每个分区可以是独立分时的，也可以专门用于批处理。编程语言包括Fortran(带有Fortran 90数组结构)和C*(一种并行的C语言方言)。PRISM编程环境通过基于X Windows的图形界面支持源代码级调试、跟踪和分析。

引用次数: 29

Simulation and performance estimation for the Rewrite Rule Machine 改写规则机的仿真与性能评估

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234941

Hitoshi Aida, J. Goguen, Sany M. Leinwand, P. Lincoln, J. Meseguer, B. Taheri, T. Winkler

The authors give an overview of the Rewrite Rule Machine's (RRM's) architecture and discuss performance estimates based on very detailed register-level simulations at the chip level, together with more abstract simulations and modeling for higher levels. For a 10000 ensemble RRM, the present estimates are as follows. (1) The raw peak performance is 576 trillion operations per second. (2) For general symbolic applications, ensemble Sun-relative speedup is roughly 6.7, and RRM performance with a wormhole network at 88% efficiency gives an idealized Sun-relative speedup of 59000. (3) For highly regular symbolic applications (the sorting problem is taken as a typical example), ensemble performance is a Sun-relative speedup of 127, and RRM performance is estimated at over 80% efficiency (relative to the cluster performance), yielding a Sun-relative speedup of over 91. (4) For systolic applications (a 2-D fluid flow problem is taken as a typical example), ensemble performance is a Sun-relative speedup of 400-670, and cluster-level performance, which should be attainable in practice, is at 82% efficiency.<>

作者概述了重写规则机(RRM)的体系结构，并讨论了基于芯片级非常详细的寄存器级模拟的性能估计，以及更抽象的模拟和更高级别的建模。对于10000个整体RRM，目前的估计数如下。(1)原始峰值性能为每秒576万亿次操作。(2)对于一般符号应用，集成太阳相对加速大约为6.7，而在88%效率的虫洞网络下，RRM性能的理想太阳相对加速为59000。(3)对于高度规则的符号应用(以排序问题为典型例子)，集成性能的太阳相对加速为127,RRM性能的效率估计超过80%(相对于集群性能)，产生的太阳相对加速超过91。(4)对于收缩应用(以二维流体流动问题为典型例子)，集成性能的太阳相对加速为400-670，集群级性能的效率为82%，在实践中应该可以实现。

{"title":"Simulation and performance estimation for the Rewrite Rule Machine","authors":"Hitoshi Aida, J. Goguen, Sany M. Leinwand, P. Lincoln, J. Meseguer, B. Taheri, T. Winkler","doi":"10.1109/FMPC.1992.234941","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234941","url":null,"abstract":"The authors give an overview of the Rewrite Rule Machine's (RRM's) architecture and discuss performance estimates based on very detailed register-level simulations at the chip level, together with more abstract simulations and modeling for higher levels. For a 10000 ensemble RRM, the present estimates are as follows. (1) The raw peak performance is 576 trillion operations per second. (2) For general symbolic applications, ensemble Sun-relative speedup is roughly 6.7, and RRM performance with a wormhole network at 88% efficiency gives an idealized Sun-relative speedup of 59000. (3) For highly regular symbolic applications (the sorting problem is taken as a typical example), ensemble performance is a Sun-relative speedup of 127, and RRM performance is estimated at over 80% efficiency (relative to the cluster performance), yielding a Sun-relative speedup of over 91. (4) For systolic applications (a 2-D fluid flow problem is taken as a typical example), ensemble performance is a Sun-relative speedup of 400-670, and cluster-level performance, which should be attainable in practice, is at 82% efficiency.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117252667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Data Parallel Fortran 数据并行Fortran

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234909

P. Elustondo, L. A. Vazquez, O.J. Nestares, J. S. Avalos, G. A. Alvarez, C.-T. Ho, J. Sanz

The authors present Data Parallel Fortran (DPF), a set of extensions to Fortran aimed at programming scientific applications on a variety of parallel machines. DPF portrays a global name space to programmers and allows programs to be written in a clear, data-parallel style. DPF's model is based on the idea of having a single control thread that spans parallel virtual threads with arbitrary nesting, resuming at their completion into a single global state. It also provides explicit control of which subset of the global name space is strictly accessed by each virtual processor at different points in a program. This powerful mechanism makes it possible to write programs in which communication points are handled explicitly, but without making use of message passing code. Also, DPF offers some primitives that involve communication often encountered in parallel numerical and scientific applications. DPF semantics does not depend on any particular feature of the architecture, thus providing a reasonably high-level programming methodology.<>

作者提出了数据并行Fortran (DPF)，这是Fortran的一组扩展，旨在在各种并行机器上编程科学应用程序。DPF向程序员描绘了一个全局名称空间，并允许以清晰的数据并行风格编写程序。DPF的模型基于这样的思想，即拥有一个控制线程，该线程跨越具有任意嵌套的并行虚拟线程，在它们完成时恢复到单个全局状态。它还提供了对全局名称空间的哪个子集由每个虚拟处理器在程序的不同点严格访问的显式控制。这种强大的机制使得编写显式处理通信点的程序成为可能，但不使用消息传递代码。此外，DPF还提供了一些涉及在并行数值和科学应用程序中经常遇到的通信的原语。DPF语义不依赖于体系结构的任何特定特性，因此提供了一种相当高级的编程方法

引用次数: 6

Distance between images 图像间距离

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234956

J. A. Gualtieri, J. Le Moigne, C. V. Packer

The authors compare two methods which compute an approximation to the Hausdorff distance between pairs of binary images. They also implement a parallel vision of one of the methods, which can provide a fast image distance algorithm to calibrate algorithms performing such tasks as image recognition, image compression, or image browsing. For this purpose, they have shown a simple application of selecting the best iteration of a region growing algorithm which yields edge images by comparing them to a Canny edge detector.<>

作者比较了计算二值图像间豪斯多夫距离近似的两种方法。他们还实现了其中一种方法的并行视觉，它可以提供快速的图像距离算法来校准执行图像识别、图像压缩或图像浏览等任务的算法。为此，他们展示了一个简单的应用程序，选择区域增长算法的最佳迭代，该算法通过将边缘图像与Canny边缘检测器进行比较来产生边缘图像

引用次数: 12

Issues on the algorithm-software continuum 算法-软件连续体的问题

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234957

L. Jamieson, M. Atallah, J. Cuny, D. Gannon, J. JáJá, V. Lo, R. Miller

To date, highest performance on parallel systems has required expertise spanning high-level algorithm design through architecture-dependent fine tuning of the implementation. Application users who are uninformed about architecture details are not able to take advantage of (or compensate for) idiosyncrasies of the target machine; parallel processing experts are often not able to explore radically different ways of solving a physical problem in order to adopt the approach best suited to a particular architecture. Moreover, software tools have not yet succeeded in automating the realization of high-performance parallel applications. The authors therefore deal with questions about how much of an algorithm designer a user of parallel systems can/should be expected to be, and how much software support is realistic to expect.<>

迄今为止，在并行系统上实现最高性能需要专业知识，包括高级算法设计，以及与体系结构相关的实现微调。不了解体系结构细节的应用程序用户无法利用(或补偿)目标机器的特性;为了采用最适合特定体系结构的方法，并行处理专家通常无法从根本上探索解决物理问题的不同方法。此外，软件工具还没有成功地实现高性能并行应用程序的自动化。因此，作者处理的问题是，并行系统的用户可以/应该期望成为多少算法设计者，以及期望得到多少软件支持是现实的。

引用次数: 0

Input/output for fine grain multiprocessor systems 输入/输出用于细粒度多处理器系统

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234927

S.-Y. Lee

While extensive investigations on how multiple processing elements (PEs) in a parallel system can be utilized efficiently have been carried out, the I/O (input/output) into and from the system has been ignored in most cases. However, the time for downloading input data or uploading results would not be negligible, especially when a large number of PEs such as those in a massively parallel system and/or a large volume of data are involved. Results from a preliminary study on how I/O can be efficiently realized in a fine-grain multiprocessor system without any hardware change are reported.<>

虽然对如何有效利用并行系统中的多个处理元素(pe)进行了广泛的调查，但在大多数情况下，系统的I/O(输入/输出)被忽略了。但是，下载输入数据或上传结果的时间是不可忽略的，特别是当涉及大量pe(例如大规模并行系统中的pe)和/或大量数据时。本文报道了在不改变硬件的情况下，如何在细粒度多处理器系统中有效地实现I/O的初步研究结果。

引用次数: 0

Pi: a parallel architecture interface Pi:一个并行架构接口

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234940

D. Wills, W. Dally

The authors define Pi, a parallel architecture interface that separates model and machine issues, allowing them to be addressed independently. This provides greater flexibility for both the model and machine builder. Pi addresses a set of common parallel model requirements, including low-latency communication, fast task switching, low-cost synchronization, efficient storage management, the ability to exploit locality, and efficiency support for sequential code. Since Pi provides generic parallel operations, it can efficiently support many parallel programming models, including hybrids of existing models. Pi also forms a basis of comparison for architectural components. The authors present an overview of Pi, and a description of several model examples which have been constructed and evaluated on the interface.<>

作者定义了Pi，一个将模型和机器问题分离的并行架构接口，允许它们独立处理。这为模型和机器制造商提供了更大的灵活性。Pi解决了一组常见的并行模型需求，包括低延迟通信、快速任务切换、低成本同步、高效存储管理、利用局部性的能力，以及对顺序代码的高效支持。由于Pi提供了通用的并行操作，它可以有效地支持许多并行编程模型，包括现有模型的混合模型。圆周率也构成了建筑构件比较的基础。作者介绍了Pi的概述，并描述了几个已经在界面上构建和评估的模型示例。

引用次数: 8

On the parallel processing capabilities of LCA networks LCA网络的并行处理能力研究

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

Pub Date : 1992-10-19 DOI: 10.1109/FMPC.1992.234918

I.D. Scherson, P.Y. Wang

Lowest Common Ancestor networks (LCANs) are hierarchical interconnection networks for communication in SIMD and MIMD machines. The connectivity and permutational properties of specific families of LCANs have been previously studied. LCANs are built with switches in a tree-like manner. A level in the hierarchy is akin to a stage in a multistage interconnect and their topology is similar to that of hypertrees and fat trees. Their hierarchical structure lends itself to implementation in the fabrication hierarchy, namely chips, boards and backplanes. In this paper, a preliminary investigation of the algorithmic capabilities of LCANs (in terms of their parameters) is reported.<>

最低共同祖先网络(lcan)是用于SIMD和MIMD机器通信的分层互连网络。先前已经研究了特定家族的lcan的连通性和排列性质。lcan以树状的方式使用交换机构建。层次结构中的一个级别类似于多阶段互连中的一个阶段，它们的拓扑结构类似于超树和胖树。它们的分层结构有助于在制造层次结构中实现，即芯片、电路板和背板。本文对lcan的算法性能(根据其参数)进行了初步研究。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀