[1993] Proceedings Seventh International Parallel Processing Symposium最新文献

英文中文

Low crosstalk address encodings for optical message switching systems 光消息交换系统的低串扰地址编码

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262784

Y. Ben-Asher, A. Schuster

An optical message switching system delivers messages from N sources to N destinations using beams of light. The redirection of the beams involves vector-matrix multiplication and a threshold operation. The authors consider the design of addresses which are both short (so that the number of threshold devices is reduced) and have low crosstalk (so that the sensitivity gap may grow). They show that addresses for O(log N) bits exist, for which the crosstalk is a constant fraction of the number of set bits in each address, hence allowing for a Theta (log N) sized sensitivity gap. More generally, they show the precise coefficient which depends on the desired gap. It is established that when using O(log N) bit addresses, the crosstalk cannot be further reduced. An exact construction of O(log/sup 2/ N) bit addresses is given, where the involved constant depends on the desired crosstalk. Finally they describe briefly the basic optical elements that can be used in order to construct a message switching system which use these address schemes.<>

光信息交换系统利用光束将信息从N个源传送到N个目的地。光束的重定向涉及向量矩阵乘法和阈值运算。作者考虑地址既短(以便减少阈值器件的数量)又低串扰(以便增大灵敏度间隙)的设计。他们表明存在O(log N)位的地址，其中串扰是每个地址中设置比特数的常数部分，因此允许Theta (log N)大小的灵敏度间隙。更一般地说，它们显示了取决于所需间隙的精确系数。当使用O(log N)位地址时，不能进一步减小串扰。给出了O(log/sup 2/ N)位地址的精确构造，其中所涉及的常数取决于所需的串扰。最后简要描述了用于构建使用这些地址方案的消息交换系统的基本光学元件。

引用次数: 0

Explicit parallel structuring for rule-based programming 基于规则的编程的显式并行结构

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262829

Shiow-yang Wu, J. Browne

This paper presents semantically-based explicit parallel structuring for rule-based programming systems. Explicit parallel structuring appears to be necessary since compile-time dependency analysis of sequential programs has not yielded large scale parallelism and run-time analysis for parallelism is restricted by the execution cost of the analysis. Simple language extensions specifying semantics of rules are used to define parallel execution behavior at the rule level. Type definitions for working memory elements are extended to include relationships within and among objects which define the parallelism allowed on instances of object types. The first result presented is that the algorithms implemented by commonly used benchmark rule-based programs contain scalable parallelism. The second result is that much of that parallelism can be captured by simple and modest extensions of rule-based languages which are analogies of models and constructs used for specification of parallel structures in imperative programming languages. A sketch is given for a comprehensive language system which exploits specification of semantics defining parallel structures in both object-definition and executable segments of rule-based programs.<>

针对基于规则的编程系统，提出了一种基于语义的显式并行结构。显式并行结构似乎是必要的，因为顺序程序的编译时依赖分析并没有产生大规模的并行性，而并行性的运行时分析受到分析的执行成本的限制。指定规则语义的简单语言扩展用于在规则级别定义并行执行行为。工作内存元素的类型定义被扩展到包括对象内部和对象之间的关系，这些关系定义了对象类型实例上允许的并行性。提出的第一个结果是，常用的基于基准规则的程序实现的算法包含可伸缩的并行性。第二个结果是，大部分并行性可以通过基于规则的语言的简单而适度的扩展来捕获，这些扩展类似于命令式编程语言中用于规范并行结构的模型和构造。给出了一个综合语言系统的草图，该系统利用语义规范在基于规则的程序的对象定义段和可执行段中定义并行结构。

{"title":"Explicit parallel structuring for rule-based programming","authors":"Shiow-yang Wu, J. Browne","doi":"10.1109/IPPS.1993.262829","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262829","url":null,"abstract":"This paper presents semantically-based explicit parallel structuring for rule-based programming systems. Explicit parallel structuring appears to be necessary since compile-time dependency analysis of sequential programs has not yielded large scale parallelism and run-time analysis for parallelism is restricted by the execution cost of the analysis. Simple language extensions specifying semantics of rules are used to define parallel execution behavior at the rule level. Type definitions for working memory elements are extended to include relationships within and among objects which define the parallelism allowed on instances of object types. The first result presented is that the algorithms implemented by commonly used benchmark rule-based programs contain scalable parallelism. The second result is that much of that parallelism can be captured by simple and modest extensions of rule-based languages which are analogies of models and constructs used for specification of parallel structures in imperative programming languages. A sketch is given for a comprehensive language system which exploits specification of semantics defining parallel structures in both object-definition and executable segments of rule-based programs.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116285269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

A portable parallel algorithm for VLSI circuit extraction 一种便携式VLSI电路提取并行算法

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262922

B. Ramkumar, P. Banerjee

The authors describe a new portable algorithm for parallel circuit extraction. The algorithm is built as part of the ongoing ProperCAD project: a portable object-oriented parallel environment for CAD applications that is built on top of the CHARM system. The algorithm, unlike prior approaches like PACE is asynchronous and is based on a coarse-grained dataflow execution model. Performance of circuit extraction is presented on four parallel machines: an Encore Multimax, a Sequent Symmetry, a NCUBE 2 hypercube, and a network of Sun Sparc workstations. The extractor runs unchanged on all these machines.<>

提出了一种新的便携式并行电路提取算法。该算法是作为正在进行的ProperCAD项目的一部分构建的:一个可移植的面向对象的并行环境，用于CAD应用程序，建立在CHARM系统之上。与先前的方法(如PACE)不同，该算法是异步的，并且基于粗粒度数据流执行模型。介绍了电路提取在四个并行机器上的性能:Encore multiax、sequence Symmetry、NCUBE 2超立方体和Sun Sparc工作站网络。提取器在所有这些机器上都是不变的。

引用次数: 13

Why BSP computers? (bulk-synchronous parallel computers) 为什么是BSP电脑?(批量同步并行计算机)

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262847

L. Valiant

The author gives a summary of some of the arguments favoring the adoption of the bulk-synchronous parallel (BSP) model as a standard for parallel computing. First, he argues that for parallel computing to become a major industry, agreement has to be reached on a standard model at a level intermediate between the language and architecture levels. He goes on to list the factors that make the BSP model attractive as a standard at this intermediate or bridging level. Finally, he provides some reasons for favoring it over the shared memory or PRAM model which is an alternative candidate for this role.<>

作者总结了支持采用批量同步并行(BSP)模型作为并行计算标准的一些论点。首先，他认为，为了使并行计算成为一个主要产业，必须在语言和体系结构之间的一个层次上就标准模型达成一致。他接着列举了使BSP模型在这个中间或桥接级别作为标准具有吸引力的因素。最后，他提供了一些支持它的原因，而不是共享内存或PRAM模型，后者是此角色的备选方案

引用次数: 15

Mapping onto three classes of parallel machines: a case study using the cyclic reduction algorithm 映射到三类并行机器:使用循环约简算法的案例研究

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262888

G. Saghi, H. Siegel, J. L. Gray

Mapping cyclic reduction, a known approach for the parallel solution of tridiagonal systems of equations, onto the MasPar MP-1, nCUBE 2, and PASM parallel machines is discussed. Each of these represents a different mode of parallelism. Issues addressed are SIMD/MIMD trade-offs, the effect on execution time of increasing the number of processors used, the impact of the inter-processor communications network on performance, the importance of predicting algorithm performance as a function of the mapping used, and the advantages of a partitionable system. Analytical results are validated by experimentation on all three machines.<>

讨论了在MasPar MP-1、nCUBE - 2和PASM并行机上映射循环约简这一已知的三对角线方程组并行解的方法。每一个都代表了一种不同的并行模式。本文讨论的问题包括SIMD/MIMD的权衡、增加所使用的处理器数量对执行时间的影响、处理器间通信网络对性能的影响、预测算法性能作为所使用映射函数的重要性，以及可分区系统的优势。在三台机器上的实验验证了分析结果

引用次数: 0

Load balancing of DOALL loops in the Perfect Club 完美俱乐部DOALL循环的负载平衡

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262868

G. Elsesser, Viet N. Ngo, S. Bhattacharya, W. Tsai

The speedup achieved by concurrent execution of loop iterations is determined by load balance and several other factors, so no single strategy provides maximum speedup for all classes of programs and all target architectures. Hence, the selection of a load balancing strategy must be guided by characteristics of both the application domain and the target machine architecture. The authors study loop load balance in the context of the well known Perfect Club benchmark. Several static and dynamic characteristics of DOALL loops are observed and interpreted. Late arrival of processors is identified as a significant source of load imbalance. A scheme for processor preallocation is proposed and the advantages and applicability of this scheme are demonstrated by analytical estimates as well as experimental evaluation on a Cray YMP-8.<>

通过并发执行循环迭代实现的加速是由负载平衡和其他几个因素决定的，因此没有一种策略可以为所有类型的程序和所有目标体系结构提供最大的加速。因此，负载平衡策略的选择必须以应用程序域和目标机器体系结构的特征为指导。作者在著名的Perfect Club基准中研究循环负载平衡。观察并解释了DOALL循环的静态和动态特性。处理器的延迟到达被认为是负载不平衡的一个重要来源。提出了一种处理器预分配方案，并通过在Cray ymp - 8b>上的分析估计和实验评估证明了该方案的优点和适用性

引用次数: 1

A multi-level hierarchical cache coherence protocol for multiprocessors 用于多处理器的多级分层缓存一致性协议

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262871

Craig Anderson, J. Baer

In order to meet the computational needs of the next decade, shared-memory processors must be scalable. Though single shared-bus architectures have been successful in the past, lack of bus bandwidth restricts the number of processors that can be effectively put on a single bus machine. One architecture that has been proposed to solve the limited bandwidth problem consists of processors connected via a tree hierarchy of buses. The authors present a tool to study a hierarchical bus based shared-memory system. They highlight the main features of a hierarchical cache coherence protocol and give some preliminary performance results obtained via an instruction level simulator.<>

为了满足未来十年的计算需求，共享内存处理器必须具有可扩展性。尽管单个共享总线架构在过去已经取得了成功，但是总线带宽的缺乏限制了可以有效地放在单个总线机器上的处理器数量。已经提出的解决有限带宽问题的一种体系结构由通过总线的树型层次结构连接的处理器组成。作者提出了一种基于分层总线的共享内存系统的研究工具。他们强调了分层缓存一致性协议的主要特点，并给出了通过指令级模拟器获得的一些初步性能结果。

引用次数: 9

The data-parallel Ada run-time system, simulation and empirical results 数据并行Ada运行系统，仿真及实证结果

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262808

H. G. Mayer, Stefan Jähnichen

The Parallel Ada Run-Time System (PARTS), developed at TUB, is the target of an experimental translator that maps sequential Ada to a shared-memory multi-processor. Other modules of the parallel compiler are not explained. The paper summarizes the multi-processor run-time system; it explains those instructions that activate multiple processors leading to SPMD execution and discusses the scheduling policy Default architectural attributes of PARTS can be custom-tailored for each run without re-compile. The experiments exposed different machine personalities by measuring execution time profiles of the vector product run on different architectures. The goal is to find experimentally, how well a shared-memory architecture scales up to an increasing problem size, and how well the problem size scales up for a fixed multi-processor configuration. The measurements expose the advantages of shared-memory multi-processor architectures to exploit one dimension of parallelism. However, scalability is limited to the number of memory ports. Therefore another architectural dimension of parallelism, distributed-memory, must be combined with shared memories to achieve Tera-FLOP performance.<>

并行Ada运行时系统(PARTS)，由TUB开发，是一个实验性转换器的目标，它将顺序Ada映射到共享内存多处理器。并行编译器的其他模块没有解释。本文综述了多处理器运行时系统;它解释了那些激活导致SPMD执行的多个处理器的指令，并讨论了调度策略。PARTS的默认体系结构属性可以为每次运行定制，而无需重新编译。实验通过测量在不同架构上运行的向量积的执行时间曲线来暴露不同的机器特性。我们的目标是通过实验发现，共享内存体系结构在不断增加的问题规模中扩展得有多好，以及在固定的多处理器配置中问题规模扩展得有多好。这些测量揭示了共享内存多处理器架构在利用一维并行性方面的优势。但是，可伸缩性受限于内存端口的数量。因此，并行性的另一个架构维度，分布式内存，必须与共享内存相结合，以实现Tera-FLOP性能。

{"title":"The data-parallel Ada run-time system, simulation and empirical results","authors":"H. G. Mayer, Stefan Jähnichen","doi":"10.1109/IPPS.1993.262808","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262808","url":null,"abstract":"The Parallel Ada Run-Time System (PARTS), developed at TUB, is the target of an experimental translator that maps sequential Ada to a shared-memory multi-processor. Other modules of the parallel compiler are not explained. The paper summarizes the multi-processor run-time system; it explains those instructions that activate multiple processors leading to SPMD execution and discusses the scheduling policy Default architectural attributes of PARTS can be custom-tailored for each run without re-compile. The experiments exposed different machine personalities by measuring execution time profiles of the vector product run on different architectures. The goal is to find experimentally, how well a shared-memory architecture scales up to an increasing problem size, and how well the problem size scales up for a fixed multi-processor configuration. The measurements expose the advantages of shared-memory multi-processor architectures to exploit one dimension of parallelism. However, scalability is limited to the number of memory ports. Therefore another architectural dimension of parallelism, distributed-memory, must be combined with shared memories to achieve Tera-FLOP performance.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130467785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Dynamic embeddings of trees and quasi-grids into hyper-de Bruijn networks 树和准网格在超德布鲁因网络中的动态嵌入

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262823

Sabine R. Öhring, Sajal K. Das

This paper deals with optimal embeddings of various topologies into the hyper-de Bruijn network, which is a combination of the well known hypercube and the de Bruijn graph. In particular, the authors develop modular embeddings of complete binary trees and other tree-related graphs, and dynamic task allocation embeddings of dynamically evolving arbitrary binary trees. Additionally, an optimal embedding of butterflies and a subgraph-embedding of cube-connected cycles are presented. They also consider how to dynamically embed dynamically evolving grid-structures (so called quasi-grids) into hyper-de Bruijn networks. The results are important in mapping data and algorithm structures on multiprocessor networks.<>

本文研究了将各种拓扑最优嵌入到超立方体和德布鲁因图相结合的超德布鲁因网络中。特别地，作者开发了完全二叉树和其他树相关图的模块化嵌入，以及动态发展的任意二叉树的动态任务分配嵌入。此外，还提出了蝴蝶的最优嵌入和立方连接环的子图嵌入。他们还考虑了如何将动态演化的网格结构(所谓的准网格)动态嵌入到超德布鲁因网络中。这些结果对于在多处理器网络上映射数据和算法结构具有重要意义。

引用次数: 13

OCCAM prototyping of massively parallel applications from colored Petri-nets 彩色Petri-nets大规模并行应用的OCCAM原型

[1993] Proceedings Seventh International Parallel Processing Symposium

Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262772

F. Breant, Jean-François Peyre

The authors present a technique to build a massively parallel application from a formal description. They use the colored Petri-net formalism to model applications. This formalism allows them to concisely describe parallel applications. Theoretical results on this formalism contribute to proving the correctness of the description before implementation. Furthermore, they use some linear invariants to decompose the model into interacting state machines which are easy to implement. An important feature introduced consists in using color to map state machines and to distribute data and communication onto a formal architecture description.<>

作者提出了一种从形式化描述构建大规模并行应用程序的技术。他们使用彩色Petri-net形式化来为应用程序建模。这种形式使他们能够简洁地描述并行应用程序。关于这种形式的理论结果有助于证明在实现之前描述的正确性。此外，他们使用一些线性不变量将模型分解为易于实现的相互作用的状态机。引入的一个重要特性是使用颜色来映射状态机，并将数据和通信分发到正式的体系结构描述上

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

[1993] Proceedings Seventh International Parallel Processing Symposium

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀