Proceedings 11th International Parallel Processing Symposium最新文献

英文中文

Platform-independent runtime optimizations using OpenThreads 使用OpenThreads进行与平台无关的运行时优化

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580941

M. Haines, K. Langendoen

Although platform-independent runtime systems for parallel programming languages are desirable, the need for low-level optimizations usually precludes their existence. This is because most optimizations involve some combination of low-level communication and low-level threading the product of which is almost always platform-dependent. We propose a solution to the threading half of this dilemma by using a thread package, that allows fine-grain control over the behaviour of the threads while still providing performance comparable to hand-tuned, machine-dependent thread packages. This makes it possible to construct platform-independent thread modules for parallel runtime systems and, more importantly, to optimize them.

尽管并行编程语言需要独立于平台的运行时系统，但由于需要低级优化，它们通常不存在。这是因为大多数优化涉及低级通信和低级线程的某种组合，其产物几乎总是依赖于平台。我们提出了一个解决方案，通过使用线程包来解决这个困境的线程部分，它允许对线程的行为进行细粒度控制，同时仍然提供与手动调优的、依赖于机器的线程包相当的性能。这使得为并行运行时系统构建与平台无关的线程模块成为可能，更重要的是，可以对它们进行优化。

引用次数: 12

S-Check: a tool for tuning parallel programs S-Check:用于调整并行程序的工具

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580861

R. Snelick

We present a novel tool, called S-Check, for identifying performance bottlenecks in parallel and networked programs. S-Check is a highly-automated sensitivity analysis tool for programs that extends benchmarking and conventional profiling. It predicts how refinements in parts of a program are going to affect performance by making focal changes in code efficiencies and correlating these against overall program performance. This analysis is a sophisticated comparison that catches interactions arising from shared resources or communication links. S-Check's performance assessment ranks code segments "bottleneck" according to their sensitivity to the code efficiency changes. This rank-ordered list serves as a guide for tuning applications. In practice, S-Check code analysis yields faster parallel programs. A case study compares and contrasts sensitivity analyses of the same program on different architectures and offers solutions for performance improvement. An initial implementation of S-Check runs on Silicon Graphics multiprocessors and IBM SP machines. Particulars of the underlying methodology are only sketched with main emphasis given to details of the tool S-Check and its use.

我们提出了一种新的工具，称为S-Check，用于识别并行和网络程序中的性能瓶颈。S-Check是一种高度自动化的灵敏度分析工具，用于扩展基准测试和常规分析。它通过对代码效率进行重点更改，并将这些更改与整个程序性能相关联，预测程序部分的改进将如何影响性能。这种分析是一种复杂的比较，可以捕获来自共享资源或通信链接的交互。S-Check的性能评估根据代码段对代码效率变化的敏感性将其列为“瓶颈”。这个按顺序排列的列表可以作为调优应用程序的指南。在实践中，S-Check代码分析产生更快的并行程序。一个案例研究比较和对比了同一程序在不同体系结构上的敏感性分析，并提供了性能改进的解决方案。S-Check的初始实现运行在Silicon Graphics多处理器和IBM SP机器上。本文只概述了基本方法的细节，主要强调了S-Check工具及其使用的细节。

{"title":"S-Check: a tool for tuning parallel programs","authors":"R. Snelick","doi":"10.1109/IPPS.1997.580861","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580861","url":null,"abstract":"We present a novel tool, called S-Check, for identifying performance bottlenecks in parallel and networked programs. S-Check is a highly-automated sensitivity analysis tool for programs that extends benchmarking and conventional profiling. It predicts how refinements in parts of a program are going to affect performance by making focal changes in code efficiencies and correlating these against overall program performance. This analysis is a sophisticated comparison that catches interactions arising from shared resources or communication links. S-Check's performance assessment ranks code segments \"bottleneck\" according to their sensitivity to the code efficiency changes. This rank-ordered list serves as a guide for tuning applications. In practice, S-Check code analysis yields faster parallel programs. A case study compares and contrasts sensitivity analyses of the same program on different architectures and offers solutions for performance improvement. An initial implementation of S-Check runs on Silicon Graphics multiprocessors and IBM SP machines. Particulars of the underlying methodology are only sketched with main emphasis given to details of the tool S-Check and its use.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132157255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

A comparison of parallel approaches for algebraic factorization in logic synthesis 逻辑综合中代数分解并行方法的比较

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580973

Subhasish Subhasish, P. Banerjee

Algebraic factorization is an extremely important part of any logic synthesis system, but it is computationally expensive. Hence, it is important to look at parallel processing to speed up the procedure. This paper presents three different parallel algorithms for algebraic factorization. The first algorithm uses circuit replication and uses a divide-and-conquer strategy. A second algorithm uses totally independent factorization on different circuit partitions with no interactions among the partitions. A third algorithm represents a compromise between the two approaches. It uses a novel L-shaped partitioning strategy which provides some interaction among the rectangles obtained in various partitions. For a large circuit like ex1010, the last algorithm runs 11.5 times faster over the sequential kernel extraction algorithms of the SIS sequential circuit synthesis system on six processors with less than 0.2% degradation in quality of the results.

代数分解是任何逻辑综合系统的一个极其重要的部分，但它是计算昂贵的。因此，考虑并行处理以加快过程是很重要的。本文提出了三种不同的代数分解并行算法。第一种算法使用电路复制和分而治之策略。第二种算法在不同的电路分区上使用完全独立的分解，分区之间没有相互作用。第三种算法是两种方法之间的折衷。它采用了一种新颖的l形分区策略，在不同分区中得到的矩形之间提供了一定的相互作用。对于像ex1010这样的大型电路，最后一种算法在6个处理器上比SIS顺序电路合成系统的顺序核提取算法快11.5倍，结果质量下降不到0.2%。

引用次数: 3

Analysis of several scheduling algorithms under the nano-threads programming model 纳米线程编程模型下几种调度算法的分析

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580909

X. Martorell, Jesús Labarta, N. Navarro, E. Ayguadé

The authors present the analysis, in a dynamic processor allocation environment, of four scheduling algorithms running on top of the nano-threads programming model. Three of them are well-known: uniform-sized chunking, guided self-scheduling and trapezoid self-scheduling. The fourth is their proposal: adaptable size chunking. In that environment, applications are automatically decomposed into tasks by a parallelizing compiler which uses the hierarchical task graph to represent the source application. The parallel code is an executable representation of this graph with the support of a user-level library (the nano-threads library). The execution environment includes a user-level process (CPU manager) which controls the allocation of processors to applications. The analysis of the scheduling algorithms shows it is possible to provide enough information to the library to allow a fast adaptation to dynamic changes in the processors allocated to the application.

在动态处理器分配环境下，作者分析了运行在纳米线程编程模型之上的四种调度算法。其中三个是众所周知的:均匀大小的分块，引导自调度和梯形自调度。第四个是他们的建议:适应性大小分块。在这种环境中，应用程序被并行化编译器自动分解为任务，并行化编译器使用分层任务图表示源应用程序。并行代码是这个图的可执行表示形式，它有用户级库(纳米线程库)的支持。执行环境包括一个用户级进程(CPU管理器)，它控制向应用程序分配处理器。对调度算法的分析表明，可以向库提供足够的信息，以便快速适应分配给应用程序的处理器的动态变化。

引用次数: 4

Optimal wormhole routing in the (n,d)-torus (n,d)环面的最优虫洞路径

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580921

Stefan Bock, F. Heide, C. Scheideler

The authors consider wormhole routing in a d-dimensional torus of side length n. In particular they present an optimal randomized algorithm for routing worms of length up to O(n/(d log n)/sup 2/), one per node, to random destinations. Previous algorithms only work optimally for two dimensions, or are a factor of log n away from the optimal running time. As a by-product they develop an algorithm for the 2-dimensional torus that guarantees an optimal runtime for worms of length up to O(n/(log n)/sup 2/) with much higher probability than all previous algorithms.

作者考虑了边长为n的d维环面的虫洞路由。特别是，他们提出了一种最优随机算法，用于将长度为O(n/(d log n)/sup 2/)的蠕虫路由到随机目的地，每个节点一个。以前的算法只能在两个维度上最优地工作，或者与最优运行时间相差log n倍。作为一个副产品，他们为二维环面开发了一种算法，该算法保证了长度为O(n/(log n)/sup 2/)的蠕虫的最佳运行时间，其概率比所有以前的算法都高得多。

引用次数: 1

Extensible message passing application development and debugging with Python 使用Python开发和调试可扩展的消息传递应用程序

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580971

D. Beazley, P. Lomdahl

The authors describe how they have parallelized Python, an interpreted object oriented scripting language, and used it to build an extensible message-passing molecular dynamics application for the CM-5, Cray T3D, and Sun multiprocessor servers running MPI. This allows one to interact with large-scale message-passing applications, rapidly prototype new features, and perform application specific debugging. It is even possible to write message passing programs in Python itself. They describe some of the tools they have developed to extend Python and results of this approach.

作者描述了他们如何并行化Python(一种面向对象的解释性脚本语言)，并使用它为运行MPI的CM-5、Cray T3D和Sun多处理器服务器构建可扩展的消息传递分子动力学应用程序。这样就可以与大规模的消息传递应用程序进行交互，快速创建新特性的原型，并执行特定于应用程序的调试。甚至可以用Python本身编写消息传递程序。他们描述了他们开发的一些扩展Python的工具和这种方法的结果。

引用次数: 15

Modeling compiled communication costs in multiplexed optical networks 对多路光网络的通信成本进行建模

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580850

C. Salisbury, R. Melhem

Improvements in optical technology will enable the construction of high bandwidth, low latency switching networks. These networks have many applications in massively parallel processing. However current circuit switching and packet switching techniques are not quite suitable for controlling such networks. Time division multiplexing (TDM) schemes can improve the performance of circuit switched optical interconnection networks by taking advantage of the locality of references present in the communication patterns. In this paper we construct a model for the cost of compiled communications in circuit switched networks. We show how the cost is affected by the characteristics of the network and by the application's communication locality of references. We show how a compiler can use this information to choose the most appropriate multiplexing degree.

光学技术的改进将使高带宽、低延迟交换网络的建设成为可能。这些网络在大规模并行处理中有许多应用。但是目前的电路交换和分组交换技术还不太适合控制这种网络。时分复用(TDM)方案利用通信模式中引用的局部性，提高了电路交换光互连网络的性能。本文建立了电路交换网络中编译通信开销的模型。我们展示了成本是如何受到网络特性和应用程序的通信引用位置的影响的。我们将展示编译器如何使用这些信息来选择最合适的多路复用程度。

引用次数: 3

Efficient sorting and routing on reconfigurable meshes using restricted bus length 使用限制总线长度的可重构网格的有效排序和路由

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580985

M. Kunde, Kay Guertzig

Sorting and balanced routing problems for synchronous mesh-like processor networks with reconfigurable buses are considered. Induced by the argument that broadcasting along buses of arbitrary length within unit time seems rather non-realistic, we consider basic problems on reconfigurable meshes that can be solved efficiently even with restricted bus length. It is shown that on r-dimensional reconfigurable meshes of side length n with bus length bounded to a constant l the h-h sorting and routing problem can be solved within hn+o(hrn) steps in any case and in hn/2+o(hrn) steps with high probability, provided that hl/spl ges/4r. This result is due to a data concentration method that is explained in the paper and it will hold even for certain very light loadings, i.e. with significantly less than one elements per processor on average. Extensions to two-dimensional reconfigurable meshes with diagonal links are considered.

研究了具有可重构总线的同步类网格处理器网络的排序和均衡路由问题。由于在单位时间内沿任意长度的总线广播似乎不太现实，我们考虑了即使在有限的总线长度下也能有效解决的可重构网格的基本问题。结果表明，在边长为n且总线长度为常数l的r维可重构网格上，h-h排序和路由问题在任何情况下都可以在hn+o(hrn)步内得到解决，并且在hn/2+o(hrn)步内得到高概率的解决，只要hl/spl为/4r。这个结果是由于在论文中解释的数据集中方法，它甚至可以在某些非常轻的负载下保持，即每个处理器平均少于一个元素。研究了具有对角链接的二维可重构网格的扩展。

引用次数: 8

SuperWeb: towards a global Web-based parallel computing infrastructure SuperWeb:迈向全球性的基于web的并行计算基础设施

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580858

K. Schauser, C. Scheiman, G. Park, B. Shirazi, J. Marquis

The Internet, best known by most users as the World-Wide-Web, continues to expand at an amazing pace. We propose a new infrastructure to harness the combined resources, such as CPU cycles or disk storage, and make them available to everyone interested. This infrastructure has the potential for solving parallel supercomputing applications involving thousands of cooperating components. Our approach is based on recent advances in Internet connectivity and the implementation of safe distributed computing embodied in languages such as Java. We developed a prototype of a global computing infrastructure, called SuperWeb, that consists of hosts, brokers and clients. Hosts register a fraction of their computing resources (CPU time, memory, bandwidth, disk space) with resource brokers. Client computations are then mapped by the broker onto the registered resources. We examine an economic model for trading computing resources, and discuss several technical challenges associated with such a global computing environment.

互联网，大多数用户最熟悉的是万维网，继续以惊人的速度扩展。我们提出了一种新的基础设施来利用组合资源，例如CPU周期或磁盘存储，并使每个感兴趣的人都可以使用它们。这种基础设施具有解决涉及数千个协作组件的并行超级计算应用程序的潜力。我们的方法是基于Internet连接的最新进展和安全分布式计算的实现，这些计算体现在Java等语言中。我们开发了一个全球计算基础设施的原型，叫做SuperWeb，它由主机、代理和客户端组成。主机向资源代理注册它们的一小部分计算资源(CPU时间、内存、带宽、磁盘空间)。然后，代理将客户机计算映射到注册的资源上。我们研究了交易计算资源的经济模型，并讨论了与这种全球计算环境相关的几个技术挑战。

引用次数: 58

Optimal scheduling for UET-UCT generalized n-dimensional grid task graphs ut - uct广义n维网格任务图的优化调度

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580872

T. Andronikos, N. Koziris, G. Papakonstantinou, P. Tsanakas

The n-dimensional grid is one of the most representative patterns of data flow in parallel computation. The most frequently used scheduling models for grids is the unit execution-unit communication time (UET-UCT). We enhance the model of n-dimensional grid by adding extra diagonal edges. First, we calculate the optimal makespan for the generalized UET-UCT grid topology and then we establish the minimum number of processors required, to achieve the optimal makespan. Furthermore, we solve the scheduling problem for generalized n-dimensional grids by proposing an optimal time and space scheduling strategy. We thus prove that UET-UCT scheduling of generalized n-dimensional grids is low complexity tractable.

n维网格是并行计算中最具代表性的数据流模式之一。最常用的网格调度模型是单元执行-单元通信时间(UET-UCT)。我们通过增加额外的对角线来增强n维网格模型。首先，我们计算了广义UET-UCT网格拓扑的最优最大跨度，然后我们建立了实现最优最大跨度所需的最小处理器数量。在此基础上，提出了一种最优的时间和空间调度策略，解决了广义n维网格的调度问题。由此证明了广义n维网格的ut - uct调度具有低复杂度可处理性。

引用次数: 31

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings 11th International Parallel Processing Symposium

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀