Proceedings 11th International Parallel Processing Symposium最新文献

英文中文

A BSP approach to the scheduling of tightly-nested loops 紧嵌套循环调度的BSP方法

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580954

R. Calinescu

This paper addresses the scheduling of uniform-dependence loop nests within the framework of the bulk-synchronous parallel (BSP) model. Two broad classes of tightly-nested loops are identified in the paper and scheduled according to the BSP discipline, and the resulting schedules are analysed in terms of the BSP cost model.

本文研究了大容量同步并行(BSP)模型框架下的均匀依赖循环巢的调度问题。本文确定了两大类紧密嵌套循环，并根据BSP原则进行了调度，并根据BSP成本模型对所得到的调度进行了分析。

引用次数: 8

The sparse cyclic distribution against its dense counterparts 稀疏的循环分布相对于密集的循环分布

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580969

G. Bandera, M. Ujaldón, M. A. Trenas, E. Zapata

Several methods have been proposed in the literature for the distribution of data on distributed memory machines, either oriented to dense or sparse structures. Many of the real applications, however, deal with both kinds of data jointly. The paper presents techniques for integrating dense and sparse array accesses in a way that optimizes locality and further allows an efficient loop partitioning within a data-parallel compiler. The approach is evaluated through an experimental survey with several compilers and parallel platforms. The results prove the benefits of the BRS sparse distribution when combined with CYCLIC in mixed algorithms and the poor efficiency achieved by well-known distribution schemes when sparse elements arise in the source code.

文献中已经提出了几种在分布式存储机器上分布数据的方法，这些方法要么面向密集结构，要么面向稀疏结构。然而，许多实际应用程序联合处理这两种数据。本文提出了以优化局部性的方式集成密集和稀疏数组访问的技术，并进一步允许在数据并行编译器中进行有效的循环划分。通过几个编译器和并行平台的实验调查，对该方法进行了评估。结果证明了BRS稀疏分布与CYCLIC混合算法相结合的优点，以及当源代码中出现稀疏元素时，常用分布方案的效率较差。

引用次数: 6

A randomized sorting algorithm on the BSP model 基于BSP模型的随机排序算法

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580912

A. Gerbessiotis, Constantinos J. Siniolakis

The authors present a new randomized sorting algorithm on the bulk-synchronous parallel (BSP) model. The algorithm improves upon the parallel slack of previous algorithms to achieve optimality. Tighter probabilistic bounds are also established. It uses sample sorting and utilizes recently introduced search algorithms for a class of data structures on the BSP model. Moreover the methods are within a 1+o(1) multiplicative factor of the respective sequential methods in terms of speedup for a wide range of the BSP parameters.

提出了一种基于批量同步并行(BSP)模型的随机化排序算法。该算法对已有算法的并行松弛进行了改进，达到了最优性。还建立了更严格的概率界限。它使用样本排序，并利用最近引入的搜索算法对BSP模型上的一类数据结构。此外，在广泛的BSP参数范围内，这些方法的加速速度在各自顺序方法的1+ 0(1)倍因子内。

引用次数: 17

A parallel priority data structure with applications 具有应用程序的并行优先级数据结构

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580979

G. Brodal, J. Träff, C. Zaroliagis

Presents a parallel priority data structure that improves the running time of certain algorithms for problems that lack a fast and work-efficient parallel solution. As a main application, we give a parallel implementation of Dijkstra's (1959) algorithm which runs in O(n) time while performing O(m log n) work on a CREW PRAM. This is a logarithmic factor improvement for the running time compared with previous approaches. The main feature of our data structure is that the operations needed in each iteration of Dijkstra's algorithm can be supported in O(1) time.

提出了一种并行优先级数据结构，该结构改善了某些算法在缺乏快速高效并行解决方案的问题上的运行时间。作为主要应用，我们给出了Dijkstra(1959)算法的并行实现，该算法在CREW PRAM上运行O(n)时间，同时执行O(m log n)工作。与以前的方法相比，这是运行时间的对数因子改进。我们的数据结构的主要特征是Dijkstra算法每次迭代所需的操作可以在O(1)时间内得到支持。

引用次数: 25

O(log log n) time algorithms for Hamiltonian-suffix and min-max-pair heap operations on hypercube multicomputers 超立方体多计算机上的hamilton -suffix和min-max对堆操作的O(log log n)时间算法

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580947

Sajal K. Das, M. C. Pinotti

We present an efficient mapping of a min-max-pair heap of size N on a hypercube multicomputer of p processors in such a way the load on each processor's local memory is balanced and no additional communication overhead is incurred for implementation of the single insertion, deletemin and deletemax operations. Our novel approach is based on an optimal mapping of the paths of a binary heap into a hypercube such that in O(log N/p+log p) time we can compute the Hamiltonian-suffix, which is defined as a pipelined suffix-minima computation on an O(log N)length heap path embedded into the Hamiltonian path of the hypercube according to the binary reflected Gray codes. However the binary tree underlying the heap data structure is not altered by the mapping process.

我们在p个处理器的超立方体多计算机上提供了一个大小为N的最小-最大对堆的有效映射，这样每个处理器的本地内存上的负载是平衡的，并且在实现单个插入、deletemin和deletemax操作时不会产生额外的通信开销。我们的新方法是基于二进制堆到超立方体的路径的最优映射，这样在O(log N/p+log p)时间内我们可以计算哈密顿-后缀，这被定义为根据二进制反射Gray码在嵌入到超立方体哈密顿路径中的O(log N)长度的堆路径上的管道后缀最小计算。但是，映射过程不会改变堆数据结构底层的二叉树。

引用次数: 0

Empirical evaluation of distributed mutual exclusion algorithms 分布式互斥算法的实证评价

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580904

S. Fu, N. Tzeng, Zhiyuan Li

We evaluate various distributed mutual exclusion algorithms on the IBM SP2 machine and the Intel iPSC/860 system. The empirical results are compared in terms of such criteria as the number of message exchanges and the response time. Our results indicate that the Star algorithm (M.L. Neilsen and M. Mizuno, 1991) achieves the shortest response time in most cases among all the algorithms on a small to medium sized system, when processors request for the critical section many times before involving any barrier synchronization. On the other hand, if every processor enters the critical section only once before encountering a barrier, the improved Ring algorithm (S.S. Fu and N.-F. Tzeng, 1995) is found to outperform others under a heavy load; but the Star algorithm and the CSL algorithm (Y.I. Chang et al., 1990) prevail when the request rate becomes light. The best solution to mutual exclusion in distributed memory systems is determined by how participating sites generate their mutual exclusion requests.

我们在IBM SP2机器和Intel iPSC/860系统上评估了各种分布式互斥算法。根据消息交换次数和响应时间等标准对实证结果进行了比较。我们的结果表明，当处理器在涉及任何屏障同步之前多次请求临界段时，在大多数情况下，Star算法(M.L. Neilsen和m.m izuno, 1991)在中小型系统的所有算法中实现了最短的响应时间。另一方面，如果每个处理器在遇到障碍之前只进入临界区一次，则改进的Ring算法(S.S. Fu和n.n - f。Tzeng, 1995)被发现在重载下表现优于其他人;但当请求率变低时，采用Star算法和CSL算法(Y.I. Chang et al.， 1990)。分布式内存系统中互斥的最佳解决方案取决于参与站点如何生成互斥请求。

引用次数: 16

Evaluating the performance of software distributed shared memory as a target for parallelizing compilers 评估软件分布式共享内存作为并行编译器目标的性能

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580943

A. Cox, S. Dwarkadas, Honghui Lu, W. Zwaenepoel

In this paper we evaluate the use of software distributed shared memory (DSM) on a message passing machine as the target for a parallelizing compiler. We compare this approach to compiler-generated message passing, hand-coded software DSM and hand-coded message passing. For this comparison, we use six applications: four that are regular and two that are irregular: Our results are gathered on an 8-node IBM SP/2 using the TreadMarks software DSM system. We use the APR shared-memory (SPF) compiler to generate the shared memory-programs and the APR XHPF compiler to generate message passing programs. The hand-coded message passing programs run with the IBM PVMe optimized message passing library. On the regular programs, both the compiler-generated and the hand-coded message passing outperform the SPF/TreadMarks combination: the compiler-generated message passing by 5.5% to 40%, and the hand-coded message passing by 7.5% to 49%. On the irregular programs, the SPF/TreadMarks combination outperforms the compiler-generated message passing by 38% and 89%, and only slightly underperforms the hand-coded message passing, differing by 4.4% and 16%. We also identify the factors that account for the performance differences, estimate their relative importance, and describe methods to improve the performance.

在本文中，我们评估了在消息传递机上使用软件分布式共享内存(DSM)作为并行编译器的目标。我们将这种方法与编译器生成的消息传递、手工编码的软件DSM和手工编码的消息传递进行比较。为了进行比较，我们使用了六个应用程序:四个是规则的，两个是不规则的:我们的结果是使用TreadMarks软件DSM系统在8节点IBM SP/2上收集的。我们使用APR共享内存(SPF)编译器生成共享内存程序，使用APR XHPF编译器生成消息传递程序。手工编码的消息传递程序使用IBM PVMe优化的消息传递库运行。在常规程序中，编译器生成的消息传递和手工编码的消息传递都优于SPF/TreadMarks组合:编译器生成的消息传递比SPF/TreadMarks组合高5.5%到40%，手工编码的消息传递比SPF/TreadMarks组合高7.5%到49%。在不规则程序中，SPF/TreadMarks组合的性能比编译器生成的消息传递高出38%和89%，仅略低于手工编码的消息传递，相差4.4%和16%。我们还确定了导致性能差异的因素，估计了它们的相对重要性，并描述了提高性能的方法。

{"title":"Evaluating the performance of software distributed shared memory as a target for parallelizing compilers","authors":"A. Cox, S. Dwarkadas, Honghui Lu, W. Zwaenepoel","doi":"10.1109/IPPS.1997.580943","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580943","url":null,"abstract":"In this paper we evaluate the use of software distributed shared memory (DSM) on a message passing machine as the target for a parallelizing compiler. We compare this approach to compiler-generated message passing, hand-coded software DSM and hand-coded message passing. For this comparison, we use six applications: four that are regular and two that are irregular: Our results are gathered on an 8-node IBM SP/2 using the TreadMarks software DSM system. We use the APR shared-memory (SPF) compiler to generate the shared memory-programs and the APR XHPF compiler to generate message passing programs. The hand-coded message passing programs run with the IBM PVMe optimized message passing library. On the regular programs, both the compiler-generated and the hand-coded message passing outperform the SPF/TreadMarks combination: the compiler-generated message passing by 5.5% to 40%, and the hand-coded message passing by 7.5% to 49%. On the irregular programs, the SPF/TreadMarks combination outperforms the compiler-generated message passing by 38% and 89%, and only slightly underperforms the hand-coded message passing, differing by 4.4% and 16%. We also identify the factors that account for the performance differences, estimate their relative importance, and describe methods to improve the performance.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116837380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

Geometric data structures on a reconfigurable mesh, with applications 几何数据结构上的可重构网格，与应用程序

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580983

A. Datta

We present several geometric data structures and algorithms for problems for a planar set of rectangles and bipartitioning problems for a point set in two dimensions on a reconfigurable mesh of size n/spl times/n. The problems for rectangles include computing the measure, contour perimeter and maximum clique for the union of a set of rectangles. The bipartitioning problems for a two dimensional point set are solved in the L/sub /spl infin// and L/sub 1/ metrics. We solve all these problems in O(log n) time.

我们提出了几种几何数据结构和算法，用于解决尺寸为n/spl × /n的可重构网格上的平面矩形集问题和二维点集的双分区问题。矩形的问题包括计算一组矩形的测度、等高线周长和最大团。在L/sub /spl //和L/sub //度量中解决了二维点集的双分区问题。我们在O(log n)时间内解决了所有这些问题。

引用次数: 2

Adaptive fault-tolerant wormhole routing algorithms for hypercube and mesh interconnection networks 超立方体和网状互连网络的自适应容错虫洞路由算法

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580923

Jau-Der Shih

The author presents adaptive fault-tolerant deadlock-free routing algorithms for hypercubes and meshes by using only 3 virtual channels and 2 virtual channels respectively. Based on the concept of unsafe nodes, the author designs a routing algorithm for hypercubes that can tolerate at least n-1 node faults and can route a message via a path of length no more than the Hamming distance between the source and destination plus four. The author also develops a routing algorithm for meshes that can tolerate any block faults, as long as the distance between any two nodes in different faulty blocks is at least 2 in each dimension.

提出了超立方体和网格的自适应容错无死锁路由算法，分别使用3个虚拟通道和2个虚拟通道。基于不安全节点的概念，作者设计了一种超立方体的路由算法，该算法可以容忍至少n-1个节点的故障，并且可以通过长度不大于源和目的地之间的汉明距离加4的路径路由消息。作者还开发了一种可以容忍任何块故障的网格路由算法，只要不同故障块中任意两个节点之间的距离在每个维度上至少为2。

引用次数: 16

Wide-sense nonblocking Clos networks under packing strategy 包装策略下的广义非阻塞Clos网络

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580844

Yuanyuan Yang, Jianchao Wang

In this paper, we study wide-sense nonblocking conditions under packing strategy for the three-stage Clos network, or v(m, n, r) network. Wide-sense nonblocking networks are generally believed to have lower network cost than strictly nonblocking networks. However, the analysis for the wide-sense nonblocking conditions is usually more difficult. Moore proved that a v(m, n, 2) network is nonblocking under packing strategy if the number of middle stage switches m/spl ges/[/sup 3///sub 2/n]. This result has been widely cited in the literature, and is even considered as the wide-sense nonblocking condition under packing strategy for the general v(m, n, r) networks in some papers. In fact, it is still not known that whether the condition m/spl ges/[/sup 3///sub 2/n] holds for v(m, n, r) networks when r/spl ges/3. In this paper, we introduce a systematic approach to the analysis of wide-sense nonblocking conditions under packing strategy for general v(m, n, r) networks with any r values. We first translate the problem of finding the necessary and sufficient nonblocking conditions for v(m, n, r) networks to a set of linear programming problems. We then solve this special type of linear programming problems and obtain an elegant dosed form optimum solution. We prove that the necessary and sufficient condition for a v(m, n, r) network to be nonblocking under packing strategy is m/spl ges/[(2-1/F/sub 2r-1/)n] where F/sub 2r-1/ is the Fibonaaci number. We believe that the systematic approach developed in this paper can be used for analyzing other wide-sense nonblocking control strategies as well.

本文研究了三阶Clos网络或v(m, n, r)网络在填充策略下的广义非阻塞条件。广义非阻塞网络通常被认为比严格非阻塞网络具有更低的网络成本。然而,大范围的分析非阻塞条件通常是更加困难。Moore证明了在分组策略下，如果中间阶段交换机个数为m/spl /[/sup 3///sub 2/n]，则v(m, n, 2)网络是非阻塞的。这一结果在文献中被广泛引用，甚至在一些论文中被认为是一般v(m, n, r)网络在填充策略下的广义非阻塞条件。实际上，尚不清楚当r/spl ges/3时，条件m/spl ges/[/sup 3///sub 2/n]是否对v(m, n, r)网络成立。本文系统地分析了具有任意r值的一般v(m, n, r)网络在填充策略下的广义非阻塞条件。我们首先将寻找v(m, n, r)网络的充分必要非阻塞条件的问题转化为一组线性规划问题。然后我们对这类特殊的线性规划问题进行了求解，得到了一个优雅的剂量形式的最优解。证明了在分组策略下v(m, n, r)网络非阻塞的充分必要条件为m/spl ges/[(2-1/F/sub 2r-1/)n]，其中F/sub 2r-1/为斐波那契数。我们相信本文所建立的系统方法也可用于分析其他广义非阻塞控制策略。

{"title":"Wide-sense nonblocking Clos networks under packing strategy","authors":"Yuanyuan Yang, Jianchao Wang","doi":"10.1109/IPPS.1997.580844","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580844","url":null,"abstract":"In this paper, we study wide-sense nonblocking conditions under packing strategy for the three-stage Clos network, or v(m, n, r) network. Wide-sense nonblocking networks are generally believed to have lower network cost than strictly nonblocking networks. However, the analysis for the wide-sense nonblocking conditions is usually more difficult. Moore proved that a v(m, n, 2) network is nonblocking under packing strategy if the number of middle stage switches m/spl ges/[/sup 3///sub 2/n]. This result has been widely cited in the literature, and is even considered as the wide-sense nonblocking condition under packing strategy for the general v(m, n, r) networks in some papers. In fact, it is still not known that whether the condition m/spl ges/[/sup 3///sub 2/n] holds for v(m, n, r) networks when r/spl ges/3. In this paper, we introduce a systematic approach to the analysis of wide-sense nonblocking conditions under packing strategy for general v(m, n, r) networks with any r values. We first translate the problem of finding the necessary and sufficient nonblocking conditions for v(m, n, r) networks to a set of linear programming problems. We then solve this special type of linear programming problems and obtain an elegant dosed form optimum solution. We prove that the necessary and sufficient condition for a v(m, n, r) network to be nonblocking under packing strategy is m/spl ges/[(2-1/F/sub 2r-1/)n] where F/sub 2r-1/ is the Fibonaaci number. We believe that the systematic approach developed in this paper can be used for analyzing other wide-sense nonblocking control strategies as well.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122739940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings 11th International Parallel Processing Symposium

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀