首页 > 最新文献

[1993] Proceedings Seventh International Parallel Processing Symposium最新文献

英文 中文
Writing correct parallel programs 编写正确的并行程序
Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262807
K. Chandy
This paper explores the questions: Is writing correct parallel programs harder than writing correct sequential programs? If so, why? What can be done to help in developing reliable parallel programs?.<>
本文探讨了以下问题:编写正确的并行程序比编写正确的顺序程序更难吗?如果是,为什么?如何帮助开发可靠的并行程序?
{"title":"Writing correct parallel programs","authors":"K. Chandy","doi":"10.1109/IPPS.1993.262807","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262807","url":null,"abstract":"This paper explores the questions: Is writing correct parallel programs harder than writing correct sequential programs? If so, why? What can be done to help in developing reliable parallel programs?.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126555158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Maintaining bipartite matchings in the presence of failures 在存在故障的情况下保持二部分匹配
Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262856
E. Sha, K. Steiglitz
The authors present an on-line distributed reconfiguration algorithm for finding a new maximum matching incrementally after some nodes have failed. Their algorithm is deadlock free, and with k failures maintains at least M-k matching pairs during the reconfiguration process, where M is the size of the original maximum matching. The algorithm tolerates failures that occur during reconfiguration. The worst-case reconfiguration time is O(k min( mod A mod , mod B mod )) after k failures, where A and B are the node sets, but simulations show that the average-case reconfiguration time is much better. The algorithm is also simple enough to be implemented in hardware.<>
提出了一种在线分布式重构算法,用于在某些节点失效后,增量地寻找新的最大匹配。他们的算法是无死锁的,并且在k次失败的情况下,在重新配置过程中保持至少M-k对匹配,其中M是原始最大匹配的大小。该算法允许在重新配置过程中出现故障。在k次失败后,最坏情况下的重构时间为O(k min(A mod, B mod)),其中A和B为节点集,但仿真表明,平均情况下的重构时间要好得多。该算法也足够简单,可以在硬件上实现。
{"title":"Maintaining bipartite matchings in the presence of failures","authors":"E. Sha, K. Steiglitz","doi":"10.1109/IPPS.1993.262856","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262856","url":null,"abstract":"The authors present an on-line distributed reconfiguration algorithm for finding a new maximum matching incrementally after some nodes have failed. Their algorithm is deadlock free, and with k failures maintains at least M-k matching pairs during the reconfiguration process, where M is the size of the original maximum matching. The algorithm tolerates failures that occur during reconfiguration. The worst-case reconfiguration time is O(k min( mod A mod , mod B mod )) after k failures, where A and B are the node sets, but simulations show that the average-case reconfiguration time is much better. The algorithm is also simple enough to be implemented in hardware.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129309315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On the hierarchical hypercube interconnection network 在分层超立方体互连网络上
Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262822
Q. Malluhi, M. Bayoumi, T. Rao
The paper explores the hierarchical hypercube (HHC) interconnection network, suitable for building massively parallel systems with thousands of processors. HHC is self-embedded, that is, an HHC can embed HHCs of lower dimensions. In addition, HHC is a communication-efficient architecture. Two algorithms for data communication in the HHC are presented. The first algorithm is for one-to-one transfer and the second is for one-to-all broadcasting. Both algorithms take O(log k), where, k is the total number of processors in the system. Moreover, the paper shows that the HHC VLSI layout has a relatively small area which is O((log log k).k/sup 2//log k).<>
本文探讨了适用于构建具有数千个处理器的大规模并行系统的分层超立方体(HHC)互连网络。HHC是自嵌入的,即HHC可以嵌入低维的HHC。此外,HHC是一种通信高效的架构。提出了HHC中数据通信的两种算法。第一种算法用于一对一传输,第二种算法用于一对所有广播。两种算法都需要O(log k),其中k是系统中处理器的总数。此外,本文还表明,HHC VLSI布局的面积相对较小,为O((log log k).k/sup 2//log k).>
{"title":"On the hierarchical hypercube interconnection network","authors":"Q. Malluhi, M. Bayoumi, T. Rao","doi":"10.1109/IPPS.1993.262822","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262822","url":null,"abstract":"The paper explores the hierarchical hypercube (HHC) interconnection network, suitable for building massively parallel systems with thousands of processors. HHC is self-embedded, that is, an HHC can embed HHCs of lower dimensions. In addition, HHC is a communication-efficient architecture. Two algorithms for data communication in the HHC are presented. The first algorithm is for one-to-one transfer and the second is for one-to-all broadcasting. Both algorithms take O(log k), where, k is the total number of processors in the system. Moreover, the paper shows that the HHC VLSI layout has a relatively small area which is O((log log k).k/sup 2//log k).<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116036357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Fast algorithms for image labeling on a reconfigurable network of processors 基于可重构处理器网络的快速图像标记算法
Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262816
H. Alnuweiri
This paper presents constant-time algorithms for labeling the connected components of images on a network of processors with a wide reconfigurable bus. The algorithms are based on a processor indexing scheme which employs constant-weight codes. The use of such codes enables identifying a single representative processor for each component in a constant number of steps. The proposed algorithms can label an N*N image or an N-vertex graph in O(1) time using Theta (N/sup 2/) processors, which is optimal. Furthermore, the proposed techniques lead to O(log N/log log N)-time labeling algorithms on a network of N/sup 2/ processors with a reconfigurable bus of width O(log N) bits.<>
本文提出了在具有宽可重构总线的处理器网络上标记图像连接组件的恒时算法。该算法基于一种采用恒权码的处理器索引方案。使用这样的代码,可以在一定数量的步骤中为每个组件识别一个具有代表性的处理器。该算法使用Theta (N/sup 2/)处理器在O(1)时间内标记N*N图像或N顶点图,这是最优的。此外,所提出的技术导致在N/sup /处理器网络上的O(log N/log log N)时间标记算法,具有宽度为O(log N)位的可重构总线。
{"title":"Fast algorithms for image labeling on a reconfigurable network of processors","authors":"H. Alnuweiri","doi":"10.1109/IPPS.1993.262816","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262816","url":null,"abstract":"This paper presents constant-time algorithms for labeling the connected components of images on a network of processors with a wide reconfigurable bus. The algorithms are based on a processor indexing scheme which employs constant-weight codes. The use of such codes enables identifying a single representative processor for each component in a constant number of steps. The proposed algorithms can label an N*N image or an N-vertex graph in O(1) time using Theta (N/sup 2/) processors, which is optimal. Furthermore, the proposed techniques lead to O(log N/log log N)-time labeling algorithms on a network of N/sup 2/ processors with a reconfigurable bus of width O(log N) bits.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123586992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Strategies for mapping Lee's maze routing algorithm onto parallel architectures 将李的迷宫路径算法映射到并行架构的策略
Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262800
I. Yen, R. Dubash, F. Bastani
Lee's (1961) maze-routing algorithm has been a popular method for routing wires in VLSI circuits. It can also be applied to a variety of other problems, such as robot path planning. Although the algorithm is simple and easy to implement, its computation time can be quite high. Therefore, it is a very attractive candidate for implementation on parallel systems. The major issue in parallelizing this algorithm is mapping the grid space of the problem to the processor space. The communication cost and processor utilization can be greatly affected by the mapping strategy used. Won and Sahni (1987) have studied a class of mapping strategies for Lee's algorithm and analyzed their performance. The authors propose two new mapping strategies. First, they modify Won and Sahni's mapping algorithm by using the concept of mirror images to allow higher processor utilization while reducing the number of boundary cells. The new algorithm is shown to be better than the original one in an obstacle-free grid space. Then, they propose a dynamic mapping algorithm. This new mapping algorithm is shown to give an optimal mapping in an obstacle-free grid space. Also, they performed simulation to study the relative performance of these mapping algorithms for grid spaces with obstacles. The results show that the new algorithms are substantially faster than the earlier ones.<>
Lee(1961)的迷宫路由算法已成为VLSI电路中布线的流行方法。它也可以应用于各种其他问题,如机器人路径规划。虽然该算法简单且易于实现,但其计算时间可能相当高。因此,它是在并行系统上实现的一个非常有吸引力的候选对象。并行化该算法的主要问题是将问题的网格空间映射到处理器空间。所使用的映射策略会极大地影响通信成本和处理器利用率。Won和Sahni(1987)研究了Lee算法的一类映射策略,并分析了它们的性能。作者提出了两种新的映射策略。首先,他们利用镜像的概念修改了Won和Sahni的映射算法,以提高处理器利用率,同时减少边界单元的数量。在无障碍物网格空间中,新算法优于原算法。然后,他们提出了一种动态映射算法。该算法能在无障碍网格空间中给出最优映射。此外,他们还进行了模拟,研究了这些映射算法在有障碍物的网格空间中的相对性能。结果表明,新算法比以前的算法要快得多。
{"title":"Strategies for mapping Lee's maze routing algorithm onto parallel architectures","authors":"I. Yen, R. Dubash, F. Bastani","doi":"10.1109/IPPS.1993.262800","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262800","url":null,"abstract":"Lee's (1961) maze-routing algorithm has been a popular method for routing wires in VLSI circuits. It can also be applied to a variety of other problems, such as robot path planning. Although the algorithm is simple and easy to implement, its computation time can be quite high. Therefore, it is a very attractive candidate for implementation on parallel systems. The major issue in parallelizing this algorithm is mapping the grid space of the problem to the processor space. The communication cost and processor utilization can be greatly affected by the mapping strategy used. Won and Sahni (1987) have studied a class of mapping strategies for Lee's algorithm and analyzed their performance. The authors propose two new mapping strategies. First, they modify Won and Sahni's mapping algorithm by using the concept of mirror images to allow higher processor utilization while reducing the number of boundary cells. The new algorithm is shown to be better than the original one in an obstacle-free grid space. Then, they propose a dynamic mapping algorithm. This new mapping algorithm is shown to give an optimal mapping in an obstacle-free grid space. Also, they performed simulation to study the relative performance of these mapping algorithms for grid spaces with obstacles. The results show that the new algorithms are substantially faster than the earlier ones.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124037251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
VLSI architectures for depth estimation using intensity gradient analysis 使用强度梯度分析深度估计的VLSI架构
Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262795
R. Sastry, N. Ranganathan, R. Jain
Depth recovery from grey-scale images is an important topic in the field of computer and robot vision. Intensity gradient analysis (IGA) is a robust technique for inferring depth information from a sequence of images acquired by a sensor undergoing translational motion. IGA obviates the need for explicitly solving the correspondence problem and hence is an efficient technique for range estimation. The design of special purpose hardware could significantly speed up the computations in IGA, which is a computationally intensive task. The authors propose two VLSI architectures for high-speed range estimation based on IGA. The architectures fully utilize the principles of pipelining and parallelism in order to obtain high speed and throughput. The designs are conceptually simple and suitable for implementation in VLSI.<>
灰度图像的深度恢复是计算机和机器人视觉领域的一个重要课题。强度梯度分析(IGA)是一种强大的技术,用于从传感器进行平移运动获得的图像序列中推断深度信息。IGA避免了显式求解对应问题的需要,因此是一种有效的距离估计技术。IGA是一项计算量大的任务,而专用硬件的设计可以显著提高计算速度。作者提出了两种基于IGA的高速距离估计VLSI架构。该体系结构充分利用了流水线和并行的原理,以获得高速度和高吞吐量。该设计概念简单,适合在VLSI中实现。
{"title":"VLSI architectures for depth estimation using intensity gradient analysis","authors":"R. Sastry, N. Ranganathan, R. Jain","doi":"10.1109/IPPS.1993.262795","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262795","url":null,"abstract":"Depth recovery from grey-scale images is an important topic in the field of computer and robot vision. Intensity gradient analysis (IGA) is a robust technique for inferring depth information from a sequence of images acquired by a sensor undergoing translational motion. IGA obviates the need for explicitly solving the correspondence problem and hence is an efficient technique for range estimation. The design of special purpose hardware could significantly speed up the computations in IGA, which is a computationally intensive task. The authors propose two VLSI architectures for high-speed range estimation based on IGA. The architectures fully utilize the principles of pipelining and parallelism in order to obtain high speed and throughput. The designs are conceptually simple and suitable for implementation in VLSI.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124206879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mapping a class of run-time dependencies onto regular arrays 将一类运行时依赖项映射到常规数组
Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262863
G. Megson
The production of regular computations using algorithmic engineering techniques is beginning to play an important role in the synthesis of massively parallel and VLSI processor arrays. The author widens the class of algorithms that can be formally synthesized by introducing a mapping theorem for a class of algorithms with run-time dependencies. The technique is illustrated by deriving uniform recurrences for the so-called knapsack problem, the resulting systolic array is known to be optimal.<>
使用算法工程技术生成规则计算在大规模并行和超大规模集成电路处理器阵列的合成中开始发挥重要作用。作者通过引入具有运行时依赖性的一类算法的映射定理,扩大了可形式化合成的算法的类别。该技术通过为所谓的背包问题推导一致递归来说明,由此产生的收缩阵列已知是最优的。
{"title":"Mapping a class of run-time dependencies onto regular arrays","authors":"G. Megson","doi":"10.1109/IPPS.1993.262863","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262863","url":null,"abstract":"The production of regular computations using algorithmic engineering techniques is beginning to play an important role in the synthesis of massively parallel and VLSI processor arrays. The author widens the class of algorithms that can be formally synthesized by introducing a mapping theorem for a class of algorithms with run-time dependencies. The technique is illustrated by deriving uniform recurrences for the so-called knapsack problem, the resulting systolic array is known to be optimal.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122791193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Approximate parallel prefix computation and its applications 近似并行前缀计算及其应用
Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262899
M. Goodrich, Yossi Matias, U. Vishkin
The authors address two fundamental problems in parallel algorithm design-parallel prefix sums and integer sorting-and show that both of them can be approximately solved very quickly on a randomized CRCW PRAM. In the case of prefix sums the approximation is in terms of the accuracy of the sums and in the case of integer sorting it is in terms of allowing some gaps between consecutive elements in the ordered list. By introducing approximation in these ways the authors are able to solve these problems in o(lg lg n) time, and thus avoid the near-logarithmic lower bounds by Beame and Hastad that hold for the exact versions of these problems. Nevertheless, they demonstrate that these approximations are strong enough to be used as subroutines in fast randomized algorithms for some well-known problems in parallel computational geometry. Perhaps the most succinct way to describe the power of the new tools which are presented is by observing that prior to this work it was known how to solve the interval allocation problem fast. The authors show how to solve the ordered version of the problem.<>
本文讨论了并行算法设计中的两个基本问题——并行前缀和和整数排序,并证明了这两个问题在随机化的CRCW PRAM上都可以快速近似求解。在前缀和的情况下,近似是根据和的准确性,而在整数排序的情况下,它是根据允许有序列表中连续元素之间的一些间隙。通过以这些方式引入近似,作者能够在o(lglgn)时间内解决这些问题,从而避免了Beame和hasad对这些问题的精确版本所采用的近对数下界。然而,他们证明了这些近似足够强大,可以作为快速随机算法的子程序用于并行计算几何中一些众所周知的问题。也许描述这些新工具威力的最简洁的方式是观察一下,在此工作之前,人们已经知道如何快速解决间隔分配问题。作者展示了如何解决这个问题的有序版本。
{"title":"Approximate parallel prefix computation and its applications","authors":"M. Goodrich, Yossi Matias, U. Vishkin","doi":"10.1109/IPPS.1993.262899","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262899","url":null,"abstract":"The authors address two fundamental problems in parallel algorithm design-parallel prefix sums and integer sorting-and show that both of them can be approximately solved very quickly on a randomized CRCW PRAM. In the case of prefix sums the approximation is in terms of the accuracy of the sums and in the case of integer sorting it is in terms of allowing some gaps between consecutive elements in the ordered list. By introducing approximation in these ways the authors are able to solve these problems in o(lg lg n) time, and thus avoid the near-logarithmic lower bounds by Beame and Hastad that hold for the exact versions of these problems. Nevertheless, they demonstrate that these approximations are strong enough to be used as subroutines in fast randomized algorithms for some well-known problems in parallel computational geometry. Perhaps the most succinct way to describe the power of the new tools which are presented is by observing that prior to this work it was known how to solve the interval allocation problem fast. The authors show how to solve the ordered version of the problem.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125634331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Efficient parallel mappings of a dynamic programming algorithm: a summary of results 一种动态规划算法的高效并行映射:结果摘要
Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262817
G. Karypis, Vipin Kumar
The authors are concerned with dynamic programming (DP) algorithms whose solution is given by a recurrence relation similar to that for the matrix parenthesization problem. Guibas, Kung and Thompson (1979), presented a systolic array algorithm for this problem that uses O(n/sup 2/) processing cells and solves the problem in O(n) time. The authors present three different mappings of this systolic algorithm on a mesh connected parallel computer. The first two mappings use commonly known techniques for mapping systolic arrays to mesh computers. Both of them are able to obtain only a fraction of maximum possible performance. The primary reason for the poor performance of these formulations is that different nodes at different levels in the multistage graph in the DP formulation require different amounts of computation. Any adaptation has to take this into consideration and evenly distribute the work among the processors. The third mapping balances the work load among processors and thus is capable of providing efficiency approximately equal to 1 (i.e., speedup approximately equal to the number of processors) for any number of processors and sufficiently large problem. They experimentally evaluate these mappings on a mesh embedded onto a 256 processor nCUBE/2.<>
本文研究一类动态规划算法,其解由类似于矩阵括号问题的递归关系给出。gu, Kung和Thompson(1979)针对该问题提出了一种收缩阵列算法,该算法使用O(n/sup 2/)个处理单元,在O(n)时间内解决问题。作者在网格连接的并行计算机上给出了该收缩算法的三种不同映射。前两种映射使用了将收缩数组映射到网格计算机的常用技术。它们都只能获得最大可能性能的一小部分。这些公式性能差的主要原因是DP公式中多阶段图中不同级别的不同节点需要不同的计算量。任何调整都必须考虑到这一点,并在处理器之间均匀地分配工作。第三个映射平衡了处理器之间的工作负载,因此能够为任何数量的处理器和足够大的问题提供大约等于1的效率(即,加速大约等于处理器的数量)。他们通过实验评估了嵌入到256处理器nCUBE/2上的网格上的这些映射。
{"title":"Efficient parallel mappings of a dynamic programming algorithm: a summary of results","authors":"G. Karypis, Vipin Kumar","doi":"10.1109/IPPS.1993.262817","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262817","url":null,"abstract":"The authors are concerned with dynamic programming (DP) algorithms whose solution is given by a recurrence relation similar to that for the matrix parenthesization problem. Guibas, Kung and Thompson (1979), presented a systolic array algorithm for this problem that uses O(n/sup 2/) processing cells and solves the problem in O(n) time. The authors present three different mappings of this systolic algorithm on a mesh connected parallel computer. The first two mappings use commonly known techniques for mapping systolic arrays to mesh computers. Both of them are able to obtain only a fraction of maximum possible performance. The primary reason for the poor performance of these formulations is that different nodes at different levels in the multistage graph in the DP formulation require different amounts of computation. Any adaptation has to take this into consideration and evenly distribute the work among the processors. The third mapping balances the work load among processors and thus is capable of providing efficiency approximately equal to 1 (i.e., speedup approximately equal to the number of processors) for any number of processors and sufficiently large problem. They experimentally evaluate these mappings on a mesh embedded onto a 256 processor nCUBE/2.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127184420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Linked list cache coherence for scalable shared memory multiprocessors 可扩展共享内存多处理器的链表缓存一致性
Pub Date : 1993-04-13 DOI: 10.1109/IPPS.1993.262852
M. Thapar, B. Delagi, M. Flynn
This paper presents a singly-linked distributed directory (SDD) cache coherence protocol and compares the performance of the SDD protocol with the fully mapped centralized directory protocol and the IEEE SCI Standard protocol. To maintain coherence, the SDD protocol uses a linked list of cache lines that contain shared copies of the data. The protocol has scalable cost. Coherency related messages are not required to be delivered in order, thus allowing adaptive routing, making the performance more robust in the presence of congested networks. The authors analysis shows that the SDD protocol has generally better performance in the presence of memory and interconnect contention. They discuss the various factors, such as memory reference behavior and interconnect traffic, that affect the performance of these protocols.<>
提出了一种单链分布式目录(SDD)缓存一致性协议,并将其与全映射集中式目录协议和IEEE SCI标准协议的性能进行了比较。为了保持一致性,SDD协议使用包含数据共享副本的缓存线路链表。该协议具有可扩展的成本。与一致性相关的消息不需要按顺序传递,因此允许自适应路由,使性能在拥塞网络中更加健壮。分析表明,在存在内存和互连争用的情况下,SDD协议通常具有更好的性能。他们讨论了影响这些协议性能的各种因素,如内存引用行为和互连流量。
{"title":"Linked list cache coherence for scalable shared memory multiprocessors","authors":"M. Thapar, B. Delagi, M. Flynn","doi":"10.1109/IPPS.1993.262852","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262852","url":null,"abstract":"This paper presents a singly-linked distributed directory (SDD) cache coherence protocol and compares the performance of the SDD protocol with the fully mapped centralized directory protocol and the IEEE SCI Standard protocol. To maintain coherence, the SDD protocol uses a linked list of cache lines that contain shared copies of the data. The protocol has scalable cost. Coherency related messages are not required to be delivered in order, thus allowing adaptive routing, making the performance more robust in the presence of congested networks. The authors analysis shows that the SDD protocol has generally better performance in the presence of memory and interconnect contention. They discuss the various factors, such as memory reference behavior and interconnect traffic, that affect the performance of these protocols.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126804060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
期刊
[1993] Proceedings Seventh International Parallel Processing Symposium
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1