Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223064
Xiaoxiong Zhong, S. Rajopadhye, V. Lo
Studies the problem of parallel implementation of divide-and-conquer algorithms on binary de Bruijn network using a temporal binomial tree (rather than the usual binary tree) computation structure. Two cases of message volumes are considered: (i) uniform, and (ii) logarithmically decreasing (increasing) weights. A single mapping is proposed for both cases. It has average extra dilation 1 and is communication link contention-free. A lower bound for the total extra dilation of any mapping from uniform-weighted binomial tree to an arbitrary degree-4 network is also developed to show that the mapping is asymptotically optimal with respective to the average extra dilation. The implementation is well suited to a binary de Bruijn network with a wormhole or circuit switching communication scheme.<>
{"title":"Parallel implementation of divide-and-conquer algorithms on binary de Bruijn networks","authors":"Xiaoxiong Zhong, S. Rajopadhye, V. Lo","doi":"10.1109/IPPS.1992.223064","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223064","url":null,"abstract":"Studies the problem of parallel implementation of divide-and-conquer algorithms on binary de Bruijn network using a temporal binomial tree (rather than the usual binary tree) computation structure. Two cases of message volumes are considered: (i) uniform, and (ii) logarithmically decreasing (increasing) weights. A single mapping is proposed for both cases. It has average extra dilation 1 and is communication link contention-free. A lower bound for the total extra dilation of any mapping from uniform-weighted binomial tree to an arbitrary degree-4 network is also developed to show that the mapping is asymptotically optimal with respective to the average extra dilation. The implementation is well suited to a binary de Bruijn network with a wormhole or circuit switching communication scheme.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115864167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222994
P. Berman, A. Bharali
The Distributed consensus problem assumes that all processors in the system have some initial values; the goal is to make all non-faulty processors agree on one of these values. This paper investigates the time needed to reach consensus in a partially synchronous model with omission failures. In this model, the processors have no direct knowledge about time, but the time between consecutive steps of each processor is always between two known constants c/sub 1/ and c/sub 2/; the ratio C=/sup c2///sub c1/ measures the timing uncertainty in the system. Moreover, messages are delivered within time d. This paper provides an improved protocol for the above problem. When the majority of the processors are fault-free, the protocol achieves consensus in time 3( phi +1)d+Cd, where phi is the actual number of faults in a specific execution of the protocol. This allows an increase in efficiency up to 25% over the existing protocol which requires time 4( phi +1)d+Cd.<>
{"title":"Distributed consensus in semi-synchronous systems","authors":"P. Berman, A. Bharali","doi":"10.1109/IPPS.1992.222994","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222994","url":null,"abstract":"The Distributed consensus problem assumes that all processors in the system have some initial values; the goal is to make all non-faulty processors agree on one of these values. This paper investigates the time needed to reach consensus in a partially synchronous model with omission failures. In this model, the processors have no direct knowledge about time, but the time between consecutive steps of each processor is always between two known constants c/sub 1/ and c/sub 2/; the ratio C=/sup c2///sub c1/ measures the timing uncertainty in the system. Moreover, messages are delivered within time d. This paper provides an improved protocol for the above problem. When the majority of the processors are fault-free, the protocol achieves consensus in time 3( phi +1)d+Cd, where phi is the actual number of faults in a specific execution of the protocol. This allows an increase in efficiency up to 25% over the existing protocol which requires time 4( phi +1)d+Cd.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126215256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223005
N. Bagherzadeh, K. Hawk
The authors present their experience in executing the auction algorithm on an iPSC/860 hypercube multiprocessor. They show the performance of the algorithm under synchronous and asynchronous computation models. In order to reduce the number of iterations for this algorithm and effectively increase the inherent parallelism in the auction algorithm, they propose and test a new technique called gamma -scaling.<>
{"title":"Parallel implementation of the auction algorithm on the Intel hypercube","authors":"N. Bagherzadeh, K. Hawk","doi":"10.1109/IPPS.1992.223005","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223005","url":null,"abstract":"The authors present their experience in executing the auction algorithm on an iPSC/860 hypercube multiprocessor. They show the performance of the algorithm under synchronous and asynchronous computation models. In order to reduce the number of iterations for this algorithm and effectively increase the inherent parallelism in the auction algorithm, they propose and test a new technique called gamma -scaling.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126385156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223013
D. Blough, S. Najand
Fault-tolerant routing algorithms in multiprocessor systems utilize diagnostic information in selecting paths for messages. In many situations, only incomplete, or partial, diagnostic information is available for this purpose. The authors present algorithms for achieving two forms of diagnosis, known as k-reachability diagnosis and k-neighborhood diagnosis which provide partial diagnostic information. They compare, both analytically and through experiments conducted on an Intel iPSC/2 hypercube the performance and overhead of these two algorithms. They also present a routing algorithm that successfully routes messages between connected non-faulty nodes in systems of arbitrary topology containing an arbitrary number of faults. The performance of the algorithm is shown to be optimal when k=n-1 and within a factor of two of optimal, in the worst case, when k=1.<>
{"title":"Fault-tolerant multiprocessor system routing using incomplete diagnostic information","authors":"D. Blough, S. Najand","doi":"10.1109/IPPS.1992.223013","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223013","url":null,"abstract":"Fault-tolerant routing algorithms in multiprocessor systems utilize diagnostic information in selecting paths for messages. In many situations, only incomplete, or partial, diagnostic information is available for this purpose. The authors present algorithms for achieving two forms of diagnosis, known as k-reachability diagnosis and k-neighborhood diagnosis which provide partial diagnostic information. They compare, both analytically and through experiments conducted on an Intel iPSC/2 hypercube the performance and overhead of these two algorithms. They also present a routing algorithm that successfully routes messages between connected non-faulty nodes in systems of arbitrary topology containing an arbitrary number of faults. The performance of the algorithm is shown to be optimal when k=n-1 and within a factor of two of optimal, in the worst case, when k=1.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125993754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223028
Donald B. Johnson, P. Metaxas
The vertex updating problem for a minimum spanning tree (MST) is defined as follows: Given a graph G=(V,E/sub G/) and its MST T, update T when a new vertex z is introduced along with weighted edges that connect z with the vertices of G. The authors present a set of rules that, together with a valid tree-contraction schedule are used to produce simple optimal parallel algorithms that run in O(log n) parallel time using n/lgn EREW PRAMs where n= mod V mod . These rules can also be used to derive simple linear-time sequential algorithms for the same problem. It is also shown how this solution can be used to solve the multiple vertex updating problem: Update a given MST when k new vertices are introduced simultaneously. This problem is solved in O(lgk.lgn) parallel time using /sub lgk.lgn//sup k.n/ EREW PRAM processors.<>
顶点的最小生成树(MST)更新问题定义如下:给定一个图G = (V, E / sub G /)及其MST T, T更新当一个新的顶点z介绍以及加权边缘连接z与G的顶点作者提供的一组规则,连同一个有效tree-contraction时间表是用于生产简单的最优运行的并行算法在O (log n)平行时间使用n / lgn EREW婴儿车V mod n =国防部。这些规则也可以用来为同样的问题推导简单的线性时间序列算法。还展示了如何使用此解决方案来解决多顶点更新问题:当同时引入k个新顶点时更新给定的MST。使用/sub lgk在O(lgk.lgn)并行时间内解决了这个问题。//sup k.n/ EREW PRAM处理器
{"title":"Optimal algorithms for the vertex updating problem of a minimum spanning tree","authors":"Donald B. Johnson, P. Metaxas","doi":"10.1109/IPPS.1992.223028","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223028","url":null,"abstract":"The vertex updating problem for a minimum spanning tree (MST) is defined as follows: Given a graph G=(V,E/sub G/) and its MST T, update T when a new vertex z is introduced along with weighted edges that connect z with the vertices of G. The authors present a set of rules that, together with a valid tree-contraction schedule are used to produce simple optimal parallel algorithms that run in O(log n) parallel time using n/lgn EREW PRAMs where n= mod V mod . These rules can also be used to derive simple linear-time sequential algorithms for the same problem. It is also shown how this solution can be used to solve the multiple vertex updating problem: Update a given MST when k new vertices are introduced simultaneously. This problem is solved in O(lgk.lgn) parallel time using /sub lgk.lgn//sup k.n/ EREW PRAM processors.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125416808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223000
Qiang Li
This paper presents a multiple-path deadlock-free routing algorithm in direct binary hypercubes which is an improved version of a previously published algorithm by the author (1991). Between two nodes of distance k, the previous algorithm provides k disjoint paths in one direction and one path in the other. The direction with one path is a performance bottleneck. The new algorithm adds one more disjoint path to the narrow direction using buffer management technique, and preserves the deadlock-free property. Although only one path is added, simulation results presented in this paper show a significant performance improvement since the added path almost doubles the capacity of the bottleneck.<>
{"title":"An improved multiple-path deadlock-free routing algorithm in binary hypercubes","authors":"Qiang Li","doi":"10.1109/IPPS.1992.223000","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223000","url":null,"abstract":"This paper presents a multiple-path deadlock-free routing algorithm in direct binary hypercubes which is an improved version of a previously published algorithm by the author (1991). Between two nodes of distance k, the previous algorithm provides k disjoint paths in one direction and one path in the other. The direction with one path is a performance bottleneck. The new algorithm adds one more disjoint path to the narrow direction using buffer management technique, and preserves the deadlock-free property. Although only one path is added, simulation results presented in this paper show a significant performance improvement since the added path almost doubles the capacity of the bottleneck.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133375114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222991
T. Lai, Y. Tseng, Xuefeng Dong
Termination detection is a fundamental problem in distributed computing. Many algorithms have been proposed, but only the S. Chandrasekaran and S. Venkatesan (CV) algorithm (1990) is known to be optimal in worst-case message complexity. This optimal algorithm, however, has several undesirable properties. First, it always requires M'+2* mod E mod +n-1 control messages, whether it is worst case or best case, where M' is the number of basic messages issued by the underlying computation after the algorithm starts, mod E mod is the number of channels in the system, and n is the number of processes. Second, its worst-case detection delay is O(M'). In a message-intensive computation, that might not be tolerable. Third, the maximum amount of space needed by each process is O(M'), a quantity not known at compile time, making it necessary to use the more expensive dynamic memory allocation. Last, it works only for FIFO channels. This paper remedies these drawbacks, while keeping its strength. The authors propose an algorithm that requires M'+2(n-1) control messages in the worst case, but much fewer on the average, and in the best case, it uses only 2(n-1) control messages, no matter how large M' is.<>
终端检测是分布式计算中的一个基本问题。已经提出了许多算法,但已知只有S. Chandrasekaran和S. Venkatesan (CV)算法(1990)在最坏情况下是最优的。然而,这种最优算法有几个不理想的特性。首先,无论是最坏情况还是最佳情况,它总是需要M'+2* mod E mod +n-1条控制消息,其中M'为算法启动后底层计算发出的基本消息数,mod E mod为系统中的通道数,n为进程数。其次,其最坏情况检测延迟为O(M’)。在消息密集型计算中,这可能是不可容忍的。第三,每个进程所需的最大空间量是O(M'),这个量在编译时是未知的,因此有必要使用更昂贵的动态内存分配。最后,它只适用于FIFO通道。本文弥补了这些缺点,同时保持了其强度。作者提出了一种算法,在最坏的情况下,它需要M'+2(n-1)个控制消息,但平均来说要少得多,在最好的情况下,它只需要2(n-1)个控制消息,无论M'有多大。
{"title":"A more efficient message-optimal algorithm for distributed termination detection","authors":"T. Lai, Y. Tseng, Xuefeng Dong","doi":"10.1109/IPPS.1992.222991","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222991","url":null,"abstract":"Termination detection is a fundamental problem in distributed computing. Many algorithms have been proposed, but only the S. Chandrasekaran and S. Venkatesan (CV) algorithm (1990) is known to be optimal in worst-case message complexity. This optimal algorithm, however, has several undesirable properties. First, it always requires M'+2* mod E mod +n-1 control messages, whether it is worst case or best case, where M' is the number of basic messages issued by the underlying computation after the algorithm starts, mod E mod is the number of channels in the system, and n is the number of processes. Second, its worst-case detection delay is O(M'). In a message-intensive computation, that might not be tolerable. Third, the maximum amount of space needed by each process is O(M'), a quantity not known at compile time, making it necessary to use the more expensive dynamic memory allocation. Last, it works only for FIFO channels. This paper remedies these drawbacks, while keeping its strength. The authors propose an algorithm that requires M'+2(n-1) control messages in the worst case, but much fewer on the average, and in the best case, it uses only 2(n-1) control messages, no matter how large M' is.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133386710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223029
G. Miel, E. Yfantis
The paper describes a software tool that facilitates mapping onto array processors of a wide class of unitary transforms. The mapping formalism of the tool depends on matrix factorizations combined with abstract constructs that link the linear concepts to a model of the array's architecture. A prototype design of the tool is graphics-based and user-driven.<>
{"title":"A software tool for cellular mapping of discrete unitary transforms","authors":"G. Miel, E. Yfantis","doi":"10.1109/IPPS.1992.223029","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223029","url":null,"abstract":"The paper describes a software tool that facilitates mapping onto array processors of a wide class of unitary transforms. The mapping formalism of the tool depends on matrix factorizations combined with abstract constructs that link the linear concepts to a model of the array's architecture. A prototype design of the tool is graphics-based and user-driven.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129222492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223077
O. Ibarra, M. Kim
Presents O(log n) time SIMD hypercube algorithms for transforming binary images to linear quadtrees and vice versa, where n is the size of the images as well as the number of hypercube nodes. The quadtree building algorithm, which generates the locational codes in preorder, is an improvement of a recently reported algorithm that runs in O(log/sup 2/n) time. The authors also give an optimal linear quadtree building algorithm which runs in T(n) time using n/sup 2//T(n) processors for log n>
{"title":"Quadtree building algorithms on an SIMD hypercube","authors":"O. Ibarra, M. Kim","doi":"10.1109/IPPS.1992.223077","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223077","url":null,"abstract":"Presents O(log n) time SIMD hypercube algorithms for transforming binary images to linear quadtrees and vice versa, where n is the size of the images as well as the number of hypercube nodes. The quadtree building algorithm, which generates the locational codes in preorder, is an improvement of a recently reported algorithm that runs in O(log/sup 2/n) time. The authors also give an optimal linear quadtree building algorithm which runs in T(n) time using n/sup 2//T(n) processors for log n<or=T(n)<or=n/sup 2/. The algorithm is optimal in the sense that the product of time and number of processors is asymptotically the same as the optimal sequential time which is O(n/sup 2/). For this algorithm we assume that the input binary image is divided into blocks and loaded in a shuffled row major ordered hypercube. The algorithm uses the procedures for the quadtree building algorithm developed for the case when the number of hypercube nodes is equal to the number of pixels in the binary image.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123962633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223023
M. Serrano, B. Parhami
A two-dimensional mesh of PEs with separable row and column buses has been shown to be quite effective for semigroup, prefix, and a wide class of other parallel computations. The authors show how semigroup and prefix computations can be performed with the same asymptotic time complexity on meshes having separable buses for a subset of rows and columns. They find that with this basic arrangement, square grids are not optimal but that a hierarchical method of synthesizing large meshes builds optimal square meshes from rectangular submeshes. The time-complexity results are shown to correspond to those previously published when certain parameters of the design are fixed at special values.<>
{"title":"Optimal aspect ratio and number of separable row/column buses for mesh-connected parallel computers","authors":"M. Serrano, B. Parhami","doi":"10.1109/IPPS.1992.223023","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223023","url":null,"abstract":"A two-dimensional mesh of PEs with separable row and column buses has been shown to be quite effective for semigroup, prefix, and a wide class of other parallel computations. The authors show how semigroup and prefix computations can be performed with the same asymptotic time complexity on meshes having separable buses for a subset of rows and columns. They find that with this basic arrangement, square grids are not optimal but that a hierarchical method of synthesizing large meshes builds optimal square meshes from rectangular submeshes. The time-complexity results are shown to correspond to those previously published when certain parameters of the design are fixed at special values.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115971742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}