Pub Date : 1992-12-01DOI: 10.1109/SPDP.1992.242714
Jehoshua Bruck, R. Cypher, C. T. Ho
The authors present efficient algorithms for broadcasting multiple messages. They assume n processors, one of which contains m packets that it must broadcast to each of the remaining n-1 processors. The processors communicate in rounds. In one round each processor is able to send one packet to any other processor and receive one packet from any other processor. The authors give a broadcasting algorithm which requires m+log n+3 log log n +15 rounds. In addition, they show a simple lower bound of m+(log n) -1 rounds for broadcasting in this model.<>
{"title":"Multiple message broadcasting with generalized Fibonacci trees","authors":"Jehoshua Bruck, R. Cypher, C. T. Ho","doi":"10.1109/SPDP.1992.242714","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242714","url":null,"abstract":"The authors present efficient algorithms for broadcasting multiple messages. They assume n processors, one of which contains m packets that it must broadcast to each of the remaining n-1 processors. The processors communicate in rounds. In one round each processor is able to send one packet to any other processor and receive one packet from any other processor. The authors give a broadcasting algorithm which requires m+log n+3 log log n +15 rounds. In addition, they show a simple lower bound of m+(log n) -1 rounds for broadcasting in this model.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126676879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-12-01DOI: 10.1109/SPDP.1992.242761
D. Krizanc, L. Narayanan
The authors present novel algorithms for selecting an elements of specified rank among N=n/sup 2/ elements on an n*n mesh-connected processor array, in a variety of settings. They give: (1) an optimal randomized algorithm for selecting the element of rank k out of N, 1>
作者提出了一种新颖的算法,用于在N * N网格连接的处理器阵列上,在各种设置下,从N= N /sup 2/个元素中选择指定秩的元素。他们给出:(1)从N个元素中选择第k个元素的最优随机算法,1>
{"title":"Optimal algorithms for selection on a mesh-connected processor array","authors":"D. Krizanc, L. Narayanan","doi":"10.1109/SPDP.1992.242761","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242761","url":null,"abstract":"The authors present novel algorithms for selecting an elements of specified rank among N=n/sup 2/ elements on an n*n mesh-connected processor array, in a variety of settings. They give: (1) an optimal randomized algorithm for selecting the element of rank k out of N, 1<or=k<or=N, at any processor that is at least 0.5n-o(n) steps away from the middle processor; (2) an optimal deterministic algorithm for selecting the element of rank k out of N, 1<or=k<or=N, at any processor, when the elements are drawn from the set (1,. . .,N/sup 1- in /), where 0< in <or=1; and an optimal deterministic algorithm for selecting the element of rank k out of N, at any processor, when 1<or=k<or=N/sup 1- in /, where 0< in <or=1.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124251301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-12-01DOI: 10.1109/SPDP.1992.242709
M. S. Baig, T. El-Ghazawi, N. Alexandridis
In multiple SIMD/MIMD (single-instruction multiple-data/multiple-instruction multiple-data) (MSIMD/MIMD) architectures, two different types of processors are used: processing elements (PEs), to support both SIMD and MIMD partitions, and control units (CUs), to support SIMD partitions. In the existing architectures, the role of a processor to run as either PE or CU is determined only at design time. It is shown that this fixed assignment results in performance degradations. Furthermore, a single processor-pool MSIMD/MIMD architectural model with dynamic processor assignments is introduced. A cube-based single processor-pool system is presented. This system is referred to as the single-pool processor (SPP). Simulation and analysis have shown that the proposed SPP architecture offers a significantly better performance/cost than other MSIMD/MIMD systems.<>
{"title":"Single processor-pool MSIMD/MIMD architectures","authors":"M. S. Baig, T. El-Ghazawi, N. Alexandridis","doi":"10.1109/SPDP.1992.242709","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242709","url":null,"abstract":"In multiple SIMD/MIMD (single-instruction multiple-data/multiple-instruction multiple-data) (MSIMD/MIMD) architectures, two different types of processors are used: processing elements (PEs), to support both SIMD and MIMD partitions, and control units (CUs), to support SIMD partitions. In the existing architectures, the role of a processor to run as either PE or CU is determined only at design time. It is shown that this fixed assignment results in performance degradations. Furthermore, a single processor-pool MSIMD/MIMD architectural model with dynamic processor assignments is introduced. A cube-based single processor-pool system is presented. This system is referred to as the single-pool processor (SPP). Simulation and analysis have shown that the proposed SPP architecture offers a significantly better performance/cost than other MSIMD/MIMD systems.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121579924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-12-01DOI: 10.1109/SPDP.1992.242737
Mohan L. Ahuja, R. Prakash
The authors study the concept of relative speed of messages and a hierarchy of such speeds which results in a hierarchical channel. They present examples of applications where hierarchical channels are useful and an implementation of such a channel using selective flooding. They describe a mechanism that permits a programmer to specify a partial order among channels, and compare the hierarchical channels with F-channels.<>
{"title":"On the relative speed of messages and hierarchical channels","authors":"Mohan L. Ahuja, R. Prakash","doi":"10.1109/SPDP.1992.242737","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242737","url":null,"abstract":"The authors study the concept of relative speed of messages and a hierarchy of such speeds which results in a hierarchical channel. They present examples of applications where hierarchical channels are useful and an implementation of such a channel using selective flooding. They describe a mechanism that permits a programmer to specify a partial order among channels, and compare the hierarchical channels with F-channels.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124336109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-12-01DOI: 10.1109/SPDP.1992.242769
C. Raghavendra, M. Sridhar
The authors consider an important global operation, namely, broadcasting in a faulty hypercube. In particular, they study the problem of broadcasting in an n-dimensional single-instruction multiple data (SIMD) hypercube, Q/sub n/, with up to n-1 node faults. Given a set of at most n-1 faults, they develop an ordering d/sub 1/, d/sub 2/, . ., d/sub n/ of n dimensions, depending on where the faults are located. An important and useful property of this dimension ordering is the following: if the n-cube is partitioned into k-subcubes using the first k dimensions of this ordering, namely d/sub 1/, d/sub 2/,. . .d/sub k/ for any 1>
{"title":"Broadcasting algorithms in faulty SIMD hypercubes","authors":"C. Raghavendra, M. Sridhar","doi":"10.1109/SPDP.1992.242769","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242769","url":null,"abstract":"The authors consider an important global operation, namely, broadcasting in a faulty hypercube. In particular, they study the problem of broadcasting in an n-dimensional single-instruction multiple data (SIMD) hypercube, Q/sub n/, with up to n-1 node faults. Given a set of at most n-1 faults, they develop an ordering d/sub 1/, d/sub 2/, . ., d/sub n/ of n dimensions, depending on where the faults are located. An important and useful property of this dimension ordering is the following: if the n-cube is partitioned into k-subcubes using the first k dimensions of this ordering, namely d/sub 1/, d/sub 2/,. . .d/sub k/ for any 1<or=k<or=n, then each k-subcube contains at most k-1 faults. This result is used to develop several new algorithms for broadcasting. These algorithms are n+3 log n, n+2 log n+2, n+ log n+0 (log log n), n+ log n+5, and n+12 time steps, respectively, and thus improve upon the best known algorithms for this problem.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122260104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-12-01DOI: 10.1109/SPDP.1992.242702
Yuguang Wu, G. Popek, R. Muntz
The authors propose a simple solution to the problem of efficient stack evaluation of LRU (least recently used) cache memories with an arbitrary two's power set-associativity on multiprocessors. It is an extension of stack evaluation techniques for all-associativity LRU cache on a uniprocessor. Special marker entries are used in the stack to represent data pages (also called data blocks or lines) deleted by an invalidation-based cache coherence protocol. A technique of marker-splitting is used when a data page below a marker in the stack is accessed. One-pass evaluation of memory access trace will yield hit ratios for all cache sizes and set associativities on multiprocessor caches in a single pass over a memory reference trace with the use of this technique.<>
{"title":"Efficient evaluation of arbitrary set-associative caches on multiprocessors","authors":"Yuguang Wu, G. Popek, R. Muntz","doi":"10.1109/SPDP.1992.242702","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242702","url":null,"abstract":"The authors propose a simple solution to the problem of efficient stack evaluation of LRU (least recently used) cache memories with an arbitrary two's power set-associativity on multiprocessors. It is an extension of stack evaluation techniques for all-associativity LRU cache on a uniprocessor. Special marker entries are used in the stack to represent data pages (also called data blocks or lines) deleted by an invalidation-based cache coherence protocol. A technique of marker-splitting is used when a data page below a marker in the stack is accessed. One-pass evaluation of memory access trace will yield hit ratios for all cache sizes and set associativities on multiprocessor caches in a single pass over a memory reference trace with the use of this technique.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134590440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-12-01DOI: 10.1109/SPDP.1992.242751
P. T. Gaughan, S. Yalamanchili
An effort is made to reconcile the conflicting demands of performance and fault-tolerance in interprocessor communication protocols. To this end, the authors propose a pipelined communication mechanism-pipelined circuit-switching (PCS)-which is variant of the well known wormhole routing (WR) mechanism. They present a new class of adaptive routing algorithms, misrouting backtracking-m (MB-m), made possible by PCS and proofs of some fault-tolerant properties of MB-m. The results of an experimental evaluation of PCS and MB-3 are also presented. This methodology provides performance approaching that of WR, while realizing fault-tolerant behavior that is difficult to achieve with WR.<>
{"title":"Pipelined circuit-switching: a fault-tolerant variant of wormhole routing","authors":"P. T. Gaughan, S. Yalamanchili","doi":"10.1109/SPDP.1992.242751","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242751","url":null,"abstract":"An effort is made to reconcile the conflicting demands of performance and fault-tolerance in interprocessor communication protocols. To this end, the authors propose a pipelined communication mechanism-pipelined circuit-switching (PCS)-which is variant of the well known wormhole routing (WR) mechanism. They present a new class of adaptive routing algorithms, misrouting backtracking-m (MB-m), made possible by PCS and proofs of some fault-tolerant properties of MB-m. The results of an experimental evaluation of PCS and MB-3 are also presented. This methodology provides performance approaching that of WR, while realizing fault-tolerant behavior that is difficult to achieve with WR.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133458584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-12-01DOI: 10.1109/SPDP.1992.242743
C. Gong
The author discusses a novel loop transformation technique, extended cycle shrinking (ECS), that is based on cycle shrinking (CS). While CS is a very powerful technique in dealing with dependences involving a cycle, it fails to generate an optimal program in many cases. The ECS can generate optimal programs for a class of loops called regular cycle dependence loops.<>
{"title":"Extended cycle shrinking: an optimal loop transformation","authors":"C. Gong","doi":"10.1109/SPDP.1992.242743","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242743","url":null,"abstract":"The author discusses a novel loop transformation technique, extended cycle shrinking (ECS), that is based on cycle shrinking (CS). While CS is a very powerful technique in dealing with dependences involving a cycle, it fails to generate an optimal program in many cases. The ECS can generate optimal programs for a class of loops called regular cycle dependence loops.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115573618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-12-01DOI: 10.1109/SPDP.1992.242723
Jyh-Charn S. Liu, Yilong Chen
The authors propose a novel system framework for the design of distributed job-scheduling and subcube-allocation strategies in hypercube multiprocessor/multicomputer systems. A generalized-lattice ordering scheme is proposed for processors. An elegant system information structure, the subcube identification table (SIT), is proposed for efficient distribution, retrieval, and update of free-subcube information. Locations of free subcubes can be determined by any node through direct lookup of its SIT. A novel interprocessor communication mechanism called the sync-broadcast is presented for SIT update/construction, and for resolving contention between subcube-requests for consistent allocation/deallocation of subcubes. Different job scheduling schemes can be easily implemented based on the proposed scheme.<>
{"title":"On the distributed subcube-allocation strategies in the hypercube multiprocessor systems","authors":"Jyh-Charn S. Liu, Yilong Chen","doi":"10.1109/SPDP.1992.242723","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242723","url":null,"abstract":"The authors propose a novel system framework for the design of distributed job-scheduling and subcube-allocation strategies in hypercube multiprocessor/multicomputer systems. A generalized-lattice ordering scheme is proposed for processors. An elegant system information structure, the subcube identification table (SIT), is proposed for efficient distribution, retrieval, and update of free-subcube information. Locations of free subcubes can be determined by any node through direct lookup of its SIT. A novel interprocessor communication mechanism called the sync-broadcast is presented for SIT update/construction, and for resolving contention between subcube-requests for consistent allocation/deallocation of subcubes. Different job scheduling schemes can be easily implemented based on the proposed scheme.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116039433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-12-01DOI: 10.1109/SPDP.1992.242746
E. Dahlhaus
A fast parallel algorithm of single link heuristics of hierarchical clustering is presented. Its time processor product is optimal and the parallel time is the square of the logarithm. The algorithm is based on computing a minimum spanning tree which can be done in O(log/sup 2/ n) time using O(n/sup 2//log/sup n/) processors. The main gap to be filled is to compute a hierarchical clustering tree (dendrogram) from a minimum spanning tree. The author proves that this can be done in O(log n) time using O(n) processors. Therefore, the overall time-processor product of O(n/sup 2/) is optimal.<>
{"title":"Fast parallel algorithm for the single link heuristics of hierarchical clustering","authors":"E. Dahlhaus","doi":"10.1109/SPDP.1992.242746","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242746","url":null,"abstract":"A fast parallel algorithm of single link heuristics of hierarchical clustering is presented. Its time processor product is optimal and the parallel time is the square of the logarithm. The algorithm is based on computing a minimum spanning tree which can be done in O(log/sup 2/ n) time using O(n/sup 2//log/sup n/) processors. The main gap to be filled is to compute a hierarchical clustering tree (dendrogram) from a minimum spanning tree. The author proves that this can be done in O(log n) time using O(n) processors. Therefore, the overall time-processor product of O(n/sup 2/) is optimal.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"240 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116307291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}