Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262789
Chao-Chun Wang, L. Jamieson
Heuristic search is the process of searching a state space under the guidance of an evaluation function. Most research on parallelizing heuristic search algorithms has emphasized system problems such as load balancing and reduction in memory use. A theoretical analysis of a new autonomous parallel heuristic search algorithm is introduced. Rather than simply dividing the search space among the processors, the processors share information that monitors the progress of the search and use consensus to limit the amount of time spent in expanding nodes that are not on the optimal path. Each processor uses a different admissible heuristic function, and it is shown that the expected number of nodes generated by each processor in the course of the search is reduced by a factor that reflects the consensus among the processors. The asynchronous behavior of the algorithm eliminates synchronization delays.<>
{"title":"Autonomous parallel heuristic combinatorial search","authors":"Chao-Chun Wang, L. Jamieson","doi":"10.1109/IPPS.1993.262789","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262789","url":null,"abstract":"Heuristic search is the process of searching a state space under the guidance of an evaluation function. Most research on parallelizing heuristic search algorithms has emphasized system problems such as load balancing and reduction in memory use. A theoretical analysis of a new autonomous parallel heuristic search algorithm is introduced. Rather than simply dividing the search space among the processors, the processors share information that monitors the progress of the search and use consensus to limit the amount of time spent in expanding nodes that are not on the optimal path. Each processor uses a different admissible heuristic function, and it is shown that the expected number of nodes generated by each processor in the course of the search is reduced by a factor that reflects the consensus among the processors. The asynchronous behavior of the algorithm eliminates synchronization delays.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124561779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262831
A. Bar-Noy, S. Kipnis
Broadcasting is a widely used operation in many message-passing systems. Most existing broadcasting algorithms, however, do not address several emerging trends in distributed-memory parallel computers and high-speed communication networks. These trends include (i) treating the system as a fully connected collection of processors, (ii) packetizing large data into sequences of messages, and (iii) tolerating communication latencies. This paper explores the broadcasting problem in the postal model that addresses these issues. The authors provide two algorithms for broadcasting m messages in a message-passing system with n processors and communication latency lambda . A lower bound on the time for this problem is (m-1)+f/sub lambda /(n), where f/sub lambda /(n) is the optimal time for broadcasting one message. They present algorithm PARTITION that takes at most 2m+f/sub lambda /(n)+O( lambda ) time, and algorithm D-D-TREES that takes at most m+2f/sub lambda /(n)+O( lambda ) time.<>
{"title":"Multiple message broadcasting in the postal model","authors":"A. Bar-Noy, S. Kipnis","doi":"10.1109/IPPS.1993.262831","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262831","url":null,"abstract":"Broadcasting is a widely used operation in many message-passing systems. Most existing broadcasting algorithms, however, do not address several emerging trends in distributed-memory parallel computers and high-speed communication networks. These trends include (i) treating the system as a fully connected collection of processors, (ii) packetizing large data into sequences of messages, and (iii) tolerating communication latencies. This paper explores the broadcasting problem in the postal model that addresses these issues. The authors provide two algorithms for broadcasting m messages in a message-passing system with n processors and communication latency lambda . A lower bound on the time for this problem is (m-1)+f/sub lambda /(n), where f/sub lambda /(n) is the optimal time for broadcasting one message. They present algorithm PARTITION that takes at most 2m+f/sub lambda /(n)+O( lambda ) time, and algorithm D-D-TREES that takes at most m+2f/sub lambda /(n)+O( lambda ) time.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130075838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262864
Yeimkuan Chang, L. Bhuyan
Parallel algorithms of the hypercube allocation strategies are considered. Although the sequential algorithms of various hypercube allocation strategies are easier to implement, their worst case time complexities exponentially increase as the dimension of the hypercube increases. The authors show that the free processors can be utilized to perform the allocation jobs in parallel to improve the efficiency of the hypercube allocation algorithms. A modified parallel algorithm for the single Gray-Code (GC) strategy is proposed and is shown to be able to recognize more subcubes than the single GC strategy by using the binary reflected Gray code and inverse binary reflected Gray code, without increasing the execution time. Two algorithms for a complete subcube recognition system are also presented and shown to be more efficient and attractive than the sequential one currently used in the hypercube multiprocessor.<>
{"title":"Parallel algorithms for hypercube allocation","authors":"Yeimkuan Chang, L. Bhuyan","doi":"10.1109/IPPS.1993.262864","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262864","url":null,"abstract":"Parallel algorithms of the hypercube allocation strategies are considered. Although the sequential algorithms of various hypercube allocation strategies are easier to implement, their worst case time complexities exponentially increase as the dimension of the hypercube increases. The authors show that the free processors can be utilized to perform the allocation jobs in parallel to improve the efficiency of the hypercube allocation algorithms. A modified parallel algorithm for the single Gray-Code (GC) strategy is proposed and is shown to be able to recognize more subcubes than the single GC strategy by using the binary reflected Gray code and inverse binary reflected Gray code, without increasing the execution time. Two algorithms for a complete subcube recognition system are also presented and shown to be more efficient and attractive than the sequential one currently used in the hypercube multiprocessor.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122522083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262892
Nitin K. Singhvi
The enhanced connection cube or ECC and the minimal connection cube or MCC, proposed in this paper, are regular and symmetric static interconnection networks for large-scale, loosely coupled systems. The ECC connects 2/sup 2n+1/ processing nodes with only n+2 links per node, almost half the number used in a comparable hypercube. Yet its diameter is only n+2, almost half that of the hypercube. The MCC connects 2/sup 2n+1/ nodes using only n+1 links per node, has about the same diameter as a hypercube and is scalable like the hypercube. The MCC can be converted into the ECC by adding one more link per node. Both networks can emulate all the connections present in a hypercube of the same size, with no increase in routing complexity, so that typical parallel applications run on both types of CCs with the same time complexity as on a hypercube.<>
{"title":"The connection cubes: symmetric, low diameter interconnection networks with low node degree","authors":"Nitin K. Singhvi","doi":"10.1109/IPPS.1993.262892","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262892","url":null,"abstract":"The enhanced connection cube or ECC and the minimal connection cube or MCC, proposed in this paper, are regular and symmetric static interconnection networks for large-scale, loosely coupled systems. The ECC connects 2/sup 2n+1/ processing nodes with only n+2 links per node, almost half the number used in a comparable hypercube. Yet its diameter is only n+2, almost half that of the hypercube. The MCC connects 2/sup 2n+1/ nodes using only n+1 links per node, has about the same diameter as a hypercube and is scalable like the hypercube. The MCC can be converted into the ECC by adding one more link per node. Both networks can emulate all the connections present in a hypercube of the same size, with no increase in routing complexity, so that typical parallel applications run on both types of CCs with the same time complexity as on a hypercube.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121379734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262804
W. Guan, W. Tsai, D. Blough
The communication performance of the interconnection network is critical in a multicomputer system. Wormhole routing has been known to be more efficient than the traditional circuit switching and packet switching. To evaluate wormhole routing, a queueing-theoretic analysis is used. This paper presents a general analytical model for wormhole routing based on very basic assumptions. The model is used to evaluate the routing delays in hypercubes and meshes. Delays calculated are compared against those obtained from simulations, and these comparisons show that the model is within a reasonable accuracy.<>
{"title":"An analytical model for wormhole routing in multicomputer interconnection networks","authors":"W. Guan, W. Tsai, D. Blough","doi":"10.1109/IPPS.1993.262804","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262804","url":null,"abstract":"The communication performance of the interconnection network is critical in a multicomputer system. Wormhole routing has been known to be more efficient than the traditional circuit switching and packet switching. To evaluate wormhole routing, a queueing-theoretic analysis is used. This paper presents a general analytical model for wormhole routing based on very basic assumptions. The model is used to evaluate the routing delays in hypercubes and meshes. Delays calculated are compared against those obtained from simulations, and these comparisons show that the model is within a reasonable accuracy.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126012301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262802
S. Rajasekaran, David S. L. Wei
The authors consider the problems of selection, routing and sorting on an n-star graph (with n! n odes), an interconnection network which has been proven to possess many special properties. They identify a tree like subgraph (a '(k, 1, k) chain network') of the star graph which enables them to design efficient algorithms for these problems. They present an algorithm that performs a sequence of n prefix computations in O(n/sup 2/) time. This algorithm is used as a subroutine in other algorithms. In addition they offer an efficient deterministic sorting algorithm that runs in (n/sup 3/ log n)/2 steps. They also show that sorting can be performed on the n-star graph in time O(n/sup 3/) and that selection of a set of uniformly distributed n keys can be performed in O(n/sup 2/) time with high probability. Finally, they also present a deterministic (non oblivious) routing algorithm that realizes any permutation in O(n/sup 3/) steps on the n-star graph.<>
{"title":"Selection, routing, and sorting on the star graph","authors":"S. Rajasekaran, David S. L. Wei","doi":"10.1109/IPPS.1993.262802","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262802","url":null,"abstract":"The authors consider the problems of selection, routing and sorting on an n-star graph (with n! n odes), an interconnection network which has been proven to possess many special properties. They identify a tree like subgraph (a '(k, 1, k) chain network') of the star graph which enables them to design efficient algorithms for these problems. They present an algorithm that performs a sequence of n prefix computations in O(n/sup 2/) time. This algorithm is used as a subroutine in other algorithms. In addition they offer an efficient deterministic sorting algorithm that runs in (n/sup 3/ log n)/2 steps. They also show that sorting can be performed on the n-star graph in time O(n/sup 3/) and that selection of a set of uniformly distributed n keys can be performed in O(n/sup 2/) time with high probability. Finally, they also present a deterministic (non oblivious) routing algorithm that realizes any permutation in O(n/sup 3/) steps on the n-star graph.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124053490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262914
J. Antonio, L. Lin, R. C. Metzger
Lower bound complexities are derived for three intensive communication patterns assuming a balanced generalized hypercube (BGHC) topology. The BGHC is a generalized hypercube that has exactly w nodes along each of the d dimensions for a total of w/sup d/ nodes. A BGHC is said to be dense if the w nodes along each dimension form a complete directed graph. A BGHC is said to be sparse if the w nodes along each dimension form a unidirectional ring. It is shown that a dense N node BGHC with a node degree equal to Klog/sub 2/N, where K>or=2, can process certain intensive communication patterns K(K-1) times faster than an N node binary hypercube (which has a node degree equal to log/sub 2/N). Furthermore, a sparse N node BGHC with a node degree equal to /sup 1///sub L/log/sub 2/N, where L>or=2, is 2/sup L/ times slower at processing certain intensive communication patterns than an N node binary hypercube.<>
{"title":"Complexity of intensive communications on balanced generalized hypercubes","authors":"J. Antonio, L. Lin, R. C. Metzger","doi":"10.1109/IPPS.1993.262914","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262914","url":null,"abstract":"Lower bound complexities are derived for three intensive communication patterns assuming a balanced generalized hypercube (BGHC) topology. The BGHC is a generalized hypercube that has exactly w nodes along each of the d dimensions for a total of w/sup d/ nodes. A BGHC is said to be dense if the w nodes along each dimension form a complete directed graph. A BGHC is said to be sparse if the w nodes along each dimension form a unidirectional ring. It is shown that a dense N node BGHC with a node degree equal to Klog/sub 2/N, where K>or=2, can process certain intensive communication patterns K(K-1) times faster than an N node binary hypercube (which has a node degree equal to log/sub 2/N). Furthermore, a sparse N node BGHC with a node degree equal to /sup 1///sub L/log/sub 2/N, where L>or=2, is 2/sup L/ times slower at processing certain intensive communication patterns than an N node binary hypercube.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129083736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262801
A. Bhattacharya, R. R. Rao, Ting-Ting Y. Lin
Multistage interconnection networks (MINs) provide a cost-effective alternative to a full crossbar connection for processor-processor or processor-memory communication in a tightly coupled multiprocessor system. Delta networks, a class of blocking type MIN with unique path property, have been studied extensively for their self-routing capability. A probabilistic analysis of the blocking and its effect on the delay is presented here, for such a network operated in a synchronous circuit-switched mode. Under the assumption of uniformly distributed access requests independently generated at each unblocked source, an upper bound on the expected latency has been established. The bound has been compared with simulation results.<>
{"title":"Delay analysis in synchronous circuit-switched delta networks","authors":"A. Bhattacharya, R. R. Rao, Ting-Ting Y. Lin","doi":"10.1109/IPPS.1993.262801","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262801","url":null,"abstract":"Multistage interconnection networks (MINs) provide a cost-effective alternative to a full crossbar connection for processor-processor or processor-memory communication in a tightly coupled multiprocessor system. Delta networks, a class of blocking type MIN with unique path property, have been studied extensively for their self-routing capability. A probabilistic analysis of the blocking and its effect on the delay is presented here, for such a network operated in a synchronous circuit-switched mode. Under the assumption of uniformly distributed access requests independently generated at each unblocked source, an upper bound on the expected latency has been established. The bound has been compared with simulation results.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"419 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133517066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262866
B. Narahari, Ramesh Krishnamurti
A partitionable hypercube allows simultaneous execution of multiple tasks, where each task can be executed on a choice of subcubes. This paper considers the problem of static nonpreemptive scheduling of w independent tasks on a n processor partitionable hypercube system to minimize the overall finishing time of the w tasks. Each task can be executed on subcubes of different sizes, with smaller execution times on larger subcubes. A schedule determines the size of the subcube to be assigned to each task and schedules these tasks on the processors in the hypercube system. The problem of finding the optimal schedule, with minimum finishing time, is known to be NP-hard. This paper presents a fast polynomial time approximation algorithm for the problem, and derives a tight worst-case performance bound of 2 for the algorithm.<>
{"title":"Scheduling independent tasks on partitionable hypercube multiprocessors","authors":"B. Narahari, Ramesh Krishnamurti","doi":"10.1109/IPPS.1993.262866","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262866","url":null,"abstract":"A partitionable hypercube allows simultaneous execution of multiple tasks, where each task can be executed on a choice of subcubes. This paper considers the problem of static nonpreemptive scheduling of w independent tasks on a n processor partitionable hypercube system to minimize the overall finishing time of the w tasks. Each task can be executed on subcubes of different sizes, with smaller execution times on larger subcubes. A schedule determines the size of the subcube to be assigned to each task and schedules these tasks on the processors in the hypercube system. The problem of finding the optimal schedule, with minimum finishing time, is known to be NP-hard. This paper presents a fast polynomial time approximation algorithm for the problem, and derives a tight worst-case performance bound of 2 for the algorithm.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127020691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262832
D. Windheiser, E. Boyd, E. Hao, S. Abraham, E. Davidson
This paper analyzes and evaluates some novel latency hiding features of the KSR1 multiprocessor: prefetch and poststore instructions and automatic updates. As a case study, the authors analyze the performance of an iterative sparse solver which generates irregular communications. They show that automatic updates significantly reduce the amount of communication. Although prefetch and poststore instructions reduce the coherence miss ratios, they do not significantly improve the sparse solver performance due to the overhead in executing these instructions.<>
{"title":"KSR1 multiprocessor: analysis of latency hiding techniques in a sparse solver","authors":"D. Windheiser, E. Boyd, E. Hao, S. Abraham, E. Davidson","doi":"10.1109/IPPS.1993.262832","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262832","url":null,"abstract":"This paper analyzes and evaluates some novel latency hiding features of the KSR1 multiprocessor: prefetch and poststore instructions and automatic updates. As a case study, the authors analyze the performance of an iterative sparse solver which generates irregular communications. They show that automatic updates significantly reduce the amount of communication. Although prefetch and poststore instructions reduce the coherence miss ratios, they do not significantly improve the sparse solver performance due to the overhead in executing these instructions.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130667466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}