Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262866
B. Narahari, Ramesh Krishnamurti
A partitionable hypercube allows simultaneous execution of multiple tasks, where each task can be executed on a choice of subcubes. This paper considers the problem of static nonpreemptive scheduling of w independent tasks on a n processor partitionable hypercube system to minimize the overall finishing time of the w tasks. Each task can be executed on subcubes of different sizes, with smaller execution times on larger subcubes. A schedule determines the size of the subcube to be assigned to each task and schedules these tasks on the processors in the hypercube system. The problem of finding the optimal schedule, with minimum finishing time, is known to be NP-hard. This paper presents a fast polynomial time approximation algorithm for the problem, and derives a tight worst-case performance bound of 2 for the algorithm.<>
{"title":"Scheduling independent tasks on partitionable hypercube multiprocessors","authors":"B. Narahari, Ramesh Krishnamurti","doi":"10.1109/IPPS.1993.262866","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262866","url":null,"abstract":"A partitionable hypercube allows simultaneous execution of multiple tasks, where each task can be executed on a choice of subcubes. This paper considers the problem of static nonpreemptive scheduling of w independent tasks on a n processor partitionable hypercube system to minimize the overall finishing time of the w tasks. Each task can be executed on subcubes of different sizes, with smaller execution times on larger subcubes. A schedule determines the size of the subcube to be assigned to each task and schedules these tasks on the processors in the hypercube system. The problem of finding the optimal schedule, with minimum finishing time, is known to be NP-hard. This paper presents a fast polynomial time approximation algorithm for the problem, and derives a tight worst-case performance bound of 2 for the algorithm.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127020691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262831
A. Bar-Noy, S. Kipnis
Broadcasting is a widely used operation in many message-passing systems. Most existing broadcasting algorithms, however, do not address several emerging trends in distributed-memory parallel computers and high-speed communication networks. These trends include (i) treating the system as a fully connected collection of processors, (ii) packetizing large data into sequences of messages, and (iii) tolerating communication latencies. This paper explores the broadcasting problem in the postal model that addresses these issues. The authors provide two algorithms for broadcasting m messages in a message-passing system with n processors and communication latency lambda . A lower bound on the time for this problem is (m-1)+f/sub lambda /(n), where f/sub lambda /(n) is the optimal time for broadcasting one message. They present algorithm PARTITION that takes at most 2m+f/sub lambda /(n)+O( lambda ) time, and algorithm D-D-TREES that takes at most m+2f/sub lambda /(n)+O( lambda ) time.<>
{"title":"Multiple message broadcasting in the postal model","authors":"A. Bar-Noy, S. Kipnis","doi":"10.1109/IPPS.1993.262831","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262831","url":null,"abstract":"Broadcasting is a widely used operation in many message-passing systems. Most existing broadcasting algorithms, however, do not address several emerging trends in distributed-memory parallel computers and high-speed communication networks. These trends include (i) treating the system as a fully connected collection of processors, (ii) packetizing large data into sequences of messages, and (iii) tolerating communication latencies. This paper explores the broadcasting problem in the postal model that addresses these issues. The authors provide two algorithms for broadcasting m messages in a message-passing system with n processors and communication latency lambda . A lower bound on the time for this problem is (m-1)+f/sub lambda /(n), where f/sub lambda /(n) is the optimal time for broadcasting one message. They present algorithm PARTITION that takes at most 2m+f/sub lambda /(n)+O( lambda ) time, and algorithm D-D-TREES that takes at most m+2f/sub lambda /(n)+O( lambda ) time.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130075838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262906
Jianjian Song
Defining tasks as independent entities with identical execution time and workload as the number of tasks, the author proposes a partially asynchronous and iterative algorithm for distributed load balancing, shows its properties, and reports its simulation results. The algorithm converges geometrically according to a theorem proved elsewhere. He proves that the algorithm can achieve the maximum load imbalance of not more than (/sup d///sub 2/) tasks, where d is the diameter of a network. His simulation of a synchronous version of the algorithm not only validated the properties but also showed that the algorithm could produce much smaller load imbalances for hypercubes. The obtained imbalances for hypercubes of order up to ten were no more than two tasks and 56% of the sample runs produced only one task difference, as opposed to the theoretical maximum of six tasks.<>
{"title":"A partially asynchronous and iterative algorithm for distributed load balancing","authors":"Jianjian Song","doi":"10.1109/IPPS.1993.262906","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262906","url":null,"abstract":"Defining tasks as independent entities with identical execution time and workload as the number of tasks, the author proposes a partially asynchronous and iterative algorithm for distributed load balancing, shows its properties, and reports its simulation results. The algorithm converges geometrically according to a theorem proved elsewhere. He proves that the algorithm can achieve the maximum load imbalance of not more than (/sup d///sub 2/) tasks, where d is the diameter of a network. His simulation of a synchronous version of the algorithm not only validated the properties but also showed that the algorithm could produce much smaller load imbalances for hypercubes. The obtained imbalances for hypercubes of order up to ten were no more than two tasks and 56% of the sample runs produced only one task difference, as opposed to the theoretical maximum of six tasks.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126449985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262867
R. Ponnusamy, N. Mansour, A. Choudhary, G. Fox
Mapping data to parallel computers aims at minimizing the execution time of the associated application. However, it can take an unacceptable amount of time in comparison with the execution time of the application if the size of the problem is large. The authors propose reducing the problem size by a mapping-oriented graph contraction technique. They present a graph contraction (GC) heuristic algorithm that yields a smaller representation of the problem, to which mapping is then applied. The experimental results show that the GC algorithm still leads to good quality mapping solutions to the original problem, while producing remarkable reductions in mapping time. The GC algorithm allows large-scale mapping to become efficient, especially when slow but high-quality mappers are used.<>
{"title":"Mapping realistic data sets on parallel computers","authors":"R. Ponnusamy, N. Mansour, A. Choudhary, G. Fox","doi":"10.1109/IPPS.1993.262867","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262867","url":null,"abstract":"Mapping data to parallel computers aims at minimizing the execution time of the associated application. However, it can take an unacceptable amount of time in comparison with the execution time of the application if the size of the problem is large. The authors propose reducing the problem size by a mapping-oriented graph contraction technique. They present a graph contraction (GC) heuristic algorithm that yields a smaller representation of the problem, to which mapping is then applied. The experimental results show that the GC algorithm still leads to good quality mapping solutions to the original problem, while producing remarkable reductions in mapping time. The GC algorithm allows large-scale mapping to become efficient, especially when slow but high-quality mappers are used.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"53 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113933564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262909
Ashok Khemka, K. Subramanyam, R. Shyamasundar
This paper discusses the problem of scheduling multiprocessors in a hard real-time environment allowing imprecise computations. The computation of tasks can be divided into mandatory and optional parts; In a feasible schedule, the mandatory part of every task has to be completed before the deadline of the task. The optional part refines the result produced by the mandatory part to reduce the error in the result. The optional parts need not ever be completed. The quality of the result of each job is measured in terms of the average error in the results over several consecutive periods. Thus, given n real-time periodic jobs which has a periodicity and a processing requirement per period, the problem of determining whether there exists a preemptive schedule on m identical or uniform machines which completes the mandatory portion of each job in the time interval in a period is examined. A combination of network flow techniques and convex programming formulation is used to construct a minimum error schedule whenever there exists a feasible schedule. The error produced due to the uncomputed portions of tasks is assumed to be some real-valued convex function of the uncomputed portions.<>
{"title":"Multiprocessors scheduling for imprecise computations in a hard real-time environment","authors":"Ashok Khemka, K. Subramanyam, R. Shyamasundar","doi":"10.1109/IPPS.1993.262909","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262909","url":null,"abstract":"This paper discusses the problem of scheduling multiprocessors in a hard real-time environment allowing imprecise computations. The computation of tasks can be divided into mandatory and optional parts; In a feasible schedule, the mandatory part of every task has to be completed before the deadline of the task. The optional part refines the result produced by the mandatory part to reduce the error in the result. The optional parts need not ever be completed. The quality of the result of each job is measured in terms of the average error in the results over several consecutive periods. Thus, given n real-time periodic jobs which has a periodicity and a processing requirement per period, the problem of determining whether there exists a preemptive schedule on m identical or uniform machines which completes the mandatory portion of each job in the time interval in a period is examined. A combination of network flow techniques and convex programming formulation is used to construct a minimum error schedule whenever there exists a feasible schedule. The error produced due to the uncomputed portions of tasks is assumed to be some real-valued convex function of the uncomputed portions.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122855063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262793
D. Marinescu, J. Rice
The paper investigates the time lost in a parallel computation due to sequential and duplicated work, communication and control, and blocking. It introduces the concept of relative speedup and proposes characterizations of parallel algorithms based upon the communication complexity and the blocking model. The paper discusses the impact of the processor's architecture upon the measured speedup. It shows that a large speedup may be due to an inefficient sequential computation, e.g. due to the cache management, rather than to an efficient parallel computation. A model of parallel computations which takes into account sequential and duplicated work, communication and control and blocking is presented. The paper shows that the scalability of a parallel computation is determined by the communication complexity. The model is used to predict the asymptotic behavior, the maximum speedup and the optimal number of processors. An incore 3D FFT algorithm for distributed memory MIMD systems and a Chebyshev iterative algorithm for solving a linear system of equations are used to illustrate the concepts.<>
{"title":"Speedup, communication complexity and blocking-a La Recherche du Temps Perdu","authors":"D. Marinescu, J. Rice","doi":"10.1109/IPPS.1993.262793","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262793","url":null,"abstract":"The paper investigates the time lost in a parallel computation due to sequential and duplicated work, communication and control, and blocking. It introduces the concept of relative speedup and proposes characterizations of parallel algorithms based upon the communication complexity and the blocking model. The paper discusses the impact of the processor's architecture upon the measured speedup. It shows that a large speedup may be due to an inefficient sequential computation, e.g. due to the cache management, rather than to an efficient parallel computation. A model of parallel computations which takes into account sequential and duplicated work, communication and control and blocking is presented. The paper shows that the scalability of a parallel computation is determined by the communication complexity. The model is used to predict the asymptotic behavior, the maximum speedup and the optimal number of processors. An incore 3D FFT algorithm for distributed memory MIMD systems and a Chebyshev iterative algorithm for solving a linear system of equations are used to illustrate the concepts.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125356725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262785
Mohammed Atiquzzaman, M. S. Akhtar
Multistage interconnection networks (MIN) are used to connect processors to memories in shared memory multiprocessor systems. A generalized Markov chain model for the performance evaluation of a single-buffered Omega network, in the presence of a hot spot, is proposed. The proposed model produces better results than existing models.<>
{"title":"Performance of buffered multistage interconnection networks in non uniform traffic environment","authors":"Mohammed Atiquzzaman, M. S. Akhtar","doi":"10.1109/IPPS.1993.262785","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262785","url":null,"abstract":"Multistage interconnection networks (MIN) are used to connect processors to memories in shared memory multiprocessor systems. A generalized Markov chain model for the performance evaluation of a single-buffered Omega network, in the presence of a hot spot, is proposed. The proposed model produces better results than existing models.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125843713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262809
Vasanth Bala, S. Kipnis
In programming massively parallel computers, it is often necessary to have sets of processes cooperate in performing certain computations and communications. Most run-time libraries require that such sets of processes be explicitly specified in the program. In the Venus run-time communication library however, a Process Group abstraction is used to enable implicit coordination of and communication over dynamically determined sets of processes. The Process Groups mechanism in Venus offers an object-oriented approach for handling sets of processes and enhances the debugging and monitoring of programs. The authors describe the Process Groups mechanism in Venus, illustrate its use on the class of N-body problems, and outline some of the data structures and algorithms used to implement this mechanism in Venus.<>
{"title":"Process Groups: a mechanism for the coordination of and communication among processes in the Venus collective communication library","authors":"Vasanth Bala, S. Kipnis","doi":"10.1109/IPPS.1993.262809","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262809","url":null,"abstract":"In programming massively parallel computers, it is often necessary to have sets of processes cooperate in performing certain computations and communications. Most run-time libraries require that such sets of processes be explicitly specified in the program. In the Venus run-time communication library however, a Process Group abstraction is used to enable implicit coordination of and communication over dynamically determined sets of processes. The Process Groups mechanism in Venus offers an object-oriented approach for handling sets of processes and enhances the debugging and monitoring of programs. The authors describe the Process Groups mechanism in Venus, illustrate its use on the class of N-body problems, and outline some of the data structures and algorithms used to implement this mechanism in Venus.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127496625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262830
C. Hartley, V. Sunderam
Concurrent and distributed computing, using portable software systems or environments on general purpose networked computing platforms, has recently gained widespread attention. Many such systems have been developed, and several are in production use. This project proposes the use of object-oriented techniques to enhance application development and ease of use, and to relieve developers of the complexities inherent in message passing environments. The authors support the relatively well understood shared-object concurrent computation model while providing facilities designed to aid the programmer with partitioning, scheduling, and synchronization in a straightforward, efficient, and portable manner. They describe a shared object toolkit for the PVM distributed computing system and present preliminary results and experiences.<>
{"title":"Concurrent programming with shared objects in networked environments","authors":"C. Hartley, V. Sunderam","doi":"10.1109/IPPS.1993.262830","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262830","url":null,"abstract":"Concurrent and distributed computing, using portable software systems or environments on general purpose networked computing platforms, has recently gained widespread attention. Many such systems have been developed, and several are in production use. This project proposes the use of object-oriented techniques to enhance application development and ease of use, and to relieve developers of the complexities inherent in message passing environments. The authors support the relatively well understood shared-object concurrent computation model while providing facilities designed to aid the programmer with partitioning, scheduling, and synchronization in a straightforward, efficient, and portable manner. They describe a shared object toolkit for the PVM distributed computing system and present preliminary results and experiences.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127521134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262915
P. T. Gaughan, S. Yalamanchili
This paper extends existing work in virtual-channel flow control mechanisms by focusing on the bandwidth allocation issues. It presents an analytical model of the network that captures key phenomena of bandwidth allocation mechanisms in k-ary n-cubes. Tradeoffs are examined between full-duplex vs half-duplex links, purely demand-driven bandwidth allocation vs demand-driven allocation with CTS lookahead, and virtual to physical link assignment with link load balancing.<>
{"title":"Analytical models of bandwidth allocation in pipelined k-ary n-cubes","authors":"P. T. Gaughan, S. Yalamanchili","doi":"10.1109/IPPS.1993.262915","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262915","url":null,"abstract":"This paper extends existing work in virtual-channel flow control mechanisms by focusing on the bandwidth allocation issues. It presents an analytical model of the network that captures key phenomena of bandwidth allocation mechanisms in k-ary n-cubes. Tradeoffs are examined between full-duplex vs half-duplex links, purely demand-driven bandwidth allocation vs demand-driven allocation with CTS lookahead, and virtual to physical link assignment with link load balancing.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124272752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}