This paper consists of two parts. In the first part, two new algorithms for deadlock- and livelock-free wormhole routing in the torus network are presented. The first algorithm, called Channels, is for the n-dimensional torus network. This technique is fully-adaptive minimal, that is, all paths with a minimal number of hops from source to destination are available for routing, and needs only five virtual channels per bidirectional link, the lowest channel requirement known in the literature for fully-adaptive minimal worm-hole routing. In addition, this result also yields the lowest buffer requirement known in the literature for packet-switched fully-adaptive minimal routing. The second algorithm, called 4-Classes, is for the bidimensional torus network. This technique is fully-adaptive minimal and requires only eight virtual channels per bidirectional link. Also, it allows for a highly parallel implementation of its associated routing node. In the second part of this paper, four worm-hole routing techniques for the two-dimensional torus are experimentally evaluated using a dynamic message injection model and different traffic patterns and message lengths. >
{"title":"Adaptive deadlock- and livelock-free routing with all minimal paths in Torus networks","authors":"L. Gravano, G. Pifarré, P. E. Berman, J. Sanz","doi":"10.1145/140901.140902","DOIUrl":"https://doi.org/10.1145/140901.140902","url":null,"abstract":"This paper consists of two parts. In the first part, two new algorithms for deadlock- and livelock-free wormhole routing in the torus network are presented. The first algorithm, called Channels, is for the n-dimensional torus network. This technique is fully-adaptive minimal, that is, all paths with a minimal number of hops from source to destination are available for routing, and needs only five virtual channels per bidirectional link, the lowest channel requirement known in the literature for fully-adaptive minimal worm-hole routing. In addition, this result also yields the lowest buffer requirement known in the literature for packet-switched fully-adaptive minimal routing. The second algorithm, called 4-Classes, is for the bidimensional torus network. This technique is fully-adaptive minimal and requires only eight virtual channels per bidirectional link. Also, it allows for a highly parallel implementation of its associated routing node. In the second part of this paper, four worm-hole routing techniques for the two-dimensional torus are experimentally evaluated using a dynamic message injection model and different traffic patterns and message lengths. >","PeriodicalId":13128,"journal":{"name":"IEEE Trans. Parallel Distributed Syst.","volume":"54 1","pages":"1233-1251"},"PeriodicalIF":0.0,"publicationDate":"1992-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79583325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Garbage collection algorithms for shared-memory multiprocessors typically rely on some form of global synchro- nization to preserve consistency. Such global synchronization may lead to problems on asynchronous architectures: if one process is halted or delayed, other, nonfaulty processes will be unable to progress. By contrast, a storage management algorithm is lock- free if (in the absence of resource exhaustion) a process that is allocating or collecting memory can be delayed at any point without forcing other processes to block. This paper presents the first algorithm for lock-free garbage collection in a realistic model. The algorithm assumes that processes synchronize by applying read, write, and cornpre&swap operations to shared memory. This algorithm uses no locks, busy-waiting, or barrier synchronization, it does not assume that processes can observe or modify one another's local variables or registers, and it does not use inter-process interrupts. Index Terms-Garbage collection, lock-free algorithms, mem- ory management, multiprocessors, shared memory, wait-free al- gorithms.
{"title":"Lock-free garbage collection for multiprocessors","authors":"M. Herlihy, J. Moss","doi":"10.1145/113379.113400","DOIUrl":"https://doi.org/10.1145/113379.113400","url":null,"abstract":"Garbage collection algorithms for shared-memory multiprocessors typically rely on some form of global synchro- nization to preserve consistency. Such global synchronization may lead to problems on asynchronous architectures: if one process is halted or delayed, other, nonfaulty processes will be unable to progress. By contrast, a storage management algorithm is lock- free if (in the absence of resource exhaustion) a process that is allocating or collecting memory can be delayed at any point without forcing other processes to block. This paper presents the first algorithm for lock-free garbage collection in a realistic model. The algorithm assumes that processes synchronize by applying read, write, and cornpre&swap operations to shared memory. This algorithm uses no locks, busy-waiting, or barrier synchronization, it does not assume that processes can observe or modify one another's local variables or registers, and it does not use inter-process interrupts. Index Terms-Garbage collection, lock-free algorithms, mem- ory management, multiprocessors, shared memory, wait-free al- gorithms.","PeriodicalId":13128,"journal":{"name":"IEEE Trans. Parallel Distributed Syst.","volume":"47 1","pages":"304-311"},"PeriodicalIF":0.0,"publicationDate":"1991-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78856881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An efficient algorithm to remove redundant dependences in simple loops with constant dependences is presented. Dependences constrain the parallel execution of programs and are typically enforced by synchronization instructions. The synchronization instructions represent a significant part of the overhead in the parallel execution of a program. Some program dependences are redundant because they are covered by other dependences. It is shown that unlike with single loops, in the case of nested loops, a particular dependence may be redundant at some iterations but not redundant at others, so that the redundancy of a dependence may not be uniform over the entire iteration space. A sufficient condition for the uniformity of redundancy in a doubly nested loop is developed. >
{"title":"Removal of redundant dependences in DOACROSS loops with constant dependences","authors":"V. P. Krothapalli, P. Sadayappan","doi":"10.1145/109625.109632","DOIUrl":"https://doi.org/10.1145/109625.109632","url":null,"abstract":"An efficient algorithm to remove redundant dependences in simple loops with constant dependences is presented. Dependences constrain the parallel execution of programs and are typically enforced by synchronization instructions. The synchronization instructions represent a significant part of the overhead in the parallel execution of a program. Some program dependences are redundant because they are covered by other dependences. It is shown that unlike with single loops, in the case of nested loops, a particular dependence may be redundant at some iterations but not redundant at others, so that the redundancy of a dependence may not be uniform over the entire iteration space. A sufficient condition for the uniformity of redundancy in a doubly nested loop is developed. >","PeriodicalId":13128,"journal":{"name":"IEEE Trans. Parallel Distributed Syst.","volume":"143 1","pages":"281-289"},"PeriodicalIF":0.0,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73490627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many parallel algorithms are naturally expressed at a fine level of granularity, often finer than a MIMD parallel system can exploit efficiently. Most builders of parallel systems have looked to either the programmer or a parallelizing compiler to increase the granularity of such algorithms. In this paper we explore a third approach to the granularity problem by analyzing two strategies for combining parallel tasks dynamically at run-time. We reject the simpler load-based inlining method, where tasks are combined based on dynamic load level, in favor of the safer and more robust lazy task creation method, where tasks are created only retroactively as processing resources become available. These strategies grew out of work on Mul-T [14], an efficient parallel implementation of Scheme, but could be used with other applicative languages as well. We describe our Mul-T implementations of lazy task creation for two contrasting machines, and present performance statistics which show the method's effectiveness. Lazy task creation allows efficient execution of naturally expressed algorithms of a substantially finer grain than possible with previous parallel Lisp systems.
{"title":"Lazy task creation: a technique for increasing the granularity of parallel programs","authors":"Eric Mohr, D. Kranz, Robert H. Halstead","doi":"10.1145/91556.91631","DOIUrl":"https://doi.org/10.1145/91556.91631","url":null,"abstract":"Many parallel algorithms are naturally expressed at a fine level of granularity, often finer than a MIMD parallel system can exploit efficiently. Most builders of parallel systems have looked to either the programmer or a parallelizing compiler to increase the granularity of such algorithms. In this paper we explore a third approach to the granularity problem by analyzing two strategies for combining parallel tasks dynamically at run-time. We reject the simpler load-based inlining method, where tasks are combined based on dynamic load level, in favor of the safer and more robust lazy task creation method, where tasks are created only retroactively as processing resources become available.\u0000These strategies grew out of work on Mul-T [14], an efficient parallel implementation of Scheme, but could be used with other applicative languages as well. We describe our Mul-T implementations of lazy task creation for two contrasting machines, and present performance statistics which show the method's effectiveness. Lazy task creation allows efficient execution of naturally expressed algorithms of a substantially finer grain than possible with previous parallel Lisp systems.","PeriodicalId":13128,"journal":{"name":"IEEE Trans. Parallel Distributed Syst.","volume":"7 1","pages":"264-280"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81147031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Foreword to the First Issue of the Transactions on Parallel and Distributed Systems","authors":"H. Wood","doi":"10.1109/TPDS.1990.10000","DOIUrl":"https://doi.org/10.1109/TPDS.1990.10000","url":null,"abstract":"","PeriodicalId":13128,"journal":{"name":"IEEE Trans. Parallel Distributed Syst.","volume":"102 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75813152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}