Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262828
Michael L. Best, A. Greenberg, C. Stanfill, L. W. Tucker
The authors propose a library providing Unix file system support for highly parallel distributed-memory computers. CMMD I/O supports Unix I/O commands on the CM-5 supercomputer. The overall objective of the library is to provide the node level parallel programmer with routines for opening, reading, writing a file, and so forth. The default behavior mimics standard Unix running on each node; individual nodes can independently perform file system operations. New extensions to the standard Unix file descriptor semantics provide for co-operative parallel I/O. New functions provide access to very large (multi-gigabyte) files.<>
{"title":"CMMD I/O: a parallel Unix I/O","authors":"Michael L. Best, A. Greenberg, C. Stanfill, L. W. Tucker","doi":"10.1109/IPPS.1993.262828","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262828","url":null,"abstract":"The authors propose a library providing Unix file system support for highly parallel distributed-memory computers. CMMD I/O supports Unix I/O commands on the CM-5 supercomputer. The overall objective of the library is to provide the node level parallel programmer with routines for opening, reading, writing a file, and so forth. The default behavior mimics standard Unix running on each node; individual nodes can independently perform file system operations. New extensions to the standard Unix file descriptor semantics provide for co-operative parallel I/O. New functions provide access to very large (multi-gigabyte) files.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115161131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262826
S. Gupta, D. Panda
This paper deals with barrier synchronization in wormhole routed distributed-memory multiprocessors. New rendezvous and multirendezvous synchronization primitives are proposed to implement a barrier between two and multiple processors, respectively. These primitives reduce the number of communication steps required to implement a barrier; thus, significantly reducing the synchronization overhead for networks with high communication start-up cost. Two algorithms for barrier synchronization on k-ary n-cube networks are presented. The rendezvous primitive allows one to synchronize all processors in nlog/sub 2/(k) steps. The multirendezvous primitive allows one to synchronize an arbitrary subset of processors in optimal number of communication steps depending on the ratio of the communication start-up (t/sub s/) to the link-propagation (t/sub p/) cost.<>
{"title":"Barrier synchronization in distributed-memory multiprocessors using rendezvous primitives","authors":"S. Gupta, D. Panda","doi":"10.1109/IPPS.1993.262826","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262826","url":null,"abstract":"This paper deals with barrier synchronization in wormhole routed distributed-memory multiprocessors. New rendezvous and multirendezvous synchronization primitives are proposed to implement a barrier between two and multiple processors, respectively. These primitives reduce the number of communication steps required to implement a barrier; thus, significantly reducing the synchronization overhead for networks with high communication start-up cost. Two algorithms for barrier synchronization on k-ary n-cube networks are presented. The rendezvous primitive allows one to synchronize all processors in nlog/sub 2/(k) steps. The multirendezvous primitive allows one to synchronize an arbitrary subset of processors in optimal number of communication steps depending on the ratio of the communication start-up (t/sub s/) to the link-propagation (t/sub p/) cost.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132120287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262894
I. Scherson, R. Subramanian
This paper presents an off-line algorithm for routing permutations on expanded delta networks (EDNs) with restricted access. Restricted access means that the number of elements to be permuted may exceed the number of inputs to the EDN. For every N-element permutation on an M-input EDN, the algorithm computes a routing that takes exactly 3N/M passes (assuming M divides N). On a certain class of EDNs, the number of passes can be reduced to 2N/M. For example, for every 16 K-element permutation on the 1 K-input global network of the MasPar MP-1 and MP-2, the algorithm computes a routing that takes exactly 32 passes. The time complexity of the algorithm is Theta (NlogN) sequentially, and Theta (log/sup 2/N) on an N-processor PRAM.<>
{"title":"Efficient off-line routing of permutations on restricted access expanded delta networks","authors":"I. Scherson, R. Subramanian","doi":"10.1109/IPPS.1993.262894","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262894","url":null,"abstract":"This paper presents an off-line algorithm for routing permutations on expanded delta networks (EDNs) with restricted access. Restricted access means that the number of elements to be permuted may exceed the number of inputs to the EDN. For every N-element permutation on an M-input EDN, the algorithm computes a routing that takes exactly 3N/M passes (assuming M divides N). On a certain class of EDNs, the number of passes can be reduced to 2N/M. For example, for every 16 K-element permutation on the 1 K-input global network of the MasPar MP-1 and MP-2, the algorithm computes a routing that takes exactly 32 passes. The time complexity of the algorithm is Theta (NlogN) sequentially, and Theta (log/sup 2/N) on an N-processor PRAM.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116075581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262885
M. Eshaghian-Wilner, M. Shaaban
Cluster-M is a new parallel programming paradigm for designing portable software. The two main components of this paradigm are cluster-M specifications and cluster-M representations. Cluster-M specifications are high level machine independent parallel code which are mapped onto cluster-M representations, system graphs representing the topologies of the underlying architectures. An algorithm for generating cluster-M representations is presented. Also, a set of high-level constructs essential for writing cluster-M specifications are shown. Using these components, an efficient methodology is proposed to map parallel algorithms onto architectures.<>
{"title":"A cluster-M based mapping methodology","authors":"M. Eshaghian-Wilner, M. Shaaban","doi":"10.1109/IPPS.1993.262885","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262885","url":null,"abstract":"Cluster-M is a new parallel programming paradigm for designing portable software. The two main components of this paradigm are cluster-M specifications and cluster-M representations. Cluster-M specifications are high level machine independent parallel code which are mapped onto cluster-M representations, system graphs representing the topologies of the underlying architectures. An algorithm for generating cluster-M representations is presented. Also, a set of high-level constructs essential for writing cluster-M specifications are shown. Using these components, an efficient methodology is proposed to map parallel algorithms onto architectures.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"20 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113958079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262903
Srinivasan Venkatraman, Alicia Kime, K. Srinivas
The authors present a simple parallel algorithm to height-balance a binary tree. The algorithm accepts any arbitrary binary tree as its input and yields an optimally shaped binary tree. For any arbitrary binary tree of n nodes the algorithm has a time complexity of O(lgn) and utilizes O(n) processors on a EREW PRAM model. The algorithm uses Euler tours and list ranking, which form the building blocks for many parallel algorithms.<>
{"title":"Parallel algorithms for height balancing binary trees","authors":"Srinivasan Venkatraman, Alicia Kime, K. Srinivas","doi":"10.1109/IPPS.1993.262903","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262903","url":null,"abstract":"The authors present a simple parallel algorithm to height-balance a binary tree. The algorithm accepts any arbitrary binary tree as its input and yields an optimally shaped binary tree. For any arbitrary binary tree of n nodes the algorithm has a time complexity of O(lgn) and utilizes O(n) processors on a EREW PRAM model. The algorithm uses Euler tours and list ranking, which form the building blocks for many parallel algorithms.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121066848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262921
T. Johnson
The dramatic improvements in the processing rates of parallel computers are turning many compute-bound jobs into IO-bound jobs. Parallel file systems have been proposed to better match IO throughput to processing power. Many parallel file systems stripe files across numerous disks; each disk has its own controller. A striped file can be appended (or prepended) to and maintain its structure. However, a block can't be inserted into or deleted from the middle of the file, since this would destroy the round robin striping structure of the file. The author presents a distributed file structure that maintains files in indexed striped extents on a message passing multiprocessor. This approach allows highly parallel random and sequential reads, and also allows insertion and deletion into the middle of the file.<>
{"title":"Supporting insertions and deletions in striped parallel filesystems","authors":"T. Johnson","doi":"10.1109/IPPS.1993.262921","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262921","url":null,"abstract":"The dramatic improvements in the processing rates of parallel computers are turning many compute-bound jobs into IO-bound jobs. Parallel file systems have been proposed to better match IO throughput to processing power. Many parallel file systems stripe files across numerous disks; each disk has its own controller. A striped file can be appended (or prepended) to and maintain its structure. However, a block can't be inserted into or deleted from the middle of the file, since this would destroy the round robin striping structure of the file. The author presents a distributed file structure that maintains files in indexed striped extents on a message passing multiprocessor. This approach allows highly parallel random and sequential reads, and also allows insertion and deletion into the middle of the file.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114065302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262834
Insup Lee, S. Rajasekaran
Binary decision diagrams (BDDs) have recently been used in model checking to verify systems with a large number of states (of the order of 5*10/sup 20/). Representing both the state space and the state transition graph as BDDs has been demonstrated to alleviate the problem of state space explosion. But there are limitations to this heuristic approach. Even systems of reasonable complexity have many more states. Also, the BDD approach might fail even on some simple systems. The authors propose the use of parallelism to extend the applicability of BDDs in model checking. They present fast algorithms for model checking that employ BDDs. The algorithms presented are much faster than the best known previous algorithms.<>
{"title":"Fast parallel algorithms for model checking using BDDs","authors":"Insup Lee, S. Rajasekaran","doi":"10.1109/IPPS.1993.262834","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262834","url":null,"abstract":"Binary decision diagrams (BDDs) have recently been used in model checking to verify systems with a large number of states (of the order of 5*10/sup 20/). Representing both the state space and the state transition graph as BDDs has been demonstrated to alleviate the problem of state space explosion. But there are limitations to this heuristic approach. Even systems of reasonable complexity have many more states. Also, the BDD approach might fail even on some simple systems. The authors propose the use of parallelism to extend the applicability of BDDs in model checking. They present fast algorithms for model checking that employ BDDs. The algorithms presented are much faster than the best known previous algorithms.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122464285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262882
David Nassimi
The author presents an efficient O(n) parallel algorithm for finding a minimum-cost spanning forest (MSF) of a weighted undirected planar graph with n/sup 2/ edges, on an n*n mesh-connected computer. He also obtains efficient MSF-based O(n) algorithms for several application problems in image processing. In particular, he shows that an MSF can be used to obtain more efficient and elegant O(n) algorithms for the 'k-width connectivity' problem and the 'optical clustering' problem.<>
{"title":"A parallel MSF algorithm for planar graphs on a mesh and applications to image processing","authors":"David Nassimi","doi":"10.1109/IPPS.1993.262882","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262882","url":null,"abstract":"The author presents an efficient O(n) parallel algorithm for finding a minimum-cost spanning forest (MSF) of a weighted undirected planar graph with n/sup 2/ edges, on an n*n mesh-connected computer. He also obtains efficient MSF-based O(n) algorithms for several application problems in image processing. In particular, he shows that an MSF can be used to obtain more efficient and elegant O(n) algorithms for the 'k-width connectivity' problem and the 'optical clustering' problem.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122834845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262835
R. Bajwa, R. Owens, M. J. Irwin
Image processing applications are suitable candidates for parallelism and have at least in part motivated the design and development of some of the pioneering massively parallel processing systems including the CLIP family, the DAP, the MPP and the GAPP. By exploiting design techniques and architectures suitable for VLSI technology one can now build hardware which provides comparable performance at a fraction of the cost it took for these earlier designs. The authors describe the use of a fine-grained, massively parallel VLSI processor array, the Micro-Grained Array Processor (MGAP) for image processing applications. The array and its support systems, in their current configuration, are designed to be used as a co-processor board in a desk-top workstation. The array can be used for applications other than image processing as well. The versatility of the array and the single broad design provide a cost effective solution for a variety of parallelizable tasks.<>
{"title":"Image processing with the MGAP: a cost effective solution","authors":"R. Bajwa, R. Owens, M. J. Irwin","doi":"10.1109/IPPS.1993.262835","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262835","url":null,"abstract":"Image processing applications are suitable candidates for parallelism and have at least in part motivated the design and development of some of the pioneering massively parallel processing systems including the CLIP family, the DAP, the MPP and the GAPP. By exploiting design techniques and architectures suitable for VLSI technology one can now build hardware which provides comparable performance at a fraction of the cost it took for these earlier designs. The authors describe the use of a fine-grained, massively parallel VLSI processor array, the Micro-Grained Array Processor (MGAP) for image processing applications. The array and its support systems, in their current configuration, are designed to be used as a co-processor board in a desk-top workstation. The array can be used for applications other than image processing as well. The versatility of the array and the single broad design provide a cost effective solution for a variety of parallelizable tasks.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122914100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262906
Jianjian Song
Defining tasks as independent entities with identical execution time and workload as the number of tasks, the author proposes a partially asynchronous and iterative algorithm for distributed load balancing, shows its properties, and reports its simulation results. The algorithm converges geometrically according to a theorem proved elsewhere. He proves that the algorithm can achieve the maximum load imbalance of not more than (/sup d///sub 2/) tasks, where d is the diameter of a network. His simulation of a synchronous version of the algorithm not only validated the properties but also showed that the algorithm could produce much smaller load imbalances for hypercubes. The obtained imbalances for hypercubes of order up to ten were no more than two tasks and 56% of the sample runs produced only one task difference, as opposed to the theoretical maximum of six tasks.<>
{"title":"A partially asynchronous and iterative algorithm for distributed load balancing","authors":"Jianjian Song","doi":"10.1109/IPPS.1993.262906","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262906","url":null,"abstract":"Defining tasks as independent entities with identical execution time and workload as the number of tasks, the author proposes a partially asynchronous and iterative algorithm for distributed load balancing, shows its properties, and reports its simulation results. The algorithm converges geometrically according to a theorem proved elsewhere. He proves that the algorithm can achieve the maximum load imbalance of not more than (/sup d///sub 2/) tasks, where d is the diameter of a network. His simulation of a synchronous version of the algorithm not only validated the properties but also showed that the algorithm could produce much smaller load imbalances for hypercubes. The obtained imbalances for hypercubes of order up to ten were no more than two tasks and 56% of the sample runs produced only one task difference, as opposed to the theoretical maximum of six tasks.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126449985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}