Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556403
H. Hadimioglu, R. J. Flynn
The architectural design and analysis of a tightly coupled distributed hypercube file system are presented. The file system is composed of an 1/0 organization and a software interface. The analysis and design are made by varying a number of parameters for the matrix multiplication application on a hypercube simulator. The parameters form a three dimensional test space : Parameters of the 1/0 organization, the software interface and the application. Performance results are presented for a hypercube with and without a distributed file system.
{"title":"The Design and Analysis of A Tightly Coupled Hypercube File System","authors":"H. Hadimioglu, R. J. Flynn","doi":"10.1109/DMCC.1990.556403","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556403","url":null,"abstract":"The architectural design and analysis of a tightly coupled distributed hypercube file system are presented. The file system is composed of an 1/0 organization and a software interface. The analysis and design are made by varying a number of parameters for the matrix multiplication application on a hypercube simulator. The parameters form a three dimensional test space : Parameters of the 1/0 organization, the software interface and the application. Performance results are presented for a hypercube with and without a distributed file system.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116881707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556312
W. Griswold, G. Harrison, D. Notkin, L. Snyder
Writing parallel programs that scale-that is, that naturally and efficiently adapt to the size of the problem and the number of processors available-is difficult for two reasons. First, the overhead of multiplexing the processing of data points assigned to a given processor is often great. Second, to achieve scaling in asymptotic performance, the algorithm that uses the interprocessor communication structure may need to differ from the algorithm used to process points located within an individual processor. We present abstractions intended to overcome these problems, making it straightforward to define scalable parallel program. The central abstraction is an ensemble,. which gives programmers a global view of physically distributed data, computation, and communication. We demonstrate the application of these ensembles to two variants of Batcher’s sort, describing how the concepts apply to other parallel programs.
{"title":"Scalable Abstractions for Parallel Programming","authors":"W. Griswold, G. Harrison, D. Notkin, L. Snyder","doi":"10.1109/DMCC.1990.556312","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556312","url":null,"abstract":"Writing parallel programs that scale-that is, that naturally and efficiently adapt to the size of the problem and the number of processors available-is difficult for two reasons. First, the overhead of multiplexing the processing of data points assigned to a given processor is often great. Second, to achieve scaling in asymptotic performance, the algorithm that uses the interprocessor communication structure may need to differ from the algorithm used to process points located within an individual processor. We present abstractions intended to overcome these problems, making it straightforward to define scalable parallel program. The central abstraction is an ensemble,. which gives programmers a global view of physically distributed data, computation, and communication. We demonstrate the application of these ensembles to two variants of Batcher’s sort, describing how the concepts apply to other parallel programs.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"126 30","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120937237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555395
Mingxian Xu, J. J. Miller, E. Wegman
The purpose of this paper is to present a parallel implementation of multiple linear regression. We discuss the multiple linear regression model. Traditionally parallelism has been used for either speed-up or redundancy (hence reliability). With stochastic data, by clever parsing and algorithm development, it is possible to achieve both speed and reliability enhancement. We demonstrate this with multiple linear regression. Other examples include kernel estimation and bootstrapping.
{"title":"Parallelizing Multiple Linear Regression for Speed and Redundancy: An Empirical Study","authors":"Mingxian Xu, J. J. Miller, E. Wegman","doi":"10.1109/DMCC.1990.555395","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555395","url":null,"abstract":"The purpose of this paper is to present a parallel implementation of multiple linear regression. We discuss the multiple linear regression model. Traditionally parallelism has been used for either speed-up or redundancy (hence reliability). With stochastic data, by clever parsing and algorithm development, it is possible to achieve both speed and reliability enhancement. We demonstrate this with multiple linear regression. Other examples include kernel estimation and bootstrapping.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125813806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555435
T. Taha
Taha and Ablowitz derived numerkal schemes by methods related to the inverse scattering bransform (IST) for phy;ically important equations such as the Korteweg-de Vries (KdV) and modified Korteweg-de Vries (MKdV) equations. Experiments have shown that the IST numerical schemes compare very favorably with other numerical methods. In this paper an accurate numerical scheme based on the IST is used to solve non-integrable higher KdV equations, for instance:
{"title":"A Parallel Algorithm for Solving Higher KdV Equations on a Hypercube","authors":"T. Taha","doi":"10.1109/DMCC.1990.555435","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555435","url":null,"abstract":"Taha and Ablowitz derived numerkal schemes by methods related to the inverse scattering bransform (IST) for phy;ically important equations such as the Korteweg-de Vries (KdV) and modified Korteweg-de Vries (MKdV) equations. Experiments have shown that the IST numerical schemes compare very favorably with other numerical methods. In this paper an accurate numerical scheme based on the IST is used to solve non-integrable higher KdV equations, for instance:","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"194 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123012615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556406
Tze Chiang Lee
We investigate the design of fault-tolerant embedding functions of application graphs into hypercubes with the aim of minimizing the recovery cost and performance degradation due to faults. The recovery cost is measured by the number of node-state changes or recovery steps. Performance is measured by the dilation of the embedding, which is the maximum distance between the embedded images of two nodes that are adjacent in the application graph. The basic idea is to embed application graphs so that spare nodes are always close to failed nodes whenever reconfiguration occurs. We develop 1FT and 2-FT embeddings for paths, even-length loops, meshes, toruses and complete binary trees into hypercubes. Embeddings with higher fault tolerance are also obtained for meshes and toruses. The processor utilization of these embeddings is reasonably high and most of them take the minimum number of recovery steps.
{"title":"Quick Recovery of Embedded Structures in Hypercube Computers","authors":"Tze Chiang Lee","doi":"10.1109/DMCC.1990.556406","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556406","url":null,"abstract":"We investigate the design of fault-tolerant embedding functions of application graphs into hypercubes with the aim of minimizing the recovery cost and performance degradation due to faults. The recovery cost is measured by the number of node-state changes or recovery steps. Performance is measured by the dilation of the embedding, which is the maximum distance between the embedded images of two nodes that are adjacent in the application graph. The basic idea is to embed application graphs so that spare nodes are always close to failed nodes whenever reconfiguration occurs. We develop 1FT and 2-FT embeddings for paths, even-length loops, meshes, toruses and complete binary trees into hypercubes. Embeddings with higher fault tolerance are also obtained for meshes and toruses. The processor utilization of these embeddings is reasonably high and most of them take the minimum number of recovery steps.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"245 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114393039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556400
Ajay K. Gupta, Susanne E. Hambrusch
Abstract In this paper we consider the problem of embedding r guest networks G0, ..., Gr−1, into a k-dimensional hypercube H so that every processor of H is assigned at most r guest processors and dilation and congestion are minimized. Network G, can be a complete binary tree, a leap tree, a linear array, or a mesh. We show that r such guest networks can simultaneously be embedded into H without a significant increase in dilation and congestion compared to the embedding of a single network when r ≤ k. For r > k, the increase in the cost measures is proportional to r/k. We consider two models which differ in the requirements imposed on the r guest processors assigned to a processor of H.
{"title":"Multiple Network Embedding into Hybercubes","authors":"Ajay K. Gupta, Susanne E. Hambrusch","doi":"10.1109/DMCC.1990.556400","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556400","url":null,"abstract":"Abstract In this paper we consider the problem of embedding r guest networks G0, ..., Gr−1, into a k-dimensional hypercube H so that every processor of H is assigned at most r guest processors and dilation and congestion are minimized. Network G, can be a complete binary tree, a leap tree, a linear array, or a mesh. We show that r such guest networks can simultaneously be embedded into H without a significant increase in dilation and congestion compared to the embedding of a single network when r ≤ k. For r > k, the increase in the cost measures is proportional to r/k. We consider two models which differ in the requirements imposed on the r guest processors assigned to a processor of H.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128975437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556288
M. Chan, Shiang-Jen Lee
{"title":"Distributed Fault-Tolerant Embedding of Rings in Hypercubes","authors":"M. Chan, Shiang-Jen Lee","doi":"10.1109/DMCC.1990.556288","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556288","url":null,"abstract":"","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127000985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556280
A. Skjellum, A. Leung
Sophisticated multicomputer applications require efficient, flexible, convenient underlying communication primitives. In the work described here, Zipcode, a new, portable communication library, has been designed, developed, articulated and evaluated. The primary goals were: high efficiency compared to lowest-level primitives, user-definable message receipt selectivity, as well as abstraction of collections of processes and message selectivity to allow multiple, independently conceived libraries to work together without conflict. Zipcode works atop the Caltech Reactive Kernel, a portable, minimalistic multicomputer node operating system. Presently, the Reactive Kernel is implemented for Intel iPSC/1, iPSC/2, and Symult s2010 multicomputers and emulated on shared-memory computers as well as networks of Sun workstations. Consequently, Zipcode addresses an equally wide audience, and can plausibly be run in other environments.
{"title":"A Portable Multicomputer Communication Library atop the Reactive Kernel","authors":"A. Skjellum, A. Leung","doi":"10.1109/DMCC.1990.556280","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556280","url":null,"abstract":"Sophisticated multicomputer applications require efficient, \u0000flexible, convenient underlying communication primitives. \u0000In the work described here, Zipcode, a new, portable communication library, has been designed, developed, articulated and evaluated. The primary goals were: high efficiency compared to lowest-level primitives, user-definable message receipt selectivity, as well as abstraction of collections of processes and message selectivity to allow multiple, independently conceived libraries to work together without conflict. \u0000 \u0000Zipcode works atop the Caltech Reactive Kernel, a portable, minimalistic multicomputer node operating system. Presently, the Reactive Kernel is implemented for Intel iPSC/1, iPSC/2, and Symult s2010 multicomputers and emulated on shared-memory computers as well as networks of Sun workstations. Consequently, Zipcode addresses an equally wide audience, and can plausibly be run in other environments.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130648269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555373
S. Miguet, Y. Robert
In this paper, we discuss the implementation of Bitz and Kungs path planning algorithm on a ring of generalpurpose processors. We show that Bitz and Kung's algorithm, originally designed for the Warp machine, is not efficient in this context, due to the intensive interprocessor communications that it requires. We design a modified version that performs much better. The new version updates a segment of k positions within a step and allocates blocks of r consecutive rows of the map to the processors in a wraparound fashion. Bitz and Kung's algorithm corresponds to the situation (k,r) = (I ,I). We analytically determine the optimal values of the parameters (k,r) which minimize the parallel execution time as a function of the problem size n and of the number of processors p. The theoretical results are nicely corroborated by numerical experiments on a ring of 32 Transputers. Kung's algorithm is not efficient in the context of general purpose processors, due to the intensive communication scheme that it requires.
{"title":"Path Planning on a Distributed Memory Computer","authors":"S. Miguet, Y. Robert","doi":"10.1109/DMCC.1990.555373","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555373","url":null,"abstract":"In this paper, we discuss the implementation of Bitz and Kungs path planning algorithm on a ring of generalpurpose processors. We show that Bitz and Kung's algorithm, originally designed for the Warp machine, is not efficient in this context, due to the intensive interprocessor communications that it requires. We design a modified version that performs much better. The new version updates a segment of k positions within a step and allocates blocks of r consecutive rows of the map to the processors in a wraparound fashion. Bitz and Kung's algorithm corresponds to the situation (k,r) = (I ,I). We analytically determine the optimal values of the parameters (k,r) which minimize the parallel execution time as a function of the problem size n and of the number of processors p. The theoretical results are nicely corroborated by numerical experiments on a ring of 32 Transputers. Kung's algorithm is not efficient in the context of general purpose processors, due to the intensive communication scheme that it requires.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"160 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123420156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}