Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232635
S. Plimpton, G. Heffelfinger
Presents two parallel algorithms suitable for molecular dynamics simulations over a wide range of sizes, from a few hundred to millions of atoms. One of the algorithms is optimally scalable, offering performance proportional to N/P where N is the number of atoms (or molecules) and P is the number of processors. Their implementation on three MIMD parallel computers (nCUBE2, Intel Gamma, and Intel Delta) and performance on a standard benchmark problem as compared to vector and SIMD implementations is discussed. The authors also briefly describe the integration of one of the algorithms into a widely-used code appropriate for modeling defect dynamics in metals via the embedded atom method.<>
{"title":"Scalable parallel molecular dynamics on MIMD supercomputers","authors":"S. Plimpton, G. Heffelfinger","doi":"10.1109/SHPCC.1992.232635","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232635","url":null,"abstract":"Presents two parallel algorithms suitable for molecular dynamics simulations over a wide range of sizes, from a few hundred to millions of atoms. One of the algorithms is optimally scalable, offering performance proportional to N/P where N is the number of atoms (or molecules) and P is the number of processors. Their implementation on three MIMD parallel computers (nCUBE2, Intel Gamma, and Intel Delta) and performance on a standard benchmark problem as compared to vector and SIMD implementations is discussed. The authors also briefly describe the integration of one of the algorithms into a widely-used code appropriate for modeling defect dynamics in metals via the embedded atom method.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"50 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114020918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232627
E. Barszcz
In this paper, new functions that enable efficient intercube communication on the Intel iPSC/860 are introduced. Communication between multiple cubes (power-of-two number of processor nodes) within the Intel iPSC/860 is a desirable feature to facilitate the implementation of interdisciplinary problems such as the grand challenge problems of the High Performance Computing and Communications Project (HPCCP). Intercube communication allows programs for each discipline to be developed independently on the hypercube and then integrated at the interface boundaries using intercube communication.<>
{"title":"Intercube communication for the iPSC/860","authors":"E. Barszcz","doi":"10.1109/SHPCC.1992.232627","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232627","url":null,"abstract":"In this paper, new functions that enable efficient intercube communication on the Intel iPSC/860 are introduced. Communication between multiple cubes (power-of-two number of processor nodes) within the Intel iPSC/860 is a desirable feature to facilitate the implementation of interdisciplinary problems such as the grand challenge problems of the High Performance Computing and Communications Project (HPCCP). Intercube communication allows programs for each discipline to be developed independently on the hypercube and then integrated at the interface boundaries using intercube communication.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"337 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122749943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232652
D. Grunwald
Existing multicomputers typically use passive, dedicated network interfaces. By comparison, an active interconnect network can manipulate the data in messages transitting through a node; these might use existing systolic processors as the network interface. Active interconnects will become increasingly common in distributed memory multicomputers because they can be used to implement a variety of routing algorithms, reduction operators and efficient memory-fetch operations. The authors measures the effectiveness of distributing load information using active interconnection networks. He does not expect the benefits of distributing load information to solely justify the existence of active interconnection networks; rather, he foresees that active networks will shortly be commonplace, and seeks to measure their benefit for tightly coupled distributed operating systems.<>
{"title":"Load information distribution via active interconnection networks","authors":"D. Grunwald","doi":"10.1109/SHPCC.1992.232652","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232652","url":null,"abstract":"Existing multicomputers typically use passive, dedicated network interfaces. By comparison, an active interconnect network can manipulate the data in messages transitting through a node; these might use existing systolic processors as the network interface. Active interconnects will become increasingly common in distributed memory multicomputers because they can be used to implement a variety of routing algorithms, reduction operators and efficient memory-fetch operations. The authors measures the effectiveness of distributing load information using active interconnection networks. He does not expect the benefits of distributing load information to solely justify the existence of active interconnection networks; rather, he foresees that active networks will shortly be commonplace, and seeks to measure their benefit for tightly coupled distributed operating systems.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134284705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232686
R. Katz
With the dramatic shift towards distributed computing, and its associated client-server model of computation, storage facilities are now found attached to file servers and distributed throughout the network. The paper discusses the underlying technology trends that are leading to high performance network-based storage, namely advances in networks, storage devices, and I/O controller and server architectures. It describes a research prototype, developed at Berkeley, that takes a new approach to high performance computing based on network-attached storage.<>
{"title":"Network-attached storage systems","authors":"R. Katz","doi":"10.1109/SHPCC.1992.232686","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232686","url":null,"abstract":"With the dramatic shift towards distributed computing, and its associated client-server model of computation, storage facilities are now found attached to file servers and distributed throughout the network. The paper discusses the underlying technology trends that are leading to high performance network-based storage, namely advances in networks, storage devices, and I/O controller and server architectures. It describes a research prototype, developed at Berkeley, that takes a new approach to high performance computing based on network-attached storage.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134297772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232630
T. Ngo, L. Snyder
Experiments are presented indicating that on shared-memory machines, programs written in the nonshared-memory programming model generally offer better performance, in addition to being more portable and scalable. The authors study the LU decomposition problem and a molecular dynamics simulation on three shared-memory machines with widely differing architectures, and analyze the results from three perspectives: performance, speedup, and scaling.<>
{"title":"On the influence of programming models on shared memory computer performance","authors":"T. Ngo, L. Snyder","doi":"10.1109/SHPCC.1992.232630","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232630","url":null,"abstract":"Experiments are presented indicating that on shared-memory machines, programs written in the nonshared-memory programming model generally offer better performance, in addition to being more portable and scalable. The authors study the LU decomposition problem and a molecular dynamics simulation on three shared-memory machines with widely differing architectures, and analyze the results from three perspectives: performance, speedup, and scaling.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114434421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232628
S. Bokhari, H. Berryman
The complete exchange ('all-to-all personalized') communication pattern is at the heart of numerous important multicomputer algorithms. Recent research has shown how this pattern can efficiently be performed on circuit-switched hypercubes. However, on circuit-switched meshes, this pattern is difficult to perform efficiently because the sparsity of the mesh interconnect leads to severe link contention. The authors develop a family of algorithms that proceed by recursively carrying out a series of contention-free exchanges on subdivisions of the mesh. Each member of this family is useful for some range of the parameters: mesh size, message size, startup time, and data transmission and permutation rates. The authors describe the performance of their algorithms on the Touchstone Delta mesh.<>
{"title":"Complete exchange on a circuit switched mesh","authors":"S. Bokhari, H. Berryman","doi":"10.1109/SHPCC.1992.232628","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232628","url":null,"abstract":"The complete exchange ('all-to-all personalized') communication pattern is at the heart of numerous important multicomputer algorithms. Recent research has shown how this pattern can efficiently be performed on circuit-switched hypercubes. However, on circuit-switched meshes, this pattern is difficult to perform efficiently because the sparsity of the mesh interconnect leads to severe link contention. The authors develop a family of algorithms that proceed by recursively carrying out a series of contention-free exchanges on subdivisions of the mesh. Each member of this family is useful for some range of the parameters: mesh size, message size, startup time, and data transmission and permutation rates. The authors describe the performance of their algorithms on the Touchstone Delta mesh.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116178507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232674
J. C. Otto
Describes a parallel version of the two-dimensional, chemically reacting CFD code, SPARK. The sequential code has been ported to run on the Intel iPSC/860-based parallel computers. Routines have been added to the code which partition the problem based on the global mesh, and then assign the resulting subdomains across the processors. Two subdomain mappings have been considered. The routines which compute spatial derivatives and the routine which adds artificial viscosity to the discretization were modified to handle the subdomain boundaries interior to the global domain, and an effort has been made to overlap the required communication/computation. Measurements of the performance of the code have been made for two test problems exercising all of the available options of the parallel code thus far. While the parallel efficiency of the code is quite good, the single-node performance has been much lower than expected for this architecture.<>
{"title":"A parallel implementation of the chemically reacting CFD code, SPARK","authors":"J. C. Otto","doi":"10.1109/SHPCC.1992.232674","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232674","url":null,"abstract":"Describes a parallel version of the two-dimensional, chemically reacting CFD code, SPARK. The sequential code has been ported to run on the Intel iPSC/860-based parallel computers. Routines have been added to the code which partition the problem based on the global mesh, and then assign the resulting subdomains across the processors. Two subdomain mappings have been considered. The routines which compute spatial derivatives and the routine which adds artificial viscosity to the discretization were modified to handle the subdomain boundaries interior to the global domain, and an effort has been made to overlap the required communication/computation. Measurements of the performance of the code have been made for two test problems exercising all of the available options of the parallel code thus far. While the parallel efficiency of the code is quite good, the single-node performance has been much lower than expected for this architecture.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124884633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232649
R. Hofman, W. Vree
Distributed control, in this case for scheduling, is a necessity for scalable multiprocessors. Distributed control suffers from incomplete knowledge about the system state: knowledge about remote nodes is outdated, and knowledge is often limited to a neighbourhood. Distributed hierarchical scheduling algorithms suffer less from this information bottleneck. The programming discipline of the authors' Parallel Reduction Machine allows the system to do an estimate of new tasks' execution time and inherent parallelism. The authors use these to derive a consistent load metric and a sophisticated allocation criterion. A natural mapping of new tasks on scheduler levels is found. From simulation studies, the authors find that the performance of their algorithm depends strongly on the quality of the task time estimate. If this estimate is good, their algorithm yields higher speed-ups than the well-known distributed scheduling algorithms that they use as a reference. The number of messages exchanged is much smaller for the authors' hierarchical algorithm.<>
{"title":"Evaluation of distributed hierarchical scheduling with explicit grain size control","authors":"R. Hofman, W. Vree","doi":"10.1109/SHPCC.1992.232649","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232649","url":null,"abstract":"Distributed control, in this case for scheduling, is a necessity for scalable multiprocessors. Distributed control suffers from incomplete knowledge about the system state: knowledge about remote nodes is outdated, and knowledge is often limited to a neighbourhood. Distributed hierarchical scheduling algorithms suffer less from this information bottleneck. The programming discipline of the authors' Parallel Reduction Machine allows the system to do an estimate of new tasks' execution time and inherent parallelism. The authors use these to derive a consistent load metric and a sophisticated allocation criterion. A natural mapping of new tasks on scheduler levels is found. From simulation studies, the authors find that the performance of their algorithm depends strongly on the quality of the task time estimate. If this estimate is good, their algorithm yields higher speed-ups than the well-known distributed scheduling algorithms that they use as a reference. The number of messages exchanged is much smaller for the authors' hierarchical algorithm.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121661882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232687
M. Gerndt
Discusses dependence analysis and transformations for SPMD programs with collective communication operations. In this context transformations have not only to take into account data dependences but have to be designed carefully not to introduce deadlocks. The work described has been done in the context of SUPERB, an interactive parallelization system.<>
{"title":"Program analysis and transformations for message-passing programs","authors":"M. Gerndt","doi":"10.1109/SHPCC.1992.232687","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232687","url":null,"abstract":"Discusses dependence analysis and transformations for SPMD programs with collective communication operations. In this context transformations have not only to take into account data dependences but have to be designed carefully not to introduce deadlocks. The work described has been done in the context of SUPERB, an interactive parallelization system.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132210396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232640
T. Mattson, M. Shifman
Presents a sampling of work applying parallel computation to computational chemistry using the Linda machine-independent parallel programming language. The authors focus on two projects in particular. The first project parallelized the well-known distance geometry program, DGEOM, while the second project looked at a molecular dynamic code. In both cases, the Linda programs were relatively easy to develop and delivered good performance on a variety of MIMD architectures.<>
{"title":"Application of Linda to molecular modeling","authors":"T. Mattson, M. Shifman","doi":"10.1109/SHPCC.1992.232640","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232640","url":null,"abstract":"Presents a sampling of work applying parallel computation to computational chemistry using the Linda machine-independent parallel programming language. The authors focus on two projects in particular. The first project parallelized the well-known distance geometry program, DGEOM, while the second project looked at a molecular dynamic code. In both cases, the Linda programs were relatively easy to develop and delivered good performance on a variety of MIMD architectures.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133191885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}