Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232650
F. Hanson, D. Jarvis, H.H. Xu
Data parallel broadcasting methods have been developed by taking the advantages of the properties of stochastic, nonlinear, continuous-time dynamical systems. The stochastic components include both Gaussian and Poisson random white noise. An example of a grand challenge level application is the resource management problem. The purpose of this paper is to demonstrate that broadcasting can be efficiently performed, if the computational functions are FORALL-formed, i.e. arrays are formed using FORALL-loops. Also, it is predicted that the parallel data vault mass storage method becomes efficient and flexible if the computational functions are FORALL-formed.<>
{"title":"Applications of FORALL-formed computations in large scale stochastic dynamic programming","authors":"F. Hanson, D. Jarvis, H.H. Xu","doi":"10.1109/SHPCC.1992.232650","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232650","url":null,"abstract":"Data parallel broadcasting methods have been developed by taking the advantages of the properties of stochastic, nonlinear, continuous-time dynamical systems. The stochastic components include both Gaussian and Poisson random white noise. An example of a grand challenge level application is the resource management problem. The purpose of this paper is to demonstrate that broadcasting can be efficiently performed, if the computational functions are FORALL-formed, i.e. arrays are formed using FORALL-loops. Also, it is predicted that the parallel data vault mass storage method becomes efficient and flexible if the computational functions are FORALL-formed.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130327470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232634
R. P. Weaver, R. Schnabel
Describes an algorithm for automatically mapping and load balancing unstructured, dynamic data structures on distributed memory machines. The algorithm is intended to be embedded in a compiler for a parallel language (DYNO) for programming unstructured numerical computations. The result is that the mapping and load balancing are transparent to the programmer. The algorithm iterates over two basic steps: (1) It identifies groups of nodes ('pieces') that disproportionately contribute to the number of off-processor edges of the data structure and moves them to processors to which they are better connected. (2) It balances the loads by identifying groups of nodes ('flows') that can moved to adjacent processors without creating new pieces. The initial results are promising, giving good load balancing and a reasonably low number of inter-processor edges.<>
{"title":"Automatic mapping and load balancing of pointer-based dynamic data structures on distributed memory machines","authors":"R. P. Weaver, R. Schnabel","doi":"10.1109/SHPCC.1992.232634","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232634","url":null,"abstract":"Describes an algorithm for automatically mapping and load balancing unstructured, dynamic data structures on distributed memory machines. The algorithm is intended to be embedded in a compiler for a parallel language (DYNO) for programming unstructured numerical computations. The result is that the mapping and load balancing are transparent to the programmer. The algorithm iterates over two basic steps: (1) It identifies groups of nodes ('pieces') that disproportionately contribute to the number of off-processor edges of the data structure and moves them to processors to which they are better connected. (2) It balances the loads by identifying groups of nodes ('flows') that can moved to adjacent processors without creating new pieces. The initial results are promising, giving good load balancing and a reasonably low number of inter-processor edges.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134228072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232633
A. L. Cheung, A. Reeves
Performance optimization has ben achieved by a transparent parallel sparse data representation in a data-parallel programming environment. In a sparse data representation, only the non-zero data elements of an array are stored and processed. The parallel sparse data representation is designed to efficiently utilize system resources on multicomputer systems for a broad class of problems; the main focus of this work is on the sparse situations that arise in dense data-parallel algorithms rather than the more traditional sparse linear algebra applications. A number of sparse data formats have been considered; one of these formats has been implemented in a high-level data-parallel programming environment called Paragon. Experimental results have been obtained with a distributed-memory multicomputer system.<>
{"title":"Sparse data representation for data-parallel computation","authors":"A. L. Cheung, A. Reeves","doi":"10.1109/SHPCC.1992.232633","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232633","url":null,"abstract":"Performance optimization has ben achieved by a transparent parallel sparse data representation in a data-parallel programming environment. In a sparse data representation, only the non-zero data elements of an array are stored and processed. The parallel sparse data representation is designed to efficiently utilize system resources on multicomputer systems for a broad class of problems; the main focus of this work is on the sparse situations that arise in dense data-parallel algorithms rather than the more traditional sparse linear algebra applications. A number of sparse data formats have been considered; one of these formats has been implemented in a high-level data-parallel programming environment called Paragon. Experimental results have been obtained with a distributed-memory multicomputer system.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131093135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232626
P. Brezany, M. Gerndt, V. Sipková, H. Zima
Runtime support for parallelization of scientific programs is needed when some information important for decisions in this process cannot be accurately derived at compile time. This paper describes a project which integrates runtime parallelization with the advanced compile-time parallelization techniques of SUPERB. Besides the description of implementation techniques, language constructs are proposed, providing means for the specification of irregular computations. SUPERB is an interactive SIMD/MIMD parallelizing system for the Suprenum, iPSC/860 and Genesis-P machines. The implementation of the runtime parallelization is based on the Parti procedures developed at ICASE NASA.<>
{"title":"SUPERB support for irregular scientific computations","authors":"P. Brezany, M. Gerndt, V. Sipková, H. Zima","doi":"10.1109/SHPCC.1992.232626","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232626","url":null,"abstract":"Runtime support for parallelization of scientific programs is needed when some information important for decisions in this process cannot be accurately derived at compile time. This paper describes a project which integrates runtime parallelization with the advanced compile-time parallelization techniques of SUPERB. Besides the description of implementation techniques, language constructs are proposed, providing means for the specification of irregular computations. SUPERB is an interactive SIMD/MIMD parallelizing system for the Suprenum, iPSC/860 and Genesis-P machines. The implementation of the runtime parallelization is based on the Parti procedures developed at ICASE NASA.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115542424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232689
W. Lee
Reports on the development of a scalable multiple-instruction multiple-data (MIMD) concurrent architecture which is intended to serve as an effective alternative for solving stochastic differential and optimization systems. This architecture has in turn motivated the application of group theory and invariance analysis to acquire further insights in understanding the original problem. The speed-up ratios attained by this architecture can realistically justify its potential deployment in certain real-time applications. A case study related to real-time stochastic control and optimization serve to illustrate this possibility.<>
{"title":"Toward a scalable concurrent architecture for real-time processing of stochastic control and optimization problems","authors":"W. Lee","doi":"10.1109/SHPCC.1992.232689","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232689","url":null,"abstract":"Reports on the development of a scalable multiple-instruction multiple-data (MIMD) concurrent architecture which is intended to serve as an effective alternative for solving stochastic differential and optimization systems. This architecture has in turn motivated the application of group theory and invariance analysis to acquire further insights in understanding the original problem. The speed-up ratios attained by this architecture can realistically justify its potential deployment in certain real-time applications. A case study related to real-time stochastic control and optimization serve to illustrate this possibility.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125094093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232676
A. Stagg, G. Carey, D. Cline, J. Shadid
Reaching new milestones in science and engineering will require the speed and scalability offered by massively parallel computers. The primary challenge to the users of this technology will be the development of scalable software. All the software's functionality, including the generation of grids, the algorithmic solvers, and the production of output for interpretation and visualization, must scale across multiple processors. As an example of the scalable application concept, the authors have developed a highly parallel, scalable version of a parabolized Navier-Stokes (PNS) code used to simulate steady three-dimensional flow past supersonic and hypersonic flight vehicles. The primary goal of this research has been to develop a fully scalable version of the PNS procedure and to demonstrate that it can achieve high performance on a massively parallel, multiple instruction multiple data (MIMD) computer.<>
{"title":"Massively parallel MIMD solution of the parabolized Navier-Stokes equations","authors":"A. Stagg, G. Carey, D. Cline, J. Shadid","doi":"10.1109/SHPCC.1992.232676","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232676","url":null,"abstract":"Reaching new milestones in science and engineering will require the speed and scalability offered by massively parallel computers. The primary challenge to the users of this technology will be the development of scalable software. All the software's functionality, including the generation of grids, the algorithmic solvers, and the production of output for interpretation and visualization, must scale across multiple processors. As an example of the scalable application concept, the authors have developed a highly parallel, scalable version of a parabolized Navier-Stokes (PNS) code used to simulate steady three-dimensional flow past supersonic and hypersonic flight vehicles. The primary goal of this research has been to develop a fully scalable version of the PNS procedure and to demonstrate that it can achieve high performance on a massively parallel, multiple instruction multiple data (MIMD) computer.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121957569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232680
H. Sato, Y. Tanaka, H. Iwama, S. Kawakika, M. Saito, K. Morikami, T. Yao, S. Tsutsumi
The authors have parallelized the AMBER molecular dynamics program for the AP1000 highly parallel computer. To obtain a high degree of parallelism and an even load balance between processors for model problems of protein and water molecules, protein amino acid residues and water molecules are distributed to processors randomly. Global interprocessor communication required by this data mapping is efficiently done using the AP1000 broadcast network, to broadcast atom coordinate data for other processors' reference and its torus network; also for point-to-point communication to accumulate forces for atoms assigned to other processors. Experiments showed that a problem with 41095 atoms is processed 226 times faster with a 512 processor AP1000 than by a single processor.<>
{"title":"Parallelization of AMBER molecular dynamics program for the AP1000 highly parallel computer","authors":"H. Sato, Y. Tanaka, H. Iwama, S. Kawakika, M. Saito, K. Morikami, T. Yao, S. Tsutsumi","doi":"10.1109/SHPCC.1992.232680","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232680","url":null,"abstract":"The authors have parallelized the AMBER molecular dynamics program for the AP1000 highly parallel computer. To obtain a high degree of parallelism and an even load balance between processors for model problems of protein and water molecules, protein amino acid residues and water molecules are distributed to processors randomly. Global interprocessor communication required by this data mapping is efficiently done using the AP1000 broadcast network, to broadcast atom coordinate data for other processors' reference and its torus network; also for point-to-point communication to accumulate forces for atoms assigned to other processors. Experiments showed that a problem with 41095 atoms is processed 226 times faster with a 512 processor AP1000 than by a single processor.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128074128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232642
R. Ponnusamy, J. Saltz, R. Das
In scalable multiprocessor systems, high performance demands that computational load be balanced evenly among processors and that interprocessor communication be limited as much as possible. In this paper, the authors study the problem of automatically choosing data distributions for irregular problems. Irregular problems are programs where the data access pattern cannot be determined during compilation. The authors describe a method by which data arrays can be automatically mapped at runtime. The mapping is based on the computational patterns in one or more user-specified loops. A distributed memory compiler generates code that, at runtime, generates a distributed data structure to represent the computational pattern of the chosen loop. This computational pattern is used to determine how data arrays are to be partitioned. The compiler generates code to pass the distributed data structure to a partitioner. The work described is being pursued in the context of the CRPC Fortran D project.<>
{"title":"A runtime data mapping scheme for irregular problems","authors":"R. Ponnusamy, J. Saltz, R. Das","doi":"10.1109/SHPCC.1992.232642","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232642","url":null,"abstract":"In scalable multiprocessor systems, high performance demands that computational load be balanced evenly among processors and that interprocessor communication be limited as much as possible. In this paper, the authors study the problem of automatically choosing data distributions for irregular problems. Irregular problems are programs where the data access pattern cannot be determined during compilation. The authors describe a method by which data arrays can be automatically mapped at runtime. The mapping is based on the computational patterns in one or more user-specified loops. A distributed memory compiler generates code that, at runtime, generates a distributed data structure to represent the computational pattern of the chosen loop. This computational pattern is used to determine how data arrays are to be partitioned. The compiler generates code to pass the distributed data structure to a partitioner. The work described is being pursued in the context of the CRPC Fortran D project.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125615629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232654
L. Freitag, J. Ortega
Uses the reduced system conjugate gradient algorithm to find the solution of large, sparse, symmetric, positive definite systems of linear equations arising from finite difference discretization of the generalized Helmholtz equation. The authors examine in detail three spatial domain decompositions on distributed memory machines. They use a two-step damped Jacobi preconditioner for the Schur complement system and find that although the number of iterations required for convergence is nearly halved, overall solution time is slightly increased. The authors introduce a modification to the preconditioner in order to reduce overhead.<>
{"title":"Parallel solution of the generalized Helmholtz equation","authors":"L. Freitag, J. Ortega","doi":"10.1109/SHPCC.1992.232654","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232654","url":null,"abstract":"Uses the reduced system conjugate gradient algorithm to find the solution of large, sparse, symmetric, positive definite systems of linear equations arising from finite difference discretization of the generalized Helmholtz equation. The authors examine in detail three spatial domain decompositions on distributed memory machines. They use a two-step damped Jacobi preconditioner for the Schur complement system and find that although the number of iterations required for convergence is nearly halved, overall solution time is slightly increased. The authors introduce a modification to the preconditioner in order to reduce overhead.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129362056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232659
I. Angus
Parallel computers have been used to solve computational fluid dynamics (CFD) problems for many years; however, while the hardware has greatly improved, the software methods for describing CFD algorithms have remained largely unchanged. From the physics and software engineering points of view, the boundary conditions consume most of the algorithmic development and programming time, but only a small part of the execution time. This paper describes a methodology that eliminates most of the coding work that is required to implement boundary conditions thereby freeing the researcher to concentrate his time on the algorithms.<>
{"title":"An object oriented approach to boundary conditions in finite difference fluid dynamics codes","authors":"I. Angus","doi":"10.1109/SHPCC.1992.232659","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232659","url":null,"abstract":"Parallel computers have been used to solve computational fluid dynamics (CFD) problems for many years; however, while the hardware has greatly improved, the software methods for describing CFD algorithms have remained largely unchanged. From the physics and software engineering points of view, the boundary conditions consume most of the algorithmic development and programming time, but only a small part of the execution time. This paper describes a methodology that eliminates most of the coding work that is required to implement boundary conditions thereby freeing the researcher to concentrate his time on the algorithms.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127699595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}