Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232692
B. Stramm, F. Berman
The paper introduces the retargetable program-sensitive (RPS) model which predicts the performance of static, data-independent parallel programs mapped to message-passing multicomputers. It shows that the model accurately predicts the performance of mapped programs by comparing RPS predictions to actual execution times in the Poker parallel programming environment. The paper also previews plans for further verification of the model on the NCube2 and other multicomputers.<>
{"title":"Predicting the performance of large programs on scalable multicomputers","authors":"B. Stramm, F. Berman","doi":"10.1109/SHPCC.1992.232692","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232692","url":null,"abstract":"The paper introduces the retargetable program-sensitive (RPS) model which predicts the performance of static, data-independent parallel programs mapped to message-passing multicomputers. It shows that the model accurately predicts the performance of mapped programs by comparing RPS predictions to actual execution times in the Poker parallel programming environment. The paper also previews plans for further verification of the model on the NCube2 and other multicomputers.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"279 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123149137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232644
E. Paalvast, L. Breebaart, H. Sips
This paper illustrates two major points. First, the authors discuss a general, conceptual model for SPMD program generating systems, and demonstrate that this model allows one to capture a broad range of different program semantics. Second, they show that it is possible to fit the concepts of this model into an annotation language that allows an SPMD program generating system to fully utilize all the possibilities present in the model.<>
{"title":"An expressive annotation model for generating SPMD programs","authors":"E. Paalvast, L. Breebaart, H. Sips","doi":"10.1109/SHPCC.1992.232644","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232644","url":null,"abstract":"This paper illustrates two major points. First, the authors discuss a general, conceptual model for SPMD program generating systems, and demonstrate that this model allows one to capture a broad range of different program semantics. Second, they show that it is possible to fit the concepts of this model into an annotation language that allows an SPMD program generating system to fully utilize all the possibilities present in the model.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117266247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232693
J. Challinger
Presents results of investigations into techniques for volume rendering using parallel processing on a multiple-instruction, multiple-data (MIMD) architecture that has a non-uniform access, shared memory. In particular, two parallel algorithms are given for volume rendering of curvilinear volumes. These two algorithms have been implemented on a BBN TC2000, and their performance has been measured and analyzed.<>
{"title":"Parallel volume rendering for curvilinear volumes","authors":"J. Challinger","doi":"10.1109/SHPCC.1992.232693","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232693","url":null,"abstract":"Presents results of investigations into techniques for volume rendering using parallel processing on a multiple-instruction, multiple-data (MIMD) architecture that has a non-uniform access, shared memory. In particular, two parallel algorithms are given for volume rendering of curvilinear volumes. These two algorithms have been implemented on a BBN TC2000, and their performance has been measured and analyzed.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127455327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232664
B. Ghosh, M. Schultz
Describes an approach towards providing an efficient Level-3 BLAS library over a variety of parallel architectures using C-Linda. A blocked linear algebra program calling the sequential Level-3 BLAS can now run on both shared and distributed memory environments (which support Linda) by simply replacing each call by a call to the corresponding parallel Linda Level-3 BLAS. The authors summarise some of the implementation and algorithmic issues related to the matrix multiplication subroutine. All the various matrix algorithms being block-structured, they are particularly interested in parallel computers with hierarchical memory systems. Experimental data for their implementations show substantial speedups on shared memory, disjoint memory and networked configurations of processors. The authors also present the use of their parallel subroutines in blocked dense LU decomposition and present some preliminary experimental data.<>
{"title":"Portable parallel Level-3 BLAS in Linda","authors":"B. Ghosh, M. Schultz","doi":"10.1109/SHPCC.1992.232664","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232664","url":null,"abstract":"Describes an approach towards providing an efficient Level-3 BLAS library over a variety of parallel architectures using C-Linda. A blocked linear algebra program calling the sequential Level-3 BLAS can now run on both shared and distributed memory environments (which support Linda) by simply replacing each call by a call to the corresponding parallel Linda Level-3 BLAS. The authors summarise some of the implementation and algorithmic issues related to the matrix multiplication subroutine. All the various matrix algorithms being block-structured, they are particularly interested in parallel computers with hierarchical memory systems. Experimental data for their implementations show substantial speedups on shared memory, disjoint memory and networked configurations of processors. The authors also present the use of their parallel subroutines in blocked dense LU decomposition and present some preliminary experimental data.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116390614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232655
C. E. Fineman, P. Hontalas
The field of parallel processing is going through an important evolution in technology characterized by a significant increase in the number of processors within such systems. As the number of processors increases, the conventional techniques for monitoring the performance of parallel systems will produce large amounts of data in the form of event trace files. The authors propose one possible solution to this data size problem: performance metric predicates. These predicates permit the user to define performance parameters that control the output of event trace data during the application's execution time. The authors assert that the use of performance metric predicates provides a powerful and useful tool for the control of event trace data output from large, complex systems.<>
{"title":"Selective monitoring using performance metric predicates","authors":"C. E. Fineman, P. Hontalas","doi":"10.1109/SHPCC.1992.232655","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232655","url":null,"abstract":"The field of parallel processing is going through an important evolution in technology characterized by a significant increase in the number of processors within such systems. As the number of processors increases, the conventional techniques for monitoring the performance of parallel systems will produce large amounts of data in the form of event trace files. The authors propose one possible solution to this data size problem: performance metric predicates. These predicates permit the user to define performance parameters that control the output of event trace data during the application's execution time. The authors assert that the use of performance metric predicates provides a powerful and useful tool for the control of event trace data output from large, complex systems.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126667794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232658
L.S. Auvil, C. Ribbens, L. T. Watson
Considers general-purpose and problem-specific tools for parallel problem solving. A comparison is made between the two approaches, in terms of effort and usefulness, for two example problems. The advantages of special-purpose, problem-specific environments are described, and the effort required to construct such environments is seen to be reasonable.<>
{"title":"Problem specific environments for parallel computing","authors":"L.S. Auvil, C. Ribbens, L. T. Watson","doi":"10.1109/SHPCC.1992.232658","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232658","url":null,"abstract":"Considers general-purpose and problem-specific tools for parallel problem solving. A comparison is made between the two approaches, in terms of effort and usefulness, for two example problems. The advantages of special-purpose, problem-specific environments are described, and the effort required to construct such environments is seen to be reasonable.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124165667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232683
F. André, T. Priol
Programming distributed memory parallel computers with explicit message passing refrains the use of this type of architecture. The objective is to provide a programmed environment which will hide the message passing aspects of DMPCs, and will allow the use of traditional languages as input. The paper describes two different approaches which satisfy this goal: a compiler which translates sequential code into distributed parallel processes and a shared virtual memory which offers to the user a global address space. Examples and results for both mechanisms are given. The hope and the interest of each approach is outlined.<>
{"title":"Programming distributed memory parallel computers without explicit message passing","authors":"F. André, T. Priol","doi":"10.1109/SHPCC.1992.232683","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232683","url":null,"abstract":"Programming distributed memory parallel computers with explicit message passing refrains the use of this type of architecture. The objective is to provide a programmed environment which will hide the message passing aspects of DMPCs, and will allow the use of traditional languages as input. The paper describes two different approaches which satisfy this goal: a compiler which translates sequential code into distributed parallel processes and a shared virtual memory which offers to the user a global address space. Examples and results for both mechanisms are given. The hope and the interest of each approach is outlined.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134078236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232690
S. Bhatt, M. Chen, C.-Y. Lin, Peng Liu
Introduces C++ programming abstractions for maintaining load-balanced partitions of irregular and adaptive trees. Such abstractions are useful across a range of applications and MIMD architectures. The use of these abstractions is illustrated for gravitational N-body simulation. The strategy for parallel N-body simulation is based on a technique for implicitly representing a global tree across multiple processors. This substantially reduces the programming complexity and the overhead for distributed memory architectures. The overhead is further reduced by maintaining incremental data structures.<>
{"title":"Abstractions for parallel N-body simulations","authors":"S. Bhatt, M. Chen, C.-Y. Lin, Peng Liu","doi":"10.1109/SHPCC.1992.232690","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232690","url":null,"abstract":"Introduces C++ programming abstractions for maintaining load-balanced partitions of irregular and adaptive trees. Such abstractions are useful across a range of applications and MIMD architectures. The use of these abstractions is illustrated for gravitational N-body simulation. The strategy for parallel N-body simulation is based on a technique for implicitly representing a global tree across multiple processors. This substantially reduces the programming complexity and the overhead for distributed memory architectures. The overhead is further reduced by maintaining incremental data structures.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123601967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232669
U. B. Vemulapati
Constrained least squares problems occur often in practice, mostly as sub-problems in many optimization contexts. For solving large and sparse instances of these problems on parallel architectures with distributed memory, the use of static data structures to represent the sparse matrix is preferred during the factorization. But the accurate detection of the rank of the constraint matrix is also very critical to the accuracy of the computed solution. The author examines the solution of the constrained problem using weighting approach. All computations can be carried out using a static data structure that is generated using the symbolic structure of the input matrices, making use of a recently proposed rank detection procedure. The author shows good speed-ups in solving large and sparse equality conditioned least squares problems on hypercubes of up to 128 processors.<>
{"title":"Solving equality constrained least squares problems","authors":"U. B. Vemulapati","doi":"10.1109/SHPCC.1992.232669","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232669","url":null,"abstract":"Constrained least squares problems occur often in practice, mostly as sub-problems in many optimization contexts. For solving large and sparse instances of these problems on parallel architectures with distributed memory, the use of static data structures to represent the sparse matrix is preferred during the factorization. But the accurate detection of the rank of the constraint matrix is also very critical to the accuracy of the computed solution. The author examines the solution of the constrained problem using weighting approach. All computations can be carried out using a static data structure that is generated using the symbolic structure of the input matrices, making use of a recently proposed rank detection procedure. The author shows good speed-ups in solving large and sparse equality conditioned least squares problems on hypercubes of up to 128 processors.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126523961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232645
D. Nosenchuck
A new approach to generating low-conflict parallel instructions for complex applications is introduced in this paper. This method is presented within the context of a FORTRAN compiler. An approximate simulator has been incorporated within a parallel-code/domain-decomposition loop within the compiler. The simulator estimates the performance of candidate instruction segments, and guides the selection of appropriate code transformations, heuristics, and data storage strategies. At present, many aspects of the target machine are parameterized, to permit investigations of a number of parallel-computer architectures. In this paper, the compiler is illustrated for a Navier-Stokes computer target node application.<>
{"title":"Parameterized memory/processor optimizing FORTRAN compiler for parallel computers","authors":"D. Nosenchuck","doi":"10.1109/SHPCC.1992.232645","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232645","url":null,"abstract":"A new approach to generating low-conflict parallel instructions for complex applications is introduced in this paper. This method is presented within the context of a FORTRAN compiler. An approximate simulator has been incorporated within a parallel-code/domain-decomposition loop within the compiler. The simulator estimates the performance of candidate instruction segments, and guides the selection of appropriate code transformations, heuristics, and data storage strategies. At present, many aspects of the target machine are parameterized, to permit investigations of a number of parallel-computer architectures. In this paper, the compiler is illustrated for a Navier-Stokes computer target node application.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122234433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}