Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232632
V. Naik
Considers the performance scalability of a class of computational fluid dynamics applications. The results indicate that neither the scalability in time nor the scalability in problem size can be obtained by simply scaling up the processing power. Results are presented to show that latency, packet size, and transmission speeds play an important role. However, improvements only in the architectural parameters are not sufficient to realize full performance scalability. Suitable partitioning and algorithmic parameters must be selected for each type of architecture.<>
{"title":"Scalability issues for a class of CFD applications","authors":"V. Naik","doi":"10.1109/SHPCC.1992.232632","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232632","url":null,"abstract":"Considers the performance scalability of a class of computational fluid dynamics applications. The results indicate that neither the scalability in time nor the scalability in problem size can be obtained by simply scaling up the processing power. Results are presented to show that latency, packet size, and transmission speeds play an important role. However, improvements only in the architectural parameters are not sufficient to realize full performance scalability. Suitable partitioning and algorithmic parameters must be selected for each type of architecture.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114834375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232638
L. Knight, R. L. Wainwright
The genetic algorithm is a robust search and optimization technique based on the principles of natural genetics and survival of the fittest. Genetic algorithms (GA) are a promising new approach to global optimization problems, and are applicable to a wide variety of problems. HYPERGEN was developed as a research tool for investigating parallel genetic algorithms applied to combinatorial optimization problems. It provides the user with a wide variety of options to test the particular problem at hand. In addition, HYPERGEN is modular enough for a user to insert routines of his own for special needs, or for doing further research studies on parallel GAs. HYPERGEN was used successfully to find new 'best' tours on three 'standard' TSP problems, and out-performed a parallel simulated annealing algorithm on various package placement problems. The authors found it fairly easy to fine tune the parameters that drive a parallel GA for near optimal performance (population size, migration rate, and migration interval).<>
{"title":"HYPERGEN-a distributed genetic algorithm on a hypercube","authors":"L. Knight, R. L. Wainwright","doi":"10.1109/SHPCC.1992.232638","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232638","url":null,"abstract":"The genetic algorithm is a robust search and optimization technique based on the principles of natural genetics and survival of the fittest. Genetic algorithms (GA) are a promising new approach to global optimization problems, and are applicable to a wide variety of problems. HYPERGEN was developed as a research tool for investigating parallel genetic algorithms applied to combinatorial optimization problems. It provides the user with a wide variety of options to test the particular problem at hand. In addition, HYPERGEN is modular enough for a user to insert routines of his own for special needs, or for doing further research studies on parallel GAs. HYPERGEN was used successfully to find new 'best' tours on three 'standard' TSP problems, and out-performed a parallel simulated annealing algorithm on various package placement problems. The authors found it fairly easy to fine tune the parameters that drive a parallel GA for near optimal performance (population size, migration rate, and migration interval).<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117260357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232648
C. Lin, L. Snyder
A matrix product algorithm is studied in which one matrix operand is transposed prior to the computation. This algorithm is compared with the Fox-Hey-Otto algorithm on hypercube architectures. The Transpose algorithm simplifies communication for nonsquare matrices and for computations where the number of processors is not a perfect square. The results indicate superior performance for the Transpose algorithm.<>
{"title":"A matrix product algorithm and its comparative performance on hypercubes","authors":"C. Lin, L. Snyder","doi":"10.1109/SHPCC.1992.232648","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232648","url":null,"abstract":"A matrix product algorithm is studied in which one matrix operand is transposed prior to the computation. This algorithm is compared with the Fox-Hey-Otto algorithm on hypercube architectures. The Transpose algorithm simplifies communication for nonsquare matrices and for computations where the number of processors is not a perfect square. The results indicate superior performance for the Transpose algorithm.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"189 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127271523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232684
E. Felten, D. McNamee
Achieving maximum performance in message-passing programs requires that calculation and communication be overlapped. However, the program transformations required to achieve this overlap are error-prone and add significant complexity to the application program. The authors argue that calculation/communication overlap can be achieved easily and consistently by executing multiple threads of control on each processor, and that this approach is practical on message-passing architectures without any special hardware support. They present timing data for a typical message-passing application, to demonstrate the advantages of the scheme.<>
{"title":"Improving the performance of message-passing applications by multithreading","authors":"E. Felten, D. McNamee","doi":"10.1109/SHPCC.1992.232684","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232684","url":null,"abstract":"Achieving maximum performance in message-passing programs requires that calculation and communication be overlapped. However, the program transformations required to achieve this overlap are error-prone and add significant complexity to the application program. The authors argue that calculation/communication overlap can be achieved easily and consistently by executing multiple threads of control on each processor, and that this approach is practical on message-passing architectures without any special hardware support. They present timing data for a typical message-passing application, to demonstrate the advantages of the scheme.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"45 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132835112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232694
I. Angus
The image algebra formalism provides a succinct high level algebraic method of describing many image processing algorithms. By exploiting this formalism it is possible to map the mathematical algorithms, for which pixel level parallelism is transparent, into C++ computer code which can be portable across all MIMD, SIMD, and sequential architectures. The advantage of this method is that complex image processing algorithms can now be prototyped and tested on any machine and then safely migrated directly to parallel machines without any demands being placed upon the user by issues such as parallelism and data decomposition.<>
{"title":"Image algebra: an object oriented approach to transparently concurrent image processing","authors":"I. Angus","doi":"10.1109/SHPCC.1992.232694","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232694","url":null,"abstract":"The image algebra formalism provides a succinct high level algebraic method of describing many image processing algorithms. By exploiting this formalism it is possible to map the mathematical algorithms, for which pixel level parallelism is transparent, into C++ computer code which can be portable across all MIMD, SIMD, and sequential architectures. The advantage of this method is that complex image processing algorithms can now be prototyped and tested on any machine and then safely migrated directly to parallel machines without any demands being placed upon the user by issues such as parallelism and data decomposition.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117227594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232679
R. Falgout, A. Skjellum, S.G. Smith, C. Still
The authors describe many of the issues involved in general-purpose concurrent basic linear algebra subprograms (concurrent BLAS or CBLAS) and discuss data-distribution independence, while further generalizing data distributions. They comment on the utility of linear algebra communication subprograms (LACS). They also describe an algorithm for dense matrix-matrix multiplication and also discuss matrix-vector multiplication issues. With regard to communication, they conclude that there is limited leverage in LACS per se as a stand-alone message-passing standard, and propose that needed capabilities instead be integrated in a general, application-level message passing standard, focusing attention on CBLAS and large-scale application needs. Most of the proposed LACS features are similar to existing or needed general-purpose primitives anyway. All of the ideas discussed have been implemented or are under current development within the Multicomputer Toolbox open software system.<>
{"title":"The Multicomputer Toolbox approach to concurrent BLAS and LACS","authors":"R. Falgout, A. Skjellum, S.G. Smith, C. Still","doi":"10.1109/SHPCC.1992.232679","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232679","url":null,"abstract":"The authors describe many of the issues involved in general-purpose concurrent basic linear algebra subprograms (concurrent BLAS or CBLAS) and discuss data-distribution independence, while further generalizing data distributions. They comment on the utility of linear algebra communication subprograms (LACS). They also describe an algorithm for dense matrix-matrix multiplication and also discuss matrix-vector multiplication issues. With regard to communication, they conclude that there is limited leverage in LACS per se as a stand-alone message-passing standard, and propose that needed capabilities instead be integrated in a general, application-level message passing standard, focusing attention on CBLAS and large-scale application needs. Most of the proposed LACS features are similar to existing or needed general-purpose primitives anyway. All of the ideas discussed have been implemented or are under current development within the Multicomputer Toolbox open software system.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130433746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232643
S. Plimpton, T. Bartel
Direct simulation Monte Carlo is a well-established technique for modeling low density fluid flows. The parallel implementation of a general simulation which allows for body-fitted grids, particle weighting, and a variety of surface and flow chemistry models is described. The authors compare its performance on a 1024-node nCUBE 2 to a serial version for the CRAY-YMP. Experiences with load-balancing the computation via graph-based heuristics and the newer spectral techniques are also discussed. This is a critical issue, since density fluctuations can create orders-of-magnitude differences in computational loads as the simulation progresses.<>
{"title":"Monte Carlo particle simulation of low-density fluid flow on MIMD supercomputers","authors":"S. Plimpton, T. Bartel","doi":"10.1109/SHPCC.1992.232643","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232643","url":null,"abstract":"Direct simulation Monte Carlo is a well-established technique for modeling low density fluid flows. The parallel implementation of a general simulation which allows for body-fitted grids, particle weighting, and a variety of surface and flow chemistry models is described. The authors compare its performance on a 1024-node nCUBE 2 to a serial version for the CRAY-YMP. Experiences with load-balancing the computation via graph-based heuristics and the newer spectral techniques are also discussed. This is a critical issue, since density fluctuations can create orders-of-magnitude differences in computational loads as the simulation progresses.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130627704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232637
S. X. Yang, Jenq-Kuen Lee, S. Narayana, Dennis Gannon
A three-dimensional hydrodynamic code is used to test a newly developed parallel C++ (pC++) language and compiler. The original code is written in FORTRAN77 and is designed to model self-gravitating compressible gas flows. The code is rewritten in pC++ and is tested on a BBN GP1000 and an Alliant FX/2800. Nearly linear speed-up is achieved on both machines. On the Alliant comparison between the pC++ code and the original FORTRAN77 code is conducted. For processor numbers >or=6, the pC++ code outperforms the FORTRAN77 code which is automatically vectorized and parallelized by the Alliant FORTRAN compiler.<>
{"title":"Programming an astrophysics application in an object-oriented parallel language","authors":"S. X. Yang, Jenq-Kuen Lee, S. Narayana, Dennis Gannon","doi":"10.1109/SHPCC.1992.232637","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232637","url":null,"abstract":"A three-dimensional hydrodynamic code is used to test a newly developed parallel C++ (pC++) language and compiler. The original code is written in FORTRAN77 and is designed to model self-gravitating compressible gas flows. The code is rewritten in pC++ and is tested on a BBN GP1000 and an Alliant FX/2800. Nearly linear speed-up is achieved on both machines. On the Alliant comparison between the pC++ code and the original FORTRAN77 code is conducted. For processor numbers >or=6, the pC++ code outperforms the FORTRAN77 code which is automatically vectorized and parallelized by the Alliant FORTRAN compiler.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124158683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232681
K. Esselink, P. Hilbers
The paper presents some theoretical results concerning molecular dynamics simulations on parallel networks. Specifically, it gives rules which, depending on the system to be simulated and on the processor network, gives the optimal mapping for a class of algorithms. It also shows that multi-particle potentials can efficiently be implemented when geometric parallelism is used. The paper demonstrates the approach by showing some results of simulations of water/oil/surfactant and of polymer systems on a toroidal network of transputers. Furthermore, it compares timing results of some simulations performed on this network with those performed on a Cray single-processor machine.<>
{"title":"Parallel molecular dynamics on a torus network","authors":"K. Esselink, P. Hilbers","doi":"10.1109/SHPCC.1992.232681","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232681","url":null,"abstract":"The paper presents some theoretical results concerning molecular dynamics simulations on parallel networks. Specifically, it gives rules which, depending on the system to be simulated and on the processor network, gives the optimal mapping for a class of algorithms. It also shows that multi-particle potentials can efficiently be implemented when geometric parallelism is used. The paper demonstrates the approach by showing some results of simulations of water/oil/surfactant and of polymer systems on a toroidal network of transputers. Furthermore, it compares timing results of some simulations performed on this network with those performed on a Cray single-processor machine.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127177648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-04-26DOI: 10.1109/SHPCC.1992.232662
J. M. Francioni, D. Rover
Visual and aural portrayals of parallel program execution are used to gain insight into how a program is working. The combination of portrayals in a coordinated performance environment provides the user with multiple perspectives and stimuli to comprehend complex, multidimensional run-time information. An open question for either medium is how well does it scale? That is, how effectively can it be used to represent program performance on a large parallel computer system? This paper investigates using sound in conjunction with graphics to represent the performance of a scalable application program, the SLALOM benchmark program, executed on the nCUBE 2 distributed memory parallel computer. Custom auralization software is coupled with the PICL and ParaGraph tools. The techniques and results of visually and aurally monitoring program execution on increasing numbers of processors are presented.<>
{"title":"Visual-aural representations of performance for a scalable application program","authors":"J. M. Francioni, D. Rover","doi":"10.1109/SHPCC.1992.232662","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232662","url":null,"abstract":"Visual and aural portrayals of parallel program execution are used to gain insight into how a program is working. The combination of portrayals in a coordinated performance environment provides the user with multiple perspectives and stimuli to comprehend complex, multidimensional run-time information. An open question for either medium is how well does it scale? That is, how effectively can it be used to represent program performance on a large parallel computer system? This paper investigates using sound in conjunction with graphics to represent the performance of a scalable application program, the SLALOM benchmark program, executed on the nCUBE 2 distributed memory parallel computer. Custom auralization software is coupled with the PICL and ParaGraph tools. The techniques and results of visually and aurally monitoring program execution on increasing numbers of processors are presented.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133208641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}