J. Annis, Yong Zhao, Jens-S. Vöckler, M. Wilde, S. Kent, Ian T Foster
In many scientific disciplines — especially long running, data- intensive collaborations — it is important to track all aspects of data capture, production, transformation, and analysis. In principle, one can then audit, validate, reproduce, and/or re-run with corrections various data transformations. We have recently proposed and prototyped the Chimera virtual data system, a new database-driven approach to this problem. We present here a major application study in which we apply Chimera to a challenging data analysis problem: the identification of galaxy clusters within the Sloan Digital Sky Survey. We describe the problem, its computational procedures, and the use of Chimera to plan and orchestrate the workflow of thousands of tasks on a data grid comprising hundreds of computers. This experience suggests that a general set of tools can indeed enhance the accuracy and productivity of scientific data reduction and that further development and application of this paradigm will offer great value.
{"title":"Applying Chimera Virtual Data Concepts to Cluster Finding in the Sloan Sky Survey","authors":"J. Annis, Yong Zhao, Jens-S. Vöckler, M. Wilde, S. Kent, Ian T Foster","doi":"10.1109/SC.2002.10021","DOIUrl":"https://doi.org/10.1109/SC.2002.10021","url":null,"abstract":"In many scientific disciplines — especially long running, data- intensive collaborations — it is important to track all aspects of data capture, production, transformation, and analysis. In principle, one can then audit, validate, reproduce, and/or re-run with corrections various data transformations. We have recently proposed and prototyped the Chimera virtual data system, a new database-driven approach to this problem. We present here a major application study in which we apply Chimera to a challenging data analysis problem: the identification of galaxy clusters within the Sloan Digital Sky Survey. We describe the problem, its computational procedures, and the use of Chimera to plan and orchestrate the workflow of thousands of tasks on a data grid comprising hundreds of computers. This experience suggests that a general set of tools can indeed enhance the accuracy and productivity of scientific data reduction and that further development and application of this paradigm will offer great value.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120852199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Frachtenberg, F. Petrini, Juan Fernández Peinador, S. Pakin, S. Coll
Although workstation clusters are a common platform for high-performance computing (HPC), they remain more difficult to manage than sequential systems or even symmetric multiprocessors. Furthermore, as cluster sizes increase, the quality of the resource-management subsystem — essentially, all of the code that runs on a cluster other than the applications — increasingly impacts application efficiency. In this paper, we present STORM, a resource-management framework designed for scalability and performance. The key innovation behind STORM is a software architecture that enables resource management to exploit low-level network features. As a result of this HPC-application-like design, STORM is orders of magnitude faster than the best reported results in the literature on two sample resource-management functions: job launching and process scheduling.
{"title":"STORM: Lightning-Fast Resource Management","authors":"E. Frachtenberg, F. Petrini, Juan Fernández Peinador, S. Pakin, S. Coll","doi":"10.1109/SC.2002.10057","DOIUrl":"https://doi.org/10.1109/SC.2002.10057","url":null,"abstract":"Although workstation clusters are a common platform for high-performance computing (HPC), they remain more difficult to manage than sequential systems or even symmetric multiprocessors. Furthermore, as cluster sizes increase, the quality of the resource-management subsystem — essentially, all of the code that runs on a cluster other than the applications — increasingly impacts application efficiency. In this paper, we present STORM, a resource-management framework designed for scalability and performance. The key innovation behind STORM is a software architecture that enables resource management to exploit low-level network features. As a result of this HPC-application-like design, STORM is orders of magnitude faster than the best reported results in the literature on two sample resource-management functions: job launching and process scheduling.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131940157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Baldridge, J. Greenberg, S. Elbert, S. Mock, P. Papadopoulos
High performance computing, storage, visualization, and database infrastructures are increasing geometrically in complexity as scientists move towards grid-based computing. While this is natural, it has the effect of pushing computational capabilities beyond the reach of scientists because of the time needed to harness the infrastructure. Hiding the complexity of networked resources becomes essential if scientists are to utilize them effectively. In this work, we describe our efforts to integrate various computational chemistry components into a scientific computing environment. We briefly describe improvements we have made to individual components of the chemistry environment as well as future directions, followed by a more in-depth discussion of our strategy for integration into a grid workflow environment based on web services, which enables access to remote resources while shielding users from the complexities of the grid infrastructures. A preliminary schema for storing data obtained from computational chemistry calculations is also described.
{"title":"QMView and GAMESS: Integration into the World Wide Computational Grid","authors":"K. Baldridge, J. Greenberg, S. Elbert, S. Mock, P. Papadopoulos","doi":"10.1109/SC.2002.10014","DOIUrl":"https://doi.org/10.1109/SC.2002.10014","url":null,"abstract":"High performance computing, storage, visualization, and database infrastructures are increasing geometrically in complexity as scientists move towards grid-based computing. While this is natural, it has the effect of pushing computational capabilities beyond the reach of scientists because of the time needed to harness the infrastructure. Hiding the complexity of networked resources becomes essential if scientists are to utilize them effectively. In this work, we describe our efforts to integrate various computational chemistry components into a scientific computing environment. We briefly describe improvements we have made to individual components of the chemistry environment as well as future directions, followed by a more in-depth discussion of our strategy for integration into a grid workflow environment based on web services, which enables access to remote resources while shielding users from the complexities of the grid infrastructures. A preliminary schema for storing data obtained from computational chemistry calculations is also described.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128168198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Ma, G. Schussman, Brett Wilson, K. Ko, J. Qiang, R. Ryne
This paper presents two new hardware-assisted rendering techniques developed for interactive visualization of the terascale data generated from numerical modeling of next-generation accelerator designs. The first technique, based on a hybrid rendering approach, makes possible interactive exploration of large-scale particle data from particle beam dynamics modeling. The second technique, based on a compact texture-enhanced representation, exploits the advanced features of commodity graphics cards to achieve perceptually effective visualization of the very dense and complex electromagnetic fields produced from the modeling of reflection and transmission properties of open structures in an accelerator design. Because of the collaborative nature of the overall accelerator modeling project, the visualization technology developed is for both desktop and remote visualization settings. We have tested the techniques using both time-varying particle data sets containing up to one billion particles per time step and electromagnetic field data sets with millions of mesh elements.
{"title":"Advanced Visualization Technology for Terascale Particle Accelerator Simulations","authors":"K. Ma, G. Schussman, Brett Wilson, K. Ko, J. Qiang, R. Ryne","doi":"10.1109/SC.2002.10007","DOIUrl":"https://doi.org/10.1109/SC.2002.10007","url":null,"abstract":"This paper presents two new hardware-assisted rendering techniques developed for interactive visualization of the terascale data generated from numerical modeling of next-generation accelerator designs. The first technique, based on a hybrid rendering approach, makes possible interactive exploration of large-scale particle data from particle beam dynamics modeling. The second technique, based on a compact texture-enhanced representation, exploits the advanced features of commodity graphics cards to achieve perceptually effective visualization of the very dense and complex electromagnetic fields produced from the modeling of reflection and transmission properties of open structures in an accelerator design. Because of the collaborative nature of the overall accelerator modeling project, the visualization technology developed is for both desktop and remote visualization settings. We have tested the techniques using both time-varying particle data sets containing up to one billion particles per time step and electromagnetic field data sets with millions of mesh elements.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114563286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper considers the modular programming with hierarchically structured multi-processor tasks on top of SPMD tasks for distributed memory machines. The parallel execution requires a corresponding decomposition of the set of processors into a hierarchical group structure onto which the tasks are mapped. This results in a multi-level group SPMD computation model with varying processor group structures. The advantage of this kind of mixed task and data parallelism is a potential to reduce the communication overhead and to increase scalability. We present a runtime library to support the coordination of hierarchically structured multi-processor tasks. The library exploits an extended parallel group SPMD programming model and manages the entire task execution including the dynamic hierarchy of processor groups. The library is built on top of MPI, has an easy-to-use interface, and leads to only a marginal overhead while allowing static planning and dynamic restructuring.
{"title":"Library Support for Hierarchical Multi-Processor Tasks","authors":"T. Rauber, G. Rünger","doi":"10.1109/SC.2002.10064","DOIUrl":"https://doi.org/10.1109/SC.2002.10064","url":null,"abstract":"The paper considers the modular programming with hierarchically structured multi-processor tasks on top of SPMD tasks for distributed memory machines. The parallel execution requires a corresponding decomposition of the set of processors into a hierarchical group structure onto which the tasks are mapped. This results in a multi-level group SPMD computation model with varying processor group structures. The advantage of this kind of mixed task and data parallelism is a potential to reduce the communication overhead and to increase scalability. We present a runtime library to support the coordination of hierarchically structured multi-processor tasks. The library exploits an extended parallel group SPMD programming model and manages the entire task execution including the dynamic hierarchy of processor groups. The library is built on top of MPI, has an easy-to-use interface, and leads to only a marginal overhead while allowing static planning and dynamic restructuring.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114591122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Bailey, D. Broadhurst, Yozo Hida, X. Li, Brandon Thompson
In this paper we describe some novel applications of high performance computing in a discipline now known as "experimental mathematics." The paper reviews some recent published work, and then presents some new results that have not yet appeared in the literature. A key technique inovlved in this research is the PSLQ integer relation algorithm (recently named one of ten "algorithms of the century" by Computing in Science and Engineering). This algorithm permits one to recognize a numeric constant in terms of the formula that it satisfies. We present a variant of PSLQ that is well-suited for parallel computation, and give several examples of new mathematical results that we have found using it. Two of these computations were performed on highly parallel computers, since they are not feasible on conventional systems. We also describe a new software package for performing arbitrary precision arithmetic, which is required in this research.
{"title":"High Performance Computing Meets Experimental Mathematics","authors":"D. Bailey, D. Broadhurst, Yozo Hida, X. Li, Brandon Thompson","doi":"10.1109/SC.2002.10060","DOIUrl":"https://doi.org/10.1109/SC.2002.10060","url":null,"abstract":"In this paper we describe some novel applications of high performance computing in a discipline now known as \"experimental mathematics.\" The paper reviews some recent published work, and then presents some new results that have not yet appeared in the literature. A key technique inovlved in this research is the PSLQ integer relation algorithm (recently named one of ten \"algorithms of the century\" by Computing in Science and Engineering). This algorithm permits one to recognize a numeric constant in terms of the formula that it satisfies. We present a variant of PSLQ that is well-suited for parallel computation, and give several examples of new mathematical results that we have found using it. Two of these computations were performed on highly parallel computers, since they are not feasible on conventional systems. We also describe a new software package for performing arbitrary precision arithmetic, which is required in this research.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128913177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Snavely, L. Carrington, N. Wolter, Jesús Labarta, Rosa M. Badia, A. Purkayastha
Cycle-accurate simulation is far too slow for modeling the expected performance of full parallel applications on large HPC systems. And just running an application on a system and observing wallclock time tells you nothing about why the application performs as it does (and is anyway impossible on yet-to-be-built systems). Here we present a framework for performance modeling and prediction that is faster than cycle-accurate simulation, more informative than simple benchmarking, and is shown useful for performance investigations in several dimensions.
{"title":"A Framework for Performance Modeling and Prediction","authors":"A. Snavely, L. Carrington, N. Wolter, Jesús Labarta, Rosa M. Badia, A. Purkayastha","doi":"10.1109/SC.2002.10004","DOIUrl":"https://doi.org/10.1109/SC.2002.10004","url":null,"abstract":"Cycle-accurate simulation is far too slow for modeling the expected performance of full parallel applications on large HPC systems. And just running an application on a system and observing wallclock time tells you nothing about why the application performs as it does (and is anyway impossible on yet-to-be-built systems). Here we present a framework for performance modeling and prediction that is faster than cycle-accurate simulation, more informative than simple benchmarking, and is shown useful for performance investigations in several dimensions.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127877141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scientific programs often include multiple loops over the same data; interleaving parts of different loops may greatly improve performance. We exploit this in a compiler for Titanium, a dialect of Java. Our compiler combines reordering optimizations such as loop fusion and tiling with storage optimizations such as array contraction (eliminating or reducing the size of temporary arrays). The programmers we have in mind are willing to spend some time tuning their code and their compiler parameters. Given that, and the difficulty in statically selecting parameters such as tile sizes, it makes sense to provide automatic parameter searching alongside the compiler. Our strategy is to optimize aggressively but to expose the compiler’s decisions to external control. We double or triple the performance of Gauss-Seidel relaxation and multi-grid (versus an optimizing compiler without tiling and array contraction), and we argue that ours is the best compiler for that kind of program.
{"title":"Better Tiling and Array Contraction for Compiling Scientific Programs","authors":"Geoff Pike, P. Hilfinger","doi":"10.1109/SC.2002.10040","DOIUrl":"https://doi.org/10.1109/SC.2002.10040","url":null,"abstract":"Scientific programs often include multiple loops over the same data; interleaving parts of different loops may greatly improve performance. We exploit this in a compiler for Titanium, a dialect of Java. Our compiler combines reordering optimizations such as loop fusion and tiling with storage optimizations such as array contraction (eliminating or reducing the size of temporary arrays). The programmers we have in mind are willing to spend some time tuning their code and their compiler parameters. Given that, and the difficulty in statically selecting parameters such as tile sizes, it makes sense to provide automatic parameter searching alongside the compiler. Our strategy is to optimize aggressively but to expose the compiler’s decisions to external control. We double or triple the performance of Gauss-Seidel relaxation and multi-grid (versus an optimizing compiler without tiling and array contraction), and we argue that ours is the best compiler for that kind of program.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124557505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Bosilca, Aurélien Bouteiller, F. Cappello, Samir Djilali, G. Fedak, C. Germain, T. Hérault, Pierre Lemarinier, O. Lodygensky, F. Magniette, V. Néri, A. Selikhov
Global Computing platforms, large scale clusters and future TeraGRID systems gather thousands of nodes for computing parallel scientific applications. At this scale, node failures or disconnections are frequent events. This Volatility reduces the MTBF of the whole system in the range of hours or minutes. We present MPICH-V, an automatic Volatility tolerant MPI environment based on uncoordinated checkpoint/roll-back and distributed message logging. MPICH-V architecture relies on Channel Memories, Checkpoint servers and theoretically proven protocols to execute existing or new, SPMD and Master-Worker MPI applications on volatile nodes. To evaluate its capabilities, we run MPICH-V within a framework for which the number of nodes, Channels Memories and Checkpoint Servers can be completely configured as well as the node Volatility. We present a detailed performance evaluation of every component of MPICH-V and its global performance for non-trivial parallel applications. Experimental results demonstrate good scalability and high tolerance to node volatility.
{"title":"MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes","authors":"G. Bosilca, Aurélien Bouteiller, F. Cappello, Samir Djilali, G. Fedak, C. Germain, T. Hérault, Pierre Lemarinier, O. Lodygensky, F. Magniette, V. Néri, A. Selikhov","doi":"10.1109/SC.2002.10048","DOIUrl":"https://doi.org/10.1109/SC.2002.10048","url":null,"abstract":"Global Computing platforms, large scale clusters and future TeraGRID systems gather thousands of nodes for computing parallel scientific applications. At this scale, node failures or disconnections are frequent events. This Volatility reduces the MTBF of the whole system in the range of hours or minutes. We present MPICH-V, an automatic Volatility tolerant MPI environment based on uncoordinated checkpoint/roll-back and distributed message logging. MPICH-V architecture relies on Channel Memories, Checkpoint servers and theoretically proven protocols to execute existing or new, SPMD and Master-Worker MPI applications on volatile nodes. To evaluate its capabilities, we run MPICH-V within a framework for which the number of nodes, Channels Memories and Checkpoint Servers can be completely configured as well as the node Volatility. We present a detailed performance evaluation of every component of MPICH-V and its global performance for non-trivial parallel applications. Experimental results demonstrate good scalability and high tolerance to node volatility.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130487962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a new technique in the Network Weather Service for producing multi-variate forecasts. The new technique uses the NWS’s univariate forecasters and emprically gathered Cumulative Distribution Functions (CDFs) to make predictions from correlated measurement streams. Experimental results are shown in which throughput is predicted for long TCP/IP transfers from short NWS network probes.
{"title":"Multivariate Resource Performance Forecasting in the Network Weather Service","authors":"M. Swany, R. Wolski","doi":"10.1109/SC.2002.10039","DOIUrl":"https://doi.org/10.1109/SC.2002.10039","url":null,"abstract":"This paper describes a new technique in the Network Weather Service for producing multi-variate forecasts. The new technique uses the NWS’s univariate forecasters and emprically gathered Cumulative Distribution Functions (CDFs) to make predictions from correlated measurement streams. Experimental results are shown in which throughput is predicted for long TCP/IP transfers from short NWS network probes.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122203659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}