Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392625
C. Morin, Renaud Lottiaux, Geoffroy R. Vallée, Pascal Gallard, D. Margery, J. Berthou, I. Scherson
A working single system image distributed operating system is presented. Dubbed Kerrighed, it provides a unified approach and support to both the MPI and the shared memory programming models. The system is operational in a 16-processor cluster at the Institut de Recherche en Informatique et Systemes Aleatoires in Rennes, France. In this paper, the system is described with emphasis on its main contributing and distinguishing factors, namely its DSM based on memory containers, its flexible handling of scheduling and checkpointing strategies, and its efficient and unified communications layer. Because of the importance and popularity of data parallel applications in these systems, we present a brief discussion of the mapping of two well known and established data parallel algorithms. It is shown that ShearSort is remarkably well suited for the architecture/system pair as is the ever so popular and important two-dimensional fast Fourier transform. (2D FFT).
{"title":"Kerrighed and data parallelism: cluster computing on single system image operating systems","authors":"C. Morin, Renaud Lottiaux, Geoffroy R. Vallée, Pascal Gallard, D. Margery, J. Berthou, I. Scherson","doi":"10.1109/CLUSTR.2004.1392625","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392625","url":null,"abstract":"A working single system image distributed operating system is presented. Dubbed Kerrighed, it provides a unified approach and support to both the MPI and the shared memory programming models. The system is operational in a 16-processor cluster at the Institut de Recherche en Informatique et Systemes Aleatoires in Rennes, France. In this paper, the system is described with emphasis on its main contributing and distinguishing factors, namely its DSM based on memory containers, its flexible handling of scheduling and checkpointing strategies, and its efficient and unified communications layer. Because of the importance and popularity of data parallel applications in these systems, we present a brief discussion of the mapping of two well known and established data parallel algorithms. It is shown that ShearSort is remarkably well suited for the architecture/system pair as is the ever so popular and important two-dimensional fast Fourier transform. (2D FFT).","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114682718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392648
A. Andrzejak, Mehmet Ceyran
Summary form only given. Scientific computing clusters, enterprise data centers and grid and utility environments utilize the majority of the world's computing resources. Most of these resources are lightly utilized and offer a vast potential for resource sharing, an economically attractive and increasingly indispensable management option. A prerequisite for automating resource consolidation is modeling and prediction of demand characteristics. We present an approach for long-term demand characteristics prediction based on mining periodicities in historical demand data. In addition to characterizing the regularity of the past demand behavior (and so providing a measure of predictability) we propose a method for predicting probabilistic profiles which describe likely future behavior. The presented algorithms are change-adaptive in the sense that they automatically adjust to new regularities in demand patterns. A case study using data from an enterprise data center evaluates the effectiveness of the technique.
{"title":"Predicting resource demand profiles by periodicity mining","authors":"A. Andrzejak, Mehmet Ceyran","doi":"10.1109/CLUSTR.2004.1392648","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392648","url":null,"abstract":"Summary form only given. Scientific computing clusters, enterprise data centers and grid and utility environments utilize the majority of the world's computing resources. Most of these resources are lightly utilized and offer a vast potential for resource sharing, an economically attractive and increasingly indispensable management option. A prerequisite for automating resource consolidation is modeling and prediction of demand characteristics. We present an approach for long-term demand characteristics prediction based on mining periodicities in historical demand data. In addition to characterizing the regularity of the past demand behavior (and so providing a measure of predictability) we propose a method for predicting probabilistic profiles which describe likely future behavior. The presented algorithms are change-adaptive in the sense that they automatically adjust to new regularities in demand patterns. A case study using data from an enterprise data center evaluates the effectiveness of the technique.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123065463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392637
Ryan W. Mooney, Ken P. Schmidt, R. S. Studham
We present NWPerf, a new system for analyzing fine granularity performance metric data on large-scale supercomputing clusters. This tool is able to measure application efficiency on a system wide basis from both a global system perspective as well as providing a detailed view of individual applications. NWPerf provides this service while minimizing the impact on the performance of user applications. We describe the type of information that can be derived from the system, and demonstrate how the system was used detect and eliminate a performance problem in an application application that improved performance by up to several thousand percent. The NWPerf architecture has proven to be a stable and scalable platform for gathering performance data on a large 1954-CPU production Linux cluster at PNNL.
{"title":"NWPerf: a system wide performance monitoring tool for large Linux clusters","authors":"Ryan W. Mooney, Ken P. Schmidt, R. S. Studham","doi":"10.1109/CLUSTR.2004.1392637","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392637","url":null,"abstract":"We present NWPerf, a new system for analyzing fine granularity performance metric data on large-scale supercomputing clusters. This tool is able to measure application efficiency on a system wide basis from both a global system perspective as well as providing a detailed view of individual applications. NWPerf provides this service while minimizing the impact on the performance of user applications. We describe the type of information that can be derived from the system, and demonstrate how the system was used detect and eliminate a performance problem in an application application that improved performance by up to several thousand percent. The NWPerf architecture has proven to be a stable and scalable platform for gathering performance data on a large 1954-CPU production Linux cluster at PNNL.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116594965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392623
G. Amerson, A. Apon
The buffered message interface (BMI) of PVFSv2 is a low level network abstraction that allows PVFSv2 to operate on any protocol that has BMI support. This work presents a BMI module that supports the VIA over an early release version of InfiniBand and also over Myrinet. The baseline bandwidth and latency of the implementation were compared to the BMI modules and were shown to achieve significantly higher performance than the TCP module, but slightly less than the CM module. Experimental results comparing a completion queue version with a notify version and using immediate versus rendezvous messages are useful to system implementors of network messaging modules.
{"title":"Implementation and design analysis of a network messaging module using virtual interface architecture","authors":"G. Amerson, A. Apon","doi":"10.1109/CLUSTR.2004.1392623","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392623","url":null,"abstract":"The buffered message interface (BMI) of PVFSv2 is a low level network abstraction that allows PVFSv2 to operate on any protocol that has BMI support. This work presents a BMI module that supports the VIA over an early release version of InfiniBand and also over Myrinet. The baseline bandwidth and latency of the implementation were compared to the BMI modules and were shown to achieve significantly higher performance than the TCP module, but slightly less than the CM module. Experimental results comparing a completion queue version with a notify version and using immediate versus rendezvous messages are useful to system implementors of network messaging modules.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122420533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392602
Troy Baer, P. Wyckoff
Access to shared data is critical to the long term success of grids of distributed systems. As more parallel applications are being used on these grids, the need for some kind of parallel I/O facility across distributed systems increases. However, grid middleware has thus far had only limited support for distributed parallel I/O. In this paper, we present an implementation of the MPI-2 I/O interface using the Globus GridFTP client API. MPI is widely used for parallel computing, and its I/O interface maps onto a large variety of storage systems. The limitations of using GridFTP as an MPI-I/O transport mechanism are described, as well as support for parallel access to scientific data formats such as HDF and NetCDF. We compare the performance of GridFTP to that of NFS on the same network using several parallel I/O benchmarks. Our tests indicate that GridFTP can be a workable transport for parallel I/O, particularly for distributed read-only access to shared data sets.
{"title":"A parallel I/O mechanism for distributed systems","authors":"Troy Baer, P. Wyckoff","doi":"10.1109/CLUSTR.2004.1392602","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392602","url":null,"abstract":"Access to shared data is critical to the long term success of grids of distributed systems. As more parallel applications are being used on these grids, the need for some kind of parallel I/O facility across distributed systems increases. However, grid middleware has thus far had only limited support for distributed parallel I/O. In this paper, we present an implementation of the MPI-2 I/O interface using the Globus GridFTP client API. MPI is widely used for parallel computing, and its I/O interface maps onto a large variety of storage systems. The limitations of using GridFTP as an MPI-I/O transport mechanism are described, as well as support for parallel access to scientific data formats such as HDF and NetCDF. We compare the performance of GridFTP to that of NFS on the same network using several parallel I/O benchmarks. Our tests indicate that GridFTP can be a workable transport for parallel I/O, particularly for distributed read-only access to shared data sets.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134195708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392624
S. Langella, S. Hastings, S. Oster, T. Kurç, Ümit V. Çatalyürek, J. Saltz
A key challenge in supporting data-driven scientific applications is the storage and management of input and output data in a distributed environment. We describe a distributed storage middleware, based on a data and metadata management framework, to address this problem. In this middleware system, applications define the structure of their input and output data using XML schemas. The system provides support for 1) registration, versioning, management of schemas, and 2) management of storage, querying, and retrieval of instance data corresponding to the schemas in distributed databases. We carry out an experimental evaluation of the system on a set of PC clusters connected over wide- (WANs) and local-area networks (LANs).
{"title":"A distributed data management middleware for data-driven application systems","authors":"S. Langella, S. Hastings, S. Oster, T. Kurç, Ümit V. Çatalyürek, J. Saltz","doi":"10.1109/CLUSTR.2004.1392624","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392624","url":null,"abstract":"A key challenge in supporting data-driven scientific applications is the storage and management of input and output data in a distributed environment. We describe a distributed storage middleware, based on a data and metadata management framework, to address this problem. In this middleware system, applications define the structure of their input and output data using XML schemas. The system provides support for 1) registration, versioning, management of schemas, and 2) management of storage, querying, and retrieval of instance data corresponding to the schemas in distributed databases. We carry out an experimental evaluation of the system on a set of PC clusters connected over wide- (WANs) and local-area networks (LANs).","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127660938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary form only given. We propose a biologically inspired and fully-decentralized approach to the organization of computation that is based on the autonomous scheduling of strongly mobile agents on a peer-to-peer network. Our approach achieves the following design objectives: near-zero knowledge of network topology, zero knowledge of system status, autonomous scheduling, distributed computation, lack of specialized nodes. Every node is equally responsible for scheduling and computation, both of which are performed with practically no information about the system. We believe that this model is ideally suited for large-scale unstructured grids such as desktop grids. This model avoids the extensive system knowledge requirements of traditional grid scheduling approaches. Contrary to the popular master/worker organization of current desktop grids, our approach does not rely on specialized super-servers or on application-specific clients. By encapsulating computation and scheduling behavior into mobile agents, we decouple both application code and scheduling functionality from the underlying infrastructure. The resulting system is one where every node can start a large grid job, and where the computation naturally organizes itself around available resources. Through the careful design of agent behavior, the resulting global organization of the computation can be customized for different classes of applications. In a previous paper, we described a proof-of-concept prototype for an independent task application. We generalize the scheduling framework and demonstrate that our approach is applicable to a computation with a highly synchronous communication pattern, namely Cannon's matrix multiplication.
{"title":"Application-specific scheduling for the organic grid","authors":"A. Chakravarti, Gerald Baumgartner, Mario Lauria","doi":"10.1109/GRID.2004.11","DOIUrl":"https://doi.org/10.1109/GRID.2004.11","url":null,"abstract":"Summary form only given. We propose a biologically inspired and fully-decentralized approach to the organization of computation that is based on the autonomous scheduling of strongly mobile agents on a peer-to-peer network. Our approach achieves the following design objectives: near-zero knowledge of network topology, zero knowledge of system status, autonomous scheduling, distributed computation, lack of specialized nodes. Every node is equally responsible for scheduling and computation, both of which are performed with practically no information about the system. We believe that this model is ideally suited for large-scale unstructured grids such as desktop grids. This model avoids the extensive system knowledge requirements of traditional grid scheduling approaches. Contrary to the popular master/worker organization of current desktop grids, our approach does not rely on specialized super-servers or on application-specific clients. By encapsulating computation and scheduling behavior into mobile agents, we decouple both application code and scheduling functionality from the underlying infrastructure. The resulting system is one where every node can start a large grid job, and where the computation naturally organizes itself around available resources. Through the careful design of agent behavior, the resulting global organization of the computation can be customized for different classes of applications. In a previous paper, we described a proof-of-concept prototype for an independent task application. We generalize the scheduling framework and demonstrate that our approach is applicable to a computation with a highly synchronous communication pattern, namely Cannon's matrix multiplication.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131868112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392595
R. Asbury, M. Wrinn
Intel/spl copy/ Cluster Tools assist developers of distributed parallel software to analyze and optimize applications on clusters. This tutorial uses a combination of lecture, demo, and (primarily) lab exercises with these tools to introduce event-based tracing techniques for MPI applications. The tools used in this tutorial were formerly marketed as Vampir and Vampirtrace.
{"title":"MPI tuning with Intel/spl copy/ Trace Analyzer and Intel/spl copy/ Trace Collector","authors":"R. Asbury, M. Wrinn","doi":"10.1109/CLUSTR.2004.1392595","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392595","url":null,"abstract":"Intel/spl copy/ Cluster Tools assist developers of distributed parallel software to analyze and optimize applications on clusters. This tutorial uses a combination of lecture, demo, and (primarily) lab exercises with these tools to introduce event-based tracing techniques for MPI applications. The tools used in this tutorial were formerly marketed as Vampir and Vampirtrace.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123994084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392593
C. Leangsuksun, I. Haddad
Summary form only given. This tutorial addressed in detail all the design and implementation issues related to building HA Linux Beowulf clusters and using Linux and open source software as the base technology. In addition, the focus of the tutorial is HA-OSCAR. We present the architecture of HA-OSCAR, review of new features of the current release, explain how we implemented all the HA features, and discuss our experiments covering performance and availability, as well as our test results.
只提供摘要形式。本教程详细介绍了与构建HA Linux Beowulf集群以及使用Linux和开源软件作为基础技术相关的所有设计和实现问题。此外,本教程的重点是HA-OSCAR。我们介绍HA- oscar的架构,回顾当前版本的新功能,解释我们如何实现所有HA功能,并讨论我们的性能和可用性实验,以及我们的测试结果。
{"title":"Building highly available HPC clusters with HA-OSCAR","authors":"C. Leangsuksun, I. Haddad","doi":"10.1109/CLUSTR.2004.1392593","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392593","url":null,"abstract":"Summary form only given. This tutorial addressed in detail all the design and implementation issues related to building HA Linux Beowulf clusters and using Linux and open source software as the base technology. In addition, the focus of the tutorial is HA-OSCAR. We present the architecture of HA-OSCAR, review of new features of the current release, explain how we implemented all the HA features, and discuss our experiments covering performance and availability, as well as our test results.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124712785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392620
B. Cheung, Cho-Li Wang, F. Lau
Software DSM provides good programmability for cluster computing, but its performance and limited shared memory space for large applications hinder its popularity. This paper introduces LOTS, a C++ runtime library supporting a large shared object space. With its dynamic memory mapping mechanism, LOTS can map more objects, lazily from the local disk to the virtual memory during access, leaving only a trace of control information for each object in the local process space. To our knowledge, LOTS is the first pure runtime software DSM supporting a shared object space larger than the local process space. Our testing shows that LOTS can utilize all the free hard disk space available to support hundreds of gigabytes of shared objects with a small overhead. The scope consistency memory model and a mixed coherence protocol allow LOTS to achieve better scalability with respect to problem size and cluster size.
{"title":"LOTS: a software DSM supporting large object space","authors":"B. Cheung, Cho-Li Wang, F. Lau","doi":"10.1109/CLUSTR.2004.1392620","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392620","url":null,"abstract":"Software DSM provides good programmability for cluster computing, but its performance and limited shared memory space for large applications hinder its popularity. This paper introduces LOTS, a C++ runtime library supporting a large shared object space. With its dynamic memory mapping mechanism, LOTS can map more objects, lazily from the local disk to the virtual memory during access, leaving only a trace of control information for each object in the local process space. To our knowledge, LOTS is the first pure runtime software DSM supporting a shared object space larger than the local process space. Our testing shows that LOTS can utilize all the free hard disk space available to support hundreds of gigabytes of shared objects with a small overhead. The scope consistency memory model and a mixed coherence protocol allow LOTS to achieve better scalability with respect to problem size and cluster size.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131042119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}