Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392637
Ryan W. Mooney, Ken P. Schmidt, R. S. Studham
We present NWPerf, a new system for analyzing fine granularity performance metric data on large-scale supercomputing clusters. This tool is able to measure application efficiency on a system wide basis from both a global system perspective as well as providing a detailed view of individual applications. NWPerf provides this service while minimizing the impact on the performance of user applications. We describe the type of information that can be derived from the system, and demonstrate how the system was used detect and eliminate a performance problem in an application application that improved performance by up to several thousand percent. The NWPerf architecture has proven to be a stable and scalable platform for gathering performance data on a large 1954-CPU production Linux cluster at PNNL.
{"title":"NWPerf: a system wide performance monitoring tool for large Linux clusters","authors":"Ryan W. Mooney, Ken P. Schmidt, R. S. Studham","doi":"10.1109/CLUSTR.2004.1392637","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392637","url":null,"abstract":"We present NWPerf, a new system for analyzing fine granularity performance metric data on large-scale supercomputing clusters. This tool is able to measure application efficiency on a system wide basis from both a global system perspective as well as providing a detailed view of individual applications. NWPerf provides this service while minimizing the impact on the performance of user applications. We describe the type of information that can be derived from the system, and demonstrate how the system was used detect and eliminate a performance problem in an application application that improved performance by up to several thousand percent. The NWPerf architecture has proven to be a stable and scalable platform for gathering performance data on a large 1954-CPU production Linux cluster at PNNL.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116594965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392623
G. Amerson, A. Apon
The buffered message interface (BMI) of PVFSv2 is a low level network abstraction that allows PVFSv2 to operate on any protocol that has BMI support. This work presents a BMI module that supports the VIA over an early release version of InfiniBand and also over Myrinet. The baseline bandwidth and latency of the implementation were compared to the BMI modules and were shown to achieve significantly higher performance than the TCP module, but slightly less than the CM module. Experimental results comparing a completion queue version with a notify version and using immediate versus rendezvous messages are useful to system implementors of network messaging modules.
{"title":"Implementation and design analysis of a network messaging module using virtual interface architecture","authors":"G. Amerson, A. Apon","doi":"10.1109/CLUSTR.2004.1392623","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392623","url":null,"abstract":"The buffered message interface (BMI) of PVFSv2 is a low level network abstraction that allows PVFSv2 to operate on any protocol that has BMI support. This work presents a BMI module that supports the VIA over an early release version of InfiniBand and also over Myrinet. The baseline bandwidth and latency of the implementation were compared to the BMI modules and were shown to achieve significantly higher performance than the TCP module, but slightly less than the CM module. Experimental results comparing a completion queue version with a notify version and using immediate versus rendezvous messages are useful to system implementors of network messaging modules.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122420533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392625
C. Morin, Renaud Lottiaux, Geoffroy R. Vallée, Pascal Gallard, D. Margery, J. Berthou, I. Scherson
A working single system image distributed operating system is presented. Dubbed Kerrighed, it provides a unified approach and support to both the MPI and the shared memory programming models. The system is operational in a 16-processor cluster at the Institut de Recherche en Informatique et Systemes Aleatoires in Rennes, France. In this paper, the system is described with emphasis on its main contributing and distinguishing factors, namely its DSM based on memory containers, its flexible handling of scheduling and checkpointing strategies, and its efficient and unified communications layer. Because of the importance and popularity of data parallel applications in these systems, we present a brief discussion of the mapping of two well known and established data parallel algorithms. It is shown that ShearSort is remarkably well suited for the architecture/system pair as is the ever so popular and important two-dimensional fast Fourier transform. (2D FFT).
{"title":"Kerrighed and data parallelism: cluster computing on single system image operating systems","authors":"C. Morin, Renaud Lottiaux, Geoffroy R. Vallée, Pascal Gallard, D. Margery, J. Berthou, I. Scherson","doi":"10.1109/CLUSTR.2004.1392625","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392625","url":null,"abstract":"A working single system image distributed operating system is presented. Dubbed Kerrighed, it provides a unified approach and support to both the MPI and the shared memory programming models. The system is operational in a 16-processor cluster at the Institut de Recherche en Informatique et Systemes Aleatoires in Rennes, France. In this paper, the system is described with emphasis on its main contributing and distinguishing factors, namely its DSM based on memory containers, its flexible handling of scheduling and checkpointing strategies, and its efficient and unified communications layer. Because of the importance and popularity of data parallel applications in these systems, we present a brief discussion of the mapping of two well known and established data parallel algorithms. It is shown that ShearSort is remarkably well suited for the architecture/system pair as is the ever so popular and important two-dimensional fast Fourier transform. (2D FFT).","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114682718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392624
S. Langella, S. Hastings, S. Oster, T. Kurç, Ümit V. Çatalyürek, J. Saltz
A key challenge in supporting data-driven scientific applications is the storage and management of input and output data in a distributed environment. We describe a distributed storage middleware, based on a data and metadata management framework, to address this problem. In this middleware system, applications define the structure of their input and output data using XML schemas. The system provides support for 1) registration, versioning, management of schemas, and 2) management of storage, querying, and retrieval of instance data corresponding to the schemas in distributed databases. We carry out an experimental evaluation of the system on a set of PC clusters connected over wide- (WANs) and local-area networks (LANs).
{"title":"A distributed data management middleware for data-driven application systems","authors":"S. Langella, S. Hastings, S. Oster, T. Kurç, Ümit V. Çatalyürek, J. Saltz","doi":"10.1109/CLUSTR.2004.1392624","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392624","url":null,"abstract":"A key challenge in supporting data-driven scientific applications is the storage and management of input and output data in a distributed environment. We describe a distributed storage middleware, based on a data and metadata management framework, to address this problem. In this middleware system, applications define the structure of their input and output data using XML schemas. The system provides support for 1) registration, versioning, management of schemas, and 2) management of storage, querying, and retrieval of instance data corresponding to the schemas in distributed databases. We carry out an experimental evaluation of the system on a set of PC clusters connected over wide- (WANs) and local-area networks (LANs).","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127660938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392648
A. Andrzejak, Mehmet Ceyran
Summary form only given. Scientific computing clusters, enterprise data centers and grid and utility environments utilize the majority of the world's computing resources. Most of these resources are lightly utilized and offer a vast potential for resource sharing, an economically attractive and increasingly indispensable management option. A prerequisite for automating resource consolidation is modeling and prediction of demand characteristics. We present an approach for long-term demand characteristics prediction based on mining periodicities in historical demand data. In addition to characterizing the regularity of the past demand behavior (and so providing a measure of predictability) we propose a method for predicting probabilistic profiles which describe likely future behavior. The presented algorithms are change-adaptive in the sense that they automatically adjust to new regularities in demand patterns. A case study using data from an enterprise data center evaluates the effectiveness of the technique.
{"title":"Predicting resource demand profiles by periodicity mining","authors":"A. Andrzejak, Mehmet Ceyran","doi":"10.1109/CLUSTR.2004.1392648","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392648","url":null,"abstract":"Summary form only given. Scientific computing clusters, enterprise data centers and grid and utility environments utilize the majority of the world's computing resources. Most of these resources are lightly utilized and offer a vast potential for resource sharing, an economically attractive and increasingly indispensable management option. A prerequisite for automating resource consolidation is modeling and prediction of demand characteristics. We present an approach for long-term demand characteristics prediction based on mining periodicities in historical demand data. In addition to characterizing the regularity of the past demand behavior (and so providing a measure of predictability) we propose a method for predicting probabilistic profiles which describe likely future behavior. The presented algorithms are change-adaptive in the sense that they automatically adjust to new regularities in demand patterns. A case study using data from an enterprise data center evaluates the effectiveness of the technique.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123065463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary form only given. We propose a biologically inspired and fully-decentralized approach to the organization of computation that is based on the autonomous scheduling of strongly mobile agents on a peer-to-peer network. Our approach achieves the following design objectives: near-zero knowledge of network topology, zero knowledge of system status, autonomous scheduling, distributed computation, lack of specialized nodes. Every node is equally responsible for scheduling and computation, both of which are performed with practically no information about the system. We believe that this model is ideally suited for large-scale unstructured grids such as desktop grids. This model avoids the extensive system knowledge requirements of traditional grid scheduling approaches. Contrary to the popular master/worker organization of current desktop grids, our approach does not rely on specialized super-servers or on application-specific clients. By encapsulating computation and scheduling behavior into mobile agents, we decouple both application code and scheduling functionality from the underlying infrastructure. The resulting system is one where every node can start a large grid job, and where the computation naturally organizes itself around available resources. Through the careful design of agent behavior, the resulting global organization of the computation can be customized for different classes of applications. In a previous paper, we described a proof-of-concept prototype for an independent task application. We generalize the scheduling framework and demonstrate that our approach is applicable to a computation with a highly synchronous communication pattern, namely Cannon's matrix multiplication.
{"title":"Application-specific scheduling for the organic grid","authors":"A. Chakravarti, Gerald Baumgartner, Mario Lauria","doi":"10.1109/GRID.2004.11","DOIUrl":"https://doi.org/10.1109/GRID.2004.11","url":null,"abstract":"Summary form only given. We propose a biologically inspired and fully-decentralized approach to the organization of computation that is based on the autonomous scheduling of strongly mobile agents on a peer-to-peer network. Our approach achieves the following design objectives: near-zero knowledge of network topology, zero knowledge of system status, autonomous scheduling, distributed computation, lack of specialized nodes. Every node is equally responsible for scheduling and computation, both of which are performed with practically no information about the system. We believe that this model is ideally suited for large-scale unstructured grids such as desktop grids. This model avoids the extensive system knowledge requirements of traditional grid scheduling approaches. Contrary to the popular master/worker organization of current desktop grids, our approach does not rely on specialized super-servers or on application-specific clients. By encapsulating computation and scheduling behavior into mobile agents, we decouple both application code and scheduling functionality from the underlying infrastructure. The resulting system is one where every node can start a large grid job, and where the computation naturally organizes itself around available resources. Through the careful design of agent behavior, the resulting global organization of the computation can be customized for different classes of applications. In a previous paper, we described a proof-of-concept prototype for an independent task application. We generalize the scheduling framework and demonstrate that our approach is applicable to a computation with a highly synchronous communication pattern, namely Cannon's matrix multiplication.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131868112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392602
Troy Baer, P. Wyckoff
Access to shared data is critical to the long term success of grids of distributed systems. As more parallel applications are being used on these grids, the need for some kind of parallel I/O facility across distributed systems increases. However, grid middleware has thus far had only limited support for distributed parallel I/O. In this paper, we present an implementation of the MPI-2 I/O interface using the Globus GridFTP client API. MPI is widely used for parallel computing, and its I/O interface maps onto a large variety of storage systems. The limitations of using GridFTP as an MPI-I/O transport mechanism are described, as well as support for parallel access to scientific data formats such as HDF and NetCDF. We compare the performance of GridFTP to that of NFS on the same network using several parallel I/O benchmarks. Our tests indicate that GridFTP can be a workable transport for parallel I/O, particularly for distributed read-only access to shared data sets.
{"title":"A parallel I/O mechanism for distributed systems","authors":"Troy Baer, P. Wyckoff","doi":"10.1109/CLUSTR.2004.1392602","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392602","url":null,"abstract":"Access to shared data is critical to the long term success of grids of distributed systems. As more parallel applications are being used on these grids, the need for some kind of parallel I/O facility across distributed systems increases. However, grid middleware has thus far had only limited support for distributed parallel I/O. In this paper, we present an implementation of the MPI-2 I/O interface using the Globus GridFTP client API. MPI is widely used for parallel computing, and its I/O interface maps onto a large variety of storage systems. The limitations of using GridFTP as an MPI-I/O transport mechanism are described, as well as support for parallel access to scientific data formats such as HDF and NetCDF. We compare the performance of GridFTP to that of NFS on the same network using several parallel I/O benchmarks. Our tests indicate that GridFTP can be a workable transport for parallel I/O, particularly for distributed read-only access to shared data sets.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134195708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392593
C. Leangsuksun, I. Haddad
Summary form only given. This tutorial addressed in detail all the design and implementation issues related to building HA Linux Beowulf clusters and using Linux and open source software as the base technology. In addition, the focus of the tutorial is HA-OSCAR. We present the architecture of HA-OSCAR, review of new features of the current release, explain how we implemented all the HA features, and discuss our experiments covering performance and availability, as well as our test results.
只提供摘要形式。本教程详细介绍了与构建HA Linux Beowulf集群以及使用Linux和开源软件作为基础技术相关的所有设计和实现问题。此外,本教程的重点是HA-OSCAR。我们介绍HA- oscar的架构,回顾当前版本的新功能,解释我们如何实现所有HA功能,并讨论我们的性能和可用性实验,以及我们的测试结果。
{"title":"Building highly available HPC clusters with HA-OSCAR","authors":"C. Leangsuksun, I. Haddad","doi":"10.1109/CLUSTR.2004.1392593","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392593","url":null,"abstract":"Summary form only given. This tutorial addressed in detail all the design and implementation issues related to building HA Linux Beowulf clusters and using Linux and open source software as the base technology. In addition, the focus of the tutorial is HA-OSCAR. We present the architecture of HA-OSCAR, review of new features of the current release, explain how we implemented all the HA features, and discuss our experiments covering performance and availability, as well as our test results.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124712785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392654
Arnaud Legrand, Olivier Beaumont, L. Marchal, Y. Robert
Summary form only given. In this work, we consider the problem of allocating and scheduling a collection of independent, equal-sized tasks on heterogeneous star-shaped platforms. We also address the same problem for divisible tasks. For both cases, we take memory constraints into account. We prove strong NP-completeness results for different objective functions, namely makespan minimization and throughput maximization, on simple star-shaped platforms. We propose an approximation algorithm based on the unconstrained version (with unlimited memory) of the problem. We introduce several heuristics, which are evaluated and compared through extensive simulations. An unexpected conclusion drawn from these experiments is that classical scheduling heuristics that try to greedily minimize the completion time of each task are outperformed by the simple heuristic that consists in assigning the task to the available processor that has the smallest communication time, regardless of computation power (hence a "bandwidth-centric" distribution).
{"title":"Master slave scheduling on heterogeneous star-shaped platforms with limited memory","authors":"Arnaud Legrand, Olivier Beaumont, L. Marchal, Y. Robert","doi":"10.1109/CLUSTR.2004.1392654","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392654","url":null,"abstract":"Summary form only given. In this work, we consider the problem of allocating and scheduling a collection of independent, equal-sized tasks on heterogeneous star-shaped platforms. We also address the same problem for divisible tasks. For both cases, we take memory constraints into account. We prove strong NP-completeness results for different objective functions, namely makespan minimization and throughput maximization, on simple star-shaped platforms. We propose an approximation algorithm based on the unconstrained version (with unlimited memory) of the problem. We introduce several heuristics, which are evaluated and compared through extensive simulations. An unexpected conclusion drawn from these experiments is that classical scheduling heuristics that try to greedily minimize the completion time of each task are outperformed by the simple heuristic that consists in assigning the task to the available processor that has the smallest communication time, regardless of computation power (hence a \"bandwidth-centric\" distribution).","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129972994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-20DOI: 10.1109/CLUSTR.2004.1392641
Greg Bruno, M. Katz, Federico D. Sacerdoti, P. Papadopoulos
The Rocks toolkit uses a graph-based framework to describe the configuration of all node types (termed appliances) that make up a complete cluster. With hundreds of deployed clusters, our turnkey systems approach has shown to be quite easily adapted to different hardware and logical node configurations. However, the Rocks architecture and implementation contains a significant asymmetry: the graph definition of all appliance types except the initial frontend can be modified and extended by the end-user before installation. However, frontends can be modified only afterward by hands-on system administration. To address this administrative discontinuity between nodes and frontends, we describe the design and implementation of Rolls. First and foremost, Rolls provide both the architecture and mechanisms that enable the end-user to incrementally and programmatically modify the graph description for all appliance types. New functionality can be added and any Rocks-supplied software component can be overwritten or removed simply by inserting the desired Roll CD(s) at installation time. This symmetric approach to cluster construction has allowed us to shrink the core of the Rocks implementation while increasing flexibility for the end-user. Rolls are optional, automatically configured, cluster-aware software systems. Current add-ons include: scheduling systems (SGE, PBS), grid support (based on NSF Middleware Initiative), database support (DB2), Condor, integrity checking (Tripwire) and the Intel compiler. Community-specific Rolls can be and are developed by groups outside of the Rocks core development group.
{"title":"Rolls: modifying a standard system installer to support user-customizable cluster frontend appliances","authors":"Greg Bruno, M. Katz, Federico D. Sacerdoti, P. Papadopoulos","doi":"10.1109/CLUSTR.2004.1392641","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392641","url":null,"abstract":"The Rocks toolkit uses a graph-based framework to describe the configuration of all node types (termed appliances) that make up a complete cluster. With hundreds of deployed clusters, our turnkey systems approach has shown to be quite easily adapted to different hardware and logical node configurations. However, the Rocks architecture and implementation contains a significant asymmetry: the graph definition of all appliance types except the initial frontend can be modified and extended by the end-user before installation. However, frontends can be modified only afterward by hands-on system administration. To address this administrative discontinuity between nodes and frontends, we describe the design and implementation of Rolls. First and foremost, Rolls provide both the architecture and mechanisms that enable the end-user to incrementally and programmatically modify the graph description for all appliance types. New functionality can be added and any Rocks-supplied software component can be overwritten or removed simply by inserting the desired Roll CD(s) at installation time. This symmetric approach to cluster construction has allowed us to shrink the core of the Rocks implementation while increasing flexibility for the end-user. Rolls are optional, automatically configured, cluster-aware software systems. Current add-ons include: scheduling systems (SGE, PBS), grid support (based on NSF Middleware Initiative), database support (DB2), Condor, integrity checking (Tripwire) and the Intel compiler. Community-specific Rolls can be and are developed by groups outside of the Rocks core development group.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130566356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}