Green computing is a new paradigm of designing the computer system which considers not only the processing performance but also the energy efficiency. Power management is one of the approaches in green computing to reduce the power consumption in distributed computing system. In this paper, we first propose an optimal power management (OPM) used by a batch scheduler in a server farm. This OPM observes the state of a server farm and makes the decision to switch the operation mode (i.e., active or sleep) of the server to minimize the power consumption while the performance requirements are met. An optimization problem based on constrained Markov decision process (CMDP) is formulated and solved to obtain an optimal decision of OPM. Given that OPM is used in the server farm, then an assignment of users to the server farms by a job broker is considered. This assignment is to ensure that the cost due to power consumption and network transportation is minimized. The performance of the system is extensively evaluated. The result shows that with OPM the job waiting time can be maintained below the maximum threshold while the power consumption is much smaller than that without OPM.
{"title":"Optimal Power Management for Server Farm to Support Green Computing","authors":"D. Niyato, Sivadon Chaisiri, Bu-Sung Lee","doi":"10.1109/CCGRID.2009.89","DOIUrl":"https://doi.org/10.1109/CCGRID.2009.89","url":null,"abstract":"Green computing is a new paradigm of designing the computer system which considers not only the processing performance but also the energy efficiency. Power management is one of the approaches in green computing to reduce the power consumption in distributed computing system. In this paper, we first propose an optimal power management (OPM) used by a batch scheduler in a server farm. This OPM observes the state of a server farm and makes the decision to switch the operation mode (i.e., active or sleep) of the server to minimize the power consumption while the performance requirements are met. An optimization problem based on constrained Markov decision process (CMDP) is formulated and solved to obtain an optimal decision of OPM. Given that OPM is used in the server farm, then an assignment of users to the server farms by a job broker is considered. This assignment is to ensure that the cost due to power consumption and network transportation is minimized. The performance of the system is extensively evaluated. The result shows that with OPM the job waiting time can be maintained below the maximum threshold while the power consumption is much smaller than that without OPM.","PeriodicalId":118263,"journal":{"name":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132046064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Complex scientific workflows are now Increasingly executed on computational grids. In addition to the challenges of managing and scheduling these workflows, reliability challenges arise because of the unreliable nature of large-scale grid infrastructure. Fault tolerance mechanisms like over-provisioning and checkpoint-recovery are used in current grid application management systems to address these reliability challenges. In this work, we propose new approaches that combine these fault tolerance techniques with existing workflow scheduling algorithms. We present a study on the effectiveness of the combined approaches by analyzing their impact on the reliability of workflow execution, workflow performance and resource usage under different reliability models, failure prediction accuracies and workflow application types.
{"title":"Combined Fault Tolerance and Scheduling Techniques for Workflow Applications on Computational Grids","authors":"Yang Zhang, A. Mandal, C. Koelbel, K. Cooper","doi":"10.1109/CCGRID.2009.59","DOIUrl":"https://doi.org/10.1109/CCGRID.2009.59","url":null,"abstract":"Complex scientific workflows are now Increasingly executed on computational grids. In addition to the challenges of managing and scheduling these workflows, reliability challenges arise because of the unreliable nature of large-scale grid infrastructure. Fault tolerance mechanisms like over-provisioning and checkpoint-recovery are used in current grid application management systems to address these reliability challenges. In this work, we propose new approaches that combine these fault tolerance techniques with existing workflow scheduling algorithms. We present a study on the effectiveness of the combined approaches by analyzing their impact on the reliability of workflow execution, workflow performance and resource usage under different reliability models, failure prediction accuracies and workflow application types.","PeriodicalId":118263,"journal":{"name":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","volume":"245 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131201308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P2P networks facilitate people belonging to a community to share resources of interest. However, discovering resources in a large scale P2P network poses a number of challenges. Although Distributed Hash Table (DHT) structured P2P networks have shown enhanced scalability in routing messages, they only support key based exact matches. This paper presents DIndex, a distributed indexing component that can be used in P2P networks in support of range queries. DIndex introduces the concept of search dimensions for partitioning a search space, and it organizes peer nodes in a three-layered structure. Experimental results show that, for aP2P network with N number of peers, the average number of hops per message is less than log(N).
{"title":"Distributed Indexing for Resource Discovery in P2P Networks","authors":"M. Hentschel, Maozhen Li, M. Ponraj, M. Qi","doi":"10.1109/CCGRID.2009.57","DOIUrl":"https://doi.org/10.1109/CCGRID.2009.57","url":null,"abstract":"P2P networks facilitate people belonging to a community to share resources of interest. However, discovering resources in a large scale P2P network poses a number of challenges. Although Distributed Hash Table (DHT) structured P2P networks have shown enhanced scalability in routing messages, they only support key based exact matches. This paper presents DIndex, a distributed indexing component that can be used in P2P networks in support of range queries. DIndex introduces the concept of search dimensions for partitioning a search space, and it organizes peer nodes in a three-layered structure. Experimental results show that, for aP2P network with N number of peers, the average number of hops per message is less than log(N).","PeriodicalId":118263,"journal":{"name":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115172019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BPEL is the de facto standard for business process modeling in today's enterprises and is a promising candidate for the integration of business and Grid applications. Current BPEL implementations do not provide mechanisms to schedule service calls with respect to the load of the target hosts. In this paper, a solution that automatically schedules workflow steps to underutilized hosts and provides new hosts using Cloud computing infrastructures in peak-load situations is presented. The proposed approach does not require any changes to the BPEL standard. An implementation based on the ActiveBPEL engine and Amazon's Elastic Compute Cloud is presented.
{"title":"On-Demand Resource Provisioning for BPEL Workflows Using Amazon's Elastic Compute Cloud","authors":"Tim Dörnemann, Ernst Juhnke, Bernd Freisleben","doi":"10.1109/CCGRID.2009.30","DOIUrl":"https://doi.org/10.1109/CCGRID.2009.30","url":null,"abstract":"BPEL is the de facto standard for business process modeling in today's enterprises and is a promising candidate for the integration of business and Grid applications. Current BPEL implementations do not provide mechanisms to schedule service calls with respect to the load of the target hosts. In this paper, a solution that automatically schedules workflow steps to underutilized hosts and provides new hosts using Cloud computing infrastructures in peak-load situations is presented. The proposed approach does not require any changes to the BPEL standard. An implementation based on the ActiveBPEL engine and Amazon's Elastic Compute Cloud is presented.","PeriodicalId":118263,"journal":{"name":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124004388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents “Self-Chord”, a bio-inspired P2P algorithm that can be profitably adopted to build the information service of distributed systems, in particular Computational Grids and Clouds. Self-Chord inherits the ability of Chord-like structured systems for the construction and maintenance of an overlay of peers, but features enhanced functionalities deriving from the activity of ant-inspired mobile agents, such as autonomy behavior, self-organization and capacity to adapt to a changing environment. Self-Chord features three main benefits with respect to classical P2P structured systems: (i) it is possible to give a semantic meaning to keys, which enables the execution of "class" queries, often issued in Grids and Clouds; (ii) the keys are fairly distributed over the peers, thus improving the balancing of storage responsibilities; (iii) maintenance load is reduced because, as new peers join the ring, the mobile agents will spontaneously reorganize the keys in logarithmic time.
{"title":"Self-Chord: A Bio-inspired Algorithm for Structured P2P Systems","authors":"Agostino Forestiero, C. Mastroianni, M. Meo","doi":"10.1109/CCGRID.2009.39","DOIUrl":"https://doi.org/10.1109/CCGRID.2009.39","url":null,"abstract":"This paper presents “Self-Chord”, a bio-inspired P2P algorithm that can be profitably adopted to build the information service of distributed systems, in particular Computational Grids and Clouds. Self-Chord inherits the ability of Chord-like structured systems for the construction and maintenance of an overlay of peers, but features enhanced functionalities deriving from the activity of ant-inspired mobile agents, such as autonomy behavior, self-organization and capacity to adapt to a changing environment. Self-Chord features three main benefits with respect to classical P2P structured systems: (i) it is possible to give a semantic meaning to keys, which enables the execution of \"class\" queries, often issued in Grids and Clouds; (ii) the keys are fairly distributed over the peers, thus improving the balancing of storage responsibilities; (iii) maintenance load is reduced because, as new peers join the ring, the mobile agents will spontaneously reorganize the keys in logarithmic time.","PeriodicalId":118263,"journal":{"name":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126038196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Computer clusters are today the reference architecture for high-performance computing. The large number of nodes in these systems induces a high failure rate. This makes fault tolerance mechanisms, e.g. process checkpoint/restart, a required technology to effectively exploit clusters. Most of the process checkpoint/restart implementations only handle volatile states and do not take into account persistent states of applications, which can lead to incoherent application restarts. In this paper, we introduce an efficient persistent state checkpoint/restoration approach that can be interconnected with a large number of file systems. To avoid the performance issues of a stable support relying on synchronous replication mechanisms, we present a failure resilience scheme optimized for such persistent state checkpointing techniques in a distributed environment. First evaluations of our implementation in the kDFS distributed file system show the negligible performance impact of our proposal.
{"title":"Handling Persistent States in Process Checkpoint/Restart Mechanisms for HPC Systems","authors":"Pierre Riteau, A. Lèbre, C. Morin","doi":"10.1109/CCGRID.2009.29","DOIUrl":"https://doi.org/10.1109/CCGRID.2009.29","url":null,"abstract":"Computer clusters are today the reference architecture for high-performance computing. The large number of nodes in these systems induces a high failure rate. This makes fault tolerance mechanisms, e.g. process checkpoint/restart, a required technology to effectively exploit clusters. Most of the process checkpoint/restart implementations only handle volatile states and do not take into account persistent states of applications, which can lead to incoherent application restarts. In this paper, we introduce an efficient persistent state checkpoint/restoration approach that can be interconnected with a large number of file systems. To avoid the performance issues of a stable support relying on synchronous replication mechanisms, we present a failure resilience scheme optimized for such persistent state checkpointing techniques in a distributed environment. First evaluations of our implementation in the kDFS distributed file system show the negligible performance impact of our proposal.","PeriodicalId":118263,"journal":{"name":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128483647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
From personal software to advanced systems, caching mechanisms have steadfastly been a ubiquitous means for reducing workloads. It is no surprise, then, that under the grid and cluster paradigms, middlewares and other large-scale applications often seek caching solutions. Among these distributed applications, scientific workflow management systems have gained ground towards mitigating the often painstaking process of composing sequences of scientific data sets and services to derive virtual data. In the past, workflow managers have relied on low-level system cache for reuse support. But in distributed query intensive environments, where high volumes of intermediate virtual data can potentially be stored anywhere on the grid, a novel cache structure is needed to efficiently facilitate workflow planning. In this paper, we describe an approach to combat the challenges of maintaining large, fast virtual data caches for workflow composition. A hierarchical structure is proposed for indexing scientific data with spatiotemporal annotations across grid nodes. Our experimental results show that our hierarchical index is scalable and outperforms a centralized indexing scheme by an exponential factor in query intensive environments.
{"title":"Hierarchical Caches for Grid Workflows","authors":"David Chiu, G. Agrawal","doi":"10.1109/CCGRID.2009.10","DOIUrl":"https://doi.org/10.1109/CCGRID.2009.10","url":null,"abstract":"From personal software to advanced systems, caching mechanisms have steadfastly been a ubiquitous means for reducing workloads. It is no surprise, then, that under the grid and cluster paradigms, middlewares and other large-scale applications often seek caching solutions. Among these distributed applications, scientific workflow management systems have gained ground towards mitigating the often painstaking process of composing sequences of scientific data sets and services to derive virtual data. In the past, workflow managers have relied on low-level system cache for reuse support. But in distributed query intensive environments, where high volumes of intermediate virtual data can potentially be stored anywhere on the grid, a novel cache structure is needed to efficiently facilitate workflow planning. In this paper, we describe an approach to combat the challenges of maintaining large, fast virtual data caches for workflow composition. A hierarchical structure is proposed for indexing scientific data with spatiotemporal annotations across grid nodes. Our experimental results show that our hierarchical index is scalable and outperforms a centralized indexing scheme by an exponential factor in query intensive environments.","PeriodicalId":118263,"journal":{"name":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126804306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unpredictable access to batch-mode HPC resources is a significant problem for emerging dynamic data-driven applications. Although efforts such as reservation or queue-time prediction have attempted to partially address this problem, the approaches strictly based on space-sharing impose fundamental limits on real-time predictability. In contrast, our earlier work investigated the use of feedback-controlled virtual machines (VMs), a time-sharing approach, to deliver predictable execution. However, our earlier work did not fully address usability and implementation efficiency. This paper presents an online, software-only version of feedback controlled VM, called self-tuning VM, which we argue is a practical approach for predictable HPC infrastructure. Our evaluation using five widely-used applications show our approach is both predictable and practical: by simply running time-dependent jobs with our tool, we meet a job’s deadline typically within 3% errors, and within 8% errors for the more challenging applications.
{"title":"Self-Tuning Virtual Machines for Predictable eScience","authors":"Sang-Min Park, M. Humphrey","doi":"10.1109/CCGRID.2009.84","DOIUrl":"https://doi.org/10.1109/CCGRID.2009.84","url":null,"abstract":"Unpredictable access to batch-mode HPC resources is a significant problem for emerging dynamic data-driven applications. Although efforts such as reservation or queue-time prediction have attempted to partially address this problem, the approaches strictly based on space-sharing impose fundamental limits on real-time predictability. In contrast, our earlier work investigated the use of feedback-controlled virtual machines (VMs), a time-sharing approach, to deliver predictable execution. However, our earlier work did not fully address usability and implementation efficiency. This paper presents an online, software-only version of feedback controlled VM, called self-tuning VM, which we argue is a practical approach for predictable HPC infrastructure. Our evaluation using five widely-used applications show our approach is both predictable and practical: by simply running time-dependent jobs with our tool, we meet a job’s deadline typically within 3% errors, and within 8% errors for the more challenging applications.","PeriodicalId":118263,"journal":{"name":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129225704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software license management allows independent software vendors (ISVs) to control the access of their products. It is a fundamental part of the ISVs' business strategy. A wide range of products has been developed in order to address license management. There are, however, only few ongoing works with regard to license management in grid and cloud computing environments. This paper presents our work on GenLM, a license management solution suitable for these environments. It has been built in order to provide a secure and robust solution for ISVs that want to extend their software usage to these systems. We provide ISVs a toolchain to implement arbitrary software licensing models. At the same time we ensure that licenses are mobile, i.e. they can be used on any resource the user has access to.
{"title":"GenLM: License Management for Grid and Cloud Computing Environments","authors":"M. Dalheimer, F. Pfreundt","doi":"10.1109/CCGRID.2009.31","DOIUrl":"https://doi.org/10.1109/CCGRID.2009.31","url":null,"abstract":"Software license management allows independent software vendors (ISVs) to control the access of their products. It is a fundamental part of the ISVs' business strategy. A wide range of products has been developed in order to address license management. There are, however, only few ongoing works with regard to license management in grid and cloud computing environments. This paper presents our work on GenLM, a license management solution suitable for these environments. It has been built in order to provide a secure and robust solution for ISVs that want to extend their software usage to these systems. We provide ISVs a toolchain to implement arbitrary software licensing models. At the same time we ensure that licenses are mobile, i.e. they can be used on any resource the user has access to.","PeriodicalId":118263,"journal":{"name":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","volume":"843 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117332197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Replication in grid file systems can significantly improve I/O performance of data-intensive applications. However, most of existing replication techniques apply to individual files, which may introduce inefficient replication overheads for a large number of files. We propose a file clustering based replication algorithm for grid file systems. Our algorithm groups files according to a relationship of simultaneous accesses between files and stores replicas of the clustered files into storage nodes, to satisfy expected most of future read access times to the clustered files and replication times for individual files being minimized under the given storage capacity limitation. Our experiments on a given grid environment, 20 nodes of 5 sites, suggest that the proposed algorithm achieves accurate file clustering and efficient replica management; our clustering policy with the file cluster size limit of 5120 MB and the storage capacity limit for replicas of 10240 MB exhibits 1.58 times efficiency than the policy that never groups related files. The results also indicate that the overheads required for introducing our algorithm significantly affect I/O performance of running applications.
{"title":"File Clustering Based Replication Algorithm in a Grid Environment","authors":"Hitoshi Sato, S. Matsuoka, Toshio Endo","doi":"10.1109/CCGRID.2009.73","DOIUrl":"https://doi.org/10.1109/CCGRID.2009.73","url":null,"abstract":"Replication in grid file systems can significantly improve I/O performance of data-intensive applications. However, most of existing replication techniques apply to individual files, which may introduce inefficient replication overheads for a large number of files. We propose a file clustering based replication algorithm for grid file systems. Our algorithm groups files according to a relationship of simultaneous accesses between files and stores replicas of the clustered files into storage nodes, to satisfy expected most of future read access times to the clustered files and replication times for individual files being minimized under the given storage capacity limitation. Our experiments on a given grid environment, 20 nodes of 5 sites, suggest that the proposed algorithm achieves accurate file clustering and efficient replica management; our clustering policy with the file cluster size limit of 5120 MB and the storage capacity limit for replicas of 10240 MB exhibits 1.58 times efficiency than the policy that never groups related files. The results also indicate that the overheads required for introducing our algorithm significantly affect I/O performance of running applications.","PeriodicalId":118263,"journal":{"name":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126730950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}