{"title":"Recursive Evaluation of Fault Tolerance Mechanisms for SLA Management","authors":"K. Voß","doi":"10.1109/ICNS.2008.22","DOIUrl":null,"url":null,"abstract":"Service level agreements (SLAs) have been introduced into the grid in order to build a basis for its commercial uptake. The challenge for Grid providers in agreeing and operating SLA-bound jobs is to ensure their fulfillment even in the case of failures. Hence, fault-tolerance mechanisms are an essential means of the provider's SLA management. The high utilization of commercial operated clusters leads to scenarios in which typically a job migration effects other jobs scheduled. The effects result from the unavailability of enough free resources which would be needed to catch all resource outages. Consequently before initiating a migration, its effects for other jobs have to be compared and the initiation of fault- tolerance (FT-) mechanisms have to be evaluated recursively. This paper presents a measurement for the benefit of initiating a FT-mechanism, the recursive evaluation, and termination condition. Performing such an impact evaluation of an initiated chain of FT-mechanisms is often more profitable than performing a single FT-mechanism and accordingly this is important for the Grid commercialization.","PeriodicalId":180899,"journal":{"name":"Fourth International Conference on Networking and Services (icns 2008)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fourth International Conference on Networking and Services (icns 2008)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNS.2008.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Service level agreements (SLAs) have been introduced into the grid in order to build a basis for its commercial uptake. The challenge for Grid providers in agreeing and operating SLA-bound jobs is to ensure their fulfillment even in the case of failures. Hence, fault-tolerance mechanisms are an essential means of the provider's SLA management. The high utilization of commercial operated clusters leads to scenarios in which typically a job migration effects other jobs scheduled. The effects result from the unavailability of enough free resources which would be needed to catch all resource outages. Consequently before initiating a migration, its effects for other jobs have to be compared and the initiation of fault- tolerance (FT-) mechanisms have to be evaluated recursively. This paper presents a measurement for the benefit of initiating a FT-mechanism, the recursive evaluation, and termination condition. Performing such an impact evaluation of an initiated chain of FT-mechanisms is often more profitable than performing a single FT-mechanism and accordingly this is important for the Grid commercialization.