{"title":"Implementation and Performance Evaluation of an Adaptable Failure Detector for Distributed System","authors":"Jing-li Zhou, Guang Yang, Lijun Dong, Gang Liu","doi":"10.1109/CIS.2007.61","DOIUrl":null,"url":null,"abstract":"Unreliable failure detectors have been an important abstraction to build dependable distributed applications over asynchronous distributed systems subject to faults. Their implementations are commonly based on timeouts to ensure algorithm termination. However, for systems built on the Internet, it is hard to estimate this time value due to traffic variations. In order to increase the performance, self-tuned failure detectors dynamically adapt their timeouts to the communication delay behavior added of a safety margin. In this paper, we propose a new implementation of a failure detector. This implementation is a variant of the heartbeat failure detector which is adaptable and can support scalable applications. In this implementation we dissociate two aspects: a basic estimation of the expected arrival date to provide a short detection time, and an adaptation of the quality of service. The latter is based on two principles: an adaptation layer and a heuristic to adapt the sending period of \"I am alive\" messages.","PeriodicalId":127238,"journal":{"name":"2007 International Conference on Computational Intelligence and Security (CIS 2007)","volume":"34 10","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 International Conference on Computational Intelligence and Security (CIS 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIS.2007.61","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Unreliable failure detectors have been an important abstraction to build dependable distributed applications over asynchronous distributed systems subject to faults. Their implementations are commonly based on timeouts to ensure algorithm termination. However, for systems built on the Internet, it is hard to estimate this time value due to traffic variations. In order to increase the performance, self-tuned failure detectors dynamically adapt their timeouts to the communication delay behavior added of a safety margin. In this paper, we propose a new implementation of a failure detector. This implementation is a variant of the heartbeat failure detector which is adaptable and can support scalable applications. In this implementation we dissociate two aspects: a basic estimation of the expected arrival date to provide a short detection time, and an adaptation of the quality of service. The latter is based on two principles: an adaptation layer and a heuristic to adapt the sending period of "I am alive" messages.