{"title":"异步分布式系统中的故障检测","authors":"Raimundo José de Araújo, Macêdo, Campus de Ondina","doi":"10.5753/wtf.2000.23478","DOIUrl":null,"url":null,"abstract":"Being able to detect failures is an important issue in designing fault-tolerant distributed systems. However, the actual behaviour of a system limits the ability to provide such a mechanism. From one extreme of the spectrum, synchronous systems (i.e., with bounded message transmission delay and processing times) allow for the construction of perfect failure detection based simply on local timeouts. At the other extreme, accurate failure detection cannot be developed for asynchronous systems (i.e. systems with no bounds on message transmission delays and processing times), unless some extra properties can be guaranteed, such the ones specified in a seminal article by Chandra and Toueg [1]. The present paper discusses the requirements and describes the implementations of failure detectors for two important fault-tolerant mechanisms meant to asynchronous environments: process group membership and <>S Failure Detector based distributed consensus [1]. These implementations are based on a mechanism called the Time Connectivity Indicator, introduced in this paper.","PeriodicalId":356716,"journal":{"name":"Anais do II Workshop de Testes e Tolerância a Falhas (WTF 2000)","volume":"284 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Failure Detection in Asynchronous Distributed Systems\",\"authors\":\"Raimundo José de Araújo, Macêdo, Campus de Ondina\",\"doi\":\"10.5753/wtf.2000.23478\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Being able to detect failures is an important issue in designing fault-tolerant distributed systems. However, the actual behaviour of a system limits the ability to provide such a mechanism. From one extreme of the spectrum, synchronous systems (i.e., with bounded message transmission delay and processing times) allow for the construction of perfect failure detection based simply on local timeouts. At the other extreme, accurate failure detection cannot be developed for asynchronous systems (i.e. systems with no bounds on message transmission delays and processing times), unless some extra properties can be guaranteed, such the ones specified in a seminal article by Chandra and Toueg [1]. The present paper discusses the requirements and describes the implementations of failure detectors for two important fault-tolerant mechanisms meant to asynchronous environments: process group membership and <>S Failure Detector based distributed consensus [1]. These implementations are based on a mechanism called the Time Connectivity Indicator, introduced in this paper.\",\"PeriodicalId\":356716,\"journal\":{\"name\":\"Anais do II Workshop de Testes e Tolerância a Falhas (WTF 2000)\",\"volume\":\"284 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais do II Workshop de Testes e Tolerância a Falhas (WTF 2000)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5753/wtf.2000.23478\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do II Workshop de Testes e Tolerância a Falhas (WTF 2000)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/wtf.2000.23478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Failure Detection in Asynchronous Distributed Systems
Being able to detect failures is an important issue in designing fault-tolerant distributed systems. However, the actual behaviour of a system limits the ability to provide such a mechanism. From one extreme of the spectrum, synchronous systems (i.e., with bounded message transmission delay and processing times) allow for the construction of perfect failure detection based simply on local timeouts. At the other extreme, accurate failure detection cannot be developed for asynchronous systems (i.e. systems with no bounds on message transmission delays and processing times), unless some extra properties can be guaranteed, such the ones specified in a seminal article by Chandra and Toueg [1]. The present paper discusses the requirements and describes the implementations of failure detectors for two important fault-tolerant mechanisms meant to asynchronous environments: process group membership and <>S Failure Detector based distributed consensus [1]. These implementations are based on a mechanism called the Time Connectivity Indicator, introduced in this paper.