异步分布式系统中的故障检测

Anais do II Workshop de Testes e Tolerância a Falhas (WTF 2000) Pub Date : 2000-07-15 DOI:10.5753/wtf.2000.23478

Raimundo José de Araújo, Macêdo, Campus de Ondina

{"title":"异步分布式系统中的故障检测","authors":"Raimundo José de Araújo, Macêdo, Campus de Ondina","doi":"10.5753/wtf.2000.23478","DOIUrl":null,"url":null,"abstract":"Being able to detect failures is an important issue in designing fault-tolerant distributed systems. However, the actual behaviour of a system limits the ability to provide such a mechanism. From one extreme of the spectrum, synchronous systems (i.e., with bounded message transmission delay and processing times) allow for the construction of perfect failure detection based simply on local timeouts. At the other extreme, accurate failure detection cannot be developed for asynchronous systems (i.e. systems with no bounds on message transmission delays and processing times), unless some extra properties can be guaranteed, such the ones specified in a seminal article by Chandra and Toueg [1]. The present paper discusses the requirements and describes the implementations of failure detectors for two important fault-tolerant mechanisms meant to asynchronous environments: process group membership and <>S Failure Detector based distributed consensus [1]. These implementations are based on a mechanism called the Time Connectivity Indicator, introduced in this paper.","PeriodicalId":356716,"journal":{"name":"Anais do II Workshop de Testes e Tolerância a Falhas (WTF 2000)","volume":"284 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Failure Detection in Asynchronous Distributed Systems\",\"authors\":\"Raimundo José de Araújo, Macêdo, Campus de Ondina\",\"doi\":\"10.5753/wtf.2000.23478\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Being able to detect failures is an important issue in designing fault-tolerant distributed systems. However, the actual behaviour of a system limits the ability to provide such a mechanism. From one extreme of the spectrum, synchronous systems (i.e., with bounded message transmission delay and processing times) allow for the construction of perfect failure detection based simply on local timeouts. At the other extreme, accurate failure detection cannot be developed for asynchronous systems (i.e. systems with no bounds on message transmission delays and processing times), unless some extra properties can be guaranteed, such the ones specified in a seminal article by Chandra and Toueg [1]. The present paper discusses the requirements and describes the implementations of failure detectors for two important fault-tolerant mechanisms meant to asynchronous environments: process group membership and <>S Failure Detector based distributed consensus [1]. These implementations are based on a mechanism called the Time Connectivity Indicator, introduced in this paper.\",\"PeriodicalId\":356716,\"journal\":{\"name\":\"Anais do II Workshop de Testes e Tolerância a Falhas (WTF 2000)\",\"volume\":\"284 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais do II Workshop de Testes e Tolerância a Falhas (WTF 2000)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5753/wtf.2000.23478\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do II Workshop de Testes e Tolerância a Falhas (WTF 2000)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/wtf.2000.23478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

能够检测故障是设计容错分布式系统的一个重要问题。然而，系统的实际行为限制了提供这种机制的能力。从频谱的一个极端来看，同步系统(即，具有有限的消息传输延迟和处理时间)允许仅基于本地超时构建完美的故障检测。在另一个极端，异步系统(即消息传输延迟和处理时间没有限制的系统)无法开发准确的故障检测，除非可以保证一些额外的属性，例如Chandra和Toueg[1]在一篇开创性文章中指定的那些属性。本文讨论了两种重要的异步环境容错机制的故障检测器的需求和实现:进程组成员和基于分布式共识[1]的故障检测器。这些实现基于本文介绍的称为时间连接指示器的机制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Failure Detection in Asynchronous Distributed Systems

Being able to detect failures is an important issue in designing fault-tolerant distributed systems. However, the actual behaviour of a system limits the ability to provide such a mechanism. From one extreme of the spectrum, synchronous systems (i.e., with bounded message transmission delay and processing times) allow for the construction of perfect failure detection based simply on local timeouts. At the other extreme, accurate failure detection cannot be developed for asynchronous systems (i.e. systems with no bounds on message transmission delays and processing times), unless some extra properties can be guaranteed, such the ones specified in a seminal article by Chandra and Toueg [1]. The present paper discusses the requirements and describes the implementations of failure detectors for two important fault-tolerant mechanisms meant to asynchronous environments: process group membership and <>S Failure Detector based distributed consensus [1]. These implementations are based on a mechanism called the Time Connectivity Indicator, introduced in this paper.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Anais do II Workshop de Testes e Tolerância a Falhas (WTF 2000)

自引率

0.00%

发文量

期刊最新文献

Diagnóstico em Redes de Topologia Arbitrária: Um Algoritmo Baseado em Inundação de Mensagens Adicionando Replicação utilizando Componentes de Software e um Ambiente Interativo O Agente Chinês para Diagnóstico de Redes de Topologia Arbitrária Reliability Requirements in Mobile Agent Systems Experiência com a Implementação de um Injetor de Falhas em Linux