Pub Date : 2004-10-18DOI: 10.1109/RELDIS.2004.1353010
P. Melliar-Smith, L. Moser
This paper discusses progress in the field of real-time fault tolerance. In particular, it considers synchronous vs. asynchronous fault tolerance designs, maintaining replica consistency, alternative fault tolerance strategies, including checkpoint restoration, transactions, and consistent replay, and custom vs. generic fault tolerance.
{"title":"Progress in real-time fault tolerance","authors":"P. Melliar-Smith, L. Moser","doi":"10.1109/RELDIS.2004.1353010","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353010","url":null,"abstract":"This paper discusses progress in the field of real-time fault tolerance. In particular, it considers synchronous vs. asynchronous fault tolerance designs, maintaining replica consistency, alternative fault tolerance strategies, including checkpoint restoration, transactions, and consistent replay, and custom vs. generic fault tolerance.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133334044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-10-18DOI: 10.1109/RELDIS.2004.1353012
Hiroshi Nakamura, T. Hayashida, Masaaki Kondo, Yuya Tajima, Masashi Imai, T. Nanya
Large cluster systems have become widely utilized because they achieve a good performance/cost ratio especially in high performance computing. Although these cluster systems are distributed memory systems, coordinated checkpointing is a promising way to maintain high availability because the computing nodes are tightly connected to one another. However, as the number of computing nodes gets larger, the probability of multi-node failures increases. To tolerate multi-node failures, a large degree of redundancy is required in checkpointing, but this leads to performance degradation. Thus, we propose a new coordinated checkpointing called skewed checkpointing. In this method, checkpointing is skewed every time. Although each checkpointing itself contains only one degree of redundancy, this skewed checkpointing ensures /spl lfloor/log/sub 2/N/spl rfloor/ degrees of redundancy when the number of nodes is N. In this paper, we present the proposed method and an analysis of the performance overhead. Then, this method is applied to a cluster system and compared with other conventional checkpointing schemes. The results reveal the superiority of our method, especially for large cluster systems.
{"title":"Skewed checkpointing for tolerating multi-node failures","authors":"Hiroshi Nakamura, T. Hayashida, Masaaki Kondo, Yuya Tajima, Masashi Imai, T. Nanya","doi":"10.1109/RELDIS.2004.1353012","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353012","url":null,"abstract":"Large cluster systems have become widely utilized because they achieve a good performance/cost ratio especially in high performance computing. Although these cluster systems are distributed memory systems, coordinated checkpointing is a promising way to maintain high availability because the computing nodes are tightly connected to one another. However, as the number of computing nodes gets larger, the probability of multi-node failures increases. To tolerate multi-node failures, a large degree of redundancy is required in checkpointing, but this leads to performance degradation. Thus, we propose a new coordinated checkpointing called skewed checkpointing. In this method, checkpointing is skewed every time. Although each checkpointing itself contains only one degree of redundancy, this skewed checkpointing ensures /spl lfloor/log/sub 2/N/spl rfloor/ degrees of redundancy when the number of nodes is N. In this paper, we present the proposed method and an analysis of the performance overhead. Then, this method is applied to a cluster system and compared with other conventional checkpointing schemes. The results reveal the superiority of our method, especially for large cluster systems.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134503667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-10-18DOI: 10.1109/RELDIS.2004.1353017
R. Scandariato, J. Knight
Many areas of society have become heavily dependent on services such as transportation facilities, utilities and so on that are implemented in part by large numbers of computers and communications links. Both past incidents and research studies show that a well-engineered Internet worm can disable such systems in a fairly simple way and, most notably, in a matter of a few minutes. This indicates the need for defenses against worms but their speed rules out the possibility of manually countering worm outbreaks. We present a platform that emulates the epidemic behavior of Internet active worms in very large networks. A reactive control system operates on top of the platform and provides a monitor/analyze/respond approach to deal with infections automatically. Details of our highly configurable platform and various experimental performance results are presented.
{"title":"The design and evaluation of a defense system for Internet worms","authors":"R. Scandariato, J. Knight","doi":"10.1109/RELDIS.2004.1353017","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353017","url":null,"abstract":"Many areas of society have become heavily dependent on services such as transportation facilities, utilities and so on that are implemented in part by large numbers of computers and communications links. Both past incidents and research studies show that a well-engineered Internet worm can disable such systems in a fairly simple way and, most notably, in a matter of a few minutes. This indicates the need for defenses against worms but their speed rules out the possibility of manually countering worm outbreaks. We present a platform that emulates the epidemic behavior of Internet active worms in very large networks. A reactive control system operates on top of the platform and provides a monitor/analyze/respond approach to deal with infections automatically. Details of our highly configurable platform and various experimental performance results are presented.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115461923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-10-18DOI: 10.1109/RELDIS.2004.1353016
H. Kopetz
Summary form only given. A federated architecture is characterized in that every major function of an embedded system is allocated to a dedicated hardware unit. In a distributed control system this implies that adding a new function is tantamount to adding a new node. This has led to a order to achieve some functional coordination. Adding fault-tolerance to a federated architecture, e.g., by the provision of triple modular redundancy (TMR) leads to a further significant increase in the number of nodes and networks. The major advantages of a dedicated architecture are the physical encapsulation of the nearly autonomous subsystems, their outstanding fault containment and their clear-cut complexity management (independent development) in case the subsystems are nearly autonomous. An integrated distributed architecture for mixed-criticality applications must be based on a core design that supports the safety requirements of the highest considered criticality class. This is of particular importance in safety-critical applications, where the physical structure of the integrated system is determined to a significant extent by the independence requirement of fault-containment regions. The central part of an integrated distributed architecture for time-critical systems must provide the following core services: deterministic and timely transport of messages; fault tolerant clock synchronization; strong fault isolation with respect to arbitrary node failures; and consistent diagnosis of failing nodes. Any architecture that provides these core services can be used as a base architecture for an integrated distributed embedded system architecture. An example of such an integrated architecture is the time-triggered architecture (TTA). In this contribution we describe the structure and the services of the TTA that has been developed during the last twenty years and is deployed in a number of safety-critical applications in the transport sector.
{"title":"An integrated architecture for dependable embedded systems","authors":"H. Kopetz","doi":"10.1109/RELDIS.2004.1353016","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353016","url":null,"abstract":"Summary form only given. A federated architecture is characterized in that every major function of an embedded system is allocated to a dedicated hardware unit. In a distributed control system this implies that adding a new function is tantamount to adding a new node. This has led to a order to achieve some functional coordination. Adding fault-tolerance to a federated architecture, e.g., by the provision of triple modular redundancy (TMR) leads to a further significant increase in the number of nodes and networks. The major advantages of a dedicated architecture are the physical encapsulation of the nearly autonomous subsystems, their outstanding fault containment and their clear-cut complexity management (independent development) in case the subsystems are nearly autonomous. An integrated distributed architecture for mixed-criticality applications must be based on a core design that supports the safety requirements of the highest considered criticality class. This is of particular importance in safety-critical applications, where the physical structure of the integrated system is determined to a significant extent by the independence requirement of fault-containment regions. The central part of an integrated distributed architecture for time-critical systems must provide the following core services: deterministic and timely transport of messages; fault tolerant clock synchronization; strong fault isolation with respect to arbitrary node failures; and consistent diagnosis of failing nodes. Any architecture that provides these core services can be used as a base architecture for an integrated distributed embedded system architecture. An example of such an integrated architecture is the time-triggered architecture (TTA). In this contribution we describe the structure and the services of the TTA that has been developed during the last twenty years and is deployed in a number of safety-critical applications in the transport sector.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116579246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-10-18DOI: 10.1109/RELDIS.2004.1353013
Islene C. Garcia, L. E. Buzato
A checkpointing protocol that enforces rollback-dependency trackability (RDT) during the progress of a distributed computation must induce processes to take forced checkpoints to avoid the formation of nontrackable rollback dependencies. A protocol based on the minimal characterization of RDT tests only the smallest set of nontrackable dependencies. The literature indicated that this approach would require the processes to maintain and propagate O(n/sup 2/) control information, where n is the number of processes in the computation. In this paper, we present a protocol that implements this approach using only O(n) control information.
{"title":"An efficient checkpointing protocol for the minimal characterization of operational rollback-dependency trackability","authors":"Islene C. Garcia, L. E. Buzato","doi":"10.1109/RELDIS.2004.1353013","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353013","url":null,"abstract":"A checkpointing protocol that enforces rollback-dependency trackability (RDT) during the progress of a distributed computation must induce processes to take forced checkpoints to avoid the formation of nontrackable rollback dependencies. A protocol based on the minimal characterization of RDT tests only the smallest set of nontrackable dependencies. The literature indicated that this approach would require the processes to maintain and propagate O(n/sup 2/) control information, where n is the number of processes in the computation. In this paper, we present a protocol that implements this approach using only O(n) control information.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124396584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-10-18DOI: 10.1109/RELDIS.2004.1353026
Joseph G. Slember, P. Narasimhan
Fault-tolerant replicated applications are typically assumed to be deterministic, in order to ensure reproducible, consistent behavior and state across a distributed system. Real applications often contain nondeterministic features that cannot be eliminated. Through the novel application of program analysis to distributed CORBA applications, we decompose an application into its constituent structures, and discover the kinds of nondeterminism present within the application. We target the instances of nondeterminism that can be compensated for automatically, and highlight to the application programmer those instances of nondeterminism that need to be manually rectified. We demonstrate our approach by compensating for specific forms of nondeterminism and by quantifying the associated performance overheads. The resulting code growth is typically limited to one extra line for every instance of nondeterminism, and the runtime overhead is minimal, compared to a fault-tolerant application with no compensation for nondeterminism.
{"title":"Using program analysis to identify and compensate for nondeterminism in fault-tolerant, replicated systems","authors":"Joseph G. Slember, P. Narasimhan","doi":"10.1109/RELDIS.2004.1353026","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353026","url":null,"abstract":"Fault-tolerant replicated applications are typically assumed to be deterministic, in order to ensure reproducible, consistent behavior and state across a distributed system. Real applications often contain nondeterministic features that cannot be eliminated. Through the novel application of program analysis to distributed CORBA applications, we decompose an application into its constituent structures, and discover the kinds of nondeterminism present within the application. We target the instances of nondeterminism that can be compensated for automatically, and highlight to the application programmer those instances of nondeterminism that need to be manually rectified. We demonstrate our approach by compensating for specific forms of nondeterminism and by quantifying the associated performance overheads. The resulting code growth is typically limited to one extra line for every instance of nondeterminism, and the runtime overhead is minimal, compared to a fault-tolerant application with no compensation for nondeterminism.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123016834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-10-18DOI: 10.1109/RELDIS.2004.1353030
J. Pereira, L. Rodrigues, A. Pinto, R. Oliveira
In this paper we propose a novel probabilistic broadcast protocol that reduces the average end-to-end latency by dynamically adapting to network topology and traffic conditions. It does so by using an unique strategy that consists in adjusting the fanout and preferred targets for different gossip rounds as a function of the properties of each node. Node classification is light-weight and integrated in the protocol membership management. Furthermore, each node is not required to have full knowledge of the group membership or of the network topology. The paper shows how the protocol can be configured and evaluates its performance with a detailed simulation model.
{"title":"Low latency probabilistic broadcast in wide area networks","authors":"J. Pereira, L. Rodrigues, A. Pinto, R. Oliveira","doi":"10.1109/RELDIS.2004.1353030","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353030","url":null,"abstract":"In this paper we propose a novel probabilistic broadcast protocol that reduces the average end-to-end latency by dynamically adapting to network topology and traffic conditions. It does so by using an unique strategy that consists in adjusting the fanout and preferred targets for different gossip rounds as a function of the properties of each node. Node classification is light-weight and integrated in the protocol membership management. Furthermore, each node is not required to have full knowledge of the group membership or of the network topology. The paper shows how the protocol can be configured and evaluates its performance with a detailed simulation model.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132299679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-10-18DOI: 10.1109/RELDIS.2004.1353019
F. Stevens, T. Courtney, Sankalp Singh, A. Agbaria, J. F. Meyer, W. Sanders, P. Pal
An increasing number of computer systems are designed to be distributed across both local and wide-area networks, performing a multitude of critical information-sharing and computational tasks. Malicious attacks on such systems are a growing concern, where attackers typically seek to degrade quality of service by intrusions that exploit vulnerabilities in networks, operating systems, and application software. Accordingly, designers are seeking improved techniques for validating such systems with respect to specified survivability requirements. In this regard, we describe a model-based validation effort that was undertaken as part of a unified approach to validating a networked intrusion-tolerant information system. Model-based results were used to guide the system's design as well as to determine whether a given survivability requirement was satisfied.
{"title":"Model-based validation of an intrusion-tolerant information system","authors":"F. Stevens, T. Courtney, Sankalp Singh, A. Agbaria, J. F. Meyer, W. Sanders, P. Pal","doi":"10.1109/RELDIS.2004.1353019","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353019","url":null,"abstract":"An increasing number of computer systems are designed to be distributed across both local and wide-area networks, performing a multitude of critical information-sharing and computational tasks. Malicious attacks on such systems are a growing concern, where attackers typically seek to degrade quality of service by intrusions that exploit vulnerabilities in networks, operating systems, and application software. Accordingly, designers are seeking improved techniques for validating such systems with respect to specified survivability requirements. In this regard, we describe a model-based validation effort that was undertaken as part of a unified approach to validating a networked intrusion-tolerant information system. Model-based results were used to guide the system's design as well as to determine whether a given survivability requirement was satisfied.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126666364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-10-18DOI: 10.1109/RELDIS.2004.1353005
Charles P. Fry, M. Reiter
Modern distributed, object-based systems support nested method invocations, whereby one object can invoke methods on another. In this paper we present a framework that supports nested method invocations among Byzantine fault-tolerant, replicated objects that are accessed via quorum systems. A challenge in this context is that client object replicas can induce unwanted method invocations on server object replicas, due either to redundant invocations by client replicas or Byzantine failures within the client replicas. At the core of our framework are a new quorum-based authorization technique and a novel method invocation protocol that ensure the linearizability and failure atomicity of nested method invocations despite Byzantine client and server replica failures. We detail the implementation of these techniques in a system called Fleet, and give preliminary performance results for them.
{"title":"Nested objects in a Byzantine quorum-replicated system","authors":"Charles P. Fry, M. Reiter","doi":"10.1109/RELDIS.2004.1353005","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353005","url":null,"abstract":"Modern distributed, object-based systems support nested method invocations, whereby one object can invoke methods on another. In this paper we present a framework that supports nested method invocations among Byzantine fault-tolerant, replicated objects that are accessed via quorum systems. A challenge in this context is that client object replicas can induce unwanted method invocations on server object replicas, due either to redundant invocations by client replicas or Byzantine failures within the client replicas. At the core of our framework are a new quorum-based authorization technique and a novel method invocation protocol that ensure the linearizability and failure atomicity of nested method invocations despite Byzantine client and server replica failures. We detail the implementation of these techniques in a system called Fleet, and give preliminary performance results for them.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121777441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-03-30DOI: 10.1109/RELDIS.2004.1352999
P. Urbán, Naohiro Hayashibara, A. Schiper, T. Katayama
Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to analyze their performance, especially the performance of their fault tolerance mechanisms. In this paper, we compare two well-known asynchronous consensus algorithms. In both algorithms, a leader process tries to impose a decision, and another leader retries if the leader fails doing so. The algorithms elect leaders differently: the Chandra-Toueg algorithm has a rotating leader, whereas processes in the Paxos algorithm elect leaders directly. We investigate the performance implications of this difference. In the system under study, processes send atomic broadcasts to each other. Consensus is used to decide the delivery order of messages. We evaluate the steady state latency in (1) runs with neither crashes nor suspicions, (2) runs with crashes and (3) runs with no crashes in which correct processes are wrongly suspected to have crashed, as well as the transient latency after (4) one crash and (5) multiple correlated crashes. The results show that the Paxos algorithm tolerates frequent wrong suspicions (3) and correlated crashes (5) better, while the performance is comparable in all other scenarios.
{"title":"Performance comparison of a rotating coordinator and a leader based consensus algorithm","authors":"P. Urbán, Naohiro Hayashibara, A. Schiper, T. Katayama","doi":"10.1109/RELDIS.2004.1352999","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1352999","url":null,"abstract":"Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to analyze their performance, especially the performance of their fault tolerance mechanisms. In this paper, we compare two well-known asynchronous consensus algorithms. In both algorithms, a leader process tries to impose a decision, and another leader retries if the leader fails doing so. The algorithms elect leaders differently: the Chandra-Toueg algorithm has a rotating leader, whereas processes in the Paxos algorithm elect leaders directly. We investigate the performance implications of this difference. In the system under study, processes send atomic broadcasts to each other. Consensus is used to decide the delivery order of messages. We evaluate the steady state latency in (1) runs with neither crashes nor suspicions, (2) runs with crashes and (3) runs with no crashes in which correct processes are wrongly suspected to have crashed, as well as the transient latency after (4) one crash and (5) multiple correlated crashes. The results show that the Paxos algorithm tolerates frequent wrong suspicions (3) and correlated crashes (5) better, while the performance is comparable in all other scenarios.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124787171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}