首页 > 最新文献

Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.最新文献

英文 中文
Progress in real-time fault tolerance 实时容错的进展
P. Melliar-Smith, L. Moser
This paper discusses progress in the field of real-time fault tolerance. In particular, it considers synchronous vs. asynchronous fault tolerance designs, maintaining replica consistency, alternative fault tolerance strategies, including checkpoint restoration, transactions, and consistent replay, and custom vs. generic fault tolerance.
本文讨论了实时容错领域的研究进展。特别是,它考虑了同步与异步容错设计、维护副本一致性、可选容错策略(包括检查点恢复、事务和一致重播)以及自定义与通用容错。
{"title":"Progress in real-time fault tolerance","authors":"P. Melliar-Smith, L. Moser","doi":"10.1109/RELDIS.2004.1353010","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353010","url":null,"abstract":"This paper discusses progress in the field of real-time fault tolerance. In particular, it considers synchronous vs. asynchronous fault tolerance designs, maintaining replica consistency, alternative fault tolerance strategies, including checkpoint restoration, transactions, and consistent replay, and custom vs. generic fault tolerance.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133334044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Skewed checkpointing for tolerating multi-node failures 允许多节点故障的倾斜检查点
Hiroshi Nakamura, T. Hayashida, Masaaki Kondo, Yuya Tajima, Masashi Imai, T. Nanya
Large cluster systems have become widely utilized because they achieve a good performance/cost ratio especially in high performance computing. Although these cluster systems are distributed memory systems, coordinated checkpointing is a promising way to maintain high availability because the computing nodes are tightly connected to one another. However, as the number of computing nodes gets larger, the probability of multi-node failures increases. To tolerate multi-node failures, a large degree of redundancy is required in checkpointing, but this leads to performance degradation. Thus, we propose a new coordinated checkpointing called skewed checkpointing. In this method, checkpointing is skewed every time. Although each checkpointing itself contains only one degree of redundancy, this skewed checkpointing ensures /spl lfloor/log/sub 2/N/spl rfloor/ degrees of redundancy when the number of nodes is N. In this paper, we present the proposed method and an analysis of the performance overhead. Then, this method is applied to a cluster system and compared with other conventional checkpointing schemes. The results reveal the superiority of our method, especially for large cluster systems.
大型集群系统由于在高性能计算中具有良好的性能/成本比而得到了广泛的应用。尽管这些集群系统是分布式内存系统,但是协调检查点是一种很有前途的维护高可用性的方法,因为计算节点彼此紧密相连。然而,随着计算节点数量的增加,多节点故障的概率也随之增加。为了容忍多节点故障,检查点需要很大程度的冗余,但这会导致性能下降。因此,我们提出了一种新的协调检查点,称为倾斜检查点。在这种方法中,检查点每次都是倾斜的。虽然每个检查点本身只包含一个冗余度,但当节点数为N时,这种倾斜的检查点确保了/spl lfloor/log/sub 2/N/spl rfloor/冗余度。在本文中,我们提出了提出的方法并分析了性能开销。然后,将该方法应用于集群系统,并与其他传统的检查点方案进行了比较。结果表明了该方法的优越性,尤其适用于大型集群系统。
{"title":"Skewed checkpointing for tolerating multi-node failures","authors":"Hiroshi Nakamura, T. Hayashida, Masaaki Kondo, Yuya Tajima, Masashi Imai, T. Nanya","doi":"10.1109/RELDIS.2004.1353012","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353012","url":null,"abstract":"Large cluster systems have become widely utilized because they achieve a good performance/cost ratio especially in high performance computing. Although these cluster systems are distributed memory systems, coordinated checkpointing is a promising way to maintain high availability because the computing nodes are tightly connected to one another. However, as the number of computing nodes gets larger, the probability of multi-node failures increases. To tolerate multi-node failures, a large degree of redundancy is required in checkpointing, but this leads to performance degradation. Thus, we propose a new coordinated checkpointing called skewed checkpointing. In this method, checkpointing is skewed every time. Although each checkpointing itself contains only one degree of redundancy, this skewed checkpointing ensures /spl lfloor/log/sub 2/N/spl rfloor/ degrees of redundancy when the number of nodes is N. In this paper, we present the proposed method and an analysis of the performance overhead. Then, this method is applied to a cluster system and compared with other conventional checkpointing schemes. The results reveal the superiority of our method, especially for large cluster systems.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134503667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The design and evaluation of a defense system for Internet worms 网络蠕虫防御系统的设计与评价
R. Scandariato, J. Knight
Many areas of society have become heavily dependent on services such as transportation facilities, utilities and so on that are implemented in part by large numbers of computers and communications links. Both past incidents and research studies show that a well-engineered Internet worm can disable such systems in a fairly simple way and, most notably, in a matter of a few minutes. This indicates the need for defenses against worms but their speed rules out the possibility of manually countering worm outbreaks. We present a platform that emulates the epidemic behavior of Internet active worms in very large networks. A reactive control system operates on top of the platform and provides a monitor/analyze/respond approach to deal with infections automatically. Details of our highly configurable platform and various experimental performance results are presented.
社会的许多领域已经变得严重依赖于运输设施、公用事业等服务,这些服务部分是由大量计算机和通信链路实现的。过去的事件和研究都表明,一个精心设计的互联网蠕虫可以用一种相当简单的方式使这些系统瘫痪,最明显的是,在几分钟内。这表明需要对蠕虫进行防御,但它们的速度排除了手动对抗蠕虫爆发的可能性。我们提出了一个平台,模拟了互联网活跃蠕虫在非常大的网络中的流行行为。反应控制系统在平台上运行,提供监测/分析/响应方法来自动处理感染。详细介绍了我们的高度可配置平台和各种实验性能结果。
{"title":"The design and evaluation of a defense system for Internet worms","authors":"R. Scandariato, J. Knight","doi":"10.1109/RELDIS.2004.1353017","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353017","url":null,"abstract":"Many areas of society have become heavily dependent on services such as transportation facilities, utilities and so on that are implemented in part by large numbers of computers and communications links. Both past incidents and research studies show that a well-engineered Internet worm can disable such systems in a fairly simple way and, most notably, in a matter of a few minutes. This indicates the need for defenses against worms but their speed rules out the possibility of manually countering worm outbreaks. We present a platform that emulates the epidemic behavior of Internet active worms in very large networks. A reactive control system operates on top of the platform and provides a monitor/analyze/respond approach to deal with infections automatically. Details of our highly configurable platform and various experimental performance results are presented.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115461923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An integrated architecture for dependable embedded systems 可靠嵌入式系统的集成体系结构
H. Kopetz
Summary form only given. A federated architecture is characterized in that every major function of an embedded system is allocated to a dedicated hardware unit. In a distributed control system this implies that adding a new function is tantamount to adding a new node. This has led to a order to achieve some functional coordination. Adding fault-tolerance to a federated architecture, e.g., by the provision of triple modular redundancy (TMR) leads to a further significant increase in the number of nodes and networks. The major advantages of a dedicated architecture are the physical encapsulation of the nearly autonomous subsystems, their outstanding fault containment and their clear-cut complexity management (independent development) in case the subsystems are nearly autonomous. An integrated distributed architecture for mixed-criticality applications must be based on a core design that supports the safety requirements of the highest considered criticality class. This is of particular importance in safety-critical applications, where the physical structure of the integrated system is determined to a significant extent by the independence requirement of fault-containment regions. The central part of an integrated distributed architecture for time-critical systems must provide the following core services: deterministic and timely transport of messages; fault tolerant clock synchronization; strong fault isolation with respect to arbitrary node failures; and consistent diagnosis of failing nodes. Any architecture that provides these core services can be used as a base architecture for an integrated distributed embedded system architecture. An example of such an integrated architecture is the time-triggered architecture (TTA). In this contribution we describe the structure and the services of the TTA that has been developed during the last twenty years and is deployed in a number of safety-critical applications in the transport sector.
只提供摘要形式。联邦体系结构的特点是嵌入式系统的每个主要功能都分配给专用硬件单元。在分布式控制系统中,这意味着增加一个新功能相当于增加一个新节点。这导致了一种秩序,以实现某些功能的协调。向联邦体系结构添加容错性,例如,通过提供三重模块冗余(TMR),可以进一步显著增加节点和网络的数量。专用体系结构的主要优点是对近乎自治的子系统的物理封装、出色的故障遏制以及在子系统近乎自治的情况下清晰的复杂性管理(独立开发)。混合临界应用程序的集成分布式体系结构必须基于支持最高临界级别安全需求的核心设计。这在安全关键应用中尤其重要,在这些应用中,集成系统的物理结构在很大程度上取决于断层遏制区域的独立性要求。时间关键型系统的集成分布式架构的核心部分必须提供以下核心服务:消息的确定性和及时传输;容错时钟同步;针对任意节点故障的强故障隔离;以及对故障节点的一致诊断。任何提供这些核心服务的体系结构都可以用作集成分布式嵌入式系统体系结构的基础体系结构。这种集成体系结构的一个例子是时间触发体系结构(TTA)。在这篇文章中,我们描述了在过去二十年中发展起来的TTA的结构和服务,并在运输部门的许多安全关键应用中得到了部署。
{"title":"An integrated architecture for dependable embedded systems","authors":"H. Kopetz","doi":"10.1109/RELDIS.2004.1353016","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353016","url":null,"abstract":"Summary form only given. A federated architecture is characterized in that every major function of an embedded system is allocated to a dedicated hardware unit. In a distributed control system this implies that adding a new function is tantamount to adding a new node. This has led to a order to achieve some functional coordination. Adding fault-tolerance to a federated architecture, e.g., by the provision of triple modular redundancy (TMR) leads to a further significant increase in the number of nodes and networks. The major advantages of a dedicated architecture are the physical encapsulation of the nearly autonomous subsystems, their outstanding fault containment and their clear-cut complexity management (independent development) in case the subsystems are nearly autonomous. An integrated distributed architecture for mixed-criticality applications must be based on a core design that supports the safety requirements of the highest considered criticality class. This is of particular importance in safety-critical applications, where the physical structure of the integrated system is determined to a significant extent by the independence requirement of fault-containment regions. The central part of an integrated distributed architecture for time-critical systems must provide the following core services: deterministic and timely transport of messages; fault tolerant clock synchronization; strong fault isolation with respect to arbitrary node failures; and consistent diagnosis of failing nodes. Any architecture that provides these core services can be used as a base architecture for an integrated distributed embedded system architecture. An example of such an integrated architecture is the time-triggered architecture (TTA). In this contribution we describe the structure and the services of the TTA that has been developed during the last twenty years and is deployed in a number of safety-critical applications in the transport sector.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116579246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
An efficient checkpointing protocol for the minimal characterization of operational rollback-dependency trackability 一个有效的检查点协议,用于最小化操作回滚依赖可跟踪性的特征
Islene C. Garcia, L. E. Buzato
A checkpointing protocol that enforces rollback-dependency trackability (RDT) during the progress of a distributed computation must induce processes to take forced checkpoints to avoid the formation of nontrackable rollback dependencies. A protocol based on the minimal characterization of RDT tests only the smallest set of nontrackable dependencies. The literature indicated that this approach would require the processes to maintain and propagate O(n/sup 2/) control information, where n is the number of processes in the computation. In this paper, we present a protocol that implements this approach using only O(n) control information.
在分布式计算过程中强制回滚依赖项可跟踪性(RDT)的检查点协议必须诱导进程采取强制检查点,以避免形成不可跟踪的回滚依赖项。基于RDT最小特征的协议只测试最小的不可跟踪依赖项集。文献表明,这种方法需要进程维护和传播O(n/sup 2/)控制信息,其中n为计算中的进程数。在本文中,我们提出了一个仅使用O(n)个控制信息实现此方法的协议。
{"title":"An efficient checkpointing protocol for the minimal characterization of operational rollback-dependency trackability","authors":"Islene C. Garcia, L. E. Buzato","doi":"10.1109/RELDIS.2004.1353013","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353013","url":null,"abstract":"A checkpointing protocol that enforces rollback-dependency trackability (RDT) during the progress of a distributed computation must induce processes to take forced checkpoints to avoid the formation of nontrackable rollback dependencies. A protocol based on the minimal characterization of RDT tests only the smallest set of nontrackable dependencies. The literature indicated that this approach would require the processes to maintain and propagate O(n/sup 2/) control information, where n is the number of processes in the computation. In this paper, we present a protocol that implements this approach using only O(n) control information.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124396584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Using program analysis to identify and compensate for nondeterminism in fault-tolerant, replicated systems 使用程序分析来识别和补偿容错复制系统中的不确定性
Joseph G. Slember, P. Narasimhan
Fault-tolerant replicated applications are typically assumed to be deterministic, in order to ensure reproducible, consistent behavior and state across a distributed system. Real applications often contain nondeterministic features that cannot be eliminated. Through the novel application of program analysis to distributed CORBA applications, we decompose an application into its constituent structures, and discover the kinds of nondeterminism present within the application. We target the instances of nondeterminism that can be compensated for automatically, and highlight to the application programmer those instances of nondeterminism that need to be manually rectified. We demonstrate our approach by compensating for specific forms of nondeterminism and by quantifying the associated performance overheads. The resulting code growth is typically limited to one extra line for every instance of nondeterminism, and the runtime overhead is minimal, compared to a fault-tolerant application with no compensation for nondeterminism.
容错复制应用程序通常被假定为确定性的,以确保跨分布式系统的可复制、一致的行为和状态。实际应用程序通常包含无法消除的不确定性特性。通过程序分析在分布式CORBA应用程序中的新颖应用,我们将应用程序分解为其组成结构,并发现应用程序中存在的各种不确定性。我们的目标是可以自动补偿的不确定性实例,并向应用程序程序员突出显示那些需要手动纠正的不确定性实例。我们通过补偿特定形式的不确定性和量化相关的性能开销来演示我们的方法。对于每个非确定性实例,最终的代码增长通常被限制在额外的一行,并且与不补偿非确定性的容错应用程序相比,运行时开销最小。
{"title":"Using program analysis to identify and compensate for nondeterminism in fault-tolerant, replicated systems","authors":"Joseph G. Slember, P. Narasimhan","doi":"10.1109/RELDIS.2004.1353026","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353026","url":null,"abstract":"Fault-tolerant replicated applications are typically assumed to be deterministic, in order to ensure reproducible, consistent behavior and state across a distributed system. Real applications often contain nondeterministic features that cannot be eliminated. Through the novel application of program analysis to distributed CORBA applications, we decompose an application into its constituent structures, and discover the kinds of nondeterminism present within the application. We target the instances of nondeterminism that can be compensated for automatically, and highlight to the application programmer those instances of nondeterminism that need to be manually rectified. We demonstrate our approach by compensating for specific forms of nondeterminism and by quantifying the associated performance overheads. The resulting code growth is typically limited to one extra line for every instance of nondeterminism, and the runtime overhead is minimal, compared to a fault-tolerant application with no compensation for nondeterminism.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123016834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Low latency probabilistic broadcast in wide area networks 广域网中的低延迟概率广播
J. Pereira, L. Rodrigues, A. Pinto, R. Oliveira
In this paper we propose a novel probabilistic broadcast protocol that reduces the average end-to-end latency by dynamically adapting to network topology and traffic conditions. It does so by using an unique strategy that consists in adjusting the fanout and preferred targets for different gossip rounds as a function of the properties of each node. Node classification is light-weight and integrated in the protocol membership management. Furthermore, each node is not required to have full knowledge of the group membership or of the network topology. The paper shows how the protocol can be configured and evaluates its performance with a detailed simulation model.
本文提出了一种新的概率广播协议,该协议通过动态适应网络拓扑和流量条件来降低端到端平均延迟。它通过使用一种独特的策略来实现这一点,该策略包括根据每个节点的属性来调整不同八卦轮的扇出和首选目标。节点分类是轻量级的,并且集成在协议成员管理中。此外,每个节点不需要完全了解组成员关系或网络拓扑结构。本文介绍了该协议的配置方法,并通过详细的仿真模型对其性能进行了评估。
{"title":"Low latency probabilistic broadcast in wide area networks","authors":"J. Pereira, L. Rodrigues, A. Pinto, R. Oliveira","doi":"10.1109/RELDIS.2004.1353030","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353030","url":null,"abstract":"In this paper we propose a novel probabilistic broadcast protocol that reduces the average end-to-end latency by dynamically adapting to network topology and traffic conditions. It does so by using an unique strategy that consists in adjusting the fanout and preferred targets for different gossip rounds as a function of the properties of each node. Node classification is light-weight and integrated in the protocol membership management. Furthermore, each node is not required to have full knowledge of the group membership or of the network topology. The paper shows how the protocol can be configured and evaluates its performance with a detailed simulation model.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132299679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Model-based validation of an intrusion-tolerant information system 基于模型的入侵容忍信息系统验证
F. Stevens, T. Courtney, Sankalp Singh, A. Agbaria, J. F. Meyer, W. Sanders, P. Pal
An increasing number of computer systems are designed to be distributed across both local and wide-area networks, performing a multitude of critical information-sharing and computational tasks. Malicious attacks on such systems are a growing concern, where attackers typically seek to degrade quality of service by intrusions that exploit vulnerabilities in networks, operating systems, and application software. Accordingly, designers are seeking improved techniques for validating such systems with respect to specified survivability requirements. In this regard, we describe a model-based validation effort that was undertaken as part of a unified approach to validating a networked intrusion-tolerant information system. Model-based results were used to guide the system's design as well as to determine whether a given survivability requirement was satisfied.
越来越多的计算机系统被设计成分布在本地和广域网上,执行大量关键的信息共享和计算任务。对此类系统的恶意攻击日益受到关注,攻击者通常利用网络、操作系统和应用软件中的漏洞进行入侵,以降低服务质量。因此,设计人员正在寻求改进的技术来验证这些系统,以满足特定的生存能力要求。在这方面,我们描述了一种基于模型的验证工作,作为验证网络容错信息系统的统一方法的一部分。基于模型的结果用于指导系统的设计以及确定是否满足给定的生存性要求。
{"title":"Model-based validation of an intrusion-tolerant information system","authors":"F. Stevens, T. Courtney, Sankalp Singh, A. Agbaria, J. F. Meyer, W. Sanders, P. Pal","doi":"10.1109/RELDIS.2004.1353019","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353019","url":null,"abstract":"An increasing number of computer systems are designed to be distributed across both local and wide-area networks, performing a multitude of critical information-sharing and computational tasks. Malicious attacks on such systems are a growing concern, where attackers typically seek to degrade quality of service by intrusions that exploit vulnerabilities in networks, operating systems, and application software. Accordingly, designers are seeking improved techniques for validating such systems with respect to specified survivability requirements. In this regard, we describe a model-based validation effort that was undertaken as part of a unified approach to validating a networked intrusion-tolerant information system. Model-based results were used to guide the system's design as well as to determine whether a given survivability requirement was satisfied.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126666364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
Nested objects in a Byzantine quorum-replicated system 拜占庭仲裁复制系统中的嵌套对象
Charles P. Fry, M. Reiter
Modern distributed, object-based systems support nested method invocations, whereby one object can invoke methods on another. In this paper we present a framework that supports nested method invocations among Byzantine fault-tolerant, replicated objects that are accessed via quorum systems. A challenge in this context is that client object replicas can induce unwanted method invocations on server object replicas, due either to redundant invocations by client replicas or Byzantine failures within the client replicas. At the core of our framework are a new quorum-based authorization technique and a novel method invocation protocol that ensure the linearizability and failure atomicity of nested method invocations despite Byzantine client and server replica failures. We detail the implementation of these techniques in a system called Fleet, and give preliminary performance results for them.
现代分布式、基于对象的系统支持嵌套方法调用,即一个对象可以调用另一个对象的方法。在本文中,我们提出了一个框架,该框架支持通过仲裁系统访问的拜占庭容错复制对象之间的嵌套方法调用。这种情况下的一个挑战是,客户端对象副本可能会在服务器对象副本上引发不需要的方法调用,原因可能是客户端副本的冗余调用或客户端副本中的拜占庭故障。我们框架的核心是一种新的基于群体的授权技术和一种新的方法调用协议,该协议确保了嵌套方法调用的线性性和故障原子性,尽管客户端和服务器副本出现了拜占庭式的故障。我们详细介绍了这些技术在Fleet系统中的实现,并给出了它们的初步性能结果。
{"title":"Nested objects in a Byzantine quorum-replicated system","authors":"Charles P. Fry, M. Reiter","doi":"10.1109/RELDIS.2004.1353005","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1353005","url":null,"abstract":"Modern distributed, object-based systems support nested method invocations, whereby one object can invoke methods on another. In this paper we present a framework that supports nested method invocations among Byzantine fault-tolerant, replicated objects that are accessed via quorum systems. A challenge in this context is that client object replicas can induce unwanted method invocations on server object replicas, due either to redundant invocations by client replicas or Byzantine failures within the client replicas. At the core of our framework are a new quorum-based authorization technique and a novel method invocation protocol that ensure the linearizability and failure atomicity of nested method invocations despite Byzantine client and server replica failures. We detail the implementation of these techniques in a system called Fleet, and give preliminary performance results for them.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121777441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Performance comparison of a rotating coordinator and a leader based consensus algorithm 旋转协调器与基于leader的共识算法的性能比较
P. Urbán, Naohiro Hayashibara, A. Schiper, T. Katayama
Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to analyze their performance, especially the performance of their fault tolerance mechanisms. In this paper, we compare two well-known asynchronous consensus algorithms. In both algorithms, a leader process tries to impose a decision, and another leader retries if the leader fails doing so. The algorithms elect leaders differently: the Chandra-Toueg algorithm has a rotating leader, whereas processes in the Paxos algorithm elect leaders directly. We investigate the performance implications of this difference. In the system under study, processes send atomic broadcasts to each other. Consensus is used to decide the delivery order of messages. We evaluate the steady state latency in (1) runs with neither crashes nor suspicions, (2) runs with crashes and (3) runs with no crashes in which correct processes are wrongly suspected to have crashed, as well as the transient latency after (4) one crash and (5) multiple correlated crashes. The results show that the Paxos algorithm tolerates frequent wrong suspicions (3) and correlated crashes (5) better, while the performance is comparable in all other scenarios.
解决协议问题的协议是容错分布式系统的基本组成部分。虽然已经发布了许多协议,但很少有人分析它们的性能,特别是它们的容错机制的性能。在本文中,我们比较了两种著名的异步一致性算法。在这两种算法中,一个领导进程试图强加一个决定,如果另一个领导进程失败了,另一个领导进程会重新尝试。这些算法选举领导者的方式不同:Chandra-Toueg算法有一个轮流的领导者,而Paxos算法中的进程直接选举领导者。我们研究了这种差异对性能的影响。在所研究的系统中,进程相互发送原子广播。共识用于决定消息的传递顺序。我们评估了(1)在没有崩溃或怀疑的情况下运行的稳态延迟,(2)有崩溃的运行,(3)没有崩溃的运行,其中正确的进程被错误地怀疑已经崩溃,以及(4)一次崩溃和(5)多个相关崩溃之后的瞬态延迟。结果表明,Paxos算法可以更好地容忍频繁的错误猜疑(3)和相关崩溃(5),而在所有其他场景下的性能都是相当的。
{"title":"Performance comparison of a rotating coordinator and a leader based consensus algorithm","authors":"P. Urbán, Naohiro Hayashibara, A. Schiper, T. Katayama","doi":"10.1109/RELDIS.2004.1352999","DOIUrl":"https://doi.org/10.1109/RELDIS.2004.1352999","url":null,"abstract":"Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to analyze their performance, especially the performance of their fault tolerance mechanisms. In this paper, we compare two well-known asynchronous consensus algorithms. In both algorithms, a leader process tries to impose a decision, and another leader retries if the leader fails doing so. The algorithms elect leaders differently: the Chandra-Toueg algorithm has a rotating leader, whereas processes in the Paxos algorithm elect leaders directly. We investigate the performance implications of this difference. In the system under study, processes send atomic broadcasts to each other. Consensus is used to decide the delivery order of messages. We evaluate the steady state latency in (1) runs with neither crashes nor suspicions, (2) runs with crashes and (3) runs with no crashes in which correct processes are wrongly suspected to have crashed, as well as the transient latency after (4) one crash and (5) multiple correlated crashes. The results show that the Paxos algorithm tolerates frequent wrong suspicions (3) and correlated crashes (5) better, while the performance is comparable in all other scenarios.","PeriodicalId":142327,"journal":{"name":"Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124787171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
期刊
Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1