Pub Date : 2001-10-28DOI: 10.1109/RELDIS.2001.969773
J. Voas
When software is built from components, nonfunctional properties such as security, reliability, fault-tolerance, performance, availability, safety, etc. are not necessarily composed. The problem stems from our inability to know a priori, for example, that the security of a system composed of two components can be determined from knowledge about the security of each. This is because the security of the composite is based on more than just the security of the individual components. There are numerous reasons for this. The article considers only the factors of component performance and calendar time. It is concluded that no properties are easy to compose and some are much harder than others.
{"title":"Why is it so hard to predict software system trustworthiness from software component trustworthiness?","authors":"J. Voas","doi":"10.1109/RELDIS.2001.969773","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.969773","url":null,"abstract":"When software is built from components, nonfunctional properties such as security, reliability, fault-tolerance, performance, availability, safety, etc. are not necessarily composed. The problem stems from our inability to know a priori, for example, that the security of a system composed of two components can be determined from knowledge about the security of each. This is because the security of the composite is based on more than just the security of the individual components. There are numerous reasons for this. The article considers only the factors of component performance and calendar time. It is concluded that no properties are easy to compose and some are much harder than others.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121896635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-10-28DOI: 10.1109/RELDIS.2001.969775
P. Kuznetsov, R. Guerraoui, S. Handurukande, Anne-Marie Kermarrec
We present in this paper a general garbage collection scheme that reduces the "noise" in gossip-based broadcast algorithms. In short, our garbage collection scheme uses a simple heuristic to trade "useless" messages with "useful" ones. Used with a given gossip-based broadcast algorithm, a given size of buffers, and a given number of disseminated messages (e.g., per gossip round), our garbage collection scheme provides higher overall reliability than more conventional schemes. We illustrate our approach through two algorithms: bimodal multicast (pbcast) and lightweight probabilistic broadcast (lpbcast). Our scheme is based on the intuitive idea of discarding messages according to their "age". The "age" of a message represents the number of times the message has been retransmitted.
{"title":"Reducing noise in gossip-based reliable broadcast","authors":"P. Kuznetsov, R. Guerraoui, S. Handurukande, Anne-Marie Kermarrec","doi":"10.1109/RELDIS.2001.969775","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.969775","url":null,"abstract":"We present in this paper a general garbage collection scheme that reduces the \"noise\" in gossip-based broadcast algorithms. In short, our garbage collection scheme uses a simple heuristic to trade \"useless\" messages with \"useful\" ones. Used with a given gossip-based broadcast algorithm, a given size of buffers, and a given number of disseminated messages (e.g., per gossip round), our garbage collection scheme provides higher overall reliability than more conventional schemes. We illustrate our approach through two algorithms: bimodal multicast (pbcast) and lightweight probabilistic broadcast (lpbcast). Our scheme is based on the intuitive idea of discarding messages according to their \"age\". The \"age\" of a message represents the number of times the message has been retransmitted.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131659641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-10-28DOI: 10.1109/RELDIS.2001.969760
S. Upadhyaya, R. Chinchani, K. Kwiat
Local and wide area network information assurance analysts need current and precise knowledge about their system activities in order to address the challenges of critical infrastructure protection. In particular, the analyst needs to know in real-time that an intrusion has occurred so that an active response and recovery thread can be created rapidly. Existing intrusion detection solutions are basically after-the-fact, thereby offering very little in terms of damage confinement and restoration of service. Quick recovery is only possible if the assessment scheme has low latency and it occurs in real-time. The objective of the paper is to develop a reasoning framework to aid in the real-time detection and assessment task that is based on a novel idea of encapsulation of owner's intent. The theoretical framework developed here will help resolve dubious circumstances that may arise while inferring the premises of operations (encapsulated from owner's intent) by way of examining the observed conclusions resulting from the actual operations of the owner. This reasoning is significant in view of the fact that intrusion signaling is not a binary decision unlike error detection in traditional fault tolerance. Our reasoning framework has been developed by leveraging the concepts of cost analysis and pricing under uncertainty found in economics and finance. Our main result is the modeling of user activity on a computing system as a martingale and the subsequent quantification of the cost of performing a job to enable decision making.
{"title":"An analytical framework for reasoning about intrusions","authors":"S. Upadhyaya, R. Chinchani, K. Kwiat","doi":"10.1109/RELDIS.2001.969760","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.969760","url":null,"abstract":"Local and wide area network information assurance analysts need current and precise knowledge about their system activities in order to address the challenges of critical infrastructure protection. In particular, the analyst needs to know in real-time that an intrusion has occurred so that an active response and recovery thread can be created rapidly. Existing intrusion detection solutions are basically after-the-fact, thereby offering very little in terms of damage confinement and restoration of service. Quick recovery is only possible if the assessment scheme has low latency and it occurs in real-time. The objective of the paper is to develop a reasoning framework to aid in the real-time detection and assessment task that is based on a novel idea of encapsulation of owner's intent. The theoretical framework developed here will help resolve dubious circumstances that may arise while inferring the premises of operations (encapsulated from owner's intent) by way of examining the observed conclusions resulting from the actual operations of the owner. This reasoning is significant in view of the fact that intrusion signaling is not a binary decision unlike error detection in traditional fault tolerance. Our reasoning framework has been developed by leveraging the concepts of cost analysis and pricing under uncertainty found in economics and finance. Our main result is the modeling of user activity on a computing system as a martingale and the subsequent quantification of the cost of performing a job to enable decision making.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127599301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-10-28DOI: 10.1109/RELDIS.2001.970774
S. Schemmer, E. Nett, M. Mock
Autonomous systems are expected to provide increasingly complex and safety-critical services that will, sooner or later, require the cooperation of several autonomous systems for their fulfillment. In particular, coordinating the access to shared physical and information technological resources will become a general problem. Scheduling these resources is subject to strong real-time and reliability requirements. In this paper, we present an architecture that allows autonomous mobile systems to schedule shared resources in real-time using their own wireless distributed infrastructure. In our architecture, there is a clear separation between the application-specific scheduling part that is modeled as a function of the global state and the communication part that is used to provide the global state. By isolating the more error-prone communication part within a communication hardcore, the reliability of the overall system is increased and the locally executed scheduling function can be designed with primary focus on the application-specific real-time requirements.
{"title":"Reliable real-time cooperation of mobile autonomous systems","authors":"S. Schemmer, E. Nett, M. Mock","doi":"10.1109/RELDIS.2001.970774","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.970774","url":null,"abstract":"Autonomous systems are expected to provide increasingly complex and safety-critical services that will, sooner or later, require the cooperation of several autonomous systems for their fulfillment. In particular, coordinating the access to shared physical and information technological resources will become a general problem. Scheduling these resources is subject to strong real-time and reliability requirements. In this paper, we present an architecture that allows autonomous mobile systems to schedule shared resources in real-time using their own wireless distributed infrastructure. In our architecture, there is a clear separation between the application-specific scheduling part that is modeled as a function of the global state and the communication part that is used to provide the global state. By isolating the more error-prone communication part within a communication hardcore, the reliability of the overall system is increased and the locally executed scheduling function can be designed with primary focus on the application-specific real-time requirements.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125110258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-10-28DOI: 10.1109/RELDIS.2001.969752
K. Kim
The volume and size of real-time (RT) distributed computing (DC) applications are now growing faster than in the last century. The mixture of application tasks running on such systems is growing as well as the shared use of computing and communication resources for multiple applications including RT and non-RT applications. The increase in use of shared resources accompanies with it the need for effective security enforcement. More specifically, the needs are to prevent unauthorized users: (1) from accessing protected information; and (2) from disturbing bona-fide users in getting services from server components. Such disturbances are also called denial-of-service attacks.
{"title":"Incorporation of security and fault tolerance mechanisms into real-time component-based distributed computing systems","authors":"K. Kim","doi":"10.1109/RELDIS.2001.969752","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.969752","url":null,"abstract":"The volume and size of real-time (RT) distributed computing (DC) applications are now growing faster than in the last century. The mixture of application tasks running on such systems is growing as well as the shared use of computing and communication resources for multiple applications including RT and non-RT applications. The increase in use of shared resources accompanies with it the need for effective security enforcement. More specifically, the needs are to prevent unauthorized users: (1) from accessing protected information; and (2) from disturbing bona-fide users in getting services from server components. Such disturbances are also called denial-of-service attacks.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131328755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-10-28DOI: 10.1109/RELDIS.2001.969730
R. Oliveira, J. Pereira, A. Schiper
Fault-tolerant control systems can be built by replicating critical components. However replication raises the issue of inconsistency. Multiple protocols for ensuring consistency have been described in the literature. PADRE (Protocol for Asymmetric Duplex REdundancy) is such a protocol, and an interesting case study of a complex and sensitive problem: the management of replicated traffic controllers in a railway system. However, the low level at which the protocol has been developed embodies system details, namely timeliness assumptions, that make it difficult to understand and may narrow its applicability. We argue that, when designing a protocol, it is preferable to consider first a general solution that does not include any timeliness assumptions; then, by taking into account an additional hypothesis, one can easily design a time-based solution tailored to a specific environment. This paper illustrates the benefit of a top-down protocol design approach and shows that PADRE can be seen as an instance of a standard primary-backup replication protocol based on view-synchronous communication (VSC).
可以通过复制关键组件来构建容错控制系统。然而,复制引起了不一致的问题。确保一致性的多种协议已在文献中描述。PADRE (Protocol for Asymmetric Duplex REdundancy,非对称双工冗余协议)就是这样一个协议,也是一个复杂而敏感问题的有趣案例研究:铁路系统中复制交通控制器的管理。然而,协议开发的低层次体现了系统细节,即时效性假设,这使得它难以理解,并可能缩小其适用性。我们认为,在设计协议时,最好首先考虑不包含任何时效性假设的通用解决方案;然后,通过考虑额外的假设,可以轻松地设计针对特定环境的基于时间的解决方案。本文说明了自顶向下协议设计方法的好处,并表明PADRE可以被视为基于视图同步通信(VSC)的标准主备份复制协议的一个实例。
{"title":"Primary-backup replication: from a time-free protocol to a time-based implementation","authors":"R. Oliveira, J. Pereira, A. Schiper","doi":"10.1109/RELDIS.2001.969730","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.969730","url":null,"abstract":"Fault-tolerant control systems can be built by replicating critical components. However replication raises the issue of inconsistency. Multiple protocols for ensuring consistency have been described in the literature. PADRE (Protocol for Asymmetric Duplex REdundancy) is such a protocol, and an interesting case study of a complex and sensitive problem: the management of replicated traffic controllers in a railway system. However, the low level at which the protocol has been developed embodies system details, namely timeliness assumptions, that make it difficult to understand and may narrow its applicability. We argue that, when designing a protocol, it is preferable to consider first a general solution that does not include any timeliness assumptions; then, by taking into account an additional hypothesis, one can easily design a time-based solution tailored to a specific environment. This paper illustrates the benefit of a top-down protocol design approach and shows that PADRE can be seen as an instance of a standard primary-backup replication protocol based on view-synchronous communication (VSC).","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128989501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-10-28DOI: 10.1109/RELDIS.2001.969737
A. Agbaria, H. Attiya, R. Friedman, R. Vitenberg
Proposes a new classification of executions with checkpoints that is based on the notion of k-rollback, indicating the maximal number of checkpoints that may need to be rolled back during recovery. The relation between known execution classes is explored, and it is shown that coordinated checkpointing, SZPF (strictly Z-path free) and ZPF (Z-path free) are 1-rollback mechanisms, while ZCF (Z-cycle free) is (n-1)-rollback, where n is the number of participants in an execution. A new class of executions, called d-BC (d-bounded cycles), is introduced, and is shown to be an [(n-1)/spl middot/d]-rollback mechanism (ZCF is a special case of d-BC for d=1). Finally, a d-BC protocol is presented. This protocol has the nice property that it does not impose any control information overhead on an application's messages, yet it only sends a few control messages of its own. Moreover, the protocol maintains information about recovery lines, which enables very efficient discovery of the most recent recovery line that existed a short time before the failure.
提出基于k-rollback概念的检查点执行的新分类,该分类指示在恢复期间可能需要回滚的检查点的最大数量。探讨了已知执行类之间的关系,表明协调检查点、SZPF(严格无z路径)和ZPF(无z路径)是1-回滚机制,而ZCF(无z循环)是(n-1)-回滚机制,其中n是执行中参与者的数量。引入了一类新的执行,称为d- bc (d-有界循环),并被证明是一种[(n-1)/spl middot/d]-回滚机制(ZCF是d=1时d- bc的特殊情况)。最后,提出了一种d-BC协议。该协议有一个很好的特性,即它不会对应用程序的消息施加任何控制信息开销,但它只发送自己的几个控制消息。此外,协议维护有关恢复线路的信息,这使得能够非常有效地发现在故障发生前很短时间内存在的最近的恢复线路。
{"title":"Quantifying rollback propagation in distributed checkpointing","authors":"A. Agbaria, H. Attiya, R. Friedman, R. Vitenberg","doi":"10.1109/RELDIS.2001.969737","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.969737","url":null,"abstract":"Proposes a new classification of executions with checkpoints that is based on the notion of k-rollback, indicating the maximal number of checkpoints that may need to be rolled back during recovery. The relation between known execution classes is explored, and it is shown that coordinated checkpointing, SZPF (strictly Z-path free) and ZPF (Z-path free) are 1-rollback mechanisms, while ZCF (Z-cycle free) is (n-1)-rollback, where n is the number of participants in an execution. A new class of executions, called d-BC (d-bounded cycles), is introduced, and is shown to be an [(n-1)/spl middot/d]-rollback mechanism (ZCF is a special case of d-BC for d=1). Finally, a d-BC protocol is presented. This protocol has the nice property that it does not impose any control information overhead on an application's messages, yet it only sends a few control messages of its own. Moreover, the protocol maintains information about recovery lines, which enables very efficient discovery of the most recent recovery line that existed a short time before the failure.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"37 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114042990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-10-28DOI: 10.1109/RELDIS.2001.970769
Vilgot Claesson, Henrik Lönn, N. Suri
A desired attribute in safety critical embedded real-time systems is a system time/event synchronization capability on which predictable communication can be established. Focusing on bus-based communication protocols in TDMA environments, we present a novel, efficient, and low-cost synchronization approach with bounded start-up time. This approach utilizes information about each node's unique message lengths to achieve synchronization. The protocol avoids start-up collisions by postponing retries after a collision. We also present a re-synchronization strategy that incorporates recovering nodes into synchronization.
{"title":"Efficient TDMA synchronization for distributed embedded systems","authors":"Vilgot Claesson, Henrik Lönn, N. Suri","doi":"10.1109/RELDIS.2001.970769","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.970769","url":null,"abstract":"A desired attribute in safety critical embedded real-time systems is a system time/event synchronization capability on which predictable communication can be established. Focusing on bus-based communication protocols in TDMA environments, we present a novel, efficient, and low-cost synchronization approach with bounded start-up time. This approach utilizes information about each node's unique message lengths to achieve synchronization. The protocol avoids start-up collisions by postponing retries after a collision. We also present a re-synchronization strategy that incorporates recovering nodes into synchronization.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114471625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-10-28DOI: 10.1109/RELDIS.2001.969750
K. Kwiat
The combined topics of reliability and security are briefly traced in relation to the past and present endeavors of the Air Force Research Laboratory's Information Directorate. It is concluded that in the realm of information assurance, system features created to tolerate benign failures and to respond to attack must be stressed and tested beforehand and their effectiveness predicted, otherwise they might inadvertently magnify the attacker's power. With the explosive growth of distributed and mobile systems and the need for information assurance to address the accompanying vulnerabilities, one history lesson comes to mind: although ancient Rome was not built in a day, it did not take very long for it to fall once the barbarians took hold.
{"title":"Can reliability and security be joined reliably and securely?","authors":"K. Kwiat","doi":"10.1109/RELDIS.2001.969750","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.969750","url":null,"abstract":"The combined topics of reliability and security are briefly traced in relation to the past and present endeavors of the Air Force Research Laboratory's Information Directorate. It is concluded that in the realm of information assurance, system features created to tolerate benign failures and to respond to attack must be stressed and tested beforehand and their effectiveness predicted, otherwise they might inadvertently magnify the attacker's power. With the explosive growth of distributed and mobile systems and the need for information assurance to address the accompanying vulnerabilities, one history lesson comes to mind: although ancient Rome was not built in a day, it did not take very long for it to fall once the barbarians took hold.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122093987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-10-28DOI: 10.1109/RELDIS.2001.969766
M. Hurfin, A. Mostéfaoui, M. Raynal, R. Macêdo
The paper revisits the "sliding window" notion commonly encountered in communication protocols and applies it to the round numbers of round-based asynchronous protocols. This approach is novel. To illustrate its benefits, the paper presents an original weak failure detector-based consensus protocol that allows each process to be simultaneously involved in several rounds. The rounds in which a process is simultaneously involved defines "sliding round window". The proposed approach has several advantages. It fits better to the uncertainty created by the asynchrony and failures, and consequently permits one to design efficient round-based asynchronous protocols. Maybe more important, it also provides a better understanding of the global synchronization that manages the protocol progress from round to round. This appears clearly in the proposed failure detector-based consensus protocol, where the "sliding round window" allows one to dynamically define the message exchange pattern for each round separately.
{"title":"A consensus protocol based on a weak failure detector and a sliding round window","authors":"M. Hurfin, A. Mostéfaoui, M. Raynal, R. Macêdo","doi":"10.1109/RELDIS.2001.969766","DOIUrl":"https://doi.org/10.1109/RELDIS.2001.969766","url":null,"abstract":"The paper revisits the \"sliding window\" notion commonly encountered in communication protocols and applies it to the round numbers of round-based asynchronous protocols. This approach is novel. To illustrate its benefits, the paper presents an original weak failure detector-based consensus protocol that allows each process to be simultaneously involved in several rounds. The rounds in which a process is simultaneously involved defines \"sliding round window\". The proposed approach has several advantages. It fits better to the uncertainty created by the asynchrony and failures, and consequently permits one to design efficient round-based asynchronous protocols. Maybe more important, it also provides a better understanding of the global synchronization that manages the protocol progress from round to round. This appears clearly in the proposed failure detector-based consensus protocol, where the \"sliding round window\" allows one to dynamically define the message exchange pattern for each round separately.","PeriodicalId":440881,"journal":{"name":"Proceedings 20th IEEE Symposium on Reliable Distributed Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132814616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}