When deployed in geo-distributed environments, existing state-machine replication protocols require at least one widearea communication step for establishing a total order on client requests. For use cases in which clients are not interested in the actual result of a request, but just need a guarantee that the request will be processed eventually, this property usually incurs unnecessarily high response times. To address this problem we present Weave, a cloud-based georeplication protocol that relies on replica groups in multiple geographic regions to efficiently assign stable sequence numbers to incoming requests. This approach enables Weave to offer guaranteed writes which in the absence of faults only wait for communication within a client's local replica group to produce an execution guarantee for a particular sequence number. Our experiments with a distributed queue and a replicated log show that guaranteed writes can significantly improve response times of geo-replicated applications.
{"title":"Low-latency geo-replicated state machines with guaranteed writes","authors":"M. Eischer, Benedikt Straßner, T. Distler","doi":"10.1145/3380787.3393686","DOIUrl":"https://doi.org/10.1145/3380787.3393686","url":null,"abstract":"When deployed in geo-distributed environments, existing state-machine replication protocols require at least one widearea communication step for establishing a total order on client requests. For use cases in which clients are not interested in the actual result of a request, but just need a guarantee that the request will be processed eventually, this property usually incurs unnecessarily high response times. To address this problem we present Weave, a cloud-based georeplication protocol that relies on replica groups in multiple geographic regions to efficiently assign stable sequence numbers to incoming requests. This approach enables Weave to offer guaranteed writes which in the absence of faults only wait for communication within a client's local replica group to produce an execution guarantee for a particular sequence number. Our experiments with a distributed queue and a replicated log show that guaranteed writes can significantly improve response times of geo-replicated applications.","PeriodicalId":115452,"journal":{"name":"Proceedings of the 7th Workshop on Principles and Practice of Consistency for Distributed Data","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127801781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jack Waudby, P. Ezhilchelvan, J. Webber, I. Mitrani
Our earlier work identifies reciprocal consistency as an important property that must be preserved in distributed graph databases. It also demonstrates that a failure to do so seriously undermines the integrity of the database itself in the long term. Reciprocal consistency can be maintained as a part of enforcing any known isolation guarantee and such an enforcement is also known to lead to reduction in performance. Therefore, in practice, distributed graph databases are often built atop BASE databases with no isolation guarantees, benefiting from good performance but leaving them susceptible to corruption due to violations of reciprocal consistency. This paper designs and presents a lightweight, locking-free protocol and then evaluates the protocol's abilities to preserve reciprocal consistency and also offer good throughput. Our evaluations establish that the protocol can offer both integrity guarantees and sound performance when the value of its parameter is chosen appropriately.
{"title":"Preserving reciprocal consistency in distributed graph databases","authors":"Jack Waudby, P. Ezhilchelvan, J. Webber, I. Mitrani","doi":"10.1145/3380787.3393675","DOIUrl":"https://doi.org/10.1145/3380787.3393675","url":null,"abstract":"Our earlier work identifies reciprocal consistency as an important property that must be preserved in distributed graph databases. It also demonstrates that a failure to do so seriously undermines the integrity of the database itself in the long term. Reciprocal consistency can be maintained as a part of enforcing any known isolation guarantee and such an enforcement is also known to lead to reduction in performance. Therefore, in practice, distributed graph databases are often built atop BASE databases with no isolation guarantees, benefiting from good performance but leaving them susceptible to corruption due to violations of reciprocal consistency. This paper designs and presents a lightweight, locking-free protocol and then evaluates the protocol's abilities to preserve reciprocal consistency and also offer good throughput. Our evaluations establish that the protocol can offer both integrity guarantees and sound performance when the value of its parameter is chosen appropriately.","PeriodicalId":115452,"journal":{"name":"Proceedings of the 7th Workshop on Principles and Practice of Consistency for Distributed Data","volume":"186 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123389731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Linde, Pedro Fouto, J. Leitao, Nuno M. Preguiça
In the last few years, causal consistency has become a popular consistency model for geo-replicated databases. The algorithms proposed to enforce causal consistency typically associate with each operation some metadata, which is used to guarantee that an operation is not executed if its execution would break causality. This may lead to the impression that causal consistency is intrinsically costly and non scalable. In this paper, we analyze the metadata costs of enforcing causal consistency and put these costs in perspective, considering the metadata that is necessary to enforce reliability. We show that by wisely ordering the propagation of operations it is possible to enforce causal consistency without any additional metadata other than the already necessary to enforce reliability.
{"title":"The intrinsic cost of causal consistency","authors":"A. Linde, Pedro Fouto, J. Leitao, Nuno M. Preguiça","doi":"10.1145/3380787.3393674","DOIUrl":"https://doi.org/10.1145/3380787.3393674","url":null,"abstract":"In the last few years, causal consistency has become a popular consistency model for geo-replicated databases. The algorithms proposed to enforce causal consistency typically associate with each operation some metadata, which is used to guarantee that an operation is not executed if its execution would break causality. This may lead to the impression that causal consistency is intrinsically costly and non scalable. In this paper, we analyze the metadata costs of enforcing causal consistency and put these costs in perspective, considering the metadata that is necessary to enforce reliability. We show that by wisely ordering the propagation of operations it is possible to enforce causal consistency without any additional metadata other than the already necessary to enforce reliability.","PeriodicalId":115452,"journal":{"name":"Proceedings of the 7th Workshop on Principles and Practice of Consistency for Distributed Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130673087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Linde, Diogo Serra, J. Leitao, Nuno M. Preguiça
The purpose of this paper is to discuss the limitations imposed by introducing fault-tolerance in a partial replication system which aims to provide causal consistency. The general outcome is that, to provide support for indefinite replica-failure, the notion of partial in partial replication becomes not-really-partial-at-all. We prove that to implement causal consistency when indefinite replica-failures are possible, it is impossible to respect the concept of genuine partial replication -- not storing or propagating operations which are on objects a given replica does not replicate locally. In our initial approach to tackle this issue client replicas need only to replicate the operations they depend on which have not yet been marked as durable by the centralised component. We discuss remaining limitations and expected improvements in future work.
{"title":"On combining fault tolerance and partial replication with causal consistency","authors":"A. Linde, Diogo Serra, J. Leitao, Nuno M. Preguiça","doi":"10.1145/3380787.3393684","DOIUrl":"https://doi.org/10.1145/3380787.3393684","url":null,"abstract":"The purpose of this paper is to discuss the limitations imposed by introducing fault-tolerance in a partial replication system which aims to provide causal consistency. The general outcome is that, to provide support for indefinite replica-failure, the notion of partial in partial replication becomes not-really-partial-at-all. We prove that to implement causal consistency when indefinite replica-failures are possible, it is impossible to respect the concept of genuine partial replication -- not storing or propagating operations which are on objects a given replica does not replicate locally. In our initial approach to tackle this issue client replicas need only to replicate the operations they depend on which have not yet been marked as durable by the centralised component. We discuss remaining limitations and expected improvements in future work.","PeriodicalId":115452,"journal":{"name":"Proceedings of the 7th Workshop on Principles and Practice of Consistency for Distributed Data","volume":"54 16","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113990582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Conflict-free Replicated Data Types (CRDTs) for lists allow multiple users to concurrently insert and delete elements in a shared list object. However, existing algorithms behave poorly when users concurrently move list elements to a new position (i.e. reorder the elements in the list). We demonstrate the need for such a move operation, and describe an algorithm that extends a list CRDT with an explicit move operation. Our algorithm can be used in conjunction with any existing list CRDT algorithm. In addition to moving a single list element, we also discuss the open problem of moving ranges of elements.
{"title":"Moving elements in list CRDTs","authors":"Martin Kleppmann","doi":"10.1145/3380787.3393677","DOIUrl":"https://doi.org/10.1145/3380787.3393677","url":null,"abstract":"Conflict-free Replicated Data Types (CRDTs) for lists allow multiple users to concurrently insert and delete elements in a shared list object. However, existing algorithms behave poorly when users concurrently move list elements to a new position (i.e. reorder the elements in the list). We demonstrate the need for such a move operation, and describe an algorithm that extends a list CRDT with an explicit move operation. Our algorithm can be used in conjunction with any existing list CRDT algorithm. In addition to moving a single list element, we also discuss the open problem of moving ranges of elements.","PeriodicalId":115452,"journal":{"name":"Proceedings of the 7th Workshop on Principles and Practice of Consistency for Distributed Data","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124926030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Weidner, Heather Miller, Christopher S. Meiklejohn
Operation-based Conflict-free Replicated Data Types (CRDTs) are eventually consistent replicated data types that automatically resolve conflicts between concurrent operations. Opbased CRDTs must be designed differently for each data type, and current designs use ad-hoc techniques to handle concurrent operations that do not naturally commute.We present a new construction, the semidirect product of op-based CRDTs, which combines the operations of two CRDTs into one while handling conflicts between their concurrent operations in a uniform way. We demonstrate the construction's utility by decomposing several existing CRDTs as semidirect products of simpler CRDTs, as well as by using it to construct novel CRDTs. Although it reproduces common CRDT semantics, the semidirect product can be viewed as a restricted kind of operational transformation, thus forming a bridge between the two fields.
{"title":"Composing and decomposing op-based CRDTs with semidirect products: (summary)","authors":"M. Weidner, Heather Miller, Christopher S. Meiklejohn","doi":"10.1145/3380787.3393687","DOIUrl":"https://doi.org/10.1145/3380787.3393687","url":null,"abstract":"Operation-based Conflict-free Replicated Data Types (CRDTs) are eventually consistent replicated data types that automatically resolve conflicts between concurrent operations. Opbased CRDTs must be designed differently for each data type, and current designs use ad-hoc techniques to handle concurrent operations that do not naturally commute.We present a new construction, the semidirect product of op-based CRDTs, which combines the operations of two CRDTs into one while handling conflicts between their concurrent operations in a uniform way. We demonstrate the construction's utility by decomposing several existing CRDTs as semidirect products of simpler CRDTs, as well as by using it to construct novel CRDTs. Although it reproduces common CRDT semantics, the semidirect product can be viewed as a restricted kind of operational transformation, thus forming a bridge between the two fields.","PeriodicalId":115452,"journal":{"name":"Proceedings of the 7th Workshop on Principles and Practice of Consistency for Distributed Data","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126819933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CRDTs, or Conflict-free Replicated Data Types, are data abstractions that guarantee convergence for replicated data. Set is one of the most fundamental and widely used data types. Existing general-purpose set CRDTs associate every element in the set with causal contexts as meta data. Manipulation of causal contexts can be complicated and costly. We present a new set CRDT, CLSet (causal-length set), where the meta data associated with an element is simply a natural number (called causal length). We compare CLSet with existing general purpose CRDTs in terms of semantics and performance.
{"title":"A low-cost set CRDT based on causal lengths","authors":"Weihai Yu, Sigbjørn Rostad","doi":"10.1145/3380787.3393678","DOIUrl":"https://doi.org/10.1145/3380787.3393678","url":null,"abstract":"CRDTs, or Conflict-free Replicated Data Types, are data abstractions that guarantee convergence for replicated data. Set is one of the most fundamental and widely used data types. Existing general-purpose set CRDTs associate every element in the set with causal contexts as meta data. Manipulation of causal contexts can be complicated and costly. We present a new set CRDT, CLSet (causal-length set), where the meta data associated with an element is simply a natural number (called causal length). We compare CLSet with existing general purpose CRDTs in terms of semantics and performance.","PeriodicalId":115452,"journal":{"name":"Proceedings of the 7th Workshop on Principles and Practice of Consistency for Distributed Data","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123770080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributed consensus is a fundamental primitive for constructing fault-tolerant, strongly-consistent distributed systems. Though many distributed consensus algorithms have been proposed, just two dominate production systems: Paxos, the traditional, famously subtle, algorithm; and Raft, a more recent algorithm positioned as a more understandable alternative to Paxos. In this paper, we consider the question of which algorithm, Paxos or Raft, is the better solution to distributed consensus? We analyse both to determine exactly how they differ by describing a simplified Paxos algorithm using Raft's terminology and pragmatic abstractions. We find that both Paxos and Raft take a very similar approach to distributed consensus, differing only in their approach to leader election. Most notably, Raft only allows servers with up-to-date logs to become leaders, whereas Paxos allows any server to be leader provided it then updates its log to ensure it is up-to-date. Raft's approach is surprisingly efficient given its simplicity as, unlike Paxos, it does not require log entries to be exchanged during leader election. We surmise that much of the understandability of Raft comes from the paper's clear presentation rather than being fundamental to the underlying algorithm being presented.
{"title":"Paxos vs Raft: have we reached consensus on distributed consensus?","authors":"H. Howard, R. Mortier","doi":"10.1145/3380787.3393681","DOIUrl":"https://doi.org/10.1145/3380787.3393681","url":null,"abstract":"Distributed consensus is a fundamental primitive for constructing fault-tolerant, strongly-consistent distributed systems. Though many distributed consensus algorithms have been proposed, just two dominate production systems: Paxos, the traditional, famously subtle, algorithm; and Raft, a more recent algorithm positioned as a more understandable alternative to Paxos. In this paper, we consider the question of which algorithm, Paxos or Raft, is the better solution to distributed consensus? We analyse both to determine exactly how they differ by describing a simplified Paxos algorithm using Raft's terminology and pragmatic abstractions. We find that both Paxos and Raft take a very similar approach to distributed consensus, differing only in their approach to leader election. Most notably, Raft only allows servers with up-to-date logs to become leaders, whereas Paxos allows any server to be leader provided it then updates its log to ensure it is up-to-date. Raft's approach is surprisingly efficient given its simplicity as, unlike Paxos, it does not require log entries to be exchanged during leader election. We surmise that much of the understandability of Raft comes from the paper's clear presentation rather than being fundamental to the underlying algorithm being presented.","PeriodicalId":115452,"journal":{"name":"Proceedings of the 7th Workshop on Principles and Practice of Consistency for Distributed Data","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126819477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}