{"title":"A certain freedom: thoughts on the CAP theorem","authors":"E. Brewer","doi":"10.1145/1835698.1835701","DOIUrl":null,"url":null,"abstract":"At PODC 2000, the CAP theorem received its first broad audience. Surprisingly for an impossibility result, one important effect has been to free designers to explore a wider range of distributed systems. Designers of wide-area systems, in which network partitions are considered inevitable, know they cannot have both availability and consistency, and thus can now justify weaker consistency. The rise of the \"NoSQL\" movement (\"Not Only SQL\") is an expression of this freedom. The choices of how and when to weaken consistency are often the defining characteristics of these systems, with new variations appearing every year. We review a variety of interesting places in the \"CAP Space\" as a way to illuminate these issues and their consequences. For example, automatic teller machines (ATMs), which predate the CAP theorem, surprisingly choose availability with weak consistency but with bounded risk. Finally, I explore a few of the options to try to \"work around\" the impossible. The most basic is the use of commutative operations, which make it easy to restore consistency after a partition heals. However, even many commutative operations have non-commutative exceptions in practice, which means that the exceptions may be incorrect or late. Adding the concept of \"delayed exceptions\" allows more operations to be considered commutative and simplifies eventual consistency during a partition. Furthermore, we can think of delayed exception handling as \"compensation\" - we execute a compensating transaction that restores consistency. Delayed exception handling with compensation appears to be what most real wide-area systems do - inconsistency due to limited communication is treated as an exception and some exceptional action, such as monetary compensation or even legal action, is used to fix it. This approach to wide-area systems puts the emphasis on audit trails and recovery rather than prevention, and implies that we should expand and formalize the role of compensation in the design of complex systems","PeriodicalId":447863,"journal":{"name":"Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"90","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1835698.1835701","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 90
Abstract
At PODC 2000, the CAP theorem received its first broad audience. Surprisingly for an impossibility result, one important effect has been to free designers to explore a wider range of distributed systems. Designers of wide-area systems, in which network partitions are considered inevitable, know they cannot have both availability and consistency, and thus can now justify weaker consistency. The rise of the "NoSQL" movement ("Not Only SQL") is an expression of this freedom. The choices of how and when to weaken consistency are often the defining characteristics of these systems, with new variations appearing every year. We review a variety of interesting places in the "CAP Space" as a way to illuminate these issues and their consequences. For example, automatic teller machines (ATMs), which predate the CAP theorem, surprisingly choose availability with weak consistency but with bounded risk. Finally, I explore a few of the options to try to "work around" the impossible. The most basic is the use of commutative operations, which make it easy to restore consistency after a partition heals. However, even many commutative operations have non-commutative exceptions in practice, which means that the exceptions may be incorrect or late. Adding the concept of "delayed exceptions" allows more operations to be considered commutative and simplifies eventual consistency during a partition. Furthermore, we can think of delayed exception handling as "compensation" - we execute a compensating transaction that restores consistency. Delayed exception handling with compensation appears to be what most real wide-area systems do - inconsistency due to limited communication is treated as an exception and some exceptional action, such as monetary compensation or even legal action, is used to fix it. This approach to wide-area systems puts the emphasis on audit trails and recovery rather than prevention, and implies that we should expand and formalize the role of compensation in the design of complex systems