{"title":"Recoverable Mutual Exclusion Under System-Wide Failures","authors":"W. Golab, Danny Hendler","doi":"10.1145/3212734.3212755","DOIUrl":null,"url":null,"abstract":"Recoverable mutual exclusion (RME) is a variation on the classic mutual exclusion (ME) problem that allows processes to crash and recover. The time complexity of RME algorithms is quantified in the same way as for ME, namely by counting remote memory references -- expensive memory operations that traverse the processor-to-memory interconnect. Prior work has established that the RMR complexity of the RME problem for n processes is Θ(log n) for the class of algorithms that use read/write registers and single-word comparison primitives such as Compare-And-Swap (Golab and Ramaraju 2016), O(log n / log log n) for the class of algorithms that use read/write registers and additional single-word read-modify-primitives such as Fetch-And-Store (Golab and Hendler 2017), and Θ(1) for the class of algorithms that use read/write registers and specialized double-word read-modify-write primitives (Golab and Hendler 2017). These complexity bounds hold in a model of computation where processes may fail independently, and where a process that fails while accessing the mutex is required to recover eventually. This body of work leaves open two important questions: (i) what is the tight bound on the RMR complexity of RME for the class of algorithms that use read/write registers and commonly supported single-word read-modify-primitives; and (ii) how is the RMR complexity of RME affected by variations in the failure model? This paper answers both questions partially by showing that RME can be solved using O(1) RMRs per passage in the worst case in a model where failures are system-wide (i.e., all processes crash simultaneously), and processes receive additional information from the environment regarding the occurrence of the failure. The upper bound algorithm we present relies crucially on a novel RMR-efficient barrier that processes use to synchronize recovery actions after each failure. The barrier uses read/write registers and single-word Compare-And-Swap only. Additionally, we present a transformation that can add properties such as critical section re-entry and a strong notion of starvation freedom to any RME algorithm while preserving its asymptotic RMR complexity.","PeriodicalId":198284,"journal":{"name":"Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3212734.3212755","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
Recoverable mutual exclusion (RME) is a variation on the classic mutual exclusion (ME) problem that allows processes to crash and recover. The time complexity of RME algorithms is quantified in the same way as for ME, namely by counting remote memory references -- expensive memory operations that traverse the processor-to-memory interconnect. Prior work has established that the RMR complexity of the RME problem for n processes is Θ(log n) for the class of algorithms that use read/write registers and single-word comparison primitives such as Compare-And-Swap (Golab and Ramaraju 2016), O(log n / log log n) for the class of algorithms that use read/write registers and additional single-word read-modify-primitives such as Fetch-And-Store (Golab and Hendler 2017), and Θ(1) for the class of algorithms that use read/write registers and specialized double-word read-modify-write primitives (Golab and Hendler 2017). These complexity bounds hold in a model of computation where processes may fail independently, and where a process that fails while accessing the mutex is required to recover eventually. This body of work leaves open two important questions: (i) what is the tight bound on the RMR complexity of RME for the class of algorithms that use read/write registers and commonly supported single-word read-modify-primitives; and (ii) how is the RMR complexity of RME affected by variations in the failure model? This paper answers both questions partially by showing that RME can be solved using O(1) RMRs per passage in the worst case in a model where failures are system-wide (i.e., all processes crash simultaneously), and processes receive additional information from the environment regarding the occurrence of the failure. The upper bound algorithm we present relies crucially on a novel RMR-efficient barrier that processes use to synchronize recovery actions after each failure. The barrier uses read/write registers and single-word Compare-And-Swap only. Additionally, we present a transformation that can add properties such as critical section re-entry and a strong notion of starvation freedom to any RME algorithm while preserving its asymptotic RMR complexity.