{"title":"Efficiently Recovering Stateful System Components of Multi-server Microkernels","authors":"Wentai Li, Jinyu Gu, Nian Liu, B. Zang","doi":"10.1109/ICDCS51616.2021.00054","DOIUrl":null,"url":null,"abstract":"Microkernel OSes provide OS services through mutually-isolated system servers running in different user processes, which brings stronger fault isolation than monolithic OSes. Nevertheless, considering the fault recovery capability of system servers, most existing microkernel OSes usually do no more than restarting a fault server, which will cause a server to lose all its running states and then may affect all the applications relying on it. In this paper, we present a mechanism named TxIPC that can efficiently recover stateful system servers on microkernel OSes. Since a system server provides the service by inter-process communication (IPC), TxIPC makes it fault resilient by handling each IPC in a transaction-like manner. Specifically, if a fault happens in a server (during one IPC handling procedure), TxIPC aborts all the updates made by the IPC and thus recovers the server from that fault. Evaluations show that TxIPC can enable servers to recover from 99.8% (injected) faults with 3%-45 % performance overhead on application benchmarks, which significantly outperforms existing counterparts.","PeriodicalId":222376,"journal":{"name":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS51616.2021.00054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Microkernel OSes provide OS services through mutually-isolated system servers running in different user processes, which brings stronger fault isolation than monolithic OSes. Nevertheless, considering the fault recovery capability of system servers, most existing microkernel OSes usually do no more than restarting a fault server, which will cause a server to lose all its running states and then may affect all the applications relying on it. In this paper, we present a mechanism named TxIPC that can efficiently recover stateful system servers on microkernel OSes. Since a system server provides the service by inter-process communication (IPC), TxIPC makes it fault resilient by handling each IPC in a transaction-like manner. Specifically, if a fault happens in a server (during one IPC handling procedure), TxIPC aborts all the updates made by the IPC and thus recovers the server from that fault. Evaluations show that TxIPC can enable servers to recover from 99.8% (injected) faults with 3%-45 % performance overhead on application benchmarks, which significantly outperforms existing counterparts.