R. Hanmer, L. Jagadeesan, V. Mendiratta, Heng Zhang
{"title":"敌与友:高可用性分布式系统和SDN中的强一致性与过载","authors":"R. Hanmer, L. Jagadeesan, V. Mendiratta, Heng Zhang","doi":"10.1109/ISSREW.2018.00-30","DOIUrl":null,"url":null,"abstract":"Distributed systems play an increasingly important role in leading-edge networks with high availability requirements, including software-defined networks (SDN), where replicating essential network state information is critical to ensure resilience under failures. Distributed consensus based strong consistency algorithms, such as Raft, are often used to ensure that all components of the distributed system agree on their view of the replicated data, even when a minority of the distributed components crash. Another critical requirement for highly available networks is to gracefully handle overload conditions, where the demands on the network exceed expected levels for a period of time, such as during natural or man-made disasters or popular sporting events. Hence, the strong consistency algorithms used in such networks must also behave gracefully under overload conditions. We show that, in fact, strong consistency algorithms such as Raft may not behave gracefully under overload conditions and can in fact significantly negatively affect SDN control plane availability in these circumstances. We demonstrate that the open-source ONOS SDN controller, which uses the Java-based Atomix implementation of Raft, exhibits such behavior under intent overload, resulting in the loss of requests to the network, and with the entire SDN network eventually crashing. We further demonstrate similar behaviors of the Python-based pysyncobj implementation of Raft. We then propose DynRaft, a dynamic add-on to Raft implementations that continues to ensure the formally proven strong consistency properties of Raft, and demonstrate the effectiveness of DynRaft with the pysyncobj implementation under emulated overload conditions.","PeriodicalId":321448,"journal":{"name":"2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Friend or Foe: Strong Consistency vs. Overload in High-Availability Distributed Systems and SDN\",\"authors\":\"R. Hanmer, L. Jagadeesan, V. Mendiratta, Heng Zhang\",\"doi\":\"10.1109/ISSREW.2018.00-30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed systems play an increasingly important role in leading-edge networks with high availability requirements, including software-defined networks (SDN), where replicating essential network state information is critical to ensure resilience under failures. Distributed consensus based strong consistency algorithms, such as Raft, are often used to ensure that all components of the distributed system agree on their view of the replicated data, even when a minority of the distributed components crash. Another critical requirement for highly available networks is to gracefully handle overload conditions, where the demands on the network exceed expected levels for a period of time, such as during natural or man-made disasters or popular sporting events. Hence, the strong consistency algorithms used in such networks must also behave gracefully under overload conditions. We show that, in fact, strong consistency algorithms such as Raft may not behave gracefully under overload conditions and can in fact significantly negatively affect SDN control plane availability in these circumstances. We demonstrate that the open-source ONOS SDN controller, which uses the Java-based Atomix implementation of Raft, exhibits such behavior under intent overload, resulting in the loss of requests to the network, and with the entire SDN network eventually crashing. We further demonstrate similar behaviors of the Python-based pysyncobj implementation of Raft. We then propose DynRaft, a dynamic add-on to Raft implementations that continues to ensure the formally proven strong consistency properties of Raft, and demonstrate the effectiveness of DynRaft with the pysyncobj implementation under emulated overload conditions.\",\"PeriodicalId\":321448,\"journal\":{\"name\":\"2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"volume\":\"184 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSREW.2018.00-30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSREW.2018.00-30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Friend or Foe: Strong Consistency vs. Overload in High-Availability Distributed Systems and SDN
Distributed systems play an increasingly important role in leading-edge networks with high availability requirements, including software-defined networks (SDN), where replicating essential network state information is critical to ensure resilience under failures. Distributed consensus based strong consistency algorithms, such as Raft, are often used to ensure that all components of the distributed system agree on their view of the replicated data, even when a minority of the distributed components crash. Another critical requirement for highly available networks is to gracefully handle overload conditions, where the demands on the network exceed expected levels for a period of time, such as during natural or man-made disasters or popular sporting events. Hence, the strong consistency algorithms used in such networks must also behave gracefully under overload conditions. We show that, in fact, strong consistency algorithms such as Raft may not behave gracefully under overload conditions and can in fact significantly negatively affect SDN control plane availability in these circumstances. We demonstrate that the open-source ONOS SDN controller, which uses the Java-based Atomix implementation of Raft, exhibits such behavior under intent overload, resulting in the loss of requests to the network, and with the entire SDN network eventually crashing. We further demonstrate similar behaviors of the Python-based pysyncobj implementation of Raft. We then propose DynRaft, a dynamic add-on to Raft implementations that continues to ensure the formally proven strong consistency properties of Raft, and demonstrate the effectiveness of DynRaft with the pysyncobj implementation under emulated overload conditions.