敌与友:高可用性分布式系统和SDN中的强一致性与过载

R. Hanmer, L. Jagadeesan, V. Mendiratta, Heng Zhang
{"title":"敌与友:高可用性分布式系统和SDN中的强一致性与过载","authors":"R. Hanmer, L. Jagadeesan, V. Mendiratta, Heng Zhang","doi":"10.1109/ISSREW.2018.00-30","DOIUrl":null,"url":null,"abstract":"Distributed systems play an increasingly important role in leading-edge networks with high availability requirements, including software-defined networks (SDN), where replicating essential network state information is critical to ensure resilience under failures. Distributed consensus based strong consistency algorithms, such as Raft, are often used to ensure that all components of the distributed system agree on their view of the replicated data, even when a minority of the distributed components crash. Another critical requirement for highly available networks is to gracefully handle overload conditions, where the demands on the network exceed expected levels for a period of time, such as during natural or man-made disasters or popular sporting events. Hence, the strong consistency algorithms used in such networks must also behave gracefully under overload conditions. We show that, in fact, strong consistency algorithms such as Raft may not behave gracefully under overload conditions and can in fact significantly negatively affect SDN control plane availability in these circumstances. We demonstrate that the open-source ONOS SDN controller, which uses the Java-based Atomix implementation of Raft, exhibits such behavior under intent overload, resulting in the loss of requests to the network, and with the entire SDN network eventually crashing. We further demonstrate similar behaviors of the Python-based pysyncobj implementation of Raft. We then propose DynRaft, a dynamic add-on to Raft implementations that continues to ensure the formally proven strong consistency properties of Raft, and demonstrate the effectiveness of DynRaft with the pysyncobj implementation under emulated overload conditions.","PeriodicalId":321448,"journal":{"name":"2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Friend or Foe: Strong Consistency vs. Overload in High-Availability Distributed Systems and SDN\",\"authors\":\"R. Hanmer, L. Jagadeesan, V. Mendiratta, Heng Zhang\",\"doi\":\"10.1109/ISSREW.2018.00-30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed systems play an increasingly important role in leading-edge networks with high availability requirements, including software-defined networks (SDN), where replicating essential network state information is critical to ensure resilience under failures. Distributed consensus based strong consistency algorithms, such as Raft, are often used to ensure that all components of the distributed system agree on their view of the replicated data, even when a minority of the distributed components crash. Another critical requirement for highly available networks is to gracefully handle overload conditions, where the demands on the network exceed expected levels for a period of time, such as during natural or man-made disasters or popular sporting events. Hence, the strong consistency algorithms used in such networks must also behave gracefully under overload conditions. We show that, in fact, strong consistency algorithms such as Raft may not behave gracefully under overload conditions and can in fact significantly negatively affect SDN control plane availability in these circumstances. We demonstrate that the open-source ONOS SDN controller, which uses the Java-based Atomix implementation of Raft, exhibits such behavior under intent overload, resulting in the loss of requests to the network, and with the entire SDN network eventually crashing. We further demonstrate similar behaviors of the Python-based pysyncobj implementation of Raft. We then propose DynRaft, a dynamic add-on to Raft implementations that continues to ensure the formally proven strong consistency properties of Raft, and demonstrate the effectiveness of DynRaft with the pysyncobj implementation under emulated overload conditions.\",\"PeriodicalId\":321448,\"journal\":{\"name\":\"2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"volume\":\"184 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSREW.2018.00-30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSREW.2018.00-30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

摘要

分布式系统在具有高可用性需求的前沿网络(包括软件定义网络(SDN))中扮演着越来越重要的角色,在这些网络中,复制基本的网络状态信息对于确保故障时的恢复能力至关重要。基于分布式一致性的强一致性算法(如Raft)通常用于确保分布式系统的所有组件在复制数据的视图上达成一致,即使少数分布式组件崩溃也是如此。对高可用性网络的另一个关键要求是优雅地处理过载条件,即网络上的需求在一段时间内超过预期水平,例如在自然或人为灾害或流行体育赛事期间。因此,在这种网络中使用的强一致性算法也必须在过载条件下表现良好。我们表明,事实上,像Raft这样的强一致性算法在过载条件下可能不会表现得很好,而且在这种情况下实际上会对SDN控制平面的可用性产生显著的负面影响。我们展示了开源ONOS SDN控制器,它使用基于java的Atomix实现Raft,在意图过载下表现出这种行为,导致对网络的请求丢失,并最终导致整个SDN网络崩溃。我们进一步演示了Raft基于python的pysyncobj实现的类似行为。然后,我们提出了DynRaft,一个动态附加到Raft实现中,继续确保Raft的正式证明的强一致性属性,并演示了DynRaft与pysyncobj实现在模拟过载条件下的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Friend or Foe: Strong Consistency vs. Overload in High-Availability Distributed Systems and SDN
Distributed systems play an increasingly important role in leading-edge networks with high availability requirements, including software-defined networks (SDN), where replicating essential network state information is critical to ensure resilience under failures. Distributed consensus based strong consistency algorithms, such as Raft, are often used to ensure that all components of the distributed system agree on their view of the replicated data, even when a minority of the distributed components crash. Another critical requirement for highly available networks is to gracefully handle overload conditions, where the demands on the network exceed expected levels for a period of time, such as during natural or man-made disasters or popular sporting events. Hence, the strong consistency algorithms used in such networks must also behave gracefully under overload conditions. We show that, in fact, strong consistency algorithms such as Raft may not behave gracefully under overload conditions and can in fact significantly negatively affect SDN control plane availability in these circumstances. We demonstrate that the open-source ONOS SDN controller, which uses the Java-based Atomix implementation of Raft, exhibits such behavior under intent overload, resulting in the loss of requests to the network, and with the entire SDN network eventually crashing. We further demonstrate similar behaviors of the Python-based pysyncobj implementation of Raft. We then propose DynRaft, a dynamic add-on to Raft implementations that continues to ensure the formally proven strong consistency properties of Raft, and demonstrate the effectiveness of DynRaft with the pysyncobj implementation under emulated overload conditions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Message from the WoSoCer 2018 Workshop Chairs Software Aging and Rejuvenation in the Cloud: A Literature Review Spectrum-Based Fault Localization for Logic-Based Reasoning [Title page iii] Software Reliability Assessment: Modeling and Algorithms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1