Chaos Monkey: Increasing SDN Reliability through Systematic Network Destruction

M. Chang, Brendan Tschaen, Theophilus A. Benson, L. Vanbever
{"title":"Chaos Monkey: Increasing SDN Reliability through Systematic Network Destruction","authors":"M. Chang, Brendan Tschaen, Theophilus A. Benson, L. Vanbever","doi":"10.1145/2785956.2790038","DOIUrl":null,"url":null,"abstract":"As modern networking applications become increasingly dynamic and high-bandwidth, software defined networking (SDN) has emerged as an agile, cost effective architecture with widespread adoption across industry. In SDN, the control-plane program runs on a logically-centralized controller which directly configures the packet-handling mechanisms in the underlying switches using an open API (e.g., OpenFlow). While the controller makes it exceptionally convenient for a network operator to control and manage a network, the controller requires complex logic and becomes a single point of failure within the network. As a result, configuration errors by the controller could be extremely costly for the network provider. Several SDN controllers have been developed since the conception of SDN, and network operators have utilized very traditional means of identifying bugs in the controller, such as unit testing and model checking [1]. However, it has become apparent that these methods cannot practically handle the inherent complexity of the controller platform that manages large networks. Ultimately, one major source of this complexity are network failures, as they trigger execution of unexplored portions of code; these network failures are inevitable, costly, and considering all possible interleaving of bugs is simply unfeasible. To address this problem, we propose “Chaos Monkey” a real-time post-deployment failure injection tool. Inspired by industry practices in the cloud [2], Chaos Monkey is intended to systematically introduce failure (e.g., link failure, network failure) into a network. Chaos Monkey is guided by the following design principles:","PeriodicalId":268472,"journal":{"name":"Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2785956.2790038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30

Abstract

As modern networking applications become increasingly dynamic and high-bandwidth, software defined networking (SDN) has emerged as an agile, cost effective architecture with widespread adoption across industry. In SDN, the control-plane program runs on a logically-centralized controller which directly configures the packet-handling mechanisms in the underlying switches using an open API (e.g., OpenFlow). While the controller makes it exceptionally convenient for a network operator to control and manage a network, the controller requires complex logic and becomes a single point of failure within the network. As a result, configuration errors by the controller could be extremely costly for the network provider. Several SDN controllers have been developed since the conception of SDN, and network operators have utilized very traditional means of identifying bugs in the controller, such as unit testing and model checking [1]. However, it has become apparent that these methods cannot practically handle the inherent complexity of the controller platform that manages large networks. Ultimately, one major source of this complexity are network failures, as they trigger execution of unexplored portions of code; these network failures are inevitable, costly, and considering all possible interleaving of bugs is simply unfeasible. To address this problem, we propose “Chaos Monkey” a real-time post-deployment failure injection tool. Inspired by industry practices in the cloud [2], Chaos Monkey is intended to systematically introduce failure (e.g., link failure, network failure) into a network. Chaos Monkey is guided by the following design principles:
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
混沌猴子:通过系统的网络破坏提高SDN的可靠性
随着现代网络应用程序变得越来越动态和高带宽,软件定义网络(SDN)已经成为一种敏捷的、具有成本效益的体系结构,并在整个行业中得到广泛采用。在SDN中,控制平面程序运行在逻辑上集中的控制器上,该控制器使用开放API(例如OpenFlow)直接配置底层交换机中的数据包处理机制。虽然控制器使网络操作员非常方便地控制和管理网络,但控制器需要复杂的逻辑,并成为网络中的单点故障。因此,控制器的配置错误可能会给网络提供商带来极大的代价。自SDN概念提出以来,已经开发了几种SDN控制器,网络运营商使用非常传统的方法来识别控制器中的错误,例如单元测试和模型检查[1]。然而,这些方法显然不能实际处理管理大型网络的控制器平台的固有复杂性。最终,这种复杂性的一个主要来源是网络故障,因为它们会触发执行未开发的代码部分;这些网络故障是不可避免的,代价高昂,考虑所有可能的错误交织是根本不可行的。为了解决这个问题,我们提出了一个实时的部署后故障注入工具“Chaos Monkey”。受到云行业实践的启发[2],Chaos Monkey旨在系统地将故障(如链路故障、网络故障)引入网络。Chaos Monkey遵循以下设计原则:
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Alternative Trust Sources: Reducing DNSSEC Signature Verification Operations with TLS RPKI MIRO: Monitoring and Inspection of RPKI Objects Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale Extreme Data-rate Scheduling for the Data Center Multi-Context TLS (mcTLS): Enabling Secure In-Network Functionality in TLS
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1