Recent research shows that disaggregated datacenters (DDCs) are practical and that DDC resource modularity will benefit both users and operators. This paper explores the implications of disaggregation on application fault tolerance. We expect that resource failures in a DDC will be fine-grained because resources will no longer fate-share. In this context, we look at how DDCs can provide legacy applications with familiar failure semantics and discuss fate sharing granularities that are not available in existing datacenters. We argue that fate sharing and failure mitigation should be programmable, specified by the application, and primarily implemented in the SDN-based network.
{"title":"Tolerating Faults in Disaggregated Datacenters","authors":"A. Carbonari, Ivan Beschastnikh","doi":"10.1145/3152434.3152447","DOIUrl":"https://doi.org/10.1145/3152434.3152447","url":null,"abstract":"Recent research shows that disaggregated datacenters (DDCs) are practical and that DDC resource modularity will benefit both users and operators. This paper explores the implications of disaggregation on application fault tolerance. We expect that resource failures in a DDC will be fine-grained because resources will no longer fate-share. In this context, we look at how DDCs can provide legacy applications with familiar failure semantics and discuss fate sharing granularities that are not available in existing datacenters. We argue that fate sharing and failure mitigation should be programmable, specified by the application, and primarily implemented in the SDN-based network.","PeriodicalId":120886,"journal":{"name":"Proceedings of the 16th ACM Workshop on Hot Topics in Networks","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122260268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akshay Narayan, Frank Cangialosi, Prateesh Goyal, S. Narayana, Mohammad Alizadeh, Harinarayanan Balakrishnan
With Moore's law ending, the gap between general-purpose processor speeds and network link rates is widening. This trend has led to new packet-processing "datapaths" in endpoints, including kernel bypass software and emerging SmartNIC hardware. In addition, several applications are rolling out their own protocols atop UDP (e.g., QUIC, WebRTC, Mosh, etc.), forming new datapaths different from the traditional kernel TCP stack. All these datapaths require congestion control, but they must implement it separately because it is not possible to reuse the kernel's TCP implementations. This paper proposes moving congestion control from the datapath into a separate agent. This agent, which we call the congestion control plane (CCP), must provide both an expressive congestion control API as well as a specification for datapath designers to implement and deploy CCP. We propose an API for congestion control, datapath primitives, and a user-space agent design that uses a batching method to communicate with the datapath. Our approach promises to preserve the behavior and performance of in-datapath implementations while making it significantly easier to implement and deploy new congestion control algorithms.
{"title":"The Case for Moving Congestion Control Out of the Datapath","authors":"Akshay Narayan, Frank Cangialosi, Prateesh Goyal, S. Narayana, Mohammad Alizadeh, Harinarayanan Balakrishnan","doi":"10.1145/3152434.3152438","DOIUrl":"https://doi.org/10.1145/3152434.3152438","url":null,"abstract":"With Moore's law ending, the gap between general-purpose processor speeds and network link rates is widening. This trend has led to new packet-processing \"datapaths\" in endpoints, including kernel bypass software and emerging SmartNIC hardware. In addition, several applications are rolling out their own protocols atop UDP (e.g., QUIC, WebRTC, Mosh, etc.), forming new datapaths different from the traditional kernel TCP stack. All these datapaths require congestion control, but they must implement it separately because it is not possible to reuse the kernel's TCP implementations. This paper proposes moving congestion control from the datapath into a separate agent. This agent, which we call the congestion control plane (CCP), must provide both an expressive congestion control API as well as a specification for datapath designers to implement and deploy CCP. We propose an API for congestion control, datapath primitives, and a user-space agent design that uses a batching method to communicate with the datapath. Our approach promises to preserve the behavior and performance of in-datapath implementations while making it significantly easier to implement and deploy new congestion control algorithms.","PeriodicalId":120886,"journal":{"name":"Proceedings of the 16th ACM Workshop on Hot Topics in Networks","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122431039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Today's Internet scarcely resembles the mythological image of it as a fundamentally democratic system. Instead, users are at the whims of a small number of providers who control nearly everything about users' experiences on the Internet. In response, researchers and engineers have proposed, over the past decade, many systems to re-democratize the Internet, pushing control over data and systems back to the users. Yet nearly all such projects have failed. In this paper we explore why: what are the goals of such systems and what has caused them to run aground?
{"title":"The Barriers to Overthrowing Internet Feudalism","authors":"Tai-Ting Liu, Zain Tariq, Jay Chen, B. Raghavan","doi":"10.1145/3152434.3152454","DOIUrl":"https://doi.org/10.1145/3152434.3152454","url":null,"abstract":"Today's Internet scarcely resembles the mythological image of it as a fundamentally democratic system. Instead, users are at the whims of a small number of providers who control nearly everything about users' experiences on the Internet. In response, researchers and engineers have proposed, over the past decade, many systems to re-democratize the Internet, pushing control over data and systems back to the users. Yet nearly all such projects have failed. In this paper we explore why: what are the goals of such systems and what has caused them to run aground?","PeriodicalId":120886,"journal":{"name":"Proceedings of the 16th ACM Workshop on Hot Topics in Networks","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125615191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces sharable backup as a novel solution to failure recovery in data center networks. It allows the entire network to share a small pool of backup devices. This proposal is grounded in three key observations. First, the traditional rerouting-based failure recovery is ineffective, because bandwidth loss from failures degrades application performance drastically. Therefore, failed devices should be replaced to restore bandwidth. Second, failures in data centers are rare but destructive [11], so it is desirable to seek cost-effective backup options. Third, the emergence of configurable data center network architectures promises feasibility of bringing backup devices online dynamically. We design the ShareBackup prototype architecture to realize this idea. Compared to rerouting-based solutions, ShareBackup provides more bandwidth with short path length at low cost.
{"title":"Stop Rerouting!: Enabling ShareBackup for Failure Recovery in Data Center Networks","authors":"Yiting Xia, X. Huang, T. Ng","doi":"10.1145/3152434.3152452","DOIUrl":"https://doi.org/10.1145/3152434.3152452","url":null,"abstract":"This paper introduces sharable backup as a novel solution to failure recovery in data center networks. It allows the entire network to share a small pool of backup devices. This proposal is grounded in three key observations. First, the traditional rerouting-based failure recovery is ineffective, because bandwidth loss from failures degrades application performance drastically. Therefore, failed devices should be replaced to restore bandwidth. Second, failures in data centers are rare but destructive [11], so it is desirable to seek cost-effective backup options. Third, the emergence of configurable data center network architectures promises feasibility of bringing backup devices online dynamically. We design the ShareBackup prototype architecture to realize this idea. Compared to rerouting-based solutions, ShareBackup provides more bandwidth with short path length at low cost.","PeriodicalId":120886,"journal":{"name":"Proceedings of the 16th ACM Workshop on Hot Topics in Networks","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122877184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Network verification has made great progress recently, yet existing solutions are limited in their ability to handle specific protocols or implementation quirks or to diagnose and repair the cause of policy violations. In this positioning paper, we examine whether we can achieve the best of both worlds: full coverage of control plane protocols and decision processes combined with the ability to diagnose and repair the cause of violations. To this end, we leverage the happens-before relationships that exist between control plane I/Os (e.g., route advertisements and forwarding updates). These relationships allow us to identify when it is safe to employ a data plane verifier and track the root-cause of problematic forwarding updates. We show how we can capture errors before they are installed, automatically trace down the source of the error and roll-back the updates whenever possible.
{"title":"Integrating Verification and Repair into the Control Plane","authors":"Aaron Gember, C. Raiciu, L. Vanbever","doi":"10.1145/3152434.3152439","DOIUrl":"https://doi.org/10.1145/3152434.3152439","url":null,"abstract":"Network verification has made great progress recently, yet existing solutions are limited in their ability to handle specific protocols or implementation quirks or to diagnose and repair the cause of policy violations. In this positioning paper, we examine whether we can achieve the best of both worlds: full coverage of control plane protocols and decision processes combined with the ability to diagnose and repair the cause of violations. To this end, we leverage the happens-before relationships that exist between control plane I/Os (e.g., route advertisements and forwarding updates). These relationships allow us to identify when it is safe to employ a data plane verifier and track the root-cause of problematic forwarding updates. We show how we can capture errors before they are installed, automatically trace down the source of the error and roll-back the updates whenever possible.","PeriodicalId":120886,"journal":{"name":"Proceedings of the 16th ACM Workshop on Hot Topics in Networks","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125737332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rachee Singh, Monia Ghobadi, Klaus-Tycho Foerster, M. Filer, Phillipa Gill
Fiber optic cables are the workhorses of today's Internet services. Operators spend millions of dollars to purchase, lease and maintain their optical backbone, making the efficiency of fiber essential to their business. In this work, we make a case for adapting the capacity of optical links based on their signal-to-noise ratio (SNR). We show two immediate benefits of this by analyzing the SNR of over 2000 links in an optical backbone over a period of 2.5 years. First, the capacity of 80% of IP links can be augmented by 75% or more, leading to an overall capacity gain of 145 Tbps in a large optical backbone in North America. Second, at least 25% of link failures are caused by SNR degradation, not complete loss-of-light, highlighting the opportunity to replace link failures by link flaps wherein the capacity is adjusted according to the new SNR. Given these benefits, we identify the disconnect between current optical and networking infrastructure which hinders the deployment of dynamic capacity links in wide area networks (WANs). To bridge this gap, we propose a graph abstraction that enables existing traffic engineering algorithms to benefit from dynamic link capacities. We evaluate the feasibility of dynamic link capacities using a small testbed and simulate the throughput gains from deploying our approach.
{"title":"Run, Walk, Crawl: Towards Dynamic Link Capacities","authors":"Rachee Singh, Monia Ghobadi, Klaus-Tycho Foerster, M. Filer, Phillipa Gill","doi":"10.1145/3152434.3152451","DOIUrl":"https://doi.org/10.1145/3152434.3152451","url":null,"abstract":"Fiber optic cables are the workhorses of today's Internet services. Operators spend millions of dollars to purchase, lease and maintain their optical backbone, making the efficiency of fiber essential to their business. In this work, we make a case for adapting the capacity of optical links based on their signal-to-noise ratio (SNR). We show two immediate benefits of this by analyzing the SNR of over 2000 links in an optical backbone over a period of 2.5 years. First, the capacity of 80% of IP links can be augmented by 75% or more, leading to an overall capacity gain of 145 Tbps in a large optical backbone in North America. Second, at least 25% of link failures are caused by SNR degradation, not complete loss-of-light, highlighting the opportunity to replace link failures by link flaps wherein the capacity is adjusted according to the new SNR. Given these benefits, we identify the disconnect between current optical and networking infrastructure which hinders the deployment of dynamic capacity links in wide area networks (WANs). To bridge this gap, we propose a graph abstraction that enables existing traffic engineering algorithms to benefit from dynamic link capacities. We evaluate the feasibility of dynamic link capacities using a small testbed and simulate the throughput gains from deploying our approach.","PeriodicalId":120886,"journal":{"name":"Proceedings of the 16th ACM Workshop on Hot Topics in Networks","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132019257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xing Liu, Q. Xiao, V. Gopalakrishnan, B. Han, Feng Qian, Matteo Varvello
360-degree videos are becoming increasingly popular on commercial platforms. In this position paper, we propose a holistic research agenda aiming at improving the performance, resource utilization efficiency, and users' quality of experience (QoE) for 360° video streaming on commodity mobile devices. Based on a Field-of-View (FoV) guided approach that fetches only portions of a scene that users will see, our proposed research includes the following: robust video rate adaptation with incremental chunk upgrading, big-data-assisted head movement prediction and rate adaptation, novel support for multipath streaming, and enhancements to live 360° video broadcast. We also show preliminary results demonstrating promising performance of our proof-of-concept 360° video streaming system on which our proposed research are being prototyped, integrated, and evaluated.
{"title":"360° Innovations for Panoramic Video Streaming","authors":"Xing Liu, Q. Xiao, V. Gopalakrishnan, B. Han, Feng Qian, Matteo Varvello","doi":"10.1145/3152434.3152443","DOIUrl":"https://doi.org/10.1145/3152434.3152443","url":null,"abstract":"360-degree videos are becoming increasingly popular on commercial platforms. In this position paper, we propose a holistic research agenda aiming at improving the performance, resource utilization efficiency, and users' quality of experience (QoE) for 360° video streaming on commodity mobile devices. Based on a Field-of-View (FoV) guided approach that fetches only portions of a scene that users will see, our proposed research includes the following: robust video rate adaptation with incremental chunk upgrading, big-data-assisted head movement prediction and rate adaptation, novel support for multipath streaming, and enhancements to live 360° video broadcast. We also show preliminary results demonstrating promising performance of our proof-of-concept 360° video streaming system on which our proposed research are being prototyped, integrated, and evaluated.","PeriodicalId":120886,"journal":{"name":"Proceedings of the 16th ACM Workshop on Hot Topics in Networks","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124249197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How will Deep Learning Change Internet Video Delivery?","authors":"H. Yeo, Sunghyun Do, Dongsu Han","doi":"10.1145/3152434.3152440","DOIUrl":"https://doi.org/10.1145/3152434.3152440","url":null,"abstract":"","PeriodicalId":120886,"journal":{"name":"Proceedings of the 16th ACM Workshop on Hot Topics in Networks","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128613769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prateesh Goyal, Mohammad Alizadeh, H. Balakrishnan
We propose Accel-Brake Control (ABC), a protocol that integrates a simple and deployable signaling scheme at cellular base stations with an endpoint mechanism to respond to these signals. The key idea is for the base station to enable each sender to achieve a computed target rate by marking each packet with an "accelerate" or "brake" notification, which causes the sender to either slightly increase or slightly reduce its congestion window. ABC is designed to rapidly acquire any capacity that opens up, a common occurrence in cellular networks, while responding promptly to congestion. It is also incrementally deployable using existing ECN infrastructure and can co-exist with legacy ECN routers. Preliminary results obtained over cellular network traces show that ABC outperforms prior approaches significantly.
我们提出了Accel-Brake Control (ABC)协议,该协议在蜂窝基站中集成了一个简单且可部署的信令方案,并具有响应这些信号的端点机制。关键思想是基站通过给每个数据包标记“加速”或“刹车”通知,使每个发送者能够达到计算的目标速率,这使得发送者稍微增加或稍微减少其拥塞窗口。ABC的设计目的是迅速获取任何开放的容量,这在蜂窝网络中很常见,同时对拥塞作出迅速反应。它也可以使用现有的ECN基础设施进行增量部署,并且可以与传统的ECN路由器共存。通过蜂窝网络跟踪获得的初步结果表明,ABC显著优于先前的方法。
{"title":"Rethinking Congestion Control for Cellular Networks","authors":"Prateesh Goyal, Mohammad Alizadeh, H. Balakrishnan","doi":"10.1145/3152434.3152437","DOIUrl":"https://doi.org/10.1145/3152434.3152437","url":null,"abstract":"We propose Accel-Brake Control (ABC), a protocol that integrates a simple and deployable signaling scheme at cellular base stations with an endpoint mechanism to respond to these signals. The key idea is for the base station to enable each sender to achieve a computed target rate by marking each packet with an \"accelerate\" or \"brake\" notification, which causes the sender to either slightly increase or slightly reduce its congestion window. ABC is designed to rapidly acquire any capacity that opens up, a common occurrence in cellular networks, while responding promptly to congestion. It is also incrementally deployable using existing ECN infrastructure and can co-exist with legacy ECN routers. Preliminary results obtained over cellular network traces show that ABC outperforms prior approaches significantly.","PeriodicalId":120886,"journal":{"name":"Proceedings of the 16th ACM Workshop on Hot Topics in Networks","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134479451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dmitry Kogan, Henri Stern, Ashley Tolbert, David Mazières, Keith Winstein
Today's secure stream protocols, SSH and TLS, were designed for end-to-end security and do not include a role for semi-trusted third parties. As a result, users who wish to delegate some of their authority to third parties (e.g., to run SSH clients in the cloud, or to host websites on CDNs) rely on insecure workarounds such as ssh-agent forwarding and Keyless TLS. We argue that protocol designers should consider the delegation use-case explicitly, and we propose a definition of "secure" delegation: Before a principal agrees to delegate its authority, a system should provide it with secure advance notice of who will do what to whom under that authority. We developed Guardian Agent, a delegation system for the SSH protocol that, unlike ssh-agent forwarding, allows the user to control which delegate machines can run which commands on which servers. We were able to implement Guardian Agent in a way that remains fully compatible with existing SSH servers, by "handing over" a secure connection to the delegate once it has been set up. Additionally, we use this work to suggest a path for secure delegation on the Web.
{"title":"The Case For Secure Delegation","authors":"Dmitry Kogan, Henri Stern, Ashley Tolbert, David Mazières, Keith Winstein","doi":"10.1145/3152434.3152444","DOIUrl":"https://doi.org/10.1145/3152434.3152444","url":null,"abstract":"Today's secure stream protocols, SSH and TLS, were designed for end-to-end security and do not include a role for semi-trusted third parties. As a result, users who wish to delegate some of their authority to third parties (e.g., to run SSH clients in the cloud, or to host websites on CDNs) rely on insecure workarounds such as ssh-agent forwarding and Keyless TLS. We argue that protocol designers should consider the delegation use-case explicitly, and we propose a definition of \"secure\" delegation: Before a principal agrees to delegate its authority, a system should provide it with secure advance notice of who will do what to whom under that authority. We developed Guardian Agent, a delegation system for the SSH protocol that, unlike ssh-agent forwarding, allows the user to control which delegate machines can run which commands on which servers. We were able to implement Guardian Agent in a way that remains fully compatible with existing SSH servers, by \"handing over\" a secure connection to the delegate once it has been set up. Additionally, we use this work to suggest a path for secure delegation on the Web.","PeriodicalId":120886,"journal":{"name":"Proceedings of the 16th ACM Workshop on Hot Topics in Networks","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133479138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}