首页 > 最新文献

Proceedings of the 2021 ACM SIGCOMM 2021 Conference最新文献

英文 中文
CliqueMap
Pub Date : 2021-08-09 DOI: 10.1145/3452296.3472934
Arjun Singhvi, Aditya Akella, Maggie Anderson, R. Cauble, Harshad Deshmukh, D. Gibson, Milo M. K. Martin, Amanda Strominger, T. Wenisch, Amin Vahdat
Distributed in-memory caching is a key component of modern Internet services. Such caches are often accessed via remote procedure call (RPC), as RPC frameworks provide rich support for productionization, including protocol versioning, memory efficiency, auto-scaling, and hitless upgrades. However, full-featured RPC limits performance and scalability as it incurs high latencies and CPU overheads. Remote Memory Access (RMA) offers a promising alternative, but meeting productionization requirements can be a significant challenge with RMA-based systems due to limited programmability and narrow RMA primitives. This paper describes the design, implementation, and experience derived from CliqueMap, a hybrid RMA/RPC caching system. CliqueMap has been in production use in Google's datacenters for over three years, currently serves more than 1PB of DRAM, and underlies several end-user visible services. CliqueMap makes use of performant and efficient RMAs on the critical serving path and judiciously applies RPCs toward other functionality. The design embraces lightweight replication, client-based quoruming, self-validating server responses, per-operation client-side retries, and co-design with the network layers. These foci lead to a system resilient to the rigors of production and frequent post deployment evolution.
{"title":"CliqueMap","authors":"Arjun Singhvi, Aditya Akella, Maggie Anderson, R. Cauble, Harshad Deshmukh, D. Gibson, Milo M. K. Martin, Amanda Strominger, T. Wenisch, Amin Vahdat","doi":"10.1145/3452296.3472934","DOIUrl":"https://doi.org/10.1145/3452296.3472934","url":null,"abstract":"Distributed in-memory caching is a key component of modern Internet services. Such caches are often accessed via remote procedure call (RPC), as RPC frameworks provide rich support for productionization, including protocol versioning, memory efficiency, auto-scaling, and hitless upgrades. However, full-featured RPC limits performance and scalability as it incurs high latencies and CPU overheads. Remote Memory Access (RMA) offers a promising alternative, but meeting productionization requirements can be a significant challenge with RMA-based systems due to limited programmability and narrow RMA primitives. This paper describes the design, implementation, and experience derived from CliqueMap, a hybrid RMA/RPC caching system. CliqueMap has been in production use in Google's datacenters for over three years, currently serves more than 1PB of DRAM, and underlies several end-user visible services. CliqueMap makes use of performant and efficient RMAs on the critical serving path and judiciously applies RPCs toward other functionality. The design embraces lightweight replication, client-based quoruming, self-validating server responses, per-operation client-side retries, and co-design with the network layers. These foci lead to a system resilient to the rigors of production and frequent post deployment evolution.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"233 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77009089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
RedPlane: enabling fault-tolerant stateful in-switch applications RedPlane:启用容错的交换机内状态应用
Pub Date : 2021-08-09 DOI: 10.1145/3452296.3472905
Daehyeok Kim, J. Nelson, Dan R. K. Ports, V. Sekar, S. Seshan
Many recent efforts have demonstrated the performance benefits of running datacenter functions (emph{e.g.,} NATs, load balancers, monitoring) on programmable switches. However, a key missing piece remains: fault tolerance. This is especially critical as the network is no longer stateless and pure endpoint recovery does not suffice. In this paper, we design and implement RedPlane, a fault-tolerant state store for stateful in-switch applications. This provides in-switch applications consistent access to their state, even if the switch they run on fails or traffic is rerouted to an alternative switch. We address key challenges in devising a practical, provably correct replication protocol and implementing it in the switch data plane. Our evaluations show that RedPlane incurs negligible overhead and enables end-to-end applications to rapidly recover from switch failures.
最近的许多工作已经证明了在可编程交换机上运行数据中心功能(emph{例如},nat、负载平衡器、监控)的性能优势。但是,仍然缺少一个关键部分:容错。当网络不再是无状态的,纯端点恢复是不够的,这一点尤其重要。在本文中,我们设计并实现了RedPlane,一个容错状态存储,用于有状态交换应用程序。这为交换机内应用程序提供了对其状态的一致访问,即使它们运行的交换机发生故障或流量被重路由到备用交换机。我们解决了设计一个实用的、可证明正确的复制协议并在交换机数据平面上实现它的关键挑战。我们的评估表明,RedPlane产生的开销可以忽略不计,并使端到端应用程序能够从交换机故障中快速恢复。
{"title":"RedPlane: enabling fault-tolerant stateful in-switch applications","authors":"Daehyeok Kim, J. Nelson, Dan R. K. Ports, V. Sekar, S. Seshan","doi":"10.1145/3452296.3472905","DOIUrl":"https://doi.org/10.1145/3452296.3472905","url":null,"abstract":"Many recent efforts have demonstrated the performance benefits of running datacenter functions (emph{e.g.,} NATs, load balancers, monitoring) on programmable switches. However, a key missing piece remains: fault tolerance. This is especially critical as the network is no longer stateless and pure endpoint recovery does not suffice. In this paper, we design and implement RedPlane, a fault-tolerant state store for stateful in-switch applications. This provides in-switch applications consistent access to their state, even if the switch they run on fails or traffic is rerouted to an alternative switch. We address key challenges in devising a practical, provably correct replication protocol and implementing it in the switch data plane. Our evaluations show that RedPlane incurs negligible overhead and enables end-to-end applications to rapidly recover from switch failures.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77790052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Seven years in the life of Hypergiants' off-nets 在超巨星的网外生活了七年
Pub Date : 2021-08-09 DOI: 10.1145/3452296.3472928
Petros Gigis, Matt Calder, Lefteris Manassakis, George Nomikos, Vasileios Kotronis, X. Dimitropoulos, Ethan Katz-Bassett, Georgios Smaragdakis
Content Hypergiants deliver the vast majority of Internet traffic to end users. In recent years, some have invested heavily in deploying services and servers inside end-user networks. With several dozen Hypergiants and thousands of servers deployed inside networks, these off-net (meaning outside the Hypergiant networks) deployments change the structure of the Internet. Previous efforts to study them have relied on proprietary data or specialized per-Hypergiant measurement techniques that neither scale nor generalize, providing a limited view of content delivery on today's Internet. In this paper, we develop a generic and easy to implement methodology to measure the expansion of Hypergiants' off-nets. Our key observation is that Hypergiants increasingly encrypt their traffic to protect their customers' privacy. Thus, we can analyze publicly available Internet-wide scans of port 443 and retrieve TLS certificates to discover which IP addresses host Hypergiant certificates in order to infer the networks hosting off-nets for the corresponding Hypergiants. Our results show that the number of networks hosting Hypergiant off-nets has tripled from 2013 to 2021, reaching 4.5k networks. The largest Hypergiants dominate these deployments, with almost all of these networks hosting an off-net for at least one -- and increasingly two or more -- of Google, Netflix, Facebook, or Akamai. These four Hypergiants have off-nets within networks that provide access to a significant fraction of end user population.
内容超级巨头向最终用户提供了绝大多数互联网流量。近年来,一些公司在最终用户网络中投入巨资部署服务和服务器。由于在网络内部部署了几十台Hypergiants和数千台服务器,这些离网(即在Hypergiant网络之外)部署改变了Internet的结构。以前对它们的研究依赖于专有数据或专门的超巨型测量技术,既不能扩展也不能泛化,对当今互联网上的内容交付提供了有限的看法。在本文中,我们开发了一种通用且易于实现的方法来测量Hypergiants的离网扩展。我们的主要观察是,超级巨头越来越多地加密他们的流量,以保护他们的客户隐私。因此,我们可以分析端口443的公开可用的internet范围扫描并检索TLS证书,以发现哪些IP地址承载了Hypergiant证书,从而推断出承载相应Hypergiants的网外网络。我们的结果表明,从2013年到2021年,托管Hypergiant离网的网络数量增加了两倍,达到4.5万个网络。最大的超级巨头主导着这些部署,几乎所有这些网络都至少为谷歌、Netflix、Facebook或Akamai的一个(越来越多的是两个或更多)托管离网服务。这四个超级巨头在网络中都有离网,为很大一部分最终用户提供访问。
{"title":"Seven years in the life of Hypergiants' off-nets","authors":"Petros Gigis, Matt Calder, Lefteris Manassakis, George Nomikos, Vasileios Kotronis, X. Dimitropoulos, Ethan Katz-Bassett, Georgios Smaragdakis","doi":"10.1145/3452296.3472928","DOIUrl":"https://doi.org/10.1145/3452296.3472928","url":null,"abstract":"Content Hypergiants deliver the vast majority of Internet traffic to end users. In recent years, some have invested heavily in deploying services and servers inside end-user networks. With several dozen Hypergiants and thousands of servers deployed inside networks, these off-net (meaning outside the Hypergiant networks) deployments change the structure of the Internet. Previous efforts to study them have relied on proprietary data or specialized per-Hypergiant measurement techniques that neither scale nor generalize, providing a limited view of content delivery on today's Internet. In this paper, we develop a generic and easy to implement methodology to measure the expansion of Hypergiants' off-nets. Our key observation is that Hypergiants increasingly encrypt their traffic to protect their customers' privacy. Thus, we can analyze publicly available Internet-wide scans of port 443 and retrieve TLS certificates to discover which IP addresses host Hypergiant certificates in order to infer the networks hosting off-nets for the corresponding Hypergiants. Our results show that the number of networks hosting Hypergiant off-nets has tripled from 2013 to 2021, reaching 4.5k networks. The largest Hypergiants dominate these deployments, with almost all of these networks hosting an off-net for at least one -- and increasingly two or more -- of Google, Netflix, Facebook, or Akamai. These four Hypergiants have off-nets within networks that provide access to a significant fraction of end user population.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83281092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Verifying learning-augmented systems 验证学习增强系统
Pub Date : 2021-08-09 DOI: 10.1145/3452296.3472936
Tomer Eliyahu, Yafim Kazak, Guy Katz, Michael Schapira
The application of deep reinforcement learning (DRL) to computer and networked systems has recently gained significant popularity. However, the obscurity of decisions by DRL policies renders it hard to ascertain that learning-augmented systems are safe to deploy, posing a significant obstacle to their real-world adoption. We observe that specific characteristics of recent applications of DRL to systems contexts give rise to an exciting opportunity: applying formal verification to establish that a given system provably satisfies designer/user-specified requirements, or to expose concrete counter-examples. We present whiRL, a platform for verifying DRL policies for systems, which combines recent advances in the verification of deep neural networks with scalable model checking techniques. To exemplify its usefulness, we employ whiRL to verify natural equirements from recently introduced learning-augmented systems for three real-world environments: Internet congestion control, adaptive video streaming, and job scheduling in compute clusters. Our evaluation shows that whiRL is capable of guaranteeing that natural requirements from these systems are satisfied, and of exposing specific scenarios in which other basic requirements are not.
深度强化学习(DRL)在计算机和网络系统中的应用最近获得了显著的普及。然而,DRL策略决策的模糊性使得很难确定学习增强系统的部署是否安全,这对它们在现实世界中的应用构成了重大障碍。我们观察到,最近DRL在系统环境中的应用的具体特征带来了一个令人兴奋的机会:应用正式验证来建立一个给定的系统可证明地满足设计师/用户指定的需求,或者暴露具体的反例。我们提出了whiRL,一个验证系统DRL策略的平台,它结合了深度神经网络验证和可扩展模型检查技术的最新进展。为了举例说明其实用性,我们使用whiRL来验证最近引入的学习增强系统在三个现实环境中的自然需求:互联网拥塞控制、自适应视频流和计算集群中的作业调度。我们的评估表明,whiRL能够保证这些系统的自然需求得到满足,并且能够暴露其他基本需求无法满足的特定场景。
{"title":"Verifying learning-augmented systems","authors":"Tomer Eliyahu, Yafim Kazak, Guy Katz, Michael Schapira","doi":"10.1145/3452296.3472936","DOIUrl":"https://doi.org/10.1145/3452296.3472936","url":null,"abstract":"The application of deep reinforcement learning (DRL) to computer and networked systems has recently gained significant popularity. However, the obscurity of decisions by DRL policies renders it hard to ascertain that learning-augmented systems are safe to deploy, posing a significant obstacle to their real-world adoption. We observe that specific characteristics of recent applications of DRL to systems contexts give rise to an exciting opportunity: applying formal verification to establish that a given system provably satisfies designer/user-specified requirements, or to expose concrete counter-examples. We present whiRL, a platform for verifying DRL policies for systems, which combines recent advances in the verification of deep neural networks with scalable model checking techniques. To exemplify its usefulness, we employ whiRL to verify natural equirements from recently introduced learning-augmented systems for three real-world environments: Internet congestion control, adaptive video streaming, and job scheduling in compute clusters. Our evaluation shows that whiRL is capable of guaranteeing that natural requirements from these systems are satisfied, and of exposing specific scenarios in which other basic requirements are not.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83374289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
L2D2
Pub Date : 2021-08-09 DOI: 10.1145/3452296.3472932
Deepak Vasisht, Jayanth Shenoy, Ranveer Chandra
Large constellations of Low Earth Orbit satellites promise to provide near real-time high-resolution Earth imagery. Yet, getting this large amount of data back to Earth is challenging because of their low orbits and fast motion through space. Centralized architectures with few multi-million dollar ground stations incur large hour-level data download latency and are hard to scale. We propose a geographically distributed ground station design, L2D2, that uses low-cost commodity hardware to offer low latency robust downlink. L2D2 is the first system to use a hybrid ground station model, where only a subset of ground stations are uplink-capable. We design new algorithms for scheduling and rate adaptation that enable low latency and high robustness despite the limitations of the receive-only ground stations. We evaluate L2D2 through a combination of trace-driven simulations and real-world satellite-ground station measurements. Our results demonstrate that L2D2's geographically distributed design can reduce data downlink latency from 90 minutes to 21 minutes.
{"title":"L2D2","authors":"Deepak Vasisht, Jayanth Shenoy, Ranveer Chandra","doi":"10.1145/3452296.3472932","DOIUrl":"https://doi.org/10.1145/3452296.3472932","url":null,"abstract":"Large constellations of Low Earth Orbit satellites promise to provide near real-time high-resolution Earth imagery. Yet, getting this large amount of data back to Earth is challenging because of their low orbits and fast motion through space. Centralized architectures with few multi-million dollar ground stations incur large hour-level data download latency and are hard to scale. We propose a geographically distributed ground station design, L2D2, that uses low-cost commodity hardware to offer low latency robust downlink. L2D2 is the first system to use a hybrid ground station model, where only a subset of ground stations are uplink-capable. We design new algorithms for scheduling and rate adaptation that enable low latency and high robustness despite the limitations of the receive-only ground stations. We evaluate L2D2 through a combination of trace-driven simulations and real-world satellite-ground station measurements. Our results demonstrate that L2D2's geographically distributed design can reduce data downlink latency from 90 minutes to 21 minutes.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78870414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Snowcap: synthesizing network-wide configuration updates Snowcap:综合全网配置更新
Pub Date : 2021-08-09 DOI: 10.1145/3452296.3472915
Tibor Schneider, Rüdiger Birkner, L. Vanbever
Large-scale reconfiguration campaigns tend to be nerve-racking for network operators as they can lead to significant network downtimes, decreased performance, and policy violations. Unfortunately, existing reconfiguration frameworks often fall short in practice as they either only support a small set of reconfiguration scenarios or simply do not scale. We address these problems with Snowcap, the first network reconfiguration framework which can synthesize configuration updates that comply with arbitrary hard and soft specifications, and involve arbitrary routing protocols. Our key contribution is an efficient search procedure which leverages counter-examples to efficiently navigate the space of configuration updates. Given a reconfiguration ordering which violates the desired specifications, our algorithm automatically identifies the problematic commands so that it can avoid this particular order in the next iteration. We fully implemented Snowcap and extensively evaluated its scalability and effectiveness on real-world topologies and typical, large-scale reconfiguration scenarios. Even for large topologies, Snowcap finds a valid reconfiguration ordering with minimal side-effects (i.e., traffic shifts) within a few seconds at most.
对于网络运营商来说,大规模的重新配置活动往往是伤脑筋的,因为它们可能导致严重的网络停机、性能下降和策略违反。不幸的是,现有的重新配置框架在实践中往往不足,因为它们要么只支持一小部分重新配置场景,要么根本无法扩展。我们用Snowcap解决了这些问题,Snowcap是第一个网络重构框架,它可以合成符合任意软硬规范的配置更新,并涉及任意路由协议。我们的主要贡献是一个有效的搜索过程,它利用反例来有效地导航配置更新的空间。给定一个违反期望规范的重新配置顺序,我们的算法会自动识别出有问题的命令,以便在下一次迭代中避免这种特定的顺序。我们完全实现了Snowcap,并在实际拓扑和典型的大规模重构场景中广泛评估了其可扩展性和有效性。即使对于大型拓扑,Snowcap也能在几秒钟内找到副作用最小(即流量转移)的有效重新配置顺序。
{"title":"Snowcap: synthesizing network-wide configuration updates","authors":"Tibor Schneider, Rüdiger Birkner, L. Vanbever","doi":"10.1145/3452296.3472915","DOIUrl":"https://doi.org/10.1145/3452296.3472915","url":null,"abstract":"Large-scale reconfiguration campaigns tend to be nerve-racking for network operators as they can lead to significant network downtimes, decreased performance, and policy violations. Unfortunately, existing reconfiguration frameworks often fall short in practice as they either only support a small set of reconfiguration scenarios or simply do not scale. We address these problems with Snowcap, the first network reconfiguration framework which can synthesize configuration updates that comply with arbitrary hard and soft specifications, and involve arbitrary routing protocols. Our key contribution is an efficient search procedure which leverages counter-examples to efficiently navigate the space of configuration updates. Given a reconfiguration ordering which violates the desired specifications, our algorithm automatically identifies the problematic commands so that it can avoid this particular order in the next iteration. We fully implemented Snowcap and extensively evaluated its scalability and effectiveness on real-world topologies and typical, large-scale reconfiguration scenarios. Even for large topologies, Snowcap finds a valid reconfiguration ordering with minimal side-effects (i.e., traffic shifts) within a few seconds at most.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81901202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Capacity-efficient and uncertainty-resilient backbone network planning with hose 带软管的容量效率和不确定性弹性骨干网规划
Pub Date : 2021-08-09 DOI: 10.1145/3452296.3472918
S. Ahuja, Varun Gupta, V. Dangui, Soshant Bali, A. Gopalan, Hao Zhong, Petr Lapukhov, Yiting Xia, Ying Zhang
This paper presents Facebook's design and operational experience of a Hose-based backbone network planning system. This initial adoption of the Hose model in network planning is driven by the capacity and demand uncertainty pressure of backbone expansion. Since the Hose model abstracts the aggregated traffic demand per site, peak traffic flows at different times can be multiplexed to save capacity and buffer traffic spikes. Our core design involves heuristic algorithms to select Hose-compliant traffic matrices and cross-layer optimization between the optical and IP networks. We evaluate the system performance in production and share insights from years of production experience. Hose-based network planning can save 17.4% capacity and drops 75% less traffic under fiber cuts. As the first study of Hose in network planning, our work has the potential to inspire follow-up research.
本文介绍了Facebook基于软管的骨干网规划系统的设计和运行经验。由于主干网扩容带来的容量和需求的不确定性压力,网络规划中最初采用Hose模型。由于Hose模型抽象了每个站点的聚合流量需求,因此可以将不同时间的高峰流量复用以节省容量并缓冲流量峰值。我们的核心设计包括启发式算法来选择软管兼容的流量矩阵以及光网络和IP网络之间的跨层优化。我们评估系统在生产中的性能,并分享多年生产经验的见解。基于软管的网络规划可以节省17.4%的容量,在光纤切断的情况下减少75%的流量。作为网络规划中软管的首次研究,我们的工作具有启发后续研究的潜力。
{"title":"Capacity-efficient and uncertainty-resilient backbone network planning with hose","authors":"S. Ahuja, Varun Gupta, V. Dangui, Soshant Bali, A. Gopalan, Hao Zhong, Petr Lapukhov, Yiting Xia, Ying Zhang","doi":"10.1145/3452296.3472918","DOIUrl":"https://doi.org/10.1145/3452296.3472918","url":null,"abstract":"This paper presents Facebook's design and operational experience of a Hose-based backbone network planning system. This initial adoption of the Hose model in network planning is driven by the capacity and demand uncertainty pressure of backbone expansion. Since the Hose model abstracts the aggregated traffic demand per site, peak traffic flows at different times can be multiplexed to save capacity and buffer traffic spikes. Our core design involves heuristic algorithms to select Hose-compliant traffic matrices and cross-layer optimization between the optical and IP networks. We evaluate the system performance in production and share insights from years of production experience. Hose-based network planning can save 17.4% capacity and drops 75% less traffic under fiber cuts. As the first study of Hose in network planning, our work has the potential to inspire follow-up research.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90204451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
LAVA 熔岩
Pub Date : 2021-08-09 DOI: 10.1007/978-3-540-72816-0_12856
R. I. Zelaya, W. Sussman, Jeremy Gummeson, Kyle Jamieson, Wenjun Hu
{"title":"LAVA","authors":"R. I. Zelaya, W. Sussman, Jeremy Gummeson, Kyle Jamieson, Wenjun Hu","doi":"10.1007/978-3-540-72816-0_12856","DOIUrl":"https://doi.org/10.1007/978-3-540-72816-0_12856","url":null,"abstract":"","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86180528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Designing data center networks using bottleneck structures 使用瓶颈结构设计数据中心网络
Pub Date : 2021-08-09 DOI: 10.1145/3452296.3472898
Jordi Ros-Giralt, Noah Amsel, Sruthi Yellamraju, J. Ezick, R. Lethin, Yuang Jiang, Aosong Feng, L. Tassiulas, Zhenguo Wu, Min Yee Teh, K. Bergman
This paper provides a mathematical model of data center performance based on the recently introduced Quantitative Theory of Bottleneck Structures (QTBS). Using the model, we prove that if the traffic pattern is textit{interference-free}, there exists a unique optimal design that both minimizes maximum flow completion time and yields maximal system-wide throughput. We show that interference-free patterns correspond to the important set of patterns that display data locality properties and use these theoretical insights to study three widely used interconnects---fat-trees, folded-Clos and dragonfly topologies. We derive equations that describe the optimal design for each interconnect as a function of the traffic pattern. Our model predicts, for example, that a 3-level folded-Clos interconnect with radix 24 that routes 10% of the traffic through the spine links can reduce the number of switches and cabling at the core layer by 25% without any performance penalty. We present experiments using production TCP/IP code to empirically validate the results and provide tables for network designers to identify optimal designs as a function of the size of the interconnect and traffic pattern.
本文在瓶颈结构定量理论(QTBS)的基础上,提出了数据中心性能的数学模型。利用该模型,我们证明了如果交通模式是textit{无干扰}的,存在一个唯一的最优设计,使最大流量完成时间最小化,并产生最大的系统范围吞吐量。我们表明无干扰模式对应于显示数据局域性的重要模式集,并使用这些理论见解来研究三种广泛使用的互连-脂肪树,折叠clos和蜻蜓拓扑结构。我们推导出描述每个互连的最优设计作为交通模式函数的方程。例如,我们的模型预测,基数为24的3级折叠clos互连通过主干链路路由10%的流量,可以在不影响性能的情况下将核心层的交换机和布线数量减少25%。我们提出了使用生产TCP/IP代码的实验,以经验验证结果,并为网络设计者提供表格,以确定作为互连大小和流量模式的函数的最佳设计。
{"title":"Designing data center networks using bottleneck structures","authors":"Jordi Ros-Giralt, Noah Amsel, Sruthi Yellamraju, J. Ezick, R. Lethin, Yuang Jiang, Aosong Feng, L. Tassiulas, Zhenguo Wu, Min Yee Teh, K. Bergman","doi":"10.1145/3452296.3472898","DOIUrl":"https://doi.org/10.1145/3452296.3472898","url":null,"abstract":"This paper provides a mathematical model of data center performance based on the recently introduced Quantitative Theory of Bottleneck Structures (QTBS). Using the model, we prove that if the traffic pattern is textit{interference-free}, there exists a unique optimal design that both minimizes maximum flow completion time and yields maximal system-wide throughput. We show that interference-free patterns correspond to the important set of patterns that display data locality properties and use these theoretical insights to study three widely used interconnects---fat-trees, folded-Clos and dragonfly topologies. We derive equations that describe the optimal design for each interconnect as a function of the traffic pattern. Our model predicts, for example, that a 3-level folded-Clos interconnect with radix 24 that routes 10% of the traffic through the spine links can reduce the number of switches and cabling at the core layer by 25% without any performance penalty. We present experiments using production TCP/IP code to empirically validate the results and provide tables for network designers to identify optimal designs as a function of the size of the interconnect and traffic pattern.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86194552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
From IP to transport and beyond: cross-layer attacks against applications 从IP到传输及其他:针对应用程序的跨层攻击
Pub Date : 2021-08-09 DOI: 10.1145/3452296.3472933
Tianxiang Dai, Philipp Jeitner, Haya Shulman, M. Waidner
We perform the first analysis of methodologies for launching DNS cache poisoning: manipulation at the IP layer, hijack of the inter-domain routing and probing open ports via side channels. We evaluate these methodologies against DNS resolvers in the Internet and compare them with respect to effectiveness, applicability and stealth. Our study shows that DNS cache poisoning is a practical and pervasive threat. We then demonstrate cross-layer attacks that leverage DNS cache poisoning for attacking popular systems, ranging from security mechanisms, such as RPKI, to applications, such as VoIP. In addition to more traditional adversarial goals, most notably impersonation and Denial of Service, we show for the first time that DNS cache poisoning can even enable adversaries to bypass cryptographic defences: we demonstrate how DNS cache poisoning can facilitate BGP prefix hijacking of networks protected with RPKI even when all the other networks apply route origin validation to filter invalid BGP announcements. Our study shows that DNS plays a much more central role in the Internet security than previously assumed. We recommend mitigations for securing the applications and for preventing cache poisoning.
我们对启动DNS缓存中毒的方法进行了首次分析:在IP层操纵,劫持域间路由和通过侧通道探测开放端口。我们将这些方法与互联网上的DNS解析器进行比较,并比较它们的有效性、适用性和隐蔽性。我们的研究表明,DNS缓存中毒是一种实际而普遍的威胁。然后,我们演示了利用DNS缓存中毒攻击流行系统的跨层攻击,范围从安全机制(如RPKI)到应用程序(如VoIP)。除了更传统的对抗性目标,最明显的是模仿和拒绝服务,我们首次展示了DNS缓存中毒甚至可以使攻击者绕过加密防御:我们演示了DNS缓存中毒如何促进BGP前缀劫持受RPKI保护的网络,即使所有其他网络都应用路由来源验证来过滤无效的BGP公告。我们的研究表明,DNS在互联网安全中扮演的角色比以前认为的要重要得多。我们建议采用缓解措施来保护应用程序并防止缓存中毒。
{"title":"From IP to transport and beyond: cross-layer attacks against applications","authors":"Tianxiang Dai, Philipp Jeitner, Haya Shulman, M. Waidner","doi":"10.1145/3452296.3472933","DOIUrl":"https://doi.org/10.1145/3452296.3472933","url":null,"abstract":"We perform the first analysis of methodologies for launching DNS cache poisoning: manipulation at the IP layer, hijack of the inter-domain routing and probing open ports via side channels. We evaluate these methodologies against DNS resolvers in the Internet and compare them with respect to effectiveness, applicability and stealth. Our study shows that DNS cache poisoning is a practical and pervasive threat. We then demonstrate cross-layer attacks that leverage DNS cache poisoning for attacking popular systems, ranging from security mechanisms, such as RPKI, to applications, such as VoIP. In addition to more traditional adversarial goals, most notably impersonation and Denial of Service, we show for the first time that DNS cache poisoning can even enable adversaries to bypass cryptographic defences: we demonstrate how DNS cache poisoning can facilitate BGP prefix hijacking of networks protected with RPKI even when all the other networks apply route origin validation to filter invalid BGP announcements. Our study shows that DNS plays a much more central role in the Internet security than previously assumed. We recommend mitigations for securing the applications and for preventing cache poisoning.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83814480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
Proceedings of the 2021 ACM SIGCOMM 2021 Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1