Virtualized Radio Access Network (vRAN) offers a cost-efficient solution for running the 5G RAN as a virtualized network function (VNF) on commodity hardware. The vRAN is more efficient than traditional RANs, as it multiplexes several base station workloads on the same compute hardware. Our measurements show that, whilst this multiplexing provides efficiency gains, more than 50% of the CPU cycles in typical vRAN settings still remain unused. A way to further improve CPU utilization is to collocate the vRAN with general-purpose workloads. However, to maintain performance, vRAN tasks have sub-millisecond latency requirements that have to be met 99.999% of times. We show that this is difficult to achieve with existing systems. We propose Concordia, a userspace deadline scheduling framework for the vRAN on Linux. Concordia builds prediction models using quantile decision trees to predict the worst case execution times of vRAN signal processing tasks. The Concordia scheduler is fast (runs every 20 us) and the prediction models are accurate, enabling the system to reserve a minimum number of cores required for vRAN tasks, leaving the rest for general-purpose workloads. We evaluate Concordia on a commercial-grade reference vRAN platform. We show that it meets the 99.999% reliability requirements and reclaims more than 70% of idle CPU cycles without affecting the RAN performance.
虚拟化无线接入网(virtual Radio Access Network, vRAN)为5G RAN在商用硬件上作为虚拟化网络功能(virtual Network function, VNF)运行提供了一种经济高效的解决方案。vRAN比传统的ran更高效,因为它在相同的计算硬件上复用多个基站工作负载。我们的测量表明,虽然这种多路复用提供了效率提升,但在典型的vRAN设置中,超过50%的CPU周期仍然未被使用。进一步提高CPU利用率的一种方法是将vRAN与通用工作负载搭配使用。然而,为了保持性能,vRAN任务具有亚毫秒级的延迟要求,必须在99.999%的情况下满足这些要求。我们表明,这是很难实现与现有的系统。我们提出了Concordia,一个用于Linux上的vRAN的用户空间最后期限调度框架。Concordia使用分位数决策树构建预测模型,预测vRAN信号处理任务的最坏情况执行时间。Concordia调度器速度快(每20秒运行一次),预测模型准确,使系统能够为vRAN任务保留最少数量的核心,其余的留给通用工作负载。我们在商用级参考vRAN平台上对Concordia进行了评估。我们表明,它满足99.999%的可靠性要求,并在不影响RAN性能的情况下回收70%以上的空闲CPU周期。
{"title":"Concordia: teaching the 5G vRAN to share compute","authors":"Xenofon Foukas, B. Radunovic","doi":"10.1145/3452296.3472894","DOIUrl":"https://doi.org/10.1145/3452296.3472894","url":null,"abstract":"Virtualized Radio Access Network (vRAN) offers a cost-efficient solution for running the 5G RAN as a virtualized network function (VNF) on commodity hardware. The vRAN is more efficient than traditional RANs, as it multiplexes several base station workloads on the same compute hardware. Our measurements show that, whilst this multiplexing provides efficiency gains, more than 50% of the CPU cycles in typical vRAN settings still remain unused. A way to further improve CPU utilization is to collocate the vRAN with general-purpose workloads. However, to maintain performance, vRAN tasks have sub-millisecond latency requirements that have to be met 99.999% of times. We show that this is difficult to achieve with existing systems. We propose Concordia, a userspace deadline scheduling framework for the vRAN on Linux. Concordia builds prediction models using quantile decision trees to predict the worst case execution times of vRAN signal processing tasks. The Concordia scheduler is fast (runs every 20 us) and the prediction models are accurate, enabling the system to reserve a minimum number of cores required for vRAN tasks, leaving the rest for general-purpose workloads. We evaluate Concordia on a commercial-grade reference vRAN platform. We show that it meets the 99.999% reliability requirements and reclaims more than 70% of idle CPU cycles without affecting the RAN performance.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81341931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Black swan events are hard-to-predict rare events that can significantly alter the course of our lives. The Internet has played a key role in helping us deal with the coronavirus pandemic, a recent black swan event. However, Internet researchers and operators are mostly blind to another black swan event that poses a direct threat to Internet infrastructure. In this paper, we investigate the impact of solar superstorms that can potentially cause large-scale Internet outages covering the entire globe and lasting several months. We discuss the challenges posed by such activity and currently available mitigation techniques. Using real-world datasets, we analyze the robustness of the current Internet infrastructure and show that submarine cables are at greater risk of failure compared to land cables. Moreover, the US has a higher risk for disconnection compared to Asia. Finally, we lay out steps for improving the Internet's resiliency.
{"title":"Solar superstorms: planning for an internet apocalypse","authors":"S. Jyothi","doi":"10.1145/3452296.3472916","DOIUrl":"https://doi.org/10.1145/3452296.3472916","url":null,"abstract":"Black swan events are hard-to-predict rare events that can significantly alter the course of our lives. The Internet has played a key role in helping us deal with the coronavirus pandemic, a recent black swan event. However, Internet researchers and operators are mostly blind to another black swan event that poses a direct threat to Internet infrastructure. In this paper, we investigate the impact of solar superstorms that can potentially cause large-scale Internet outages covering the entire globe and lasting several months. We discuss the challenges posed by such activity and currently available mitigation techniques. Using real-world datasets, we analyze the robustness of the current Internet infrastructure and show that submarine cables are at greater risk of failure compared to land cables. Moreover, the US has a higher risk for disconnection compared to Asia. Finally, we lay out steps for improving the Internet's resiliency.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88213475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arjun Singhvi, Aditya Akella, Maggie Anderson, R. Cauble, Harshad Deshmukh, D. Gibson, Milo M. K. Martin, Amanda Strominger, T. Wenisch, Amin Vahdat
Distributed in-memory caching is a key component of modern Internet services. Such caches are often accessed via remote procedure call (RPC), as RPC frameworks provide rich support for productionization, including protocol versioning, memory efficiency, auto-scaling, and hitless upgrades. However, full-featured RPC limits performance and scalability as it incurs high latencies and CPU overheads. Remote Memory Access (RMA) offers a promising alternative, but meeting productionization requirements can be a significant challenge with RMA-based systems due to limited programmability and narrow RMA primitives. This paper describes the design, implementation, and experience derived from CliqueMap, a hybrid RMA/RPC caching system. CliqueMap has been in production use in Google's datacenters for over three years, currently serves more than 1PB of DRAM, and underlies several end-user visible services. CliqueMap makes use of performant and efficient RMAs on the critical serving path and judiciously applies RPCs toward other functionality. The design embraces lightweight replication, client-based quoruming, self-validating server responses, per-operation client-side retries, and co-design with the network layers. These foci lead to a system resilient to the rigors of production and frequent post deployment evolution.
{"title":"CliqueMap","authors":"Arjun Singhvi, Aditya Akella, Maggie Anderson, R. Cauble, Harshad Deshmukh, D. Gibson, Milo M. K. Martin, Amanda Strominger, T. Wenisch, Amin Vahdat","doi":"10.1145/3452296.3472934","DOIUrl":"https://doi.org/10.1145/3452296.3472934","url":null,"abstract":"Distributed in-memory caching is a key component of modern Internet services. Such caches are often accessed via remote procedure call (RPC), as RPC frameworks provide rich support for productionization, including protocol versioning, memory efficiency, auto-scaling, and hitless upgrades. However, full-featured RPC limits performance and scalability as it incurs high latencies and CPU overheads. Remote Memory Access (RMA) offers a promising alternative, but meeting productionization requirements can be a significant challenge with RMA-based systems due to limited programmability and narrow RMA primitives. This paper describes the design, implementation, and experience derived from CliqueMap, a hybrid RMA/RPC caching system. CliqueMap has been in production use in Google's datacenters for over three years, currently serves more than 1PB of DRAM, and underlies several end-user visible services. CliqueMap makes use of performant and efficient RMAs on the critical serving path and judiciously applies RPCs toward other functionality. The design embraces lightweight replication, client-based quoruming, self-validating server responses, per-operation client-side retries, and co-design with the network layers. These foci lead to a system resilient to the rigors of production and frequent post deployment evolution.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"233 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77009089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Petros Gigis, Matt Calder, Lefteris Manassakis, George Nomikos, Vasileios Kotronis, X. Dimitropoulos, Ethan Katz-Bassett, Georgios Smaragdakis
Content Hypergiants deliver the vast majority of Internet traffic to end users. In recent years, some have invested heavily in deploying services and servers inside end-user networks. With several dozen Hypergiants and thousands of servers deployed inside networks, these off-net (meaning outside the Hypergiant networks) deployments change the structure of the Internet. Previous efforts to study them have relied on proprietary data or specialized per-Hypergiant measurement techniques that neither scale nor generalize, providing a limited view of content delivery on today's Internet. In this paper, we develop a generic and easy to implement methodology to measure the expansion of Hypergiants' off-nets. Our key observation is that Hypergiants increasingly encrypt their traffic to protect their customers' privacy. Thus, we can analyze publicly available Internet-wide scans of port 443 and retrieve TLS certificates to discover which IP addresses host Hypergiant certificates in order to infer the networks hosting off-nets for the corresponding Hypergiants. Our results show that the number of networks hosting Hypergiant off-nets has tripled from 2013 to 2021, reaching 4.5k networks. The largest Hypergiants dominate these deployments, with almost all of these networks hosting an off-net for at least one -- and increasingly two or more -- of Google, Netflix, Facebook, or Akamai. These four Hypergiants have off-nets within networks that provide access to a significant fraction of end user population.
{"title":"Seven years in the life of Hypergiants' off-nets","authors":"Petros Gigis, Matt Calder, Lefteris Manassakis, George Nomikos, Vasileios Kotronis, X. Dimitropoulos, Ethan Katz-Bassett, Georgios Smaragdakis","doi":"10.1145/3452296.3472928","DOIUrl":"https://doi.org/10.1145/3452296.3472928","url":null,"abstract":"Content Hypergiants deliver the vast majority of Internet traffic to end users. In recent years, some have invested heavily in deploying services and servers inside end-user networks. With several dozen Hypergiants and thousands of servers deployed inside networks, these off-net (meaning outside the Hypergiant networks) deployments change the structure of the Internet. Previous efforts to study them have relied on proprietary data or specialized per-Hypergiant measurement techniques that neither scale nor generalize, providing a limited view of content delivery on today's Internet. In this paper, we develop a generic and easy to implement methodology to measure the expansion of Hypergiants' off-nets. Our key observation is that Hypergiants increasingly encrypt their traffic to protect their customers' privacy. Thus, we can analyze publicly available Internet-wide scans of port 443 and retrieve TLS certificates to discover which IP addresses host Hypergiant certificates in order to infer the networks hosting off-nets for the corresponding Hypergiants. Our results show that the number of networks hosting Hypergiant off-nets has tripled from 2013 to 2021, reaching 4.5k networks. The largest Hypergiants dominate these deployments, with almost all of these networks hosting an off-net for at least one -- and increasingly two or more -- of Google, Netflix, Facebook, or Akamai. These four Hypergiants have off-nets within networks that provide access to a significant fraction of end user population.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83281092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tomer Eliyahu, Yafim Kazak, Guy Katz, Michael Schapira
The application of deep reinforcement learning (DRL) to computer and networked systems has recently gained significant popularity. However, the obscurity of decisions by DRL policies renders it hard to ascertain that learning-augmented systems are safe to deploy, posing a significant obstacle to their real-world adoption. We observe that specific characteristics of recent applications of DRL to systems contexts give rise to an exciting opportunity: applying formal verification to establish that a given system provably satisfies designer/user-specified requirements, or to expose concrete counter-examples. We present whiRL, a platform for verifying DRL policies for systems, which combines recent advances in the verification of deep neural networks with scalable model checking techniques. To exemplify its usefulness, we employ whiRL to verify natural equirements from recently introduced learning-augmented systems for three real-world environments: Internet congestion control, adaptive video streaming, and job scheduling in compute clusters. Our evaluation shows that whiRL is capable of guaranteeing that natural requirements from these systems are satisfied, and of exposing specific scenarios in which other basic requirements are not.
{"title":"Verifying learning-augmented systems","authors":"Tomer Eliyahu, Yafim Kazak, Guy Katz, Michael Schapira","doi":"10.1145/3452296.3472936","DOIUrl":"https://doi.org/10.1145/3452296.3472936","url":null,"abstract":"The application of deep reinforcement learning (DRL) to computer and networked systems has recently gained significant popularity. However, the obscurity of decisions by DRL policies renders it hard to ascertain that learning-augmented systems are safe to deploy, posing a significant obstacle to their real-world adoption. We observe that specific characteristics of recent applications of DRL to systems contexts give rise to an exciting opportunity: applying formal verification to establish that a given system provably satisfies designer/user-specified requirements, or to expose concrete counter-examples. We present whiRL, a platform for verifying DRL policies for systems, which combines recent advances in the verification of deep neural networks with scalable model checking techniques. To exemplify its usefulness, we employ whiRL to verify natural equirements from recently introduced learning-augmented systems for three real-world environments: Internet congestion control, adaptive video streaming, and job scheduling in compute clusters. Our evaluation shows that whiRL is capable of guaranteeing that natural requirements from these systems are satisfied, and of exposing specific scenarios in which other basic requirements are not.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83374289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large-scale reconfiguration campaigns tend to be nerve-racking for network operators as they can lead to significant network downtimes, decreased performance, and policy violations. Unfortunately, existing reconfiguration frameworks often fall short in practice as they either only support a small set of reconfiguration scenarios or simply do not scale. We address these problems with Snowcap, the first network reconfiguration framework which can synthesize configuration updates that comply with arbitrary hard and soft specifications, and involve arbitrary routing protocols. Our key contribution is an efficient search procedure which leverages counter-examples to efficiently navigate the space of configuration updates. Given a reconfiguration ordering which violates the desired specifications, our algorithm automatically identifies the problematic commands so that it can avoid this particular order in the next iteration. We fully implemented Snowcap and extensively evaluated its scalability and effectiveness on real-world topologies and typical, large-scale reconfiguration scenarios. Even for large topologies, Snowcap finds a valid reconfiguration ordering with minimal side-effects (i.e., traffic shifts) within a few seconds at most.
{"title":"Snowcap: synthesizing network-wide configuration updates","authors":"Tibor Schneider, Rüdiger Birkner, L. Vanbever","doi":"10.1145/3452296.3472915","DOIUrl":"https://doi.org/10.1145/3452296.3472915","url":null,"abstract":"Large-scale reconfiguration campaigns tend to be nerve-racking for network operators as they can lead to significant network downtimes, decreased performance, and policy violations. Unfortunately, existing reconfiguration frameworks often fall short in practice as they either only support a small set of reconfiguration scenarios or simply do not scale. We address these problems with Snowcap, the first network reconfiguration framework which can synthesize configuration updates that comply with arbitrary hard and soft specifications, and involve arbitrary routing protocols. Our key contribution is an efficient search procedure which leverages counter-examples to efficiently navigate the space of configuration updates. Given a reconfiguration ordering which violates the desired specifications, our algorithm automatically identifies the problematic commands so that it can avoid this particular order in the next iteration. We fully implemented Snowcap and extensively evaluated its scalability and effectiveness on real-world topologies and typical, large-scale reconfiguration scenarios. Even for large topologies, Snowcap finds a valid reconfiguration ordering with minimal side-effects (i.e., traffic shifts) within a few seconds at most.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81901202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Ahuja, Varun Gupta, V. Dangui, Soshant Bali, A. Gopalan, Hao Zhong, Petr Lapukhov, Yiting Xia, Ying Zhang
This paper presents Facebook's design and operational experience of a Hose-based backbone network planning system. This initial adoption of the Hose model in network planning is driven by the capacity and demand uncertainty pressure of backbone expansion. Since the Hose model abstracts the aggregated traffic demand per site, peak traffic flows at different times can be multiplexed to save capacity and buffer traffic spikes. Our core design involves heuristic algorithms to select Hose-compliant traffic matrices and cross-layer optimization between the optical and IP networks. We evaluate the system performance in production and share insights from years of production experience. Hose-based network planning can save 17.4% capacity and drops 75% less traffic under fiber cuts. As the first study of Hose in network planning, our work has the potential to inspire follow-up research.
{"title":"Capacity-efficient and uncertainty-resilient backbone network planning with hose","authors":"S. Ahuja, Varun Gupta, V. Dangui, Soshant Bali, A. Gopalan, Hao Zhong, Petr Lapukhov, Yiting Xia, Ying Zhang","doi":"10.1145/3452296.3472918","DOIUrl":"https://doi.org/10.1145/3452296.3472918","url":null,"abstract":"This paper presents Facebook's design and operational experience of a Hose-based backbone network planning system. This initial adoption of the Hose model in network planning is driven by the capacity and demand uncertainty pressure of backbone expansion. Since the Hose model abstracts the aggregated traffic demand per site, peak traffic flows at different times can be multiplexed to save capacity and buffer traffic spikes. Our core design involves heuristic algorithms to select Hose-compliant traffic matrices and cross-layer optimization between the optical and IP networks. We evaluate the system performance in production and share insights from years of production experience. Hose-based network planning can save 17.4% capacity and drops 75% less traffic under fiber cuts. As the first study of Hose in network planning, our work has the potential to inspire follow-up research.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90204451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-09DOI: 10.1007/978-3-540-72816-0_12856
R. I. Zelaya, W. Sussman, Jeremy Gummeson, Kyle Jamieson, Wenjun Hu
{"title":"LAVA","authors":"R. I. Zelaya, W. Sussman, Jeremy Gummeson, Kyle Jamieson, Wenjun Hu","doi":"10.1007/978-3-540-72816-0_12856","DOIUrl":"https://doi.org/10.1007/978-3-540-72816-0_12856","url":null,"abstract":"","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86180528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jordi Ros-Giralt, Noah Amsel, Sruthi Yellamraju, J. Ezick, R. Lethin, Yuang Jiang, Aosong Feng, L. Tassiulas, Zhenguo Wu, Min Yee Teh, K. Bergman
This paper provides a mathematical model of data center performance based on the recently introduced Quantitative Theory of Bottleneck Structures (QTBS). Using the model, we prove that if the traffic pattern is textit{interference-free}, there exists a unique optimal design that both minimizes maximum flow completion time and yields maximal system-wide throughput. We show that interference-free patterns correspond to the important set of patterns that display data locality properties and use these theoretical insights to study three widely used interconnects---fat-trees, folded-Clos and dragonfly topologies. We derive equations that describe the optimal design for each interconnect as a function of the traffic pattern. Our model predicts, for example, that a 3-level folded-Clos interconnect with radix 24 that routes 10% of the traffic through the spine links can reduce the number of switches and cabling at the core layer by 25% without any performance penalty. We present experiments using production TCP/IP code to empirically validate the results and provide tables for network designers to identify optimal designs as a function of the size of the interconnect and traffic pattern.
{"title":"Designing data center networks using bottleneck structures","authors":"Jordi Ros-Giralt, Noah Amsel, Sruthi Yellamraju, J. Ezick, R. Lethin, Yuang Jiang, Aosong Feng, L. Tassiulas, Zhenguo Wu, Min Yee Teh, K. Bergman","doi":"10.1145/3452296.3472898","DOIUrl":"https://doi.org/10.1145/3452296.3472898","url":null,"abstract":"This paper provides a mathematical model of data center performance based on the recently introduced Quantitative Theory of Bottleneck Structures (QTBS). Using the model, we prove that if the traffic pattern is textit{interference-free}, there exists a unique optimal design that both minimizes maximum flow completion time and yields maximal system-wide throughput. We show that interference-free patterns correspond to the important set of patterns that display data locality properties and use these theoretical insights to study three widely used interconnects---fat-trees, folded-Clos and dragonfly topologies. We derive equations that describe the optimal design for each interconnect as a function of the traffic pattern. Our model predicts, for example, that a 3-level folded-Clos interconnect with radix 24 that routes 10% of the traffic through the spine links can reduce the number of switches and cabling at the core layer by 25% without any performance penalty. We present experiments using production TCP/IP code to empirically validate the results and provide tables for network designers to identify optimal designs as a function of the size of the interconnect and traffic pattern.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86194552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianxiang Dai, Philipp Jeitner, Haya Shulman, M. Waidner
We perform the first analysis of methodologies for launching DNS cache poisoning: manipulation at the IP layer, hijack of the inter-domain routing and probing open ports via side channels. We evaluate these methodologies against DNS resolvers in the Internet and compare them with respect to effectiveness, applicability and stealth. Our study shows that DNS cache poisoning is a practical and pervasive threat. We then demonstrate cross-layer attacks that leverage DNS cache poisoning for attacking popular systems, ranging from security mechanisms, such as RPKI, to applications, such as VoIP. In addition to more traditional adversarial goals, most notably impersonation and Denial of Service, we show for the first time that DNS cache poisoning can even enable adversaries to bypass cryptographic defences: we demonstrate how DNS cache poisoning can facilitate BGP prefix hijacking of networks protected with RPKI even when all the other networks apply route origin validation to filter invalid BGP announcements. Our study shows that DNS plays a much more central role in the Internet security than previously assumed. We recommend mitigations for securing the applications and for preventing cache poisoning.
{"title":"From IP to transport and beyond: cross-layer attacks against applications","authors":"Tianxiang Dai, Philipp Jeitner, Haya Shulman, M. Waidner","doi":"10.1145/3452296.3472933","DOIUrl":"https://doi.org/10.1145/3452296.3472933","url":null,"abstract":"We perform the first analysis of methodologies for launching DNS cache poisoning: manipulation at the IP layer, hijack of the inter-domain routing and probing open ports via side channels. We evaluate these methodologies against DNS resolvers in the Internet and compare them with respect to effectiveness, applicability and stealth. Our study shows that DNS cache poisoning is a practical and pervasive threat. We then demonstrate cross-layer attacks that leverage DNS cache poisoning for attacking popular systems, ranging from security mechanisms, such as RPKI, to applications, such as VoIP. In addition to more traditional adversarial goals, most notably impersonation and Denial of Service, we show for the first time that DNS cache poisoning can even enable adversaries to bypass cryptographic defences: we demonstrate how DNS cache poisoning can facilitate BGP prefix hijacking of networks protected with RPKI even when all the other networks apply route origin validation to filter invalid BGP announcements. Our study shows that DNS plays a much more central role in the Internet security than previously assumed. We recommend mitigations for securing the applications and for preventing cache poisoning.","PeriodicalId":20487,"journal":{"name":"Proceedings of the 2021 ACM SIGCOMM 2021 Conference","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83814480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}