Pub Date : 2021-11-01DOI: 10.1109/ICNP52444.2021.9651986
Yiran Lei, Yu Zhou, Yunsenxiao Lin, Mingwei Xu, Yangyang Wang
Service-level objectives (SLOs), as network performance requirements for delay and packet loss typically, should be guaranteed for increasing high-performance applications, e.g., telesurgery and cloud gaming. However, SLO violations are common and destructive in today’s network operation. Detection and diagnosis, meaning monitoring performance to discover anomalies and analyzing causality of SLO violations respectively, are crucial for fast recovery. Unfortunately, existing diagnosis approaches require exhaustive causal information to function. Meanwhile, existing detection tools incur large overhead or are only able to provide limited information for diagnosis. This paper presents DOVE, a diagnosis-driven SLO detection system with high accuracy and low overhead. The key idea is to identify and report the information needed by diagnosis along with SLO violation alerts from the data plane selectively and efficiently. Network segmentation is introduced to balance scalability and accuracy. Novel algorithms to measure packet loss and percentile delay are implemented completely on the data plane without the involvement of the control plane for fine-grained SLO detection. We implement and deploy DOVE on Tofino and P4 software switch (BMv2) and show the effectiveness of DOVE with a use case. The reported SLO violation alerts and diagnosis-needing information are compared with ground truth and show high accuracy (>97%). Our evaluation also shows that DOVE introduces up to two orders of magnitude less traffic overhead than NetSight. In addition, memory utilization and required processing ability are low to be deployable in real network topologies.
{"title":"DOVE: Diagnosis-driven SLO Violation Detection","authors":"Yiran Lei, Yu Zhou, Yunsenxiao Lin, Mingwei Xu, Yangyang Wang","doi":"10.1109/ICNP52444.2021.9651986","DOIUrl":"https://doi.org/10.1109/ICNP52444.2021.9651986","url":null,"abstract":"Service-level objectives (SLOs), as network performance requirements for delay and packet loss typically, should be guaranteed for increasing high-performance applications, e.g., telesurgery and cloud gaming. However, SLO violations are common and destructive in today’s network operation. Detection and diagnosis, meaning monitoring performance to discover anomalies and analyzing causality of SLO violations respectively, are crucial for fast recovery. Unfortunately, existing diagnosis approaches require exhaustive causal information to function. Meanwhile, existing detection tools incur large overhead or are only able to provide limited information for diagnosis. This paper presents DOVE, a diagnosis-driven SLO detection system with high accuracy and low overhead. The key idea is to identify and report the information needed by diagnosis along with SLO violation alerts from the data plane selectively and efficiently. Network segmentation is introduced to balance scalability and accuracy. Novel algorithms to measure packet loss and percentile delay are implemented completely on the data plane without the involvement of the control plane for fine-grained SLO detection. We implement and deploy DOVE on Tofino and P4 software switch (BMv2) and show the effectiveness of DOVE with a use case. The reported SLO violation alerts and diagnosis-needing information are compared with ground truth and show high accuracy (>97%). Our evaluation also shows that DOVE introduces up to two orders of magnitude less traffic overhead than NetSight. In addition, memory utilization and required processing ability are low to be deployable in real network topologies.","PeriodicalId":343813,"journal":{"name":"2021 IEEE 29th International Conference on Network Protocols (ICNP)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125869293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/icnp52444.2021.9651933
{"title":"Welcome Message from the ICNP 2021 TPC Chairs","authors":"","doi":"10.1109/icnp52444.2021.9651933","DOIUrl":"https://doi.org/10.1109/icnp52444.2021.9651933","url":null,"abstract":"","PeriodicalId":343813,"journal":{"name":"2021 IEEE 29th International Conference on Network Protocols (ICNP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130147487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICNP52444.2021.9651964
Yunzhi Lin, Shouxi Luo
To achieve efficient model multicast for cross-device Federated Learning (FL) over shared wireless channels, we propose SRMP, a transport protocol that performs semi-reliable model multicast over the air by leveraging existing PHY-aided wireless multicast techniques. The preliminary study shows that, with novel designs, SRMP could reduce the communication time involved in each round of training significantly.
{"title":"Poster: Accelerate Cross-Device Federated Learning With Semi-Reliable Model Multicast Over The Air","authors":"Yunzhi Lin, Shouxi Luo","doi":"10.1109/ICNP52444.2021.9651964","DOIUrl":"https://doi.org/10.1109/ICNP52444.2021.9651964","url":null,"abstract":"To achieve efficient model multicast for cross-device Federated Learning (FL) over shared wireless channels, we propose SRMP, a transport protocol that performs semi-reliable model multicast over the air by leveraging existing PHY-aided wireless multicast techniques. The preliminary study shows that, with novel designs, SRMP could reduce the communication time involved in each round of training significantly.","PeriodicalId":343813,"journal":{"name":"2021 IEEE 29th International Conference on Network Protocols (ICNP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116490447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICNP52444.2021.9651954
Radhika Sukapuram, Ranjan Patowary, G. Barua
Network Functions (NFs) provide security and optimization services to networks by examining and modifying packets and by collecting information. When NFs need to be scaled out to manage higher load or scaled in to conserve energy, flows need to be migrated from one instance of an NF, called the source instance, to another, called the destination instance, or from one chain of instances to another chain of instances. Before flows are migrated, the state information associated with the source instance needs to be migrated to the destination instance. Packets that arrive at the destination instance meanwhile need to be either buffered or dropped until the state information is migrated, for correct functioning of some stateful NFs, while for some others, the destination NF may continue to function. We define the properties of Loss-freedom, where the flow migration system does not drop packets, No-buffering, where it does not buffer packets, and Order-preservation, where it processes packets in the same manner as the source NF, if there was no flow migration. We formalize these properties, for the first time, and prove that it is impossible for a flow migration algorithm in stateful NFs to guarantee satisfying all three of the properties of Loss-freedom (L), Order-preservation (O) and No-buffering (N) during flow migration, even if messages or packets are not lost. We demonstrate how existing algorithms operate with regard to these properties and prove that these properties are compositional.
{"title":"Loss-freedom, Order-preservation and No-buffering: Pick Any Two During Flow Migration in Network Functions","authors":"Radhika Sukapuram, Ranjan Patowary, G. Barua","doi":"10.1109/ICNP52444.2021.9651954","DOIUrl":"https://doi.org/10.1109/ICNP52444.2021.9651954","url":null,"abstract":"Network Functions (NFs) provide security and optimization services to networks by examining and modifying packets and by collecting information. When NFs need to be scaled out to manage higher load or scaled in to conserve energy, flows need to be migrated from one instance of an NF, called the source instance, to another, called the destination instance, or from one chain of instances to another chain of instances. Before flows are migrated, the state information associated with the source instance needs to be migrated to the destination instance. Packets that arrive at the destination instance meanwhile need to be either buffered or dropped until the state information is migrated, for correct functioning of some stateful NFs, while for some others, the destination NF may continue to function. We define the properties of Loss-freedom, where the flow migration system does not drop packets, No-buffering, where it does not buffer packets, and Order-preservation, where it processes packets in the same manner as the source NF, if there was no flow migration. We formalize these properties, for the first time, and prove that it is impossible for a flow migration algorithm in stateful NFs to guarantee satisfying all three of the properties of Loss-freedom (L), Order-preservation (O) and No-buffering (N) during flow migration, even if messages or packets are not lost. We demonstrate how existing algorithms operate with regard to these properties and prove that these properties are compositional.","PeriodicalId":343813,"journal":{"name":"2021 IEEE 29th International Conference on Network Protocols (ICNP)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126714577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICNP52444.2021.9651973
Sahil Gupta, D. Gosain, Garegin Grigoryan, Minseok Kwon, H. B. Acharya
The P4 language allows "protocol-independent packet parsing" in network switches, and makes many operations possible in the data plane. But P4 is not built for Deep Packet Inspection – it can only "parse" well-defined packet headers, not free-form headers as seen in HTTPS etc. Thus some very important use cases, such as application-layer firewalls, are considered impossible for P4. This demonstration shows that this limitation is not strictly true: switches, that support only standard P4, are able to independently perform tasks such as blocking specific URLs (without using non-standard "extern" components, help from the SDN controller, or rerouting to a firewall). As more Internet infrastructure becomes SDN-compatible, in future, switches may perform simple application-layer firewall tasks.
{"title":"Demo: Simple Deep Packet Inspection with P4","authors":"Sahil Gupta, D. Gosain, Garegin Grigoryan, Minseok Kwon, H. B. Acharya","doi":"10.1109/ICNP52444.2021.9651973","DOIUrl":"https://doi.org/10.1109/ICNP52444.2021.9651973","url":null,"abstract":"The P4 language allows \"protocol-independent packet parsing\" in network switches, and makes many operations possible in the data plane. But P4 is not built for Deep Packet Inspection – it can only \"parse\" well-defined packet headers, not free-form headers as seen in HTTPS etc. Thus some very important use cases, such as application-layer firewalls, are considered impossible for P4. This demonstration shows that this limitation is not strictly true: switches, that support only standard P4, are able to independently perform tasks such as blocking specific URLs (without using non-standard \"extern\" components, help from the SDN controller, or rerouting to a firewall). As more Internet infrastructure becomes SDN-compatible, in future, switches may perform simple application-layer firewall tasks.","PeriodicalId":343813,"journal":{"name":"2021 IEEE 29th International Conference on Network Protocols (ICNP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127752279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICNP52444.2021.9651935
Xizheng Wang, Guo Chen, Xijin Yin, Huichen Dai, Bojie Li, Binzhang Fu, Kun Tan
Due to its superior performance, Remote Direct Memory Access (RDMA) has been widely deployed in data center networks. It provides applications with ultra-high throughput, ultra-low latency, and far lower CPU utilization than TCP/IP software network stack. However, the connection states that must be stored on the RDMA NIC (RNIC) and the small NIC memory result in poor scalability. The performance drops significantly when the RNIC needs to maintain a large number of concurrent connections.We propose StaR (Stateless RDMA), which solves the scalability problem of RDMA by transferring states to the other communication end. Leveraging the asymmetric communication pattern in data center applications, StaR lets the communication end with low concurrency save states for the other end with high concurrency, thus making the RNIC on the bottleneck side to be stateless. We have implemented StaR on an FPGA board with 10Gbps network port and evaluated its performance on a testbed with 9 machines all equipped with StaR NICs. The experimental results show that in high concurrency scenarios, the throughput of StaR can reach up to 4.13x and 1.35x of the original RNIC and the latest software-based solution, respectively.
RDMA (Remote Direct Memory Access)由于其优越的性能,在数据中心网络中得到了广泛的应用。它为应用程序提供了比TCP/IP软件网络堆栈更高的吞吐量、更低的延迟和更低的CPU利用率。但是,必须存储在RDMA网卡(RNIC)上的连接状态和较小的网卡内存导致可扩展性较差。当RNIC需要维护大量并发连接时,性能会明显下降。我们提出了StaR(无状态RDMA),它通过向另一端传输状态来解决RDMA的可扩展性问题。利用数据中心应用程序中的非对称通信模式,StaR允许具有低并发性的通信端为具有高并发性的另一端保存状态,从而使瓶颈端的RNIC处于无状态状态。我们在带有10Gbps网络端口的FPGA板上实现了StaR,并在配备StaR网卡的9台机器的测试台上对其性能进行了评估。实验结果表明,在高并发场景下,StaR的吞吐量分别可以达到原始RNIC和最新基于软件的解决方案的4.13倍和1.35倍。
{"title":"StaR: Breaking the Scalability Limit for RDMA","authors":"Xizheng Wang, Guo Chen, Xijin Yin, Huichen Dai, Bojie Li, Binzhang Fu, Kun Tan","doi":"10.1109/ICNP52444.2021.9651935","DOIUrl":"https://doi.org/10.1109/ICNP52444.2021.9651935","url":null,"abstract":"Due to its superior performance, Remote Direct Memory Access (RDMA) has been widely deployed in data center networks. It provides applications with ultra-high throughput, ultra-low latency, and far lower CPU utilization than TCP/IP software network stack. However, the connection states that must be stored on the RDMA NIC (RNIC) and the small NIC memory result in poor scalability. The performance drops significantly when the RNIC needs to maintain a large number of concurrent connections.We propose StaR (Stateless RDMA), which solves the scalability problem of RDMA by transferring states to the other communication end. Leveraging the asymmetric communication pattern in data center applications, StaR lets the communication end with low concurrency save states for the other end with high concurrency, thus making the RNIC on the bottleneck side to be stateless. We have implemented StaR on an FPGA board with 10Gbps network port and evaluated its performance on a testbed with 9 machines all equipped with StaR NICs. The experimental results show that in high concurrency scenarios, the throughput of StaR can reach up to 4.13x and 1.35x of the original RNIC and the latest software-based solution, respectively.","PeriodicalId":343813,"journal":{"name":"2021 IEEE 29th International Conference on Network Protocols (ICNP)","volume":"66 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133609501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICNP52444.2021.9651924
Rana Shahout, R. Friedman, Dolev Adas
Measurement capabilities are fundamental for a variety of network applications. Typically, recent data items are more relevant than old ones, a notion we can capture through a sliding window abstraction. These capabilities require a large number of counters in order to monitor the traffic of all network flows. However, SRAM memories are too small to contain these counters. Previous works suggested replacing counters with small estimators, trading accuracy for reduced space. But these estimators only focus on the counters’ size, whereas often flow ids consume more space than their respective counters. In this work, we present the CELL algorithm that combines estimators with efficient flow representation for superior memory reduction.We also extend CELL to the sliding window model, which prioritizes the recent data, by presenting two variants named RAND-CELL and SHIFT-CELL. We formally analyze the error and memory consumption of our algorithms and compare their performance against competing approaches using real-world Internet traces. These measurements exhibit the benefits of our work and show that CELL consumes at least 30% less space than the best-known alternative. The code is available in open source.
{"title":"CELL: Counter Estimation for Per-flow Traffic in Streams and Sliding Windows","authors":"Rana Shahout, R. Friedman, Dolev Adas","doi":"10.1109/ICNP52444.2021.9651924","DOIUrl":"https://doi.org/10.1109/ICNP52444.2021.9651924","url":null,"abstract":"Measurement capabilities are fundamental for a variety of network applications. Typically, recent data items are more relevant than old ones, a notion we can capture through a sliding window abstraction. These capabilities require a large number of counters in order to monitor the traffic of all network flows. However, SRAM memories are too small to contain these counters. Previous works suggested replacing counters with small estimators, trading accuracy for reduced space. But these estimators only focus on the counters’ size, whereas often flow ids consume more space than their respective counters. In this work, we present the CELL algorithm that combines estimators with efficient flow representation for superior memory reduction.We also extend CELL to the sliding window model, which prioritizes the recent data, by presenting two variants named RAND-CELL and SHIFT-CELL. We formally analyze the error and memory consumption of our algorithms and compare their performance against competing approaches using real-world Internet traces. These measurements exhibit the benefits of our work and show that CELL consumes at least 30% less space than the best-known alternative. The code is available in open source.","PeriodicalId":343813,"journal":{"name":"2021 IEEE 29th International Conference on Network Protocols (ICNP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123889635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
1 Network measurement is indispensable to network operations. Two most promising measurement solutions are In-band Network Telemetry (INT) solutions and sketching solutions. INT solutions provide fine-grained per-switch per-packet information at the cost of high network overhead. Sketching solutions have low network overhead but fail to achieve both simplicity and accuracy for per-flow measurement. To keep their advantages, and at the same time, overcome their shortcomings, we first design SketchINT to combine INT and sketches, aiming to obtain all per-flow per-switch information with low network overhead. Second, for deployment flexibility and measurement accuracy, we design a new sketch for SketchINT, namely TowerSketch, which achieves both simplicity and accuracy. The key idea of TowerSketch is to use different-sized counters for different arrays under the property that the number of bits used for different arrays stays the same. TowerSketch can automatically record larger flows in larger counters and smaller flows in smaller counters. We have fully implemented our SketchINT prototype on a testbed consisting of 10 switches. We also implement our TowerSketch on P4, single-core CPU, multi-core CPU, and FPGA platforms to verify its deployment flexibility. Extensive experimental results verify that 1) TowerSketch achieves better accuracy than prior art on various tasks, outperforming the state-of-the-art ElasticSketch up to 13.9 times in terms of error; 2) Compared to INT, SketchINT reduces the number of packets in the collection process by 3 4 orders of magnitude with an error smaller than 5%.
{"title":"SketchINT: Empowering INT with TowerSketch for Per-flow Per-switch Measurement","authors":"Kaicheng Yang, Yuanpeng Li, Zirui Liu, Tong Yang, Yu Zhou, Jintao He, Jing'an Xue, Tong Zhao, Zhengyi Jia, Yongqiang Yang","doi":"10.1109/ICNP52444.2021.9651940","DOIUrl":"https://doi.org/10.1109/ICNP52444.2021.9651940","url":null,"abstract":"1 Network measurement is indispensable to network operations. Two most promising measurement solutions are In-band Network Telemetry (INT) solutions and sketching solutions. INT solutions provide fine-grained per-switch per-packet information at the cost of high network overhead. Sketching solutions have low network overhead but fail to achieve both simplicity and accuracy for per-flow measurement. To keep their advantages, and at the same time, overcome their shortcomings, we first design SketchINT to combine INT and sketches, aiming to obtain all per-flow per-switch information with low network overhead. Second, for deployment flexibility and measurement accuracy, we design a new sketch for SketchINT, namely TowerSketch, which achieves both simplicity and accuracy. The key idea of TowerSketch is to use different-sized counters for different arrays under the property that the number of bits used for different arrays stays the same. TowerSketch can automatically record larger flows in larger counters and smaller flows in smaller counters. We have fully implemented our SketchINT prototype on a testbed consisting of 10 switches. We also implement our TowerSketch on P4, single-core CPU, multi-core CPU, and FPGA platforms to verify its deployment flexibility. Extensive experimental results verify that 1) TowerSketch achieves better accuracy than prior art on various tasks, outperforming the state-of-the-art ElasticSketch up to 13.9 times in terms of error; 2) Compared to INT, SketchINT reduces the number of packets in the collection process by 3 4 orders of magnitude with an error smaller than 5%.","PeriodicalId":343813,"journal":{"name":"2021 IEEE 29th International Conference on Network Protocols (ICNP)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124523201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICNP52444.2021.9651928
Jiao Zhang, Yuxuan Gao, Shubo Wen, Tian Pan, Tao Huang
Layer-4 load balancers play a critical role in large-scale data centers. Recently, load balancers implemented on programmable switches have attracted much attention since they overcome the inflexibility of dedicated load balancers and high latency of software load balancers. However, keeping per-connection state easily leads to storage exhaustion, especially under resource exhaustion attacks. Although several stateless load balancers are proposed to address this issue, the state management burden is offloaded to backend servers, causing high deployment and running costs. In this paper, a load balancer called Loom with compressed states is proposed for large-scale data centers. Firstly, we propose a novel classifier-based load balancer idea to avoid directly maintaining per-connection state. Then, a circulating Bloom filter structure is proposed that can efficiently classify connections as well as be implemented on existing programmable switches. Theoretical analysis shows that Loom can maintain 11 ~ 30x more concurrent connections than those directly storing the 5-tuple of connections. Loom is implemented in hardware P4 switches and experimental results indicate that 11 ~ 29x more concurrent connections can be maintained in Loom, which is close to the theoretical results. Besides, Loom is resistant to resource exhaustion attacks and reduces the percentage of broken connections by up to 57% with an SYN flood.
{"title":"Loom: Switch-based Cloud Load Balancer with Compressed States","authors":"Jiao Zhang, Yuxuan Gao, Shubo Wen, Tian Pan, Tao Huang","doi":"10.1109/ICNP52444.2021.9651928","DOIUrl":"https://doi.org/10.1109/ICNP52444.2021.9651928","url":null,"abstract":"Layer-4 load balancers play a critical role in large-scale data centers. Recently, load balancers implemented on programmable switches have attracted much attention since they overcome the inflexibility of dedicated load balancers and high latency of software load balancers. However, keeping per-connection state easily leads to storage exhaustion, especially under resource exhaustion attacks. Although several stateless load balancers are proposed to address this issue, the state management burden is offloaded to backend servers, causing high deployment and running costs. In this paper, a load balancer called Loom with compressed states is proposed for large-scale data centers. Firstly, we propose a novel classifier-based load balancer idea to avoid directly maintaining per-connection state. Then, a circulating Bloom filter structure is proposed that can efficiently classify connections as well as be implemented on existing programmable switches. Theoretical analysis shows that Loom can maintain 11 ~ 30x more concurrent connections than those directly storing the 5-tuple of connections. Loom is implemented in hardware P4 switches and experimental results indicate that 11 ~ 29x more concurrent connections can be maintained in Loom, which is close to the theoretical results. Besides, Loom is resistant to resource exhaustion attacks and reduces the percentage of broken connections by up to 57% with an SYN flood.","PeriodicalId":343813,"journal":{"name":"2021 IEEE 29th International Conference on Network Protocols (ICNP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121176290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-11-01DOI: 10.1109/ICNP52444.2021.9651937
Konstantinos Poularakis, Qiaofeng Qin, Franck Le, S. Kompella, L. Tassiulas
While recent years have witnessed a steady trend of applying Deep Learning (DL) to networking systems, most of the underlying Deep Neural Networks (DNNs) suffer two major limitations. First, they fail to generalize to topologies unseen during training. This lack of generalizability hampers the ability of the DNNs to make good decisions every time the topology of the networking system changes. Second, existing DNNs commonly operate as "blackboxes" that are difficult to interpret by network operators, and hinder their deployment in practice. In this paper, we propose to rely on a recently developed family of graph-based DNNs to address the aforementioned limitations. More specifically, we focus on a network congestion prediction application and apply Graph Attention (GAT) models to make congestion predictions per link using the graph topology and time series of link loads as inputs. Evaluations on three real backbone networks demonstrate the benefits of our proposed approach in terms of prediction accuracy, generalizability, and interpretability.
{"title":"Generalizable and Interpretable Deep Learning for Network Congestion Prediction","authors":"Konstantinos Poularakis, Qiaofeng Qin, Franck Le, S. Kompella, L. Tassiulas","doi":"10.1109/ICNP52444.2021.9651937","DOIUrl":"https://doi.org/10.1109/ICNP52444.2021.9651937","url":null,"abstract":"While recent years have witnessed a steady trend of applying Deep Learning (DL) to networking systems, most of the underlying Deep Neural Networks (DNNs) suffer two major limitations. First, they fail to generalize to topologies unseen during training. This lack of generalizability hampers the ability of the DNNs to make good decisions every time the topology of the networking system changes. Second, existing DNNs commonly operate as \"blackboxes\" that are difficult to interpret by network operators, and hinder their deployment in practice. In this paper, we propose to rely on a recently developed family of graph-based DNNs to address the aforementioned limitations. More specifically, we focus on a network congestion prediction application and apply Graph Attention (GAT) models to make congestion predictions per link using the graph topology and time series of link loads as inputs. Evaluations on three real backbone networks demonstrate the benefits of our proposed approach in terms of prediction accuracy, generalizability, and interpretability.","PeriodicalId":343813,"journal":{"name":"2021 IEEE 29th International Conference on Network Protocols (ICNP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114292503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}