Zilong Wang, Xinchen Wan, Chaoliang Zeng, Kai Chen
Rate limiter is required by RDMA NIC (RNIC) to enforce the rate limits calculated by congestion control. RNIC expects the rate limiter to be accurate and scalable: to precisely shape the traffic for numerous flows with minimized resource consumption, thereby mitigating the incasts and congestions and improving the network performance. Previous works, however, fail to meet the performance requirements of RNIC while achieving accuracy and scalability. In this paper, we present Tassel, an accurate and scalable rate limiter for RNICs, including the algorithm and architecture design. Tassel first extends the classical WF2Q + algorithm to support rate limiting in the context of the RNIC scenario. Then Tassel designs a high-precision and resource-friendly rate limiter and integrates it into classical RNIC architecture. Preliminary simulation results show that Tassel precisely enforces the rate limits ranging from 100 Kbps to 100 Gbps among 1 K concurrent flows while the resource consumption is limited.
{"title":"Accurate and Scalable Rate Limiter for RDMA NICs","authors":"Zilong Wang, Xinchen Wan, Chaoliang Zeng, Kai Chen","doi":"10.1145/3600061.3600078","DOIUrl":"https://doi.org/10.1145/3600061.3600078","url":null,"abstract":"Rate limiter is required by RDMA NIC (RNIC) to enforce the rate limits calculated by congestion control. RNIC expects the rate limiter to be accurate and scalable: to precisely shape the traffic for numerous flows with minimized resource consumption, thereby mitigating the incasts and congestions and improving the network performance. Previous works, however, fail to meet the performance requirements of RNIC while achieving accuracy and scalability. In this paper, we present Tassel, an accurate and scalable rate limiter for RNICs, including the algorithm and architecture design. Tassel first extends the classical WF2Q + algorithm to support rate limiting in the context of the RNIC scenario. Then Tassel designs a high-precision and resource-friendly rate limiter and integrates it into classical RNIC architecture. Preliminary simulation results show that Tassel precisely enforces the rate limits ranging from 100 Kbps to 100 Gbps among 1 K concurrent flows while the resource consumption is limited.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"174 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133455492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The advent of Software Defined Networking (SDN) and Network Function Virtualization (NFV) has revolutionized the deployment of software-based routing and forwarding devices in modern network architectures. However, IPv6 route lookup remains a substantial performance bottleneck in these software-based devices due to two key challenges: (1) the longer addresses and prefixes, which hinder high-speed IPv6 lookup, and (2) the larger address space of IPv6 necessitates adaptability to varied length-based prefix distributions across various network scenarios. Current trie-based methods like SAIL and Poptrie have enhanced IPv4 lookup, but they struggle with adaptive and fast IPv6 lookup due to their fixed search scheme from short to long prefixes. To overcome these challenges, we propose a novel Heuristic Binary Search (HBS) scheme to achieve adaptive and fast IPv6 lookup. HBS refines the traditional "Binary Search on Prefix Lengths" scheme by incorporating two key techniques: (1) a heuristic binary search method for accelerated lookup and (2) a tree rotation method for dynamic adjustment of binary search tree shapes in response to changes in prefix distribution. Our evaluation of HBS demonstrates its superiority in terms of lookup throughput, update speed, memory efficiency, and dynamic adaptability.
{"title":"Heuristic Binary Search: Adaptive and Fast IPv6 Route Lookup with Incremental Updates","authors":"Donghong Jiang, Yanbiao Li, Yuxuan Chen, Jing Hu, Yi Huang, Gaogang Xie","doi":"10.1145/3600061.3600077","DOIUrl":"https://doi.org/10.1145/3600061.3600077","url":null,"abstract":"The advent of Software Defined Networking (SDN) and Network Function Virtualization (NFV) has revolutionized the deployment of software-based routing and forwarding devices in modern network architectures. However, IPv6 route lookup remains a substantial performance bottleneck in these software-based devices due to two key challenges: (1) the longer addresses and prefixes, which hinder high-speed IPv6 lookup, and (2) the larger address space of IPv6 necessitates adaptability to varied length-based prefix distributions across various network scenarios. Current trie-based methods like SAIL and Poptrie have enhanced IPv4 lookup, but they struggle with adaptive and fast IPv6 lookup due to their fixed search scheme from short to long prefixes. To overcome these challenges, we propose a novel Heuristic Binary Search (HBS) scheme to achieve adaptive and fast IPv6 lookup. HBS refines the traditional \"Binary Search on Prefix Lengths\" scheme by incorporating two key techniques: (1) a heuristic binary search method for accelerated lookup and (2) a tree rotation method for dynamic adjustment of binary search tree shapes in response to changes in prefix distribution. Our evaluation of HBS demonstrates its superiority in terms of lookup throughput, update speed, memory efficiency, and dynamic adaptability.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123644319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditionally, a firewall tracks the per-flow spread of each source and destination IP address to detect network scans and DDoS attacks. It is not designed with hierarchical IP addresses in mind. However, cyberattacks nowadays become more stealthy. To evade the detection, they treat a network subnet instead of a single IP as the victim of an attacking campaign. Therefore, we focus on a new problem: online estimation of each hierarchical flow’s cardinality (or spread), in order to detect the hierarchical super-spreaders (HSSs), which correspond to the IP subnet receiving numerous network connections from an extraordinarily large number of source IPs. For detecting such one-dimensional HSSs, the recent work Hierarchical virtual bitmap estimator (HVE) has been proposed. But it fails to handle the two-dimensional HSSs, and it can not be queried online due to its very high query overhead. In this paper, we propose the Hon-vHLL sketch to address these limitations. It is an innovative hierarchical extension of On-vHLL to support the estimation of conditional spreads for either 1D or 2D hierarchical flows. Hon-vHLL allocates an On-vHLL sketch for each hierarchical level bucket and query conditional spread by merging the virtual estimators of hierarchical flows. We evaluate its performance based on CAIDA network traces. The results show that our Hon-vHLL can improve the query throughput by 578 times than HVE, and also achieve 11% higher HSS detection accuracy.
{"title":"Online Detection of 1D and 2D Hierarchical Super-Spreaders in High-Speed Networks","authors":"Haorui Su, Qingjun Xiao","doi":"10.1145/3600061.3600080","DOIUrl":"https://doi.org/10.1145/3600061.3600080","url":null,"abstract":"Traditionally, a firewall tracks the per-flow spread of each source and destination IP address to detect network scans and DDoS attacks. It is not designed with hierarchical IP addresses in mind. However, cyberattacks nowadays become more stealthy. To evade the detection, they treat a network subnet instead of a single IP as the victim of an attacking campaign. Therefore, we focus on a new problem: online estimation of each hierarchical flow’s cardinality (or spread), in order to detect the hierarchical super-spreaders (HSSs), which correspond to the IP subnet receiving numerous network connections from an extraordinarily large number of source IPs. For detecting such one-dimensional HSSs, the recent work Hierarchical virtual bitmap estimator (HVE) has been proposed. But it fails to handle the two-dimensional HSSs, and it can not be queried online due to its very high query overhead. In this paper, we propose the Hon-vHLL sketch to address these limitations. It is an innovative hierarchical extension of On-vHLL to support the estimation of conditional spreads for either 1D or 2D hierarchical flows. Hon-vHLL allocates an On-vHLL sketch for each hierarchical level bucket and query conditional spread by merging the virtual estimators of hierarchical flows. We evaluate its performance based on CAIDA network traces. The results show that our Hon-vHLL can improve the query throughput by 578 times than HVE, and also achieve 11% higher HSS detection accuracy.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115977328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this poster, we propose a new transaction forwarding strategy for PCN, named PNSFF. Then we perform several experiments to study the effectiveness of the proposed strategy. Experimental results conclude that PNSFF achieves more incentivizing and higher security than previous similar works.
{"title":"A Secure Transaction Forwarding Strategy for Blockchain Payment Channel Networks","authors":"Huaihang Lin, Xiaoyan Li, Yanhua Liu, Weibei Fan","doi":"10.1145/3600061.3603130","DOIUrl":"https://doi.org/10.1145/3600061.3603130","url":null,"abstract":"In this poster, we propose a new transaction forwarding strategy for PCN, named PNSFF. Then we perform several experiments to study the effectiveness of the proposed strategy. Experimental results conclude that PNSFF achieves more incentivizing and higher security than previous similar works.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126378127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harish S A, K. S. Kumar, Anibrata Majee, Amogh Bedarakota, Praveen Tammana, Pravein G. Kannan, Rinku Shah
Network management tasks heavily rely on network telemetry data. Programmable data planes provide novel ways to collect this telemetry data efficiently using probabilistic data structures like bloom filters and their variants. Despite the benefits of the data structures (and associated data plane primitives), their exposure increases the attack surface. That is, they are at risk of adversarial network inputs. In this work, we examine the effects of adversarial network inputs to bloom filters that are integral to data plane primitives. Bloom filters are probabilistic and inherently susceptible to pollution attacks which increase their false positive rates. To quantify the impact, we demonstrate the feasibility of pollution attacks on FlowRadar, a network monitoring and debugging system that employs a data plane primitive to collect traffic statistics. We observe that the adversary can corrupt traffic statistics with a few well-crafted malicious flows (tens of flows), leading to a 99% drop in the accuracy of the core functionality of the FlowRadar system.
{"title":"In-Network Probabilistic Monitoring Primitives under the Influence of Adversarial Network Inputs","authors":"Harish S A, K. S. Kumar, Anibrata Majee, Amogh Bedarakota, Praveen Tammana, Pravein G. Kannan, Rinku Shah","doi":"10.1145/3600061.3600086","DOIUrl":"https://doi.org/10.1145/3600061.3600086","url":null,"abstract":"Network management tasks heavily rely on network telemetry data. Programmable data planes provide novel ways to collect this telemetry data efficiently using probabilistic data structures like bloom filters and their variants. Despite the benefits of the data structures (and associated data plane primitives), their exposure increases the attack surface. That is, they are at risk of adversarial network inputs. In this work, we examine the effects of adversarial network inputs to bloom filters that are integral to data plane primitives. Bloom filters are probabilistic and inherently susceptible to pollution attacks which increase their false positive rates. To quantify the impact, we demonstrate the feasibility of pollution attacks on FlowRadar, a network monitoring and debugging system that employs a data plane primitive to collect traffic statistics. We observe that the adversary can corrupt traffic statistics with a few well-crafted malicious flows (tens of flows), leading to a 99% drop in the accuracy of the core functionality of the FlowRadar system.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130019965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Byeongkeon Lee, Donghyeon Lee, J. Ok, Wonsup Yoon, Sue Moon
The growth in host resource and network speed is not synchronized, and the status quo of this imbalance from the network speed of 100 ∼ Gbps makes the host resource the bottleneck. We categorize existing body of work to reduce the host burden into the following three approaches: (1) to eliminate payload copy (zero-copy), (2) to utilize special-purpose hardware for payload copy, and (3) to offload protocol to NIC. Each approach, however, has drawbacks. (1) Most zero-copy methods require application modification. Furthermore, the application must ensure its buffer is not modified until network I/O is complete. (2) Copy elimination through special-purpose hardware still uses host memory, consuming considerable memory bandwidth. (3) The protocol offloaded to NIC has limited flexibility. We redesign the networking stack placing only the payload in the NIC DRAM and executing protocol processing in the host to overcome the above limitations. Our work (1) makes the application reuse its own buffer as soon as the payload is transferred data in the NIC DRAM and does not require application modification, (2) saves host memory bandwidth by putting packet payload in NIC and eliminating payload copying on the host, and (3) maintains flexibility by keeping protocol processing on the host. Compared to the networking stack with CPU-based copy, our work saves 38.6% of CPU usage and 54.0% of memory bandwidth.
{"title":"Host Efficient Networking Stack Utilizing NIC DRAM","authors":"Byeongkeon Lee, Donghyeon Lee, J. Ok, Wonsup Yoon, Sue Moon","doi":"10.1145/3600061.3600070","DOIUrl":"https://doi.org/10.1145/3600061.3600070","url":null,"abstract":"The growth in host resource and network speed is not synchronized, and the status quo of this imbalance from the network speed of 100 ∼ Gbps makes the host resource the bottleneck. We categorize existing body of work to reduce the host burden into the following three approaches: (1) to eliminate payload copy (zero-copy), (2) to utilize special-purpose hardware for payload copy, and (3) to offload protocol to NIC. Each approach, however, has drawbacks. (1) Most zero-copy methods require application modification. Furthermore, the application must ensure its buffer is not modified until network I/O is complete. (2) Copy elimination through special-purpose hardware still uses host memory, consuming considerable memory bandwidth. (3) The protocol offloaded to NIC has limited flexibility. We redesign the networking stack placing only the payload in the NIC DRAM and executing protocol processing in the host to overcome the above limitations. Our work (1) makes the application reuse its own buffer as soon as the payload is transferred data in the NIC DRAM and does not require application modification, (2) saves host memory bandwidth by putting packet payload in NIC and eliminating payload copying on the host, and (3) maintains flexibility by keeping protocol processing on the host. Compared to the networking stack with CPU-based copy, our work saves 38.6% of CPU usage and 54.0% of memory bandwidth.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122778893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yicheng Feng, Shihao Shen, Chen Zhang, Xiaofei Wang
Containers are gaining popularity in edge computing due to their standardization and low overhead. This trend has brought new technologies such as container engines and container orchestration platforms (COPs). However, fast and effective container deployment remains a challenge, especially at the edge. Prior work, which was designed for cloud datacenters, is no longer suitable for container deployment in edge clouds due to bandwidth limitations, fluctuating network performance, resource constraints, and geo-distributed organization. These edge features make rapid deployment on the edge difficult. Additionally, integrating with COPs is crucial for successful deployment. We present Quicklayer, a layer-stack-oriented middleware designed to accelerate container deployment in edge clouds. Quicklayer takes a holistic approach that preserves the stack-of-layers structure and is backward-compatible. It includes (1) a layer-based container refactoring solution that optimizes container images while maintaining the layer structure, (2) a customised Kubernetes scheduler that is able to be aware of network performance, disk space, and container layer cache for container placement, and (3) distributed shared layer-stack caches which are optimized for cooperative container deployment among edge clouds. Preliminary results indicate that Quicklayer reduces redundant image size by up to 3.11× and speeds up the deployment process by up to 1.64× compared to the current popular container deployment system.
{"title":"Quicklayer: A Layer-Stack-Oriented Accelerating Middleware for Fast Deployment in Edge Clouds","authors":"Yicheng Feng, Shihao Shen, Chen Zhang, Xiaofei Wang","doi":"10.1145/3600061.3600074","DOIUrl":"https://doi.org/10.1145/3600061.3600074","url":null,"abstract":"Containers are gaining popularity in edge computing due to their standardization and low overhead. This trend has brought new technologies such as container engines and container orchestration platforms (COPs). However, fast and effective container deployment remains a challenge, especially at the edge. Prior work, which was designed for cloud datacenters, is no longer suitable for container deployment in edge clouds due to bandwidth limitations, fluctuating network performance, resource constraints, and geo-distributed organization. These edge features make rapid deployment on the edge difficult. Additionally, integrating with COPs is crucial for successful deployment. We present Quicklayer, a layer-stack-oriented middleware designed to accelerate container deployment in edge clouds. Quicklayer takes a holistic approach that preserves the stack-of-layers structure and is backward-compatible. It includes (1) a layer-based container refactoring solution that optimizes container images while maintaining the layer structure, (2) a customised Kubernetes scheduler that is able to be aware of network performance, disk space, and container layer cache for container placement, and (3) distributed shared layer-stack caches which are optimized for cooperative container deployment among edge clouds. Preliminary results indicate that Quicklayer reduces redundant image size by up to 3.11× and speeds up the deployment process by up to 1.64× compared to the current popular container deployment system.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126445317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High-performance computing (HPC) systems demand continuous monitoring to ensure efficient resource allocation and application performance. Recent studies indicate that real-time resource utilization monitoring can significantly improve the performance of dynamic scheduling algorithms. However, latency induced by protocol stack heavily impacts the effectiveness of dynamic scheduling. In this paper, we propose a novel monitoring system that implements the protocol stack on a Field-Programmable Gate Array (FPGA) and adopts a publish/subscribe (pub/sub) communication protocol. Specifically, by introducing an FPGA-based protocol stack, we substantially reduce the latency of protocol stack processing and enable the implementation of custom plugins at the L7 layer. Our experiments demonstrate that the proposed system effectively reduces protocol stack latency and, with the extensibility provided by user-defined plugins, offers great potential for a wide range of HPC monitoring and feedback applications.
{"title":"Extendable MQTT Broker for Feedback-based Resource Management in Large-scale Computing Environments","authors":"Ryo Ouchi, Ryuichi Sakamoto","doi":"10.1145/3600061.3603129","DOIUrl":"https://doi.org/10.1145/3600061.3603129","url":null,"abstract":"High-performance computing (HPC) systems demand continuous monitoring to ensure efficient resource allocation and application performance. Recent studies indicate that real-time resource utilization monitoring can significantly improve the performance of dynamic scheduling algorithms. However, latency induced by protocol stack heavily impacts the effectiveness of dynamic scheduling. In this paper, we propose a novel monitoring system that implements the protocol stack on a Field-Programmable Gate Array (FPGA) and adopts a publish/subscribe (pub/sub) communication protocol. Specifically, by introducing an FPGA-based protocol stack, we substantially reduce the latency of protocol stack processing and enable the implementation of custom plugins at the L7 layer. Our experiments demonstrate that the proposed system effectively reduces protocol stack latency and, with the extensibility provided by user-defined plugins, offers great potential for a wide range of HPC monitoring and feedback applications.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128104771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Remote Direct Memory Access (RDMA) has been widely deployed in data centers to improve application performance. However, the characteristic of RDMA to deliver messages in order cannot meet the emerging requirements of applications for scheduling messages within an RDMA connection, making RDMA unable to be fully utilized. Some works try to schedule the data to be transferred in specific applications before delivering to RDMA, or distribute messages to different connections. However, these approaches tightly couple scheduling logic with application logic and may result in high scheduling overhead. In this paper, we propose sRDMA, a general and low-overhead scheduler working in user-space RDMA driver. sRDMA allows the application to express the expected transfer order to RDMA hardware via work requests (WRs). With priority information in WRs, sRDMA slices and schedules WRs to achieve desired order of message transfer and reduce blocking impact of large messages in the same RDMA connection. Our experiments show that sRDMA can improve the performance of applications, e.g., TensorFlow, by up to , and sRDMA has negligible overhead in terms of CPU and flow throughput.
RDMA (Remote Direct Memory Access)技术被广泛应用于数据中心,以提高应用程序的性能。但是,RDMA按顺序传递消息的特性不能满足应用程序对RDMA连接内消息调度的新需求,无法充分利用RDMA。一些工作尝试在交付到RDMA之前安排在特定应用程序中传输的数据,或者将消息分发到不同的连接。然而,这些方法将调度逻辑与应用程序逻辑紧密耦合,可能导致较高的调度开销。在本文中,我们提出了sRDMA,一个在用户空间RDMA驱动程序中工作的通用的低开销调度程序。sRDMA允许应用程序通过工作请求(wr)向RDMA硬件表达预期的传输顺序。利用wr中的优先级信息,sRDMA对wr进行切片和调度,以实现所需的消息传输顺序,并减少同一RDMA连接中大消息的阻塞影响。我们的实验表明,sRDMA可以将应用程序(例如TensorFlow)的性能提高多达,并且sRDMA在CPU和流量吞吐量方面的开销可以忽略不计。
{"title":"sRDMA: A General and Low-Overhead Scheduler for RDMA","authors":"Xizheng Wang, Shuai Wang, Dan Li","doi":"10.1145/3600061.3600082","DOIUrl":"https://doi.org/10.1145/3600061.3600082","url":null,"abstract":"Remote Direct Memory Access (RDMA) has been widely deployed in data centers to improve application performance. However, the characteristic of RDMA to deliver messages in order cannot meet the emerging requirements of applications for scheduling messages within an RDMA connection, making RDMA unable to be fully utilized. Some works try to schedule the data to be transferred in specific applications before delivering to RDMA, or distribute messages to different connections. However, these approaches tightly couple scheduling logic with application logic and may result in high scheduling overhead. In this paper, we propose sRDMA, a general and low-overhead scheduler working in user-space RDMA driver. sRDMA allows the application to express the expected transfer order to RDMA hardware via work requests (WRs). With priority information in WRs, sRDMA slices and schedules WRs to achieve desired order of message transfer and reduce blocking impact of large messages in the same RDMA connection. Our experiments show that sRDMA can improve the performance of applications, e.g., TensorFlow, by up to , and sRDMA has negligible overhead in terms of CPU and flow throughput.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131123209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the improvement of live streaming technology, ensuring high QoE and fairness of different ABR algorithm clients sharing the same LAN is becoming a pressing issue. However, aggressive and conservative algorithm will make different bitrate adjustment decisions when they share network resources, which leads to unfairness. In this poster, we proposed a regulation mechanism ABC, adjusting the sensitive parameters such as latency, delay and buffer, to coordinate overall system QoE by 68% and improve the fairness problem.
{"title":"ABC: Adaptive Bitrate Algorithm Commander for Multi-Client Video Streaming","authors":"Xiaoxi Xue, Yuchao Zhang","doi":"10.1145/3600061.3603134","DOIUrl":"https://doi.org/10.1145/3600061.3603134","url":null,"abstract":"With the improvement of live streaming technology, ensuring high QoE and fairness of different ABR algorithm clients sharing the same LAN is becoming a pressing issue. However, aggressive and conservative algorithm will make different bitrate adjustment decisions when they share network resources, which leads to unfairness. In this poster, we proposed a regulation mechanism ABC, adjusting the sensitive parameters such as latency, delay and buffer, to coordinate overall system QoE by 68% and improve the fairness problem.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"90-91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116236056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}