As datacenter networks support more diverse applications and faster link speeds, effective end-to-end congestion control becomes increasingly challenging due to the inherent feedback delay. To address this issue, switch-driven per-hop flow control (FC) has gained popularity due to its natural flow isolation, timely control loop, and ability to handle transient congestion. However, the ideal FC requires impractical hardware resources, and the state-of-the-art approximation approach still demands a large number of queues that exceeds common switch capabilities, limiting scalability in practice. In this paper, we propose Aquarius, a scalable solution for per-hop FC that maintains satisfactory flow isolation with a practical number of queues. The key idea of Aquarius is to take independent control of different flows within the same queue, discarding the traditional practice of managing traffic collectively within the same queue. At its core, Aquarius applies a contribution-aware pausing mechanism on congested switches to enable individual control decisions for arriving flows, and uses an opportunistic re-assigning strategy on upstream switches to further isolate congested and victim flows. Experimental results demonstrate that Aquarius maintains comparable performance with 4 × fewer queues, and achieves 5.5 × lower flow completion times using the same number of queues, compared to existing solutions.
{"title":"Scaling Switch-driven Flow Control with Aquarius","authors":"Wenxue Li, Chaoliang Zeng, Jinbin Hu, Kai Chen","doi":"10.1145/3600061.3600066","DOIUrl":"https://doi.org/10.1145/3600061.3600066","url":null,"abstract":"As datacenter networks support more diverse applications and faster link speeds, effective end-to-end congestion control becomes increasingly challenging due to the inherent feedback delay. To address this issue, switch-driven per-hop flow control (FC) has gained popularity due to its natural flow isolation, timely control loop, and ability to handle transient congestion. However, the ideal FC requires impractical hardware resources, and the state-of-the-art approximation approach still demands a large number of queues that exceeds common switch capabilities, limiting scalability in practice. In this paper, we propose Aquarius, a scalable solution for per-hop FC that maintains satisfactory flow isolation with a practical number of queues. The key idea of Aquarius is to take independent control of different flows within the same queue, discarding the traditional practice of managing traffic collectively within the same queue. At its core, Aquarius applies a contribution-aware pausing mechanism on congested switches to enable individual control decisions for arriving flows, and uses an opportunistic re-assigning strategy on upstream switches to further isolate congested and victim flows. Experimental results demonstrate that Aquarius maintains comparable performance with 4 × fewer queues, and achieves 5.5 × lower flow completion times using the same number of queues, compared to existing solutions.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124825840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In wireless sensor networks (WSNs), energy efficiency, reliability, and non-plaintext transmission of the sensed data are major concerns, and all three of them are indispensable. Based on robust Chinese Remainder Theorem (RCRT), this paper proposes an improved data aggregation scheme to satisfy the requirements of energy efficiency, reliability, and non-plaintext transmission simultaneously. Compared with the existing RCRT-based data aggregation scheme, our improved RCRT-based data aggregation scheme tolerates an unrestricted error at the expense of certain energy saving.
{"title":"An Improved Data Aggregation Scheme for Wireless Sensor Networks Based on Robust Chinese Remainder Theorem","authors":"Jinxin Zhang, Fuyou Miao","doi":"10.1145/3600061.3603176","DOIUrl":"https://doi.org/10.1145/3600061.3603176","url":null,"abstract":"In wireless sensor networks (WSNs), energy efficiency, reliability, and non-plaintext transmission of the sensed data are major concerns, and all three of them are indispensable. Based on robust Chinese Remainder Theorem (RCRT), this paper proposes an improved data aggregation scheme to satisfy the requirements of energy efficiency, reliability, and non-plaintext transmission simultaneously. Compared with the existing RCRT-based data aggregation scheme, our improved RCRT-based data aggregation scheme tolerates an unrestricted error at the expense of certain energy saving.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132700195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steven W. D. Chien, Kento Sato, Artur Podobas, Niclas Jansson, S. Markidis, Michio Honda
Cloud providers began to provide managed services to attract scientific applications, which have been traditionally executed on supercomputers. One example is AWS FSx for Lustre, a fully managed parallel file system (PFS) released in 2018. However, due to the nature of scientific applications, the frontend storage network bandwidth is left completely idle for the majority of its lifetime. Furthermore, the pricing model does not match the scalability requirement. We propose iFast, a novel host-side caching mechanism for scientific applications that improves storage bandwidth utilization and end-to-end application performance: by overlapping compute and data writeback through inexpensive local storage. iFast supports the Massage Passing Interface (MPI) library that is widely used by scientific applications and is implemented as a preloaded library. It requires no change to applications, the MPI library, or support from cloud operators. We demonstrate how iFast can accelerate the end-to-end time of a representative scientific application Neko, by 13–40%.
云提供商开始提供托管服务,以吸引传统上在超级计算机上执行的科学应用程序。其中一个例子是2018年发布的完全托管并行文件系统(PFS) AWS FSx for Lustre。然而,由于科学应用的性质,前端存储网络带宽在其生命周期的大部分时间内是完全空闲的。此外,定价模型与可伸缩性需求不匹配。我们提出了iFast,一种用于科学应用的新型主机端缓存机制,它通过廉价的本地存储重叠计算和数据回写,提高了存储带宽利用率和端到端应用性能。iFast支持在科学应用中广泛使用的按摩传递接口(MPI)库,并作为预加载库实现。它不需要更改应用程序、MPI库或云运营商的支持。我们演示了iFast如何将具有代表性的科学应用程序Neko的端到端时间缩短13-40%。
{"title":"Improving Cloud Storage Network Bandwidth Utilization of Scientific Applications","authors":"Steven W. D. Chien, Kento Sato, Artur Podobas, Niclas Jansson, S. Markidis, Michio Honda","doi":"10.1145/3600061.3603122","DOIUrl":"https://doi.org/10.1145/3600061.3603122","url":null,"abstract":"Cloud providers began to provide managed services to attract scientific applications, which have been traditionally executed on supercomputers. One example is AWS FSx for Lustre, a fully managed parallel file system (PFS) released in 2018. However, due to the nature of scientific applications, the frontend storage network bandwidth is left completely idle for the majority of its lifetime. Furthermore, the pricing model does not match the scalability requirement. We propose iFast, a novel host-side caching mechanism for scientific applications that improves storage bandwidth utilization and end-to-end application performance: by overlapping compute and data writeback through inexpensive local storage. iFast supports the Massage Passing Interface (MPI) library that is widely used by scientific applications and is implemented as a preloaded library. It requires no change to applications, the MPI library, or support from cloud operators. We demonstrate how iFast can accelerate the end-to-end time of a representative scientific application Neko, by 13–40%.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117141715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yeonho Yoo, Zhixiong Niu, C. Yoo, Peng Cheng, Y. Xiong
With the tremendous growth of IoT, the role of IoT cloud gateways in facilitating communication between IoT devices and the cloud has become more important than ever before. Most previous studies have focused on developing interoperability between IoT and cloud to accommodate various radio protocols. However, they have often neglected the performance aspect of the IoT cloud gateway, leaving users with limited options: either purchasing multiple gateways or connecting only a small number of IoT devices. Through our comprehensive measurements and analysis, we identified five key issues in IoT cloud gateways related to high latency, CPU bottlenecks, inefficient network stacks on ARM, substantial encryption overhead, and the lack of priority support. To address these issues, we propose a new IoT cloud gateway - SegaNet. We carefully design with 1) multiple agents management, 2) efficient TLS encryption, and 3) priority-oriented message delivery. Our prototype evaluation shows up to 16.7 × lower latency and 4.5 × lower CPU consumption than gateways of the existing IoT-cloud ecosystem.
{"title":"SegaNet: An Advanced IoT Cloud Gateway for Performant and Priority-Oriented Message Delivery","authors":"Yeonho Yoo, Zhixiong Niu, C. Yoo, Peng Cheng, Y. Xiong","doi":"10.1145/3600061.3600072","DOIUrl":"https://doi.org/10.1145/3600061.3600072","url":null,"abstract":"With the tremendous growth of IoT, the role of IoT cloud gateways in facilitating communication between IoT devices and the cloud has become more important than ever before. Most previous studies have focused on developing interoperability between IoT and cloud to accommodate various radio protocols. However, they have often neglected the performance aspect of the IoT cloud gateway, leaving users with limited options: either purchasing multiple gateways or connecting only a small number of IoT devices. Through our comprehensive measurements and analysis, we identified five key issues in IoT cloud gateways related to high latency, CPU bottlenecks, inefficient network stacks on ARM, substantial encryption overhead, and the lack of priority support. To address these issues, we propose a new IoT cloud gateway - SegaNet. We carefully design with 1) multiple agents management, 2) efficient TLS encryption, and 3) priority-oriented message delivery. Our prototype evaluation shows up to 16.7 × lower latency and 4.5 × lower CPU consumption than gateways of the existing IoT-cloud ecosystem.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124240888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quantile tracking is an essential component of network measurement, where the tracked quantiles of the key performance metrics allow operators to better understand network performance. Given the high network speed and huge volume of traffic, the line-rate packet-processing performance and network visibility of programmable switches make it a trend to track quantiles in the programmable data plane. However, due to the rigorous resource constraints of programmable switches, quantile tracking is required to be both memory and computation efficient to be deployed in the data plane. In this paper, we present EasyQuantile, an efficient quantile tracking approach that has small constant memory usage and involves only hardware-friendly computations. EasyQuantile adopts an adjustable incremental update approach and calculates a pre-specified quantile with high accuracy entirely in the data plane. We implement EasyQuantile on Intel Tofino switches with small resource usage. Trace-driven experiments show that EasyQuantile achieves higher accuracy and lower complexities compared with state-of-the-art approaches.
{"title":"EasyQuantile: Efficient Quantile Tracking in the Data Plane","authors":"Bo Wang, Rongqiang Chen, Lu Tang","doi":"10.1145/3600061.3600084","DOIUrl":"https://doi.org/10.1145/3600061.3600084","url":null,"abstract":"Quantile tracking is an essential component of network measurement, where the tracked quantiles of the key performance metrics allow operators to better understand network performance. Given the high network speed and huge volume of traffic, the line-rate packet-processing performance and network visibility of programmable switches make it a trend to track quantiles in the programmable data plane. However, due to the rigorous resource constraints of programmable switches, quantile tracking is required to be both memory and computation efficient to be deployed in the data plane. In this paper, we present EasyQuantile, an efficient quantile tracking approach that has small constant memory usage and involves only hardware-friendly computations. EasyQuantile adopts an adjustable incremental update approach and calculates a pre-specified quantile with high accuracy entirely in the data plane. We implement EasyQuantile on Intel Tofino switches with small resource usage. Trace-driven experiments show that EasyQuantile achieves higher accuracy and lower complexities compared with state-of-the-art approaches.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126920177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The control plane of 5G Core (5GC) is typically shared among multiple dependent network slices of the data plane. But as the number of dependent slices and services on the common and shared control plane increases, its resilience threat also increases. This paper proposes FlexCore: a 5GC that is not only flexible and scalable but also resilient to cater to various service requirements on both stateful and stateless architectures of 5GC. FlexCore is built with an eXpress Data Path (XDP) and extended Berkeley Packet Filter (eBPF) based SCTP load balancer hooked at the entry point of the 3GPP compliant 5GC control plane, and a set of micro-AMF instances to serve the user requests. Precisely, the FlexCore is fabricated to honor the variety of incoming user requests on the control plane as per the service requirements, like, per slice, per user, or per control procedure of users too. Experiments on a 3GPP compliant 5G testbed show that FlexCore can provide average latency reduction of up to 14% and 79% on stateful and stateless architectures, respectively, and up to 63% latency reduction for latency-critical slices on the slice-aware architecture.
5G Core (5GC)的控制平面通常在数据平面的多个相关网络切片之间共享。但是,随着公共和共享控制平面上依赖的切片和服务数量的增加,其弹性威胁也在增加。本文提出FlexCore:一种不仅具有灵活性和可扩展性,而且具有弹性的5GC,可以满足5GC的有状态和无状态架构的各种服务需求。FlexCore采用eXpress Data Path (XDP)和基于SCTP负载均衡器的扩展Berkeley Packet Filter (eBPF)构建,该负载均衡器连接在符合3GPP的5GC控制平面的入口点,以及一组微amf实例来服务于用户请求。确切地说,FlexCore是为了根据服务需求来满足控制平面上的各种传入用户请求,比如每个切片、每个用户或每个用户的控制过程。在符合3GPP标准的5G测试平台上进行的实验表明,FlexCore可以在有状态和无状态架构上分别提供高达14%和79%的平均延迟减少,并且在切片感知架构上对延迟关键切片的延迟减少高达63%。
{"title":"FlexCore: Leveraging XDP-SCTP for Scalable and Resilient Network Slice Service in Future 5G Core","authors":"Bhavishya Sharma, Shwetha Vittal, Antony Franklin","doi":"10.1145/3600061.3600073","DOIUrl":"https://doi.org/10.1145/3600061.3600073","url":null,"abstract":"The control plane of 5G Core (5GC) is typically shared among multiple dependent network slices of the data plane. But as the number of dependent slices and services on the common and shared control plane increases, its resilience threat also increases. This paper proposes FlexCore: a 5GC that is not only flexible and scalable but also resilient to cater to various service requirements on both stateful and stateless architectures of 5GC. FlexCore is built with an eXpress Data Path (XDP) and extended Berkeley Packet Filter (eBPF) based SCTP load balancer hooked at the entry point of the 3GPP compliant 5GC control plane, and a set of micro-AMF instances to serve the user requests. Precisely, the FlexCore is fabricated to honor the variety of incoming user requests on the control plane as per the service requirements, like, per slice, per user, or per control procedure of users too. Experiments on a 3GPP compliant 5G testbed show that FlexCore can provide average latency reduction of up to 14% and 79% on stateful and stateless architectures, respectively, and up to 63% latency reduction for latency-critical slices on the slice-aware architecture.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128626247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, we propose a unified framework for designing a class of server-centric network topologies for DML by adopting top-down design method and combinatorial design theory. Simulation results show that this flexible framework is capable of effectively supporting various DML tasks. Our framework can generate compatible topologies that meet various resource constraints and different DML tasks.
{"title":"A Unified, Flexible Framework in Network Topology Generation for Distributed Machine Learning","authors":"Jianhao Liu, Xiaoyan Li, Yanhua Liu, Weibei Fan","doi":"10.1145/3600061.3603132","DOIUrl":"https://doi.org/10.1145/3600061.3603132","url":null,"abstract":"In this study, we propose a unified framework for designing a class of server-centric network topologies for DML by adopting top-down design method and combinatorial design theory. Simulation results show that this flexible framework is capable of effectively supporting various DML tasks. Our framework can generate compatible topologies that meet various resource constraints and different DML tasks.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131465620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ChatGPT shows the enormous potential of large language models (LLMs). These models can easily reach the size of billions of parameters and create training difficulties for the majority. We propose a paradigm to train LLMs using distributed in-network computation on routers. Our preliminary result shows that our design allows LLMs to be trained at a reasonable learning rate without demanding extensive GPU resources.
{"title":"Training ChatGPT-like Models with In-network Computation","authors":"Shuhao Fu, Yong Liao, Pengyuan Zhou","doi":"10.1145/3600061.3603136","DOIUrl":"https://doi.org/10.1145/3600061.3603136","url":null,"abstract":"ChatGPT shows the enormous potential of large language models (LLMs). These models can easily reach the size of billions of parameters and create training difficulties for the majority. We propose a paradigm to train LLMs using distributed in-network computation on routers. Our preliminary result shows that our design allows LLMs to be trained at a reasonable learning rate without demanding extensive GPU resources.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131090488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xing Fang, Lizhao You, Qiao Xiang, Hanyang Shao, Gao Han, Ziyi Wang, J. Shu, L. Kong
In this paper, we show that by capturing the causal relationship among the computation of routers, one can transform the distributed program composed of routing processes into a sequential program, which allows the use of various sequential program analysis theories and tools for diagnosing and repairing routing configuration errors. This insight sheds light on future research on automatic network configuration diagnosis and repair. To demonstrate its feasibility and generality, we give the preliminary design of two methods for routing configuration error diagnosis: (1) data flow analysis using minimal unsatisfiable core and error invariants; and (2) control flow analysis using selective symbolic execution. Using real-world topologies and synthetic configurations, we show that both methods can effectively find errors in routing configurations while incurring reasonable overhead.
{"title":"Diagnosing Distributed Routing Configurations Using Sequential Program Analysis","authors":"Xing Fang, Lizhao You, Qiao Xiang, Hanyang Shao, Gao Han, Ziyi Wang, J. Shu, L. Kong","doi":"10.1145/3600061.3600065","DOIUrl":"https://doi.org/10.1145/3600061.3600065","url":null,"abstract":"In this paper, we show that by capturing the causal relationship among the computation of routers, one can transform the distributed program composed of routing processes into a sequential program, which allows the use of various sequential program analysis theories and tools for diagnosing and repairing routing configuration errors. This insight sheds light on future research on automatic network configuration diagnosis and repair. To demonstrate its feasibility and generality, we give the preliminary design of two methods for routing configuration error diagnosis: (1) data flow analysis using minimal unsatisfiable core and error invariants; and (2) control flow analysis using selective symbolic execution. Using real-world topologies and synthetic configurations, we show that both methods can effectively find errors in routing configurations while incurring reasonable overhead.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130683950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}