Proceedings of the 7th Asia-Pacific Workshop on Networking最新文献

英文中文

Efficient and Structural Gradient Compression with Principal Component Analysis for Distributed Training 基于主成分分析的高效结构梯度压缩分布式训练

Proceedings of the 7th Asia-Pacific Workshop on Networking

Pub Date : 2023-06-29 DOI: 10.1145/3600061.3603140

Jiaxin Tan, Chao Yao, Zehua Guo

Distributed machine learning is a promising machine learning approach for academia and industry. It can generate a machine learning model for dispersed training data via iterative training in a distributed fashion. To speed up the training process of distributed machine learning, it is essential to reduce the communication load among training nodes. In this paper, we propose a layer-wise gradient compression scheme based on principal component analysis and error accumulation. The key of our solution is to consider the gradient characteristics and architecture of neural networks by taking advantage of the compression ability enabled by PCA and the feedback ability enabled by error accumulation. The preliminary results on image classification task show that our scheme achieves good performance and reduces 97% of the gradient transmission.

对于学术界和工业界来说，分布式机器学习是一种很有前途的机器学习方法。它可以通过分布式方式的迭代训练，为分散的训练数据生成机器学习模型。为了加快分布式机器学习的训练过程，必须减少训练节点之间的通信负荷。本文提出了一种基于主成分分析和误差积累的分层梯度压缩方案。该解决方案的关键是利用主成分分析的压缩能力和误差积累的反馈能力，充分考虑神经网络的梯度特性和结构。在图像分类任务上的初步实验结果表明，该方法达到了较好的分类效果，减少了97%的梯度传输。

引用次数: 0

TOD: Trend-Oriented Delay-Based Congestion Control in Lossless Datacenter Network 面向趋势的基于延迟的无损数据中心网络拥塞控制

Proceedings of the 7th Asia-Pacific Workshop on Networking

Pub Date : 2023-06-29 DOI: 10.1145/3600061.3600079

Kaixin Huang, Wentao Liu, Weihang Li, Lang Cheng

In current high-speed data center networks, congestion control is crucial for ensuring consistent high performance. Over the past decade, researchers and developers have explored several congestion signals such as ECN, RTT, and INT. However, most of the existing congestion control algorithms suffer from either imprecise congestion detection due to ambiguous signals or excessive bandwidth loss due to aggressive rate decrease. This paper proposes a novel congestion control mechanism called TOD, which is a trend-oriented delay-based approach designed for lossless data center networks. TOD leverages the change in RTT to learn the congestion trend and adjusts the sending rate accordingly. By analyzing the congestion trend, the sender reacts by adjusting the sending rate to a reasonable level, while still maintaining high bandwidth utilization to dismiss congestion. The sender uses a reference rate, which is calculated by the receiver and communicated back to the sender, to achieve this target. Therefore, TOD is a sender-receiver cooperative congestion control mechanism. We evaluate TOD extensively in NS-3 simulations using both microbenchmark and macrobenchmark. Our experiments demonstrate that TOD outperforms DCQCN and Timely in terms of FCT and convergence speed.

在当前的高速数据中心网络中，拥塞控制是保证高性能的关键。在过去的十年中，研究人员和开发人员已经探索了几种拥塞信号，如ECN, RTT和INT。然而，现有的拥塞控制算法要么由于信号模糊而导致拥塞检测不精确，要么由于剧烈的速率降低而导致带宽损失过大。本文提出了一种新的拥塞控制机制TOD，这是一种面向趋势的基于延迟的方法，专为无损数据中心网络设计。TOD利用RTT的变化来了解拥塞趋势，并相应地调整发送速率。通过分析拥塞趋势，在保持高带宽利用率的前提下，调整发送速率，以消除拥塞。发送方使用参考速率(由接收方计算并传回发送方)来实现此目标。因此，TOD是一种发送方与接收方合作的拥塞控制机制。我们使用微基准和宏基准在NS-3模拟中广泛评估TOD。我们的实验表明，TOD在FCT和收敛速度方面优于DCQCN和Timely。

{"title":"TOD: Trend-Oriented Delay-Based Congestion Control in Lossless Datacenter Network","authors":"Kaixin Huang, Wentao Liu, Weihang Li, Lang Cheng","doi":"10.1145/3600061.3600079","DOIUrl":"https://doi.org/10.1145/3600061.3600079","url":null,"abstract":"In current high-speed data center networks, congestion control is crucial for ensuring consistent high performance. Over the past decade, researchers and developers have explored several congestion signals such as ECN, RTT, and INT. However, most of the existing congestion control algorithms suffer from either imprecise congestion detection due to ambiguous signals or excessive bandwidth loss due to aggressive rate decrease. This paper proposes a novel congestion control mechanism called TOD, which is a trend-oriented delay-based approach designed for lossless data center networks. TOD leverages the change in RTT to learn the congestion trend and adjusts the sending rate accordingly. By analyzing the congestion trend, the sender reacts by adjusting the sending rate to a reasonable level, while still maintaining high bandwidth utilization to dismiss congestion. The sender uses a reference rate, which is calculated by the receiver and communicated back to the sender, to achieve this target. Therefore, TOD is a sender-receiver cooperative congestion control mechanism. We evaluate TOD extensively in NS-3 simulations using both microbenchmark and macrobenchmark. Our experiments demonstrate that TOD outperforms DCQCN and Timely in terms of FCT and convergence speed.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130896733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Online Control Approach of Collaborative Federated Learning with Constrained Resources 资源约束下协同联邦学习的在线控制方法

Proceedings of the 7th Asia-Pacific Workshop on Networking

Pub Date : 2023-06-29 DOI: 10.1145/3600061.3603138

Shaohui Lin, Xiaoxi Zhang, Yupeng Li, Carlee Joe-Wong, Jingpu Duan, Xu Chen

引用次数: 0

MEB: an Efficient and Accurate Multicast using Bloom Filter with Customized Hash Function MEB:使用自定义哈希函数的布隆过滤器实现高效准确的多播

Proceedings of the 7th Asia-Pacific Workshop on Networking

Pub Date : 2023-06-29 DOI: 10.1145/3600061.3600062

Zihao Chen, Jiawei Huang, Qile Wang, Jingling Liu, Zhaoyi Li, Shengwen Zhou, Zhidong He

Multicast is widely used to support a huge range of applications with one-to-many or many-to-many communication patterns. However, multicast systems do not scale due to considerable state and communication overheads. Some stateful multicast approaches require maintaining the state of each multicast session at switches, thus incurring large memory overhead. Some stateless ones utilize Bloom filter (BF) to encode multicast tree into the packet header to minimize communication overhead, but potentially suffer from the substantial false positive due to the probabilistic nature of Bloom filter. In this paper, we propose a stateless multicast scheme MEB, which uses Bloom filter to achieve large-scale multicast communication with low error, small overhead and high scalability. Specifically, to control the rate of false positive, MEB elaborately selects the hash functions for Bloom filters when constructing the packet header at the sender side, and makes forwarding decision according to packet header at the switch with negligible overhead. We compare MEB against the state-of-the-art multicast system in large-scale simulations. The test results show that MEB reduces the traffic overhead by up to 70% with small error rate.

多播被广泛用于支持具有一对多或多对多通信模式的大量应用程序。然而，由于大量的状态和通信开销，多播系统不能扩展。一些有状态的多播方法需要在交换机上维护每个多播会话的状态，从而导致大量的内存开销。一些无状态路由利用布隆滤波器(BF)将多播树编码到包头中以减少通信开销，但由于布隆滤波器的概率特性，可能会出现大量的误报。本文提出了一种无状态组播方案MEB，该方案利用Bloom滤波器实现了低错误、小开销和高可扩展性的大规模组播通信。具体来说，为了控制误报率，MEB在发送端构造包头时精心选择布隆过滤器的哈希函数，在交换机上根据包头进行转发决策，开销可以忽略不计。我们在大规模模拟中将MEB与最先进的多播系统进行比较。测试结果表明，MEB可以减少高达70%的流量开销，错误率很小。

{"title":"MEB: an Efficient and Accurate Multicast using Bloom Filter with Customized Hash Function","authors":"Zihao Chen, Jiawei Huang, Qile Wang, Jingling Liu, Zhaoyi Li, Shengwen Zhou, Zhidong He","doi":"10.1145/3600061.3600062","DOIUrl":"https://doi.org/10.1145/3600061.3600062","url":null,"abstract":"Multicast is widely used to support a huge range of applications with one-to-many or many-to-many communication patterns. However, multicast systems do not scale due to considerable state and communication overheads. Some stateful multicast approaches require maintaining the state of each multicast session at switches, thus incurring large memory overhead. Some stateless ones utilize Bloom filter (BF) to encode multicast tree into the packet header to minimize communication overhead, but potentially suffer from the substantial false positive due to the probabilistic nature of Bloom filter. In this paper, we propose a stateless multicast scheme MEB, which uses Bloom filter to achieve large-scale multicast communication with low error, small overhead and high scalability. Specifically, to control the rate of false positive, MEB elaborately selects the hash functions for Bloom filters when constructing the packet header at the sender side, and makes forwarding decision according to packet header at the switch with negligible overhead. We compare MEB against the state-of-the-art multicast system in large-scale simulations. The test results show that MEB reduces the traffic overhead by up to 70% with small error rate.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122140104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Amphis: Rearchitecturing Congestion Control for Capturing Internet Application Variety 重构拥塞控制以捕获互联网应用的多样性

Proceedings of the 7th Asia-Pacific Workshop on Networking

Pub Date : 2023-06-29 DOI: 10.1145/3600061.3600076

Tian Pan, Shuihai Hu, Guangyu An, Xincai Fei, Fanzhao Wang, Yueke Chi, Minglan Gao, Hao Wu, Jiao Zhang, Tao Huang, Jingbin Zhou, Kun Tan

TCP was designed to provide stream-oriented communication service for bulk data transfer applications (e.g., FTP and Email). With four-decade development, Internet applications have undergone significant changes, which now involve highly dynamic traffic pattern and message-oriented communication paradigm. However, the impact of this substantial evolution on congestion control (CC) has not been fully studied. Most of the network transports today still make the long-held assumption about application traffic, i.e., a byte stream with an unlimited data arrival rate. In this paper, we demonstrate, through both analyses and experiments, that the emerging traffic dynamics and message-level data structure have huge impacts on the correctness and effectiveness of CC, but none of the existing solutions treats these two characteristics appropriately. Therefore, we present Amphis, a new CC framework that re-architects the current pure network-oriented design into a dual-control architecture, which combines application-coordinated control and network-oriented control. Amphis contains two novel ideas, i.e., pattern-driven proactive probing for handling traffic dynamics and message-driven adaptive optimization for optimizing message transmission performance. Our preliminary results show that Amphis holds great promise in terms of accurate bandwidth estimation under dynamic traffic conditions and effective data transfer at message granularity.

TCP的设计目的是为批量数据传输应用(如FTP和Email)提供面向流的通信服务。经过40年的发展，Internet应用发生了巨大的变化，包括高度动态的流量模式和面向消息的通信模式。然而，这一重大演变对拥塞控制(CC)的影响尚未得到充分研究。今天的大多数网络传输仍然对应用程序流量做出长期持有的假设，即具有无限数据到达率的字节流。在本文中，我们通过分析和实验证明，新兴的流量动态和消息级数据结构对CC的正确性和有效性有巨大的影响，但现有的解决方案都没有适当地处理这两个特征。因此，我们提出了一个新的CC框架Amphis，它将当前纯面向网络的设计重新架构为双控制架构，该架构结合了应用协调控制和面向网络的控制。Amphis包含两个新颖的思想，即处理流量动态的模式驱动的主动探测和优化消息传输性能的消息驱动的自适应优化。我们的初步结果表明，Amphis在动态流量条件下的准确带宽估计和消息粒度上的有效数据传输方面具有很大的前景。

{"title":"Amphis: Rearchitecturing Congestion Control for Capturing Internet Application Variety","authors":"Tian Pan, Shuihai Hu, Guangyu An, Xincai Fei, Fanzhao Wang, Yueke Chi, Minglan Gao, Hao Wu, Jiao Zhang, Tao Huang, Jingbin Zhou, Kun Tan","doi":"10.1145/3600061.3600076","DOIUrl":"https://doi.org/10.1145/3600061.3600076","url":null,"abstract":"TCP was designed to provide stream-oriented communication service for bulk data transfer applications (e.g., FTP and Email). With four-decade development, Internet applications have undergone significant changes, which now involve highly dynamic traffic pattern and message-oriented communication paradigm. However, the impact of this substantial evolution on congestion control (CC) has not been fully studied. Most of the network transports today still make the long-held assumption about application traffic, i.e., a byte stream with an unlimited data arrival rate. In this paper, we demonstrate, through both analyses and experiments, that the emerging traffic dynamics and message-level data structure have huge impacts on the correctness and effectiveness of CC, but none of the existing solutions treats these two characteristics appropriately. Therefore, we present Amphis, a new CC framework that re-architects the current pure network-oriented design into a dual-control architecture, which combines application-coordinated control and network-oriented control. Amphis contains two novel ideas, i.e., pattern-driven proactive probing for handling traffic dynamics and message-driven adaptive optimization for optimizing message transmission performance. Our preliminary results show that Amphis holds great promise in terms of accurate bandwidth estimation under dynamic traffic conditions and effective data transfer at message granularity.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127933221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Fine-Grained, High-Coverage Internet Monitoring at Scale 迈向细粒度、高覆盖的大规模互联网监控

Proceedings of the 7th Asia-Pacific Workshop on Networking

Pub Date : 2023-06-29 DOI: 10.1145/3600061.3600085

Hongyun Wu, Qi Ling, Penghui Mi, Chaoyang Ji, Yinliang Hu, Yibo Pi

The massiveness of the Internet makes it rather difficult to achieve high-coverage monitoring at scale with reasonable overhead. The traditional wisdom for scalable and high-coverage Internet monitoring is to consider clients in each /24 as a whole and only monitor the representatives, either by active probing or by passive traffic sniffing, such that performance of the rest can be predicted for high coverage. There are two basic assumptions behind this traditional wisdom: 1) clients in the same /24 have similar performance, and 2) tracking all targeted /24s equates to full-coverage monitoring. With the increasing prevalence of load balancing, both assumptions are now questionable. Through large-scale measurements, we evaluate the coverage and predictability issues of current practices, motivate the necessity of link-level fine-grained, high-coverage monitoring, and present new insights on how to achieve it. Our key findings are: 1) the current practices using the representatives of /24s may fail to capture the changes of up to 85% of links in the Internet; 2) the path difference between client flows to the same /24 is both significant and prevalent; 3) it is possible to cover most of the visible links from DCs to both small and large prefixes by carefully choosing client flows; 4) high-coverage monitoring can be achieved with at least three times less overhead than direct link monitoring.

互联网的巨大规模使得在合理的开销下实现大规模的高覆盖监控变得相当困难。传统的可扩展和高覆盖Internet监控方法是将每个/24中的客户机视为一个整体，并且仅通过主动探测或被动流量嗅探来监控代表，以便在高覆盖范围内预测其余部分的性能。这种传统智慧背后有两个基本假设:1)相同/24中的客户机具有相似的性能，2)跟踪所有目标/24等同于全覆盖监视。随着负载平衡的日益普及，这两个假设现在都受到了质疑。通过大规模的测量，我们评估了当前实践的覆盖范围和可预测性问题，激发了链接级细粒度、高覆盖监视的必要性，并提出了如何实现它的新见解。我们的主要发现是:1)目前使用/24代表的做法可能无法捕捉到互联网中高达85%的链接的变化;2)客户端流向相同/24的路径差异显著且普遍存在;3)通过仔细选择客户端流程，可以覆盖从dc到小型和大型前缀的大部分可见链接;4)与直接链路监控相比，高覆盖监控的开销至少减少三倍。

{"title":"Towards Fine-Grained, High-Coverage Internet Monitoring at Scale","authors":"Hongyun Wu, Qi Ling, Penghui Mi, Chaoyang Ji, Yinliang Hu, Yibo Pi","doi":"10.1145/3600061.3600085","DOIUrl":"https://doi.org/10.1145/3600061.3600085","url":null,"abstract":"The massiveness of the Internet makes it rather difficult to achieve high-coverage monitoring at scale with reasonable overhead. The traditional wisdom for scalable and high-coverage Internet monitoring is to consider clients in each /24 as a whole and only monitor the representatives, either by active probing or by passive traffic sniffing, such that performance of the rest can be predicted for high coverage. There are two basic assumptions behind this traditional wisdom: 1) clients in the same /24 have similar performance, and 2) tracking all targeted /24s equates to full-coverage monitoring. With the increasing prevalence of load balancing, both assumptions are now questionable. Through large-scale measurements, we evaluate the coverage and predictability issues of current practices, motivate the necessity of link-level fine-grained, high-coverage monitoring, and present new insights on how to achieve it. Our key findings are: 1) the current practices using the representatives of /24s may fail to capture the changes of up to 85% of links in the Internet; 2) the path difference between client flows to the same /24 is both significant and prevalent; 3) it is possible to cover most of the visible links from DCs to both small and large prefixes by carefully choosing client flows; 4) high-coverage monitoring can be achieved with at least three times less overhead than direct link monitoring.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130316555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ChatIoT: Zero-code Generation of Trigger-action Based IoT Programs with ChatGPT ChatIoT:使用ChatGPT零代码生成基于触发动作的物联网程序

Proceedings of the 7th Asia-Pacific Workshop on Networking

Pub Date : 2023-06-29 DOI: 10.1145/3600061.3603141

Fu Li, Jiaming Huang, Yi Gao, Wei Dong

Trigger-Action Program (TAP) is a popular and significant form of Internet of Things (IoT) applications, commonly utilized in smart homes. Existing works either just perform actions based on commands or require human intervention to generate TAPs. With the emergence of Large Language Models (LLMs), it becomes possible for users to create IoT TAPs in zero-code manner using natural language. Thus, we propose ChatIoT, which employs LLMs to process natural language in chats and realizes the zero-code generation of TAPs for existing devices.

触发-动作程序(TAP)是一种流行且重要的物联网(IoT)应用形式，通常用于智能家居。现有的工作要么只是基于命令执行操作，要么需要人工干预来生成tap。随着大型语言模型(llm)的出现，用户可以使用自然语言以零代码的方式创建物联网tap。因此，我们提出了ChatIoT，它使用llm来处理聊天中的自然语言，并在现有设备上实现tap的零码生成。

引用次数: 0

PHYSec: A Novel Physical Layer Security Architecture for Ethernet PHYSec:一种新的以太网物理层安全体系结构

Proceedings of the 7th Asia-Pacific Workshop on Networking

Pub Date : 2023-06-29 DOI: 10.1145/3600061.3603133

Zhen Tian, Lingen Ding, Qing Wang, Desheng Sun, Yunlong Li

Recent years have seen a significant increase in demand for security in Ethernet. In this paper, we propose a novel physical layer security architecture (PHYSec) for Ethernet, operating at the low layer of physical coding sublayer (PCS). Compared to existing encryption schemes, PHYSec provides the following benefits: 1) low overhead due to constructing encryption objects without the restriction of frame size and utilizing native markers to carry security parameters; 2) high security for completely hiding the traffic pattern.

近年来，对以太网安全性的需求显著增加。在本文中，我们提出了一种新的以太网物理层安全架构(PHYSec)，它工作在物理编码子层(PCS)的底层。与现有的加密方案相比，PHYSec具有以下优点:1)由于构造加密对象时不受帧大小的限制，并且利用本地标记来携带安全参数，因此开销低;2)安全性高，完全隐藏流量模式。

引用次数: 0

FastWake: Revisiting Host Network Stack for Interrupt-mode RDMA FastWake:重新访问中断模式RDMA的主机网络堆栈

Proceedings of the 7th Asia-Pacific Workshop on Networking

Pub Date : 2023-06-29 DOI: 10.1145/3600061.3600063

Bojie Li, Zihao Xiang, Xiaoliang Wang, Hang Ruan, Jingbin Zhou, Kun Tan

Polling and interrupt has long been a trade-off in RDMA systems. Polling has lower latency but each CPU core can only run one thread. Interrupt enables time sharing among multiple threads but has higher latency. Many applications such as databases have hundreds of threads, which is much larger than the number of cores. So, they have to use interrupt mode to share cores among threads, and the resulting RDMA latency is much higher than the hardware limits. In this paper, we analyze the root cause of high costs in RDMA interrupt delivery, and present FastWake, a practical redesign of interrupt-mode RDMA host network stack using commodity RDMA hardware, Linux OS, and unmodified applications. Our first approach to fast thread wake-up completely removes interrupts. We design a per-core dispatcher thread to poll all the completion queues of the application threads on the same core, and utilize a kernel fast path to context switch to the thread with an incoming completion event. The approach above would keep CPUs running at 100% utilization, so we design an interrupt-based approach for scenarios with power constraints. Observing that waking up a thread on the same core as the interrupt is much faster than threads on other cores, we dynamically adjust RDMA event queue mappings to improve interrupt core affinity. In addition, we revisit the kernel path of thread wake-up, and remove the overheads in virtual file system (VFS), locking, and process scheduling. Experiments show that FastWake can reduce RDMA latency by 80% on x86 and 77% on ARM at the cost of < 30% higher power utilization than traditional interrupts, and the latency is only 0.3 ∼ 0.4 μ s higher than the limits of underlying hardware. When power saving is desired, our interrupt-based approach can still reduce interrupt-mode RDMA latency by 59% on x86 and 52% on ARM.

在RDMA系统中，轮询和中断一直是一种权衡。轮询具有较低的延迟，但每个CPU核心只能运行一个线程。中断可以在多个线程之间共享时间，但具有更高的延迟。许多应用程序(如数据库)有数百个线程，这比内核的数量要大得多。因此，它们必须使用中断模式在线程之间共享内核，由此产生的RDMA延迟远远高于硬件限制。在本文中，我们分析了RDMA中断交付高成本的根本原因，并提出了FastWake，这是一种使用商用RDMA硬件，Linux操作系统和未经修改的应用程序对中断模式RDMA主机网络堆栈进行重新设计的实用方法。我们的第一种快速线程唤醒方法完全消除了中断。我们设计了一个每核调度线程来轮询同一核上应用程序线程的所有完成队列，并利用内核快速路径将上下文切换到具有传入完成事件的线程。上述方法将使cpu以100%的利用率运行，因此我们为具有功率限制的场景设计了基于中断的方法。观察到唤醒与中断相同核心上的线程要比唤醒其他核心上的线程快得多，我们动态调整RDMA事件队列映射以改善中断核心的亲和性。此外，我们将重新审视线程唤醒的内核路径，并删除虚拟文件系统(VFS)、锁定和进程调度中的开销。实验表明，FastWake可以在x86上减少80%的RDMA延迟，在ARM上减少77%的RDMA延迟，而功耗比传统中断高30%，延迟仅比底层硬件的极限高0.3 ~ 0.4 μ s。当需要节省电力时，我们基于中断的方法仍然可以在x86上减少59%的中断模式RDMA延迟，在ARM上减少52%。

{"title":"FastWake: Revisiting Host Network Stack for Interrupt-mode RDMA","authors":"Bojie Li, Zihao Xiang, Xiaoliang Wang, Hang Ruan, Jingbin Zhou, Kun Tan","doi":"10.1145/3600061.3600063","DOIUrl":"https://doi.org/10.1145/3600061.3600063","url":null,"abstract":"Polling and interrupt has long been a trade-off in RDMA systems. Polling has lower latency but each CPU core can only run one thread. Interrupt enables time sharing among multiple threads but has higher latency. Many applications such as databases have hundreds of threads, which is much larger than the number of cores. So, they have to use interrupt mode to share cores among threads, and the resulting RDMA latency is much higher than the hardware limits. In this paper, we analyze the root cause of high costs in RDMA interrupt delivery, and present FastWake, a practical redesign of interrupt-mode RDMA host network stack using commodity RDMA hardware, Linux OS, and unmodified applications. Our first approach to fast thread wake-up completely removes interrupts. We design a per-core dispatcher thread to poll all the completion queues of the application threads on the same core, and utilize a kernel fast path to context switch to the thread with an incoming completion event. The approach above would keep CPUs running at 100% utilization, so we design an interrupt-based approach for scenarios with power constraints. Observing that waking up a thread on the same core as the interrupt is much faster than threads on other cores, we dynamically adjust RDMA event queue mappings to improve interrupt core affinity. In addition, we revisit the kernel path of thread wake-up, and remove the overheads in virtual file system (VFS), locking, and process scheduling. Experiments show that FastWake can reduce RDMA latency by 80% on x86 and 77% on ARM at the cost of < 30% higher power utilization than traditional interrupts, and the latency is only 0.3 ∼ 0.4 μ s higher than the limits of underlying hardware. When power saving is desired, our interrupt-based approach can still reduce interrupt-mode RDMA latency by 59% on x86 and 52% on ARM.","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124308216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reducing Reconfiguration Time in Hybrid Optical-Electrical Datacenter Networks 减少混合光-电数据中心网络重构时间

Proceedings of the 7th Asia-Pacific Workshop on Networking

Pub Date : 2023-05-08 DOI: 10.1145/3600061.3600071

Shuyuan Zhang, Shu Shan, Shizhen Zhao

We study how to reduce the reconfiguration time in hybrid optical-electrical Datacenter Networks (DCNs). With a layer of Optical Circuit Switches (OCSes), hybrid optical-electrical DCNs could reconfigure their logical topologies to better match the on-going traffic patterns, but the reconfiguration time could directly affect the benefits of reconfigurability. The reconfiguration time consists of the topology solver running time and the network convergence time after triggering reconfiguration. However, existing topology solvers either incur high algorithmic complexity or fail to minimize the reconfiguration overhead. In this paper, we propose a novel algorithm that combines the ideas of bipartition and Minimum Cost Flow (MCF) to reduce the overall reconfiguration time. For the first time, we formulate the topology solving problem as an MCF problem with piecewise cost, which strikes a better balance between solver complexity and solution optimality. Our evaluation shows that our algorithm can significantly reduce the network convergence time while consuming less topology solver running time, making its overall performance superior to existing algorithms. Our code and test cases are available at a public repository [25].

研究了如何缩短光电混合数据中心网络(DCNs)的重构时间。通过一层光电路交换机(ocse)，混合光电DCNs可以重新配置其逻辑拓扑以更好地匹配正在进行的流量模式，但重新配置的时间会直接影响可重构性的好处。重配置时间包括拓扑求解器运行时间和触发重配置后的网络收敛时间。然而，现有的拓扑求解器要么导致较高的算法复杂度，要么无法最小化重构开销。在本文中，我们提出了一种新的算法，该算法结合了二分割和最小成本流(MCF)的思想来减少整体重构时间。我们首次将拓扑求解问题表述为具有分段代价的MCF问题，从而更好地平衡了求解器复杂性和解的最优性。我们的评估表明，我们的算法可以显著缩短网络收敛时间，同时消耗较少的拓扑求解器运行时间，使其整体性能优于现有算法。我们的代码和测试用例可以在公共存储库中获得[25]。

{"title":"Reducing Reconfiguration Time in Hybrid Optical-Electrical Datacenter Networks","authors":"Shuyuan Zhang, Shu Shan, Shizhen Zhao","doi":"10.1145/3600061.3600071","DOIUrl":"https://doi.org/10.1145/3600061.3600071","url":null,"abstract":"We study how to reduce the reconfiguration time in hybrid optical-electrical Datacenter Networks (DCNs). With a layer of Optical Circuit Switches (OCSes), hybrid optical-electrical DCNs could reconfigure their logical topologies to better match the on-going traffic patterns, but the reconfiguration time could directly affect the benefits of reconfigurability. The reconfiguration time consists of the topology solver running time and the network convergence time after triggering reconfiguration. However, existing topology solvers either incur high algorithmic complexity or fail to minimize the reconfiguration overhead. In this paper, we propose a novel algorithm that combines the ideas of bipartition and Minimum Cost Flow (MCF) to reduce the overall reconfiguration time. For the first time, we formulate the topology solving problem as an MCF problem with piecewise cost, which strikes a better balance between solver complexity and solution optimality. Our evaluation shows that our algorithm can significantly reduce the network convergence time while consuming less topology solver running time, making its overall performance superior to existing algorithms. Our code and test cases are available at a public repository [25].","PeriodicalId":228934,"journal":{"name":"Proceedings of the 7th Asia-Pacific Workshop on Networking","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129972873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 7th Asia-Pacific Workshop on Networking

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀