Accurate and fast congestion feedback in MEC-enabled RDMA datacenters

Xin He, Feifan Liang, Weibei Fan, Junchang Wang, Lei Han, Fu Xiao, Wanchun Dou
{"title":"Accurate and fast congestion feedback in MEC-enabled RDMA datacenters","authors":"Xin He, Feifan Liang, Weibei Fan, Junchang Wang, Lei Han, Fu Xiao, Wanchun Dou","doi":"10.1186/s13677-024-00642-8","DOIUrl":null,"url":null,"abstract":"Mobile edge computing (MEC) is a novel computing paradigm that pushes computation and storage resources to the edge of the network. The interconnection of edge servers forms small-scale data centers, enabling MEC to provide low-latency network services for mobile users. Nowadays, Remote Direct Memory Access (RDMA) has been widely deployed in such data centers to reduce CPU overhead and network latency. Plenty of congestion control mechanisms have been proposed for RDMA data centers, aiming to provide low-latency data delivery and high throughput network services. However, our fine-grained experimental analysis reveals that existing congestion control mechanisms still have performance limitations due to inappropriate congestion notifications and the long congestion feedback cycle. In this paper, we propose Mercury, which is an accurate and fast congestion feedback mechanism. Mercury comprises two key components: (1) the state-driven congestion detection and (2) the window-based congestion notification. Specifically, the state-driven congestion detection monitors the queue length and the number of packets received at the switch egress port when the PFC is triggered. It determines the states of egress ports and identifies flows that really contribute to congestion. Then, window-based congestion notification calculates the window sizes for congested flows and rapidly returns Congestion Notification Packets (CNPs) with the window information to the sender. It facilitates the rate adjustment of congested flows. Mercury is compatible with existing RDMA CC mechanisms and can be easily implemented in switches. We employ real-world data sets and conduct both micro-benchmark and large-scale simulations to evaluate the performance of Mercury. The results indicate that, thanks to the accurate and fast congestion feedback, Mercury achieves a reduction in the 99th tail flow completion time by up to 45.1%, 41.8%, 38.7%, 30.9%, and 37.9% compared with Timely, DCQCN, DCQCN+TCD, PACC, and HPCC, respectively.","PeriodicalId":501257,"journal":{"name":"Journal of Cloud Computing","volume":"233 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13677-024-00642-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Mobile edge computing (MEC) is a novel computing paradigm that pushes computation and storage resources to the edge of the network. The interconnection of edge servers forms small-scale data centers, enabling MEC to provide low-latency network services for mobile users. Nowadays, Remote Direct Memory Access (RDMA) has been widely deployed in such data centers to reduce CPU overhead and network latency. Plenty of congestion control mechanisms have been proposed for RDMA data centers, aiming to provide low-latency data delivery and high throughput network services. However, our fine-grained experimental analysis reveals that existing congestion control mechanisms still have performance limitations due to inappropriate congestion notifications and the long congestion feedback cycle. In this paper, we propose Mercury, which is an accurate and fast congestion feedback mechanism. Mercury comprises two key components: (1) the state-driven congestion detection and (2) the window-based congestion notification. Specifically, the state-driven congestion detection monitors the queue length and the number of packets received at the switch egress port when the PFC is triggered. It determines the states of egress ports and identifies flows that really contribute to congestion. Then, window-based congestion notification calculates the window sizes for congested flows and rapidly returns Congestion Notification Packets (CNPs) with the window information to the sender. It facilitates the rate adjustment of congested flows. Mercury is compatible with existing RDMA CC mechanisms and can be easily implemented in switches. We employ real-world data sets and conduct both micro-benchmark and large-scale simulations to evaluate the performance of Mercury. The results indicate that, thanks to the accurate and fast congestion feedback, Mercury achieves a reduction in the 99th tail flow completion time by up to 45.1%, 41.8%, 38.7%, 30.9%, and 37.9% compared with Timely, DCQCN, DCQCN+TCD, PACC, and HPCC, respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在支持 MEC 的 RDMA 数据中心中实现准确快速的拥塞反馈
移动边缘计算(MEC)是一种将计算和存储资源推向网络边缘的新型计算模式。边缘服务器的互连形成了小型数据中心,使 MEC 能够为移动用户提供低延迟网络服务。目前,远程直接内存访问(RDMA)已广泛应用于此类数据中心,以减少 CPU 开销和网络延迟。针对 RDMA 数据中心提出了大量拥塞控制机制,旨在提供低延迟数据传输和高吞吐量网络服务。然而,我们的细粒度实验分析表明,由于不恰当的拥塞通知和较长的拥塞反馈周期,现有的拥塞控制机制仍然存在性能限制。在本文中,我们提出了一种准确、快速的拥塞反馈机制--Mercury。Mercury 由两个关键部分组成:(1) 状态驱动的拥塞检测和 (2) 基于窗口的拥塞通知。具体来说,当 PFC 被触发时,状态驱动拥塞检测会监控队列长度和交换机出口端口接收到的数据包数量。它能确定出口端口的状态,并识别出真正造成拥塞的流量。然后,基于窗口的拥塞通知会计算拥塞流量的窗口大小,并将包含窗口信息的拥塞通知包(CNP)快速返回给发送方。这有助于调整拥塞流量的速率。Mercury 与现有的 RDMA CC 机制兼容,可在交换机中轻松实现。我们采用真实世界的数据集,并通过微基准和大规模仿真来评估 Mercury 的性能。结果表明,与Timely、DCQCN、DCQCN+TCD、PACC和HPCC相比,得益于准确快速的拥塞反馈,Mercury可将第99个尾流完成时间分别缩短45.1%、41.8%、38.7%、30.9%和37.9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A cost-efficient content distribution optimization model for fog-based content delivery networks Toward security quantification of serverless computing SMedIR: secure medical image retrieval framework with ConvNeXt-based indexing and searchable encryption in the cloud A trusted IoT data sharing method based on secure multi-party computation Wind power prediction method based on cloud computing and data privacy protection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1