Jinbin Hu;Yi He;Wangqing Luo;Jiawei Huang;Jin Wang
{"title":"Enhancing Load Balancing With In-Network Recirculation to Prevent Packet Reordering in Lossless Data Centers","authors":"Jinbin Hu;Yi He;Wangqing Luo;Jiawei Huang;Jin Wang","doi":"10.1109/TNET.2024.3403671","DOIUrl":null,"url":null,"abstract":"Many existing load balancing mechanisms work effectively in lossy datacenter networks (DCNs), but they suffer from serious packet reordering in lossless Ethernet DCNs deployed with the hop-by-hop Priority-based Flow Control (PFC). The key reason is that the prior solutions are not able to perceive PFC triggering correctly and in a timely manner when making load balancing decisions. Once the forwarding path pauses transmission due to PFC triggering, the packets allocated on it are blocked, inevitably leading to out-of-order packets and retransmission. In this paper, we present an Reordering-robust Load Balancing (RLB) scheme with PFC prediction in lossless DCNs. At its heart, RLB leverages the derivative of ingress queue length to predict PFC triggering and proactively notifies the upstream switches to choose an appropriate rerouting path or perform packet recirculation to avoid reordering. Furthermore, under switch failure scenarios, RLB adjusts the recirculation threshold adaptively to mitigate the risk of packets over-recirculation. We have implemented RLB in the hardware programmable switch. As a building block for existing load balancing mechanisms, we have integrated RLB into Presto, LetFlow, Hermes and DRILL. The evaluation results show that the RLB-enhanced solutions deliver significant performance by avoiding packet reordering. For example, it reduces the \n<inline-formula> <tex-math>$99^{th}$ </tex-math></inline-formula>\n percentile flow completion time (FCT) by up to 72%, 67%, 58% and 54% over DRILL, Presto, LetFlow and Hermes, respectively.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4114-4127"},"PeriodicalIF":3.6000,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10539004/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Many existing load balancing mechanisms work effectively in lossy datacenter networks (DCNs), but they suffer from serious packet reordering in lossless Ethernet DCNs deployed with the hop-by-hop Priority-based Flow Control (PFC). The key reason is that the prior solutions are not able to perceive PFC triggering correctly and in a timely manner when making load balancing decisions. Once the forwarding path pauses transmission due to PFC triggering, the packets allocated on it are blocked, inevitably leading to out-of-order packets and retransmission. In this paper, we present an Reordering-robust Load Balancing (RLB) scheme with PFC prediction in lossless DCNs. At its heart, RLB leverages the derivative of ingress queue length to predict PFC triggering and proactively notifies the upstream switches to choose an appropriate rerouting path or perform packet recirculation to avoid reordering. Furthermore, under switch failure scenarios, RLB adjusts the recirculation threshold adaptively to mitigate the risk of packets over-recirculation. We have implemented RLB in the hardware programmable switch. As a building block for existing load balancing mechanisms, we have integrated RLB into Presto, LetFlow, Hermes and DRILL. The evaluation results show that the RLB-enhanced solutions deliver significant performance by avoiding packet reordering. For example, it reduces the
$99^{th}$
percentile flow completion time (FCT) by up to 72%, 67%, 58% and 54% over DRILL, Presto, LetFlow and Hermes, respectively.
期刊介绍:
The IEEE/ACM Transactions on Networking’s high-level objective is to publish high-quality, original research results derived from theoretical or experimental exploration of the area of communication/computer networking, covering all sorts of information transport networks over all sorts of physical layer technologies, both wireline (all kinds of guided media: e.g., copper, optical) and wireless (e.g., radio-frequency, acoustic (e.g., underwater), infra-red), or hybrids of these. The journal welcomes applied contributions reporting on novel experiences and experiments with actual systems.