{"title":"ALB-TP: Adaptive Load Balancing based on Traffic Prediction using GRU-Attention for Software-Defined DCNs","authors":"Yong Liu, Qian Meng, Kefei Chen, Zhonghua Shen","doi":"10.1016/j.jnca.2024.104103","DOIUrl":null,"url":null,"abstract":"With networks increasing in size and traffic bursting, Data Center Networks (DCNs), as the core infrastructure of High-Performance Computing (HPC), can require a high-performance, robust, and scalable load balancing method. However, existing research work has not yet met these design objectives well. In this paper, we design, analyze and evaluate a novel Adaptive Load Balancing based on Traffic Prediction (ALB-TP) for achieving these goals. ALB-TP uses Gate Recurrent Unit and Attention (GRU-Attention) model to dynamically predict the path congestion information of the whole network. Compared with the existing scheme of collecting congestion status information in a fixed time period, the proposed GRU-Attention model improves the timeliness and accuracy of congestion information collection. With global congestion awareness, ALB-TP, which forwards flows to the least congested path via the two-stage routing in the actual implementation, is more robust than existing congestion-agnostic schemes for the asymmetric topology. Additionally, ALB-TP adopts a distributed control structure to capture the congestion information of the entire network in parallel, which makes it more scalable than existing congestion-aware schemes for large-scale networks. Evaluations show that on the Fat-Tree topology, ALB-TP can effectively alleviate network congestion and balance flows on different paths. Compared to existing GRU and LSTM models, the proposed GRU-Attention model improves the accuracy of congestion information prediction by 28.2% on average. Simulation results show that the proposed ALB-TP scheme reduces the Flow Completion Time (FCT) by an average of 18.5% and also improves the throughput by an average of 31.6% compared to the existing schemes. Through theoretical design and experimental analysis, we can see that the proposed ALB-TP can effectively balance the traffic load on the asymmetric topology and achieve the design goal of load balancing. Compared with existing schemes, ALB-TP also has better performance advantages in terms of FCT, throughput, and accuracy of congestion information collection.","PeriodicalId":54784,"journal":{"name":"Journal of Network and Computer Applications","volume":"50 1","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Network and Computer Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1016/j.jnca.2024.104103","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
With networks increasing in size and traffic bursting, Data Center Networks (DCNs), as the core infrastructure of High-Performance Computing (HPC), can require a high-performance, robust, and scalable load balancing method. However, existing research work has not yet met these design objectives well. In this paper, we design, analyze and evaluate a novel Adaptive Load Balancing based on Traffic Prediction (ALB-TP) for achieving these goals. ALB-TP uses Gate Recurrent Unit and Attention (GRU-Attention) model to dynamically predict the path congestion information of the whole network. Compared with the existing scheme of collecting congestion status information in a fixed time period, the proposed GRU-Attention model improves the timeliness and accuracy of congestion information collection. With global congestion awareness, ALB-TP, which forwards flows to the least congested path via the two-stage routing in the actual implementation, is more robust than existing congestion-agnostic schemes for the asymmetric topology. Additionally, ALB-TP adopts a distributed control structure to capture the congestion information of the entire network in parallel, which makes it more scalable than existing congestion-aware schemes for large-scale networks. Evaluations show that on the Fat-Tree topology, ALB-TP can effectively alleviate network congestion and balance flows on different paths. Compared to existing GRU and LSTM models, the proposed GRU-Attention model improves the accuracy of congestion information prediction by 28.2% on average. Simulation results show that the proposed ALB-TP scheme reduces the Flow Completion Time (FCT) by an average of 18.5% and also improves the throughput by an average of 31.6% compared to the existing schemes. Through theoretical design and experimental analysis, we can see that the proposed ALB-TP can effectively balance the traffic load on the asymmetric topology and achieve the design goal of load balancing. Compared with existing schemes, ALB-TP also has better performance advantages in terms of FCT, throughput, and accuracy of congestion information collection.
随着网络规模的不断扩大和流量的激增,数据中心网络(DCNs)作为高性能计算(HPC)的核心基础设施,对高性能、鲁棒性和可扩展性的负载均衡方法提出了更高的要求。然而,现有的研究工作还没有很好地满足这些设计目标。在本文中,我们设计,分析和评估了一种新的基于流量预测的自适应负载均衡(ALB-TP)来实现这些目标。ALB-TP采用Gate Recurrent Unit and Attention (GRU-Attention)模型动态预测整个网络的路径拥塞信息。与现有的固定时间段内收集拥塞状态信息的方案相比,本文提出的GRU-Attention模型提高了收集拥塞信息的时效性和准确性。在具有全局拥塞感知的情况下,ALB-TP在实际实现中通过两阶段路由将流转发到拥塞最少的路径,比现有的非对称拓扑的拥塞不可知方案更具鲁棒性。此外,ALB-TP采用分布式控制结构,可以并行捕获整个网络的拥塞信息,这使得它比现有的大规模网络拥塞感知方案更具可扩展性。评估结果表明,在胖树拓扑下,ALB-TP可以有效缓解网络拥塞,平衡不同路径上的流量。与现有的GRU和LSTM模型相比,本文提出的GRU- attention模型的拥塞信息预测准确率平均提高了28.2%。仿真结果表明,与现有方案相比,所提出的ALB-TP方案平均减少了18.5%的流量完成时间(FCT),平均提高了31.6%的吞吐量。通过理论设计和实验分析,我们可以看到所提出的ALB-TP能够有效地均衡非对称拓扑上的流量负载,达到负载均衡的设计目标。与现有方案相比,ALB-TP在FCT、吞吐量和拥塞信息采集的准确性方面也具有更好的性能优势。
期刊介绍:
The Journal of Network and Computer Applications welcomes research contributions, surveys, and notes in all areas relating to computer networks and applications thereof. Sample topics include new design techniques, interesting or novel applications, components or standards; computer networks with tools such as WWW; emerging standards for internet protocols; Wireless networks; Mobile Computing; emerging computing models such as cloud computing, grid computing; applications of networked systems for remote collaboration and telemedicine, etc. The journal is abstracted and indexed in Scopus, Engineering Index, Web of Science, Science Citation Index Expanded and INSPEC.