IEEE Transactions on Network and Service Management最新文献_第10页

Energy Efficient UAV-Assisted IoT Data Collection: A Graph-Based Deep Reinforcement Learning Approach 高能效无人机辅助物联网数据采集：基于图的深度强化学习方法

IF 4.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Network and Service Management

Pub Date : 2024-08-28 DOI: 10.1109/TNSM.2024.3450964

Qianqian Wu;Qiang Liu;Wenliang Zhu;Zefan Wu

With the advancements in technologies such as 5G, Unmanned Aerial Vehicles (UAVs) have exhibited their potential in various application scenarios, including wireless coverage, search operations, and disaster response. In this paper, we consider the utilization of a group of UAVs as aerial base stations (BS) to collect data from IoT sensor devices. The objective is to maximize the volume of collected data while simultaneously enhancing the geographical fairness among these points of interest, all within the constraints of limited energy resources. Therefore, we propose a deep reinforcement learning (DRL) method based on Graph Attention Networks (GAT), referred to as “GADRL”. GADRL utilizes graph convolutional neural networks to extract spatial correlations among multiple UAVs and makes decisions in a distributed manner under the guidance of DRL. Furthermore, we employ Long Short-Term Memory to establish memory units for storing and utilizing historical information. Numerical results demonstrate that GADRL consistently outperforms four baseline methods, validating its computational efficiency.

随着5G等技术的进步，无人机在无线覆盖、搜索行动、灾难响应等各种应用场景中展现了其潜力。在本文中，我们考虑利用一组无人机作为空中基站（BS）从物联网传感器设备收集数据。目标是最大限度地收集数据，同时加强这些兴趣点之间的地理公平性，所有这些都在有限的能源资源的限制下。因此，我们提出了一种基于图注意网络（GAT）的深度强化学习（DRL）方法，简称“GADRL”。GADRL利用图卷积神经网络提取多架无人机之间的空间相关性，并在DRL的指导下进行分布式决策。此外，我们采用长短期记忆建立记忆单元来存储和利用历史信息。数值结果表明，GADRL的计算效率始终优于4种基准方法。

{"title":"Energy Efficient UAV-Assisted IoT Data Collection: A Graph-Based Deep Reinforcement Learning Approach","authors":"Qianqian Wu;Qiang Liu;Wenliang Zhu;Zefan Wu","doi":"10.1109/TNSM.2024.3450964","DOIUrl":"10.1109/TNSM.2024.3450964","url":null,"abstract":"With the advancements in technologies such as 5G, Unmanned Aerial Vehicles (UAVs) have exhibited their potential in various application scenarios, including wireless coverage, search operations, and disaster response. In this paper, we consider the utilization of a group of UAVs as aerial base stations (BS) to collect data from IoT sensor devices. The objective is to maximize the volume of collected data while simultaneously enhancing the geographical fairness among these points of interest, all within the constraints of limited energy resources. Therefore, we propose a deep reinforcement learning (DRL) method based on Graph Attention Networks (GAT), referred to as “GADRL”. GADRL utilizes graph convolutional neural networks to extract spatial correlations among multiple UAVs and makes decisions in a distributed manner under the guidance of DRL. Furthermore, we employ Long Short-Term Memory to establish memory units for storing and utilizing historical information. Numerical results demonstrate that GADRL consistently outperforms four baseline methods, validating its computational efficiency.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6082-6094"},"PeriodicalIF":4.7,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Distributed Learning Framework for eMBB-URLLC Multiplexing in Open Radio Access Networks 开放无线接入网络中 eMBB-URLLC 复用的分布式学习框架

IF 4.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Network and Service Management

Pub Date : 2024-08-28 DOI: 10.1109/TNSM.2024.3451295

Madyan Alsenwi;Eva Lagunas;Symeon Chatzinotas

Next-generation (NextG) cellular networks are expected to evolve towards virtualization and openness, incorporating reprogrammable components that facilitate intelligence and real-time analytics. This paper builds on these innovations to address the network slicing problem in multi-cell open radio access wireless networks, focusing on two key services: enhanced Mobile BroadBand (eMBB) and Ultra-Reliable Low Latency Communications (URLLC). A stochastic resource allocation problem is formulated with the goal of balancing the average eMBB data rate and its variance, while ensuring URLLC constraints. A distributed learning framework based on the Deep Reinforcement Learning (DRL) technique is developed following the Open Radio Access Networks (O-RAN) architectures to solve the formulated optimization problem. The proposed learning approach enables training a global machine learning model at a central cloud server and sharing it with edge servers for executions. Specifically, deep learning agents are distributed at network edge servers and embedded within the Near-Real-Time Radio access network Intelligent Controller (Near-RT RIC) to collect network information and perform online executions. A global deep learning model is trained by a central training engine embedded within the Non-Real-Time RIC (Non-RT RIC) at the central server using received data from edge servers. The performed simulation results validate the efficacy of the proposed algorithm in achieving URLLC constraints while maintaining the eMBB Quality of Service (QoS).

下一代（NextG）蜂窝网络预计将向虚拟化和开放性方向发展，并采用可重新编程的组件，以促进智能化和实时分析。本文以这些创新为基础，解决多蜂窝开放式无线接入无线网络中的网络切片问题，重点关注两种关键服务：增强型移动宽带（eMBB）和超可靠低延迟通信（URLLC）。随机资源分配问题的目标是平衡平均 eMBB 数据速率及其方差，同时确保 URLLC 约束条件。根据开放无线接入网络（O-RAN）架构，开发了基于深度强化学习（DRL）技术的分布式学习框架，以解决所提出的优化问题。所提出的学习方法能够在中央云服务器上训练全局机器学习模型，并将其共享给边缘服务器执行。具体来说，深度学习代理分布在网络边缘服务器上，并嵌入近实时无线接入网智能控制器（Near-RT RIC）中，以收集网络信息并执行在线执行。全局深度学习模型由嵌入在中央服务器非实时 RIC（Non-RT RIC）内的中央训练引擎利用从边缘服务器接收到的数据进行训练。仿真结果验证了所提算法在实现 URLLC 约束的同时保持 eMBB 服务质量（QoS）的有效性。

{"title":"Distributed Learning Framework for eMBB-URLLC Multiplexing in Open Radio Access Networks","authors":"Madyan Alsenwi;Eva Lagunas;Symeon Chatzinotas","doi":"10.1109/TNSM.2024.3451295","DOIUrl":"10.1109/TNSM.2024.3451295","url":null,"abstract":"Next-generation (NextG) cellular networks are expected to evolve towards virtualization and openness, incorporating reprogrammable components that facilitate intelligence and real-time analytics. This paper builds on these innovations to address the network slicing problem in multi-cell open radio access wireless networks, focusing on two key services: enhanced Mobile BroadBand (eMBB) and Ultra-Reliable Low Latency Communications (URLLC). A stochastic resource allocation problem is formulated with the goal of balancing the average eMBB data rate and its variance, while ensuring URLLC constraints. A distributed learning framework based on the Deep Reinforcement Learning (DRL) technique is developed following the Open Radio Access Networks (O-RAN) architectures to solve the formulated optimization problem. The proposed learning approach enables training a global machine learning model at a central cloud server and sharing it with edge servers for executions. Specifically, deep learning agents are distributed at network edge servers and embedded within the Near-Real-Time Radio access network Intelligent Controller (Near-RT RIC) to collect network information and perform online executions. A global deep learning model is trained by a central training engine embedded within the Non-Real-Time RIC (Non-RT RIC) at the central server using received data from edge servers. The performed simulation results validate the efficacy of the proposed algorithm in achieving URLLC constraints while maintaining the eMBB Quality of Service (QoS).","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 5","pages":"5718-5732"},"PeriodicalIF":4.7,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic Flow Scheduling for DNN Training Workloads in Data Centers 数据中心 DNN 训练工作负载的动态流量调度

IF 4.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Network and Service Management

Pub Date : 2024-08-27 DOI: 10.1109/TNSM.2024.3450670

Xiaoyang Zhao;Chuan Wu;Xia Zhu

Distributed deep learning (DL) training constitutes a significant portion of workloads in modern data centers that are equipped with high computational capacities, such as GPU servers. However, frequent tensor exchanges among workers during distributed deep neural network (DNN) training can result in heavy traffic in the data center network, leading to congestion at server NICs and in the switching network. Unfortunately, none of the existing DL communication libraries support active flow control to optimize tensor transmission performance, instead relying on passive adjustments to the congestion window or sending rate based on packet loss or delay. To address this issue, we propose a flow scheduler per host that dynamically tunes the sending rates of outgoing tensor flows from each server, maximizing network bandwidth utilization and expediting job training progress. Our scheduler comprises two main components: a monitoring module that interacts with state-of-the-art communication libraries supporting parameter server and all-reduce paradigms to track the training progress of DNN jobs, and a congestion control protocol that receives in-network feedback from traversing switches and computes optimized flow sending rates. For data centers where switches are not programmable, we provide a software solution that emulates switch behavior and interacts with the scheduler on servers. Experiments with real-world GPU testbed and trace-driven simulation demonstrate that our scheduler outperforms common rate control protocols and representative learning-based schemes in various settings.

分布式深度学习（DL）训练在配备高计算能力的现代数据中心（如GPU服务器）中构成了很大一部分工作负载。然而，在分布式深度神经网络（DNN）训练过程中，工作人员之间频繁的张量交换会导致数据中心网络的流量过大，从而导致服务器网卡和交换网络的拥塞。不幸的是，现有的DL通信库都不支持主动流量控制来优化张量传输性能，而是依赖于对拥塞窗口或基于丢包或延迟的发送速率的被动调整。为了解决这个问题，我们提出了一个每台主机的流量调度器，它可以动态地调整每台服务器发出的张量流的发送速率，最大限度地提高网络带宽利用率，加快工作培训进度。我们的调度程序包括两个主要组件：一个监控模块，它与最先进的通信库交互，支持参数服务器和all-reduce范式，以跟踪DNN作业的训练进度；一个拥塞控制协议，它接收来自遍历交换机的网络内反馈，并计算优化的流发送速率。对于交换机不可编程的数据中心，我们提供了一种软件解决方案，可以模拟交换机行为并与服务器上的调度程序进行交互。在真实的GPU测试平台和跟踪驱动仿真中进行的实验表明，我们的调度程序在各种设置下优于常见的速率控制协议和代表性的基于学习的方案。

{"title":"Dynamic Flow Scheduling for DNN Training Workloads in Data Centers","authors":"Xiaoyang Zhao;Chuan Wu;Xia Zhu","doi":"10.1109/TNSM.2024.3450670","DOIUrl":"10.1109/TNSM.2024.3450670","url":null,"abstract":"Distributed deep learning (DL) training constitutes a significant portion of workloads in modern data centers that are equipped with high computational capacities, such as GPU servers. However, frequent tensor exchanges among workers during distributed deep neural network (DNN) training can result in heavy traffic in the data center network, leading to congestion at server NICs and in the switching network. Unfortunately, none of the existing DL communication libraries support active flow control to optimize tensor transmission performance, instead relying on passive adjustments to the congestion window or sending rate based on packet loss or delay. To address this issue, we propose a flow scheduler per host that dynamically tunes the sending rates of outgoing tensor flows from each server, maximizing network bandwidth utilization and expediting job training progress. Our scheduler comprises two main components: a monitoring module that interacts with state-of-the-art communication libraries supporting parameter server and all-reduce paradigms to track the training progress of DNN jobs, and a congestion control protocol that receives in-network feedback from traversing switches and computes optimized flow sending rates. For data centers where switches are not programmable, we provide a software solution that emulates switch behavior and interacts with the scheduler on servers. Experiments with real-world GPU testbed and trace-driven simulation demonstrate that our scheduler outperforms common rate control protocols and representative learning-based schemes in various settings.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6643-6657"},"PeriodicalIF":4.7,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SAR: Receiver-Driven Transport Protocol With Micro-Burst Prediction in Data Center Networks SAR：数据中心网络中具有微脉冲预测功能的接收器驱动传输协议

IF 4.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Network and Service Management

Pub Date : 2024-08-27 DOI: 10.1109/TNSM.2024.3450597

Jin Ye;Tiantian Yu;Zhaoyi Li;Jiawei Huang

In recent years, motivated by new datacenter applications and the well-known shortcomings of TCP in data center, many receiver-driven transport protocols have been proposed to provide ultra-low latency and zero packet loss by using the proactive congestion control. However, in the scenario of mixed short and long flows, the short flows with ON/OFF pattern generate micro-burst traffic, which significantly deteriorates the performance of existing receiver-driven transport protocols. Firstly, when the short flows turn into ON mode, the long flows cannot immediately concede bandwidth to the short ones, resulting in queue buildup and even packet loss. Secondly, when the short flows change from ON to OFF mode, the released bandwidth cannot be fully utilized by the long flows, leading to serious bandwidth waste. To address these issues, we propose a new receiver-driven transport protocol, called SAR, which predicts the micro burst generated by short flows and adjusts the sending rate of long flows accordingly. With the aid of micro-burst prediction mechanism, SAR mitigates the bandwidth competition due to the arrival of short flows, and alleviates the bandwidth waste when the short flows leave. The testbed and NS2 simulation experiments demonstrate that SAR reduces the average flow completion time (AFCT) by up to 66% compared to typical receiver-driven transport protocols.

近年来，在新的数据中心应用和TCP在数据中心中众所周知的缺点的推动下，提出了许多接收方驱动的传输协议，通过主动拥塞控制来提供超低延迟和零丢包。然而，在长短流混合的情况下，具有ON/OFF模式的短流会产生微突发流量，这会显著降低现有接收端驱动传输协议的性能。首先，当短流变成ON模式时，长流不能立即将带宽让与短流，导致队列堆积甚至丢包。其次，当短流从ON模式切换到OFF模式时，释放的带宽不能被长流充分利用，导致严重的带宽浪费。为了解决这些问题，我们提出了一种新的接收端驱动的传输协议，称为SAR，它可以预测短流产生的微突发，并相应地调整长流的发送速率。利用微突发预测机制，SAR缓解了短流到来时的带宽竞争，缓解了短流离开时的带宽浪费。测试平台和NS2仿真实验表明，与典型的接收端驱动传输协议相比，SAR将平均流量完成时间（AFCT）减少了66%。

{"title":"SAR: Receiver-Driven Transport Protocol With Micro-Burst Prediction in Data Center Networks","authors":"Jin Ye;Tiantian Yu;Zhaoyi Li;Jiawei Huang","doi":"10.1109/TNSM.2024.3450597","DOIUrl":"10.1109/TNSM.2024.3450597","url":null,"abstract":"In recent years, motivated by new datacenter applications and the well-known shortcomings of TCP in data center, many receiver-driven transport protocols have been proposed to provide ultra-low latency and zero packet loss by using the proactive congestion control. However, in the scenario of mixed short and long flows, the short flows with ON/OFF pattern generate micro-burst traffic, which significantly deteriorates the performance of existing receiver-driven transport protocols. Firstly, when the short flows turn into ON mode, the long flows cannot immediately concede bandwidth to the short ones, resulting in queue buildup and even packet loss. Secondly, when the short flows change from ON to OFF mode, the released bandwidth cannot be fully utilized by the long flows, leading to serious bandwidth waste. To address these issues, we propose a new receiver-driven transport protocol, called SAR, which predicts the micro burst generated by short flows and adjusts the sending rate of long flows accordingly. With the aid of micro-burst prediction mechanism, SAR mitigates the bandwidth competition due to the arrival of short flows, and alleviates the bandwidth waste when the short flows leave. The testbed and NS2 simulation experiments demonstrate that SAR reduces the average flow completion time (AFCT) by up to 66% compared to typical receiver-driven transport protocols.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6409-6422"},"PeriodicalIF":4.7,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Lightweight Automatic ECN Tuning Based on Deep Reinforcement Learning With Ultra-Low Overhead in Datacenter Networks 数据中心网络中基于深度强化学习的超低开销轻量级自动 ECN 调整

IF 4.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Network and Service Management

Pub Date : 2024-08-27 DOI: 10.1109/TNSM.2024.3450596

Jinbin Hu;Zikai Zhou;Jin Zhang

In modern datacenter networks (DCNs), mainstream congestion control (CC) mechanisms essentially rely on Explicit Congestion Notification (ECN) to reflect congestion. The traditional static ECN threshold performs poorly under dynamic scenarios, and setting a proper ECN threshold under various traffic patterns is challenging and time-consuming. The recently proposed reinforcement learning (RL) based ECN Tuning algorithm (ACC) consumes a large number of computational resources, making it difficult to deploy on switches. In this paper, we present a lightweight and hierarchical automated ECN tuning algorithm called LAECN, which can fully exploit the performance benefits of deep reinforcement learning with ultra-low overhead. The simulation results show that LAECN improves performance significantly by reducing latency and increasing throughput in stable network conditions, and also shows consistent high performance in small flows network environments. For example, LAECN effectively improves throughput by up to 47%, 34%, 32% and 24% over DCQCN, TIMELY, HPCC and ACC, respectively.

在现代数据中心网络（dcn）中，主流拥塞控制（CC）机制基本上依赖于显式拥塞通知（ECN）来反映拥塞情况。传统的静态ECN阈值在动态场景下性能较差，并且在不同的流量模式下设置合适的ECN阈值既具有挑战性又耗时。最近提出的基于强化学习（RL）的ECN调优算法（ACC）消耗了大量的计算资源，难以在交换机上部署。在本文中，我们提出了一种轻量级的分层自动ECN调优算法LAECN，它可以充分利用深度强化学习的性能优势，并且开销超低。仿真结果表明，LAECN在稳定网络条件下通过降低延迟和提高吞吐量显著提高了性能，并且在小流量网络环境下也表现出一致的高性能。例如，LAECN比DCQCN、TIMELY、HPCC和ACC分别有效地提高了47%、34%、32%和24%的吞吐量。

引用次数: 0

CACC: A Congestion-Aware Control Mechanism to Reduce INT Overhead and PFC Pause Delay CACC：减少 INT 开销和 PFC 暂停延迟的拥塞感知控制机制

IF 4.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Network and Service Management

Pub Date : 2024-08-26 DOI: 10.1109/TNSM.2024.3449699

Xiwen Jie;Jiangping Han;Guanglei Chen;Hang Wang;Peilin Hong;Kaiping Xue

Nowadays, Remote Direct Memory Access (RDMA) is gaining popularity in data centers for low CPU overhead, high throughput, and ultra-low latency. As one of the state-of-the-art RDMA Congestion Control (CC) mechanisms, HPCC leverages the In-band Network Telemetry (INT) features to achieve accurate control and significantly shortens the Flow Completion Time (FCT) for short flows. However, there exists redundant INT information increasing the processing latency at switches and affecting flows’ throughput. Besides, its end-to-end feedback mechanism is not timely enough to help senders cope well with bursty traffic, and there still exists a high probability of triggering Priority-based Flow Control (PFC) pauses under large-scale incast. In this paper, we propose a Congestion-Aware (CA) control mechanism called CACC, which attempts to push CC to the theoretical low INT overhead and PFC pause delay. CACC introduces two CA algorithms to quantize switch buffer and egress port congestion, separately, along with a fine-grained window size adjustment algorithm at the sender. Specifically, the buffer CA algorithm perceives large-scale congestion that may trigger PFC pauses and provides early feedback, significantly reducing the PFC pause delay. The egress port CA algorithm perceives the link state and selectively inserts useful INT data, achieving lower queue sizes and reducing the average overhead per packet from 42 bytes to 2 bits. In our evaluation, compared with HPCC, PINT, and Bolt, CACC shortens the average and tail FCT by up to 27% and 60.1%, respectively.

如今，远程直接内存访问（RDMA）在数据中心因其低CPU开销、高吞吐量和超低延迟而越来越受欢迎。作为最先进的RDMA拥塞控制（CC）机制之一，HPCC利用带内网络遥测（INT）功能实现精确控制，并显着缩短短流的流量完成时间（FCT）。但是，存在冗余的INT信息，增加了交换机的处理延迟，影响了流的吞吐量。此外，它的端到端反馈机制不够及时，不能很好地帮助发送方应对突发流量，并且在大规模投播下仍然存在触发基于优先级的流量控制（PFC）暂停的高概率。在本文中，我们提出了一种称为CACC的拥塞感知（CA）控制机制，它试图将CC推到理论上的低INT开销和PFC暂停延迟。CACC引入了两种CA算法分别量化交换机缓冲区和出口端口拥塞，以及发送端细粒度窗口大小调整算法。具体来说，缓冲CA算法感知到可能触发PFC暂停的大规模拥塞，并提供早期反馈，显著减少PFC暂停延迟。出口端口CA算法感知链路状态并选择性地插入有用的INT数据，实现更小的队列大小并将每个数据包的平均开销从42字节减少到2位。在我们的评估中，与HPCC、PINT和Bolt相比，CACC将平均FCT和尾部FCT分别缩短了27%和60.1%。

{"title":"CACC: A Congestion-Aware Control Mechanism to Reduce INT Overhead and PFC Pause Delay","authors":"Xiwen Jie;Jiangping Han;Guanglei Chen;Hang Wang;Peilin Hong;Kaiping Xue","doi":"10.1109/TNSM.2024.3449699","DOIUrl":"10.1109/TNSM.2024.3449699","url":null,"abstract":"Nowadays, Remote Direct Memory Access (RDMA) is gaining popularity in data centers for low CPU overhead, high throughput, and ultra-low latency. As one of the state-of-the-art RDMA Congestion Control (CC) mechanisms, HPCC leverages the In-band Network Telemetry (INT) features to achieve accurate control and significantly shortens the Flow Completion Time (FCT) for short flows. However, there exists redundant INT information increasing the processing latency at switches and affecting flows’ throughput. Besides, its end-to-end feedback mechanism is not timely enough to help senders cope well with bursty traffic, and there still exists a high probability of triggering Priority-based Flow Control (PFC) pauses under large-scale incast. In this paper, we propose a Congestion-Aware (CA) control mechanism called CACC, which attempts to push CC to the theoretical low INT overhead and PFC pause delay. CACC introduces two CA algorithms to quantize switch buffer and egress port congestion, separately, along with a fine-grained window size adjustment algorithm at the sender. Specifically, the buffer CA algorithm perceives large-scale congestion that may trigger PFC pauses and provides early feedback, significantly reducing the PFC pause delay. The egress port CA algorithm perceives the link state and selectively inserts useful INT data, achieving lower queue sizes and reducing the average overhead per packet from 42 bytes to 2 bits. In our evaluation, compared with HPCC, PINT, and Bolt, CACC shortens the average and tail FCT by up to 27% and 60.1%, respectively.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6382-6397"},"PeriodicalIF":4.7,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CoSIS: A Secure, Scalability, Decentralized Blockchain via Complexity Theory CoSIS：通过复杂性理论实现安全、可扩展性、去中心化的区块链

IF 4.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Network and Service Management

Pub Date : 2024-08-26 DOI: 10.1109/TNSM.2024.3449575

Hui Wang;Zhenyu Yang;Ming Li;Xiaowei Zhang;Yanlan Hu;Donghui Hu

As the origin of blockchains, the Nakamoto Consensus protocol is the primary protocol for many public blockchains (e.g., Bitcoin) used in cryptocurrencies. Blockchains need to be decentralized as a core feature, yet it is difficult to strike a balance between scalability and security. Many approaches to improving blockchain scalability often result in diminished security or compromise the decentralized nature of the system. Inspired by network science, especially the epidemic model, we try to solve this problem by mapping the propagation of transactions and blocks as two interacting epidemics, called the CoSIS model. We extend the transaction propagation process to increase the efficiency of block propagation, which reduces the number of unknown transactions. The reduction of the block propagation latency ultimately increases the blockchain throughput. The theory of complex networks is employed to offer an optimal boundary condition. Finally, the node scores are stored in the chain, so that it also provides a new incentive approach. Our experiments show that CoSIS accelerates blocks’ propagation and TPS is raised by 20%

$sim ~33$

% on average. At the same time, the system security can be significantly improved, as an orphaned block rate is close to zero in better cases. CoSIS enhances the scalability and security of the blockchain while ensuring that all changes do not compromise the decentralized nature of the blockchain.

作为区块链的起源，中本共识协议是加密货币中使用的许多公共区块链（例如比特币）的主要协议。区块链需要去中心化作为核心功能，但很难在可扩展性和安全性之间取得平衡。许多改进区块链可伸缩性的方法通常会降低安全性或损害系统的分散性。受网络科学，特别是流行病模型的启发，我们试图通过将事务和区块的传播映射为两种相互作用的流行病来解决这个问题，称为CoSIS模型。我们扩展了交易传播过程，提高了区块传播的效率，减少了未知交易的数量。块传播延迟的减少最终增加了区块链吞吐量。利用复杂网络理论给出了最优边界条件。最后，将节点得分存储在链中，这样也提供了一种新的激励方法。我们的实验表明，CoSIS加速了区块的传播，TPS平均提高了20% ~ 33%。同时，系统的安全性可以得到显著提高，因为在较好的情况下，孤立块率接近于零。CoSIS增强了区块链的可伸缩性和安全性，同时确保所有更改都不会损害区块链的分散性。

{"title":"CoSIS: A Secure, Scalability, Decentralized Blockchain via Complexity Theory","authors":"Hui Wang;Zhenyu Yang;Ming Li;Xiaowei Zhang;Yanlan Hu;Donghui Hu","doi":"10.1109/TNSM.2024.3449575","DOIUrl":"10.1109/TNSM.2024.3449575","url":null,"abstract":"As the origin of blockchains, the Nakamoto Consensus protocol is the primary protocol for many public blockchains (e.g., Bitcoin) used in cryptocurrencies. Blockchains need to be decentralized as a core feature, yet it is difficult to strike a balance between scalability and security. Many approaches to improving blockchain scalability often result in diminished security or compromise the decentralized nature of the system. Inspired by network science, especially the epidemic model, we try to solve this problem by mapping the propagation of transactions and blocks as two interacting epidemics, called the CoSIS model. We extend the transaction propagation process to increase the efficiency of block propagation, which reduces the number of unknown transactions. The reduction of the block propagation latency ultimately increases the blockchain throughput. The theory of complex networks is employed to offer an optimal boundary condition. Finally, the node scores are stored in the chain, so that it also provides a new incentive approach. Our experiments show that CoSIS accelerates blocks’ propagation and TPS is raised by 20% \u0000<inline-formula> <tex-math>$sim ~33$ </tex-math></inline-formula>\u0000% on average. At the same time, the system security can be significantly improved, as an orphaned block rate is close to zero in better cases. CoSIS enhances the scalability and security of the blockchain while ensuring that all changes do not compromise the decentralized nature of the blockchain.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6204-6217"},"PeriodicalIF":4.7,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mitigating Label Flipping Attacks in Malicious URL Detectors Using Ensemble Trees 利用集合树缓解恶意 URL 检测器中的标签翻转攻击

IF 4.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Network and Service Management

Pub Date : 2024-08-26 DOI: 10.1109/TNSM.2024.3447411

Ehsan Nowroozi;Nada Jadalla;Samaneh Ghelichkhani;Alireza Jolfaei

Malicious URLs present significant threats to businesses, such as transportation and banking, causing disruptions in business operations. It is essential to identify these URLs; however, existing Machine Learning models are vulnerable to backdoor attacks. These attacks involve manipulating a small portion of the training data labels, such as Label Flipping, which can lead to misclassification. Therefore, it is crucial to incorporate defense mechanisms into machine-learning models to protect against such attacks. The focus of this study is on backdoor attacks in the context of URL detection using ensemble trees. By illuminating the motivations behind such attacks, highlighting the roles of attackers, and emphasizing the critical importance of effective defense strategies, this paper contributes to the ongoing efforts to fortify machine-learning models against adversarial threats within the machine-learning domain in network security. We propose an innovative alarm system that detects the presence of poisoned labels and a defense mechanism designed to uncover the original class labels with the aim of mitigating backdoor attacks on ensemble tree classifiers. We conducted a case study using the Alexa and Phishing Site URL datasets and showed that label-flipping attacks can be addressed using our proposed defense mechanism. Our experimental results prove that the Label Flipping attack achieved an Attack Success Rate between 50-65% within 2-5%, and the innovative defense method successfully detected poisoned labels with an accuracy of up to 100%.

恶意url对业务（如运输和银行）构成重大威胁，导致业务操作中断。识别这些url至关重要；然而，现有的机器学习模型很容易受到后门攻击。这些攻击涉及操纵一小部分训练数据标签，例如标签翻转，这可能导致错误分类。因此，将防御机制整合到机器学习模型中以防止此类攻击至关重要。本研究的重点是使用集成树进行URL检测的背景下的后门攻击。通过阐明此类攻击背后的动机，突出攻击者的角色，并强调有效防御策略的关键重要性，本文有助于加强机器学习模型在网络安全机器学习领域内对抗对抗性威胁的持续努力。我们提出了一种创新的报警系统，可以检测有毒标签的存在，并提出了一种防御机制，旨在揭示原始类标签，以减轻对集成树分类器的后门攻击。我们使用Alexa和钓鱼网站URL数据集进行了一个案例研究，并表明可以使用我们提出的防御机制来解决标签翻转攻击。我们的实验结果证明，标签翻转攻击在2-5%的范围内实现了50-65%的攻击成功率，创新的防御方法成功检测出有毒标签，准确率高达100%。

{"title":"Mitigating Label Flipping Attacks in Malicious URL Detectors Using Ensemble Trees","authors":"Ehsan Nowroozi;Nada Jadalla;Samaneh Ghelichkhani;Alireza Jolfaei","doi":"10.1109/TNSM.2024.3447411","DOIUrl":"10.1109/TNSM.2024.3447411","url":null,"abstract":"Malicious URLs present significant threats to businesses, such as transportation and banking, causing disruptions in business operations. It is essential to identify these URLs; however, existing Machine Learning models are vulnerable to backdoor attacks. These attacks involve manipulating a small portion of the training data labels, such as Label Flipping, which can lead to misclassification. Therefore, it is crucial to incorporate defense mechanisms into machine-learning models to protect against such attacks. The focus of this study is on backdoor attacks in the context of URL detection using ensemble trees. By illuminating the motivations behind such attacks, highlighting the roles of attackers, and emphasizing the critical importance of effective defense strategies, this paper contributes to the ongoing efforts to fortify machine-learning models against adversarial threats within the machine-learning domain in network security. We propose an innovative alarm system that detects the presence of poisoned labels and a defense mechanism designed to uncover the original class labels with the aim of mitigating backdoor attacks on ensemble tree classifiers. We conducted a case study using the Alexa and Phishing Site URL datasets and showed that label-flipping attacks can be addressed using our proposed defense mechanism. Our experimental results prove that the Label Flipping attack achieved an Attack Success Rate between 50-65% within 2-5%, and the innovative defense method successfully detected poisoned labels with an accuracy of up to 100%.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6875-6884"},"PeriodicalIF":4.7,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FReD-ViQ: Fuzzy Reinforcement Learning Driven Adaptive Streaming Solution for Improved Video Quality of Experience FReD-ViQ：模糊强化学习驱动的自适应流媒体解决方案，改善视频体验质量

IF 4.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Network and Service Management

Pub Date : 2024-08-26 DOI: 10.1109/TNSM.2024.3450014

Abid Yaqoob;Gabriel-Miro Muntean

Next-generation cellular networks strive to offer ubiquitous connectivity, enhanced transmission rates with increased capacity, and superior network coverage. However, they face significant challenges due to the growing demand for multimedia services across diverse devices. Adaptive multimedia streaming services are essential for achieving good viewer Quality of Experience (QoE) levels amidst these challenges. Yet, the existing adaptive video streaming solutions do not consider diverse QoE preferences or are limited to meeting specific QoE objectives. This paper presents FReD-ViQ, a Fuzzy Reinforcement Learning-Driven Adaptive Streaming Solution for Improved Video QoE that combines the strengths of fuzzy logic and advanced Deep Reinforcement Learning (DRL) mechanisms to deliver exceptional, individually tailored user experiences. FReD-ViQ is a sophisticated streaming solution that leverages efficient membership function modelling to achieve a more finely-grained representation of both input and output spaces. This advanced representation is augmented by a set of fuzzy rules that govern the decision-making process. In addition to its fuzzy logic capabilities, FReD-ViQ incorporates a novel DRL algorithm based on Dueling Double Deep Q-Network (Dueling DDQN), noisy networks, and prioritized experience replay (PER) techniques. This innovative fusion enables effective modelling of uncertain network dynamics and high-dimensional state spaces while optimizing exploration-exploitation trade-offs in adaptive streaming environments. Extensive performance evaluations in real-world simulation settings demonstrate that FReD-ViQ effectively surpasses existing solutions across multiple QoE models, yielding average improvements of 23.10% (Linear QoE), 23.97% (Log QoE), and 33.42% (HD QoE).

下一代蜂窝网络致力于提供无处不在的连接、更大容量的传输速率以及更优越的网络覆盖。然而，由于各种设备对多媒体服务的需求不断增长，下一代蜂窝网络面临着巨大的挑战。自适应多媒体流服务对于在这些挑战中实现良好的观众体验质量（QoE）水平至关重要。然而，现有的自适应视频流解决方案并未考虑不同的 QoE 偏好，或仅限于满足特定的 QoE 目标。本文介绍的 FReD-ViQ 是一种模糊强化学习驱动的自适应流媒体解决方案，它结合了模糊逻辑和高级深度强化学习（DRL）机制的优势，可提供卓越的、个性化定制的用户体验。FReD-ViQ 是一种复杂的流媒体解决方案，它利用高效的成员函数建模来实现输入和输出空间的更精细表示。这套先进的表示方法由一套管理决策过程的模糊规则加以补充。除了模糊逻辑功能外，FReD-ViQ 还采用了基于决斗双深 Q 网络（Dueling Double Deep Q-Network，DDQN）、噪声网络和优先体验重放（PER）技术的新型 DRL 算法。这种创新的融合技术能够有效地模拟不确定的网络动态和高维状态空间，同时优化自适应流媒体环境中的探索-开发权衡。在真实世界的模拟环境中进行的广泛性能评估表明，FReD-ViQ 在多个 QoE 模型中都有效地超越了现有解决方案，平均提高了 23.10%（线性 QoE）、23.97%（对数 QoE）和 33.42%（高清 QoE）。

{"title":"FReD-ViQ: Fuzzy Reinforcement Learning Driven Adaptive Streaming Solution for Improved Video Quality of Experience","authors":"Abid Yaqoob;Gabriel-Miro Muntean","doi":"10.1109/TNSM.2024.3450014","DOIUrl":"10.1109/TNSM.2024.3450014","url":null,"abstract":"Next-generation cellular networks strive to offer ubiquitous connectivity, enhanced transmission rates with increased capacity, and superior network coverage. However, they face significant challenges due to the growing demand for multimedia services across diverse devices. Adaptive multimedia streaming services are essential for achieving good viewer Quality of Experience (QoE) levels amidst these challenges. Yet, the existing adaptive video streaming solutions do not consider diverse QoE preferences or are limited to meeting specific QoE objectives. This paper presents FReD-ViQ, a Fuzzy Reinforcement Learning-Driven Adaptive Streaming Solution for Improved Video QoE that combines the strengths of fuzzy logic and advanced Deep Reinforcement Learning (DRL) mechanisms to deliver exceptional, individually tailored user experiences. FReD-ViQ is a sophisticated streaming solution that leverages efficient membership function modelling to achieve a more finely-grained representation of both input and output spaces. This advanced representation is augmented by a set of fuzzy rules that govern the decision-making process. In addition to its fuzzy logic capabilities, FReD-ViQ incorporates a novel DRL algorithm based on Dueling Double Deep Q-Network (Dueling DDQN), noisy networks, and prioritized experience replay (PER) techniques. This innovative fusion enables effective modelling of uncertain network dynamics and high-dimensional state spaces while optimizing exploration-exploitation trade-offs in adaptive streaming environments. Extensive performance evaluations in real-world simulation settings demonstrate that FReD-ViQ effectively surpasses existing solutions across multiple QoE models, yielding average improvements of 23.10% (Linear QoE), 23.97% (Log QoE), and 33.42% (HD QoE).","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 5","pages":"5532-5547"},"PeriodicalIF":4.7,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10648983","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FAPM: A Fake Amplification Phenomenon Monitor to Filter DRDoS Attacks With P4 Data Plane FAPM：利用 P4 数据平面过滤 DRDoS 攻击的假放大现象监控器

IF 4.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Network and Service Management

Pub Date : 2024-08-26 DOI: 10.1109/TNSM.2024.3449889

Dan Tang;Xiaocai Wang;Keqin Li;Chao Yin;Wei Liang;Jiliang Zhang

Distributed Reflection Denial-of-Service (DRDoS) attacks have caused significant destructive effects by virtue of emerging protocol vulnerabilities and amplification advantages, and their intensity is increasing. The emergence of programmable data plane supporting line-rate forwarding provides a new opportunity for fine-grained and efficient attack detection. This paper proposed a light-weight DRDoS attack detection and mitigation system called FAPM, which is deployed at the victim end with the intention of detecting the amplification behavior caused by the attack. It places the work of collecting and calculating reflection features on the data plane operated by “latter window assisting former window” mechanism, and arranges complex identification and regulation logic on the control plane. This approach avoids the hardware constraints of the programmable switch while leveraging their per-packet processing capability. Also, it reduces communication traffic significantly through feature compression and state transitions. Experiments show that FAPM has (1) fast response capability within seconds (2) a memory footprint at the KB level and communication overhead of 1 Kbps, and (3) good robustness.

分布式反射拒绝服务（Distributed Reflection Denial-of-Service，简称ddos）攻击利用新出现的协议漏洞和放大优势，造成了严重的破坏性影响，攻击强度不断增强。支持线速率转发的可编程数据平面的出现，为细粒度、高效的攻击检测提供了新的契机。本文提出了一种轻量级的DRDoS攻击检测与缓解系统FAPM，该系统部署在受害者端，目的是检测由攻击引起的放大行为。将收集和计算反射特征的工作放在“后窗辅助前窗”机制操作的数据平面上，将复杂的识别和调节逻辑安排在控制平面上。这种方法避免了可编程交换机的硬件限制，同时利用了它们的逐包处理能力。此外，它还通过特征压缩和状态转换显著减少了通信流量。实验表明，FAPM具有(1)秒级快速响应能力(2)KB级内存占用和1 Kbps的通信开销(3)良好的鲁棒性。

{"title":"FAPM: A Fake Amplification Phenomenon Monitor to Filter DRDoS Attacks With P4 Data Plane","authors":"Dan Tang;Xiaocai Wang;Keqin Li;Chao Yin;Wei Liang;Jiliang Zhang","doi":"10.1109/TNSM.2024.3449889","DOIUrl":"10.1109/TNSM.2024.3449889","url":null,"abstract":"Distributed Reflection Denial-of-Service (DRDoS) attacks have caused significant destructive effects by virtue of emerging protocol vulnerabilities and amplification advantages, and their intensity is increasing. The emergence of programmable data plane supporting line-rate forwarding provides a new opportunity for fine-grained and efficient attack detection. This paper proposed a light-weight DRDoS attack detection and mitigation system called FAPM, which is deployed at the victim end with the intention of detecting the amplification behavior caused by the attack. It places the work of collecting and calculating reflection features on the data plane operated by “latter window assisting former window” mechanism, and arranges complex identification and regulation logic on the control plane. This approach avoids the hardware constraints of the programmable switch while leveraging their per-packet processing capability. Also, it reduces communication traffic significantly through feature compression and state transitions. Experiments show that FAPM has (1) fast response capability within seconds (2) a memory footprint at the KB level and communication overhead of 1 Kbps, and (3) good robustness.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"21 6","pages":"6703-6715"},"PeriodicalIF":4.7,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0