Deep Reinforcement Learning for Uplink Scheduling in NOMA-URLLC Networks

IEEE Transactions on Machine Learning in Communications and Networking Pub Date : 2024-08-02 DOI:10.1109/TMLCN.2024.3437351

Benoît-Marie Robaglia;Marceau Coupechoux;Dimitrios Tsilimantos

{"title":"Deep Reinforcement Learning for Uplink Scheduling in NOMA-URLLC Networks","authors":"Benoît-Marie Robaglia;Marceau Coupechoux;Dimitrios Tsilimantos","doi":"10.1109/TMLCN.2024.3437351","DOIUrl":null,"url":null,"abstract":"This article addresses the problem of Ultra Reliable Low Latency Communications (URLLC) in wireless networks, a framework with particularly stringent constraints imposed by many Internet of Things (IoT) applications from diverse sectors. We propose a novel Deep Reinforcement Learning (DRL) scheduling algorithm, named NOMA-PPO, to solve the Non-Orthogonal Multiple Access (NOMA) uplink URLLC scheduling problem involving strict deadlines. The challenge of addressing uplink URLLC requirements in NOMA systems is related to the combinatorial complexity of the action space due to the possibility to schedule multiple devices, and to the partial observability constraint that we impose to our algorithm in order to meet the IoT communication constraints and be scalable. Our approach involves 1) formulating the NOMA-URLLC problem as a Partially Observable Markov Decision Process (POMDP) and the introduction of an agent state, serving as a sufficient statistic of past observations and actions, enabling a transformation of the POMDP into a Markov Decision Process (MDP); 2) adapting the Proximal Policy Optimization (PPO) algorithm to handle the combinatorial action space; 3) incorporating prior knowledge into the learning agent with the introduction of a Bayesian policy. Numerical results reveal that not only does our approach outperform traditional multiple access protocols and DRL benchmarks on 3GPP scenarios, but also proves to be robust under various channel and traffic configurations, efficiently exploiting inherent time correlations.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"2 ","pages":"1142-1158"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10621640","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10621640/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This article addresses the problem of Ultra Reliable Low Latency Communications (URLLC) in wireless networks, a framework with particularly stringent constraints imposed by many Internet of Things (IoT) applications from diverse sectors. We propose a novel Deep Reinforcement Learning (DRL) scheduling algorithm, named NOMA-PPO, to solve the Non-Orthogonal Multiple Access (NOMA) uplink URLLC scheduling problem involving strict deadlines. The challenge of addressing uplink URLLC requirements in NOMA systems is related to the combinatorial complexity of the action space due to the possibility to schedule multiple devices, and to the partial observability constraint that we impose to our algorithm in order to meet the IoT communication constraints and be scalable. Our approach involves 1) formulating the NOMA-URLLC problem as a Partially Observable Markov Decision Process (POMDP) and the introduction of an agent state, serving as a sufficient statistic of past observations and actions, enabling a transformation of the POMDP into a Markov Decision Process (MDP); 2) adapting the Proximal Policy Optimization (PPO) algorithm to handle the combinatorial action space; 3) incorporating prior knowledge into the learning agent with the introduction of a Bayesian policy. Numerical results reveal that not only does our approach outperform traditional multiple access protocols and DRL benchmarks on 3GPP scenarios, but also proves to be robust under various channel and traffic configurations, efficiently exploiting inherent time correlations.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

针对 NOMA-URLLC 网络中上行链路调度的深度强化学习

本文探讨了无线网络中的超可靠低延迟通信（URLLC）问题，来自不同领域的许多物联网（IoT）应用对这一框架提出了特别严格的限制。我们提出了一种名为 NOMA-PPO 的新型深度强化学习（DRL）调度算法，用于解决涉及严格期限的非正交多址（NOMA）上行 URLLC 调度问题。在 NOMA 系统中解决上行链路 URLLC 要求所面临的挑战与行动空间的组合复杂性有关，因为有可能调度多个设备，还与部分可观测性约束有关，为了满足物联网通信约束和可扩展性，我们对算法施加了部分可观测性约束。我们的方法包括：1）将 NOMA-URLLC 问题表述为部分可观测马尔可夫决策过程（POMDP），并引入代理状态，作为过去观测和行动的充分统计，从而将 POMDP 转换为马尔可夫决策过程（MDP）；2）调整近端策略优化（PPO）算法，以处理组合行动空间；3）通过引入贝叶斯策略，将先验知识纳入学习代理。数值结果表明，在 3GPP 场景下，我们的方法不仅优于传统的多重接入协议和 DRL 基准，而且在各种信道和流量配置下都证明是稳健的，能有效利用固有的时间相关性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Machine Learning in Communications and Networking

自引率

0.00%

发文量