Deep Reinforcement Learning for Uplink Scheduling in NOMA-URLLC Networks

Benoît-Marie Robaglia;Marceau Coupechoux;Dimitrios Tsilimantos
{"title":"Deep Reinforcement Learning for Uplink Scheduling in NOMA-URLLC Networks","authors":"Benoît-Marie Robaglia;Marceau Coupechoux;Dimitrios Tsilimantos","doi":"10.1109/TMLCN.2024.3437351","DOIUrl":null,"url":null,"abstract":"This article addresses the problem of Ultra Reliable Low Latency Communications (URLLC) in wireless networks, a framework with particularly stringent constraints imposed by many Internet of Things (IoT) applications from diverse sectors. We propose a novel Deep Reinforcement Learning (DRL) scheduling algorithm, named NOMA-PPO, to solve the Non-Orthogonal Multiple Access (NOMA) uplink URLLC scheduling problem involving strict deadlines. The challenge of addressing uplink URLLC requirements in NOMA systems is related to the combinatorial complexity of the action space due to the possibility to schedule multiple devices, and to the partial observability constraint that we impose to our algorithm in order to meet the IoT communication constraints and be scalable. Our approach involves 1) formulating the NOMA-URLLC problem as a Partially Observable Markov Decision Process (POMDP) and the introduction of an agent state, serving as a sufficient statistic of past observations and actions, enabling a transformation of the POMDP into a Markov Decision Process (MDP); 2) adapting the Proximal Policy Optimization (PPO) algorithm to handle the combinatorial action space; 3) incorporating prior knowledge into the learning agent with the introduction of a Bayesian policy. Numerical results reveal that not only does our approach outperform traditional multiple access protocols and DRL benchmarks on 3GPP scenarios, but also proves to be robust under various channel and traffic configurations, efficiently exploiting inherent time correlations.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"2 ","pages":"1142-1158"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10621640","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10621640/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This article addresses the problem of Ultra Reliable Low Latency Communications (URLLC) in wireless networks, a framework with particularly stringent constraints imposed by many Internet of Things (IoT) applications from diverse sectors. We propose a novel Deep Reinforcement Learning (DRL) scheduling algorithm, named NOMA-PPO, to solve the Non-Orthogonal Multiple Access (NOMA) uplink URLLC scheduling problem involving strict deadlines. The challenge of addressing uplink URLLC requirements in NOMA systems is related to the combinatorial complexity of the action space due to the possibility to schedule multiple devices, and to the partial observability constraint that we impose to our algorithm in order to meet the IoT communication constraints and be scalable. Our approach involves 1) formulating the NOMA-URLLC problem as a Partially Observable Markov Decision Process (POMDP) and the introduction of an agent state, serving as a sufficient statistic of past observations and actions, enabling a transformation of the POMDP into a Markov Decision Process (MDP); 2) adapting the Proximal Policy Optimization (PPO) algorithm to handle the combinatorial action space; 3) incorporating prior knowledge into the learning agent with the introduction of a Bayesian policy. Numerical results reveal that not only does our approach outperform traditional multiple access protocols and DRL benchmarks on 3GPP scenarios, but also proves to be robust under various channel and traffic configurations, efficiently exploiting inherent time correlations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
针对 NOMA-URLLC 网络中上行链路调度的深度强化学习
本文探讨了无线网络中的超可靠低延迟通信(URLLC)问题,来自不同领域的许多物联网(IoT)应用对这一框架提出了特别严格的限制。我们提出了一种名为 NOMA-PPO 的新型深度强化学习(DRL)调度算法,用于解决涉及严格期限的非正交多址(NOMA)上行 URLLC 调度问题。在 NOMA 系统中解决上行链路 URLLC 要求所面临的挑战与行动空间的组合复杂性有关,因为有可能调度多个设备,还与部分可观测性约束有关,为了满足物联网通信约束和可扩展性,我们对算法施加了部分可观测性约束。我们的方法包括:1)将 NOMA-URLLC 问题表述为部分可观测马尔可夫决策过程(POMDP),并引入代理状态,作为过去观测和行动的充分统计,从而将 POMDP 转换为马尔可夫决策过程(MDP);2)调整近端策略优化(PPO)算法,以处理组合行动空间;3)通过引入贝叶斯策略,将先验知识纳入学习代理。数值结果表明,在 3GPP 场景下,我们的方法不仅优于传统的多重接入协议和 DRL 基准,而且在各种信道和流量配置下都证明是稳健的,能有效利用固有的时间相关性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Asynchronous Real-Time Federated Learning for Anomaly Detection in Microservice Cloud Applications Conditional Denoising Diffusion Probabilistic Models for Data Reconstruction Enhancement in Wireless Communications Multi-Agent Reinforcement Learning With Action Masking for UAV-Enabled Mobile Communications Online Learning for Intelligent Thermal Management of Interference-Coupled and Passively Cooled Base Stations Robust and Lightweight Modeling of IoT Network Behaviors From Raw Traffic Packets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1