Packet Drop Probability-Optimal Cross-layer Scheduling: Dealing with Curse of Sparsity using Prioritized Experience Replay

2021 IEEE International Conference on Communications Workshops (ICC Workshops) Pub Date : 2021-06-01 DOI:10.1109/ICCWorkshops50388.2021.9473857

M. Sharma, P. Tan, E. Kurniawan, Sumei Sun

{"title":"Packet Drop Probability-Optimal Cross-layer Scheduling: Dealing with Curse of Sparsity using Prioritized Experience Replay","authors":"M. Sharma, P. Tan, E. Kurniawan, Sumei Sun","doi":"10.1109/ICCWorkshops50388.2021.9473857","DOIUrl":null,"url":null,"abstract":"In this work, we develop a reinforcement learning (RL) based model-free approach to obtain a policy for joint packet scheduling and rate adaptation, such that the packet drop probability (PDP) is minimized. The developed learning scheme yields an online cross-layer scheduling policy which takes into account the randomness in packet arrivals and wireless channels, as well as the state of packet buffers. Inherent difference in the time-scales of packet arrival process and the wireless channel variations leads to sparsity in the observed reward signal. Since an RL agent learns by using the feedback obtained in terms of rewards for its actions, the sample complexity of RL approach increases exponentially due to resulting sparsity. Therefore, a basic RL based approach, e.g., double deep Q-network (DDQN) based RL, results in a policy with negligible performance gain over the state-of-the-art schemes, such as shortest processing time (SPT) based scheduling. In order to alleviate the sparse reward problem, we leverage prioritized experience replay (PER) and develop a DDQN-based learning scheme with PER. We observe through simulations that the policy learned using DDQN-PER approach results in a 3-5% lower PDP, compared to both the basic DDQN based RL and SPT scheme.","PeriodicalId":127186,"journal":{"name":"2021 IEEE International Conference on Communications Workshops (ICC Workshops)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Communications Workshops (ICC Workshops)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCWorkshops50388.2021.9473857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In this work, we develop a reinforcement learning (RL) based model-free approach to obtain a policy for joint packet scheduling and rate adaptation, such that the packet drop probability (PDP) is minimized. The developed learning scheme yields an online cross-layer scheduling policy which takes into account the randomness in packet arrivals and wireless channels, as well as the state of packet buffers. Inherent difference in the time-scales of packet arrival process and the wireless channel variations leads to sparsity in the observed reward signal. Since an RL agent learns by using the feedback obtained in terms of rewards for its actions, the sample complexity of RL approach increases exponentially due to resulting sparsity. Therefore, a basic RL based approach, e.g., double deep Q-network (DDQN) based RL, results in a policy with negligible performance gain over the state-of-the-art schemes, such as shortest processing time (SPT) based scheduling. In order to alleviate the sparse reward problem, we leverage prioritized experience replay (PER) and develop a DDQN-based learning scheme with PER. We observe through simulations that the policy learned using DDQN-PER approach results in a 3-5% lower PDP, compared to both the basic DDQN based RL and SPT scheme.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

丢包概率-最优跨层调度:使用优先体验重放处理稀疏性诅咒

在这项工作中，我们开发了一种基于强化学习(RL)的无模型方法来获得联合数据包调度和速率自适应的策略，从而使丢包概率(PDP)最小化。所开发的学习方案产生了一种在线跨层调度策略，该策略考虑了数据包到达和无线信道的随机性以及数据包缓冲区的状态。数据包到达过程的固有时间尺度差异和无线信道变化导致观察到的奖励信号稀疏。由于RL代理通过使用从其行为的奖励方面获得的反馈来学习，因此RL方法的样本复杂性由于产生的稀疏性而呈指数增长。因此，基于RL的基本方法，例如，基于双深度q网络(DDQN)的RL，与最先进的方案(如基于最短处理时间(SPT)的调度)相比，产生的策略性能增益可以忽略不计。为了缓解稀疏奖励问题，我们利用优先体验重放(PER)并开发了一个基于ddqn的PER学习方案。我们通过模拟观察到，与基本的基于DDQN的RL和SPT方案相比，使用DDQN- per方法学习的策略的PDP降低了3-5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE International Conference on Communications Workshops (ICC Workshops)

自引率

0.00%

发文量