Packet Drop Probability-Optimal Cross-layer Scheduling: Dealing with Curse of Sparsity using Prioritized Experience Replay

M. Sharma, P. Tan, E. Kurniawan, Sumei Sun
{"title":"Packet Drop Probability-Optimal Cross-layer Scheduling: Dealing with Curse of Sparsity using Prioritized Experience Replay","authors":"M. Sharma, P. Tan, E. Kurniawan, Sumei Sun","doi":"10.1109/ICCWorkshops50388.2021.9473857","DOIUrl":null,"url":null,"abstract":"In this work, we develop a reinforcement learning (RL) based model-free approach to obtain a policy for joint packet scheduling and rate adaptation, such that the packet drop probability (PDP) is minimized. The developed learning scheme yields an online cross-layer scheduling policy which takes into account the randomness in packet arrivals and wireless channels, as well as the state of packet buffers. Inherent difference in the time-scales of packet arrival process and the wireless channel variations leads to sparsity in the observed reward signal. Since an RL agent learns by using the feedback obtained in terms of rewards for its actions, the sample complexity of RL approach increases exponentially due to resulting sparsity. Therefore, a basic RL based approach, e.g., double deep Q-network (DDQN) based RL, results in a policy with negligible performance gain over the state-of-the-art schemes, such as shortest processing time (SPT) based scheduling. In order to alleviate the sparse reward problem, we leverage prioritized experience replay (PER) and develop a DDQN-based learning scheme with PER. We observe through simulations that the policy learned using DDQN-PER approach results in a 3-5% lower PDP, compared to both the basic DDQN based RL and SPT scheme.","PeriodicalId":127186,"journal":{"name":"2021 IEEE International Conference on Communications Workshops (ICC Workshops)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Communications Workshops (ICC Workshops)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCWorkshops50388.2021.9473857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In this work, we develop a reinforcement learning (RL) based model-free approach to obtain a policy for joint packet scheduling and rate adaptation, such that the packet drop probability (PDP) is minimized. The developed learning scheme yields an online cross-layer scheduling policy which takes into account the randomness in packet arrivals and wireless channels, as well as the state of packet buffers. Inherent difference in the time-scales of packet arrival process and the wireless channel variations leads to sparsity in the observed reward signal. Since an RL agent learns by using the feedback obtained in terms of rewards for its actions, the sample complexity of RL approach increases exponentially due to resulting sparsity. Therefore, a basic RL based approach, e.g., double deep Q-network (DDQN) based RL, results in a policy with negligible performance gain over the state-of-the-art schemes, such as shortest processing time (SPT) based scheduling. In order to alleviate the sparse reward problem, we leverage prioritized experience replay (PER) and develop a DDQN-based learning scheme with PER. We observe through simulations that the policy learned using DDQN-PER approach results in a 3-5% lower PDP, compared to both the basic DDQN based RL and SPT scheme.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
丢包概率-最优跨层调度:使用优先体验重放处理稀疏性诅咒
在这项工作中,我们开发了一种基于强化学习(RL)的无模型方法来获得联合数据包调度和速率自适应的策略,从而使丢包概率(PDP)最小化。所开发的学习方案产生了一种在线跨层调度策略,该策略考虑了数据包到达和无线信道的随机性以及数据包缓冲区的状态。数据包到达过程的固有时间尺度差异和无线信道变化导致观察到的奖励信号稀疏。由于RL代理通过使用从其行为的奖励方面获得的反馈来学习,因此RL方法的样本复杂性由于产生的稀疏性而呈指数增长。因此,基于RL的基本方法,例如,基于双深度q网络(DDQN)的RL,与最先进的方案(如基于最短处理时间(SPT)的调度)相比,产生的策略性能增益可以忽略不计。为了缓解稀疏奖励问题,我们利用优先体验重放(PER)并开发了一个基于ddqn的PER学习方案。我们通过模拟观察到,与基本的基于DDQN的RL和SPT方案相比,使用DDQN- per方法学习的策略的PDP降低了3-5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
BML: An Efficient and Versatile Tool for BGP Dataset Collection Efficient and Privacy-Preserving Contact Tracing System for Covid-19 using Blockchain MEC-Based Energy-Aware Distributed Feature Extraction for mHealth Applications with Strict Latency Requirements Distributed Multi-Agent Learning for Service Function Chain Partial Offloading at the Edge A Deep Neural Network Based Environment Sensing in the Presence of Jammers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1