{"title":"Optimization of URLLC and eMBB Multiplexing via Deep Reinforcement Learning","authors":"Yang Li, Chunjing Hu, Jun Wang, Mingfeng Xu","doi":"10.1109/ICCChinaW.2019.8850168","DOIUrl":null,"url":null,"abstract":"In 5G mobile networks, multiple scenarios have emerged to meet different services requirement. The limited spectrum resource becoming more and more crowed to meet different requirements. To improve the limited transmission resource (spectrum, time, power etc.) utilization while meet the different needs of users, we introduce the reward function as a measure of different allocate policies. Then we calculate the reward that different allocation policies might gain. The arrived state is a Markov Process which means the next coming state is only determined by the current state. To solve the optimization problem, we introduce the Q-Iearning algorithm. Due to the state space is enormous, this paper strives to illustrate a DQN (Deep Q-Network) based resource allocation algorithm. Numerical experiments provided in this paper show the performance of the proposed algorithms by comparing with two baselines.","PeriodicalId":252172,"journal":{"name":"2019 IEEE/CIC International Conference on Communications Workshops in China (ICCC Workshops)","volume":"196 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/CIC International Conference on Communications Workshops in China (ICCC Workshops)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCChinaW.2019.8850168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
In 5G mobile networks, multiple scenarios have emerged to meet different services requirement. The limited spectrum resource becoming more and more crowed to meet different requirements. To improve the limited transmission resource (spectrum, time, power etc.) utilization while meet the different needs of users, we introduce the reward function as a measure of different allocate policies. Then we calculate the reward that different allocation policies might gain. The arrived state is a Markov Process which means the next coming state is only determined by the current state. To solve the optimization problem, we introduce the Q-Iearning algorithm. Due to the state space is enormous, this paper strives to illustrate a DQN (Deep Q-Network) based resource allocation algorithm. Numerical experiments provided in this paper show the performance of the proposed algorithms by comparing with two baselines.