{"title":"Two-Step Deep Reinforcement Q-Learning based Relay Selection in Cooperative WPCNs","authors":"Gulnur Tolebi, T. Tsiftsis, G. Nauryzbayev","doi":"10.1109/BalkanCom58402.2023.10167871","DOIUrl":null,"url":null,"abstract":"In this paper, we propose an intelligent relay selection scheme employing deep reinforcement learning for a wireless powered cooperative network. We formulate the given problem as a Markov decision process with an unknown transitional probability between states. Therefore, a model-free off-policy relay selection model is proposed. The given model was deployed using a deep Q-network, with an updated relay selection process. Using channel characteristics, we find inaccessible nodes to form a pool of relays available for transmission and encourage the neural network to choose them. In addition, we propose a novel reward policy to train the model that is based on stored energy levels on the relays and promotes the system to expend energy. We numerically quantity the network performance in terms of outage probability and energy outage probability and compare them with the basic Q-learning.","PeriodicalId":363999,"journal":{"name":"2023 International Balkan Conference on Communications and Networking (BalkanCom)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Balkan Conference on Communications and Networking (BalkanCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BalkanCom58402.2023.10167871","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we propose an intelligent relay selection scheme employing deep reinforcement learning for a wireless powered cooperative network. We formulate the given problem as a Markov decision process with an unknown transitional probability between states. Therefore, a model-free off-policy relay selection model is proposed. The given model was deployed using a deep Q-network, with an updated relay selection process. Using channel characteristics, we find inaccessible nodes to form a pool of relays available for transmission and encourage the neural network to choose them. In addition, we propose a novel reward policy to train the model that is based on stored energy levels on the relays and promotes the system to expend energy. We numerically quantity the network performance in terms of outage probability and energy outage probability and compare them with the basic Q-learning.