{"title":"Digital Twin Enabled Q-Learning for Flying Base Station Placement: Impact of Varying Environment and Model Errors","authors":"T. Guo","doi":"10.1109/ISORC58943.2023.00042","DOIUrl":null,"url":null,"abstract":"This paper considers a use case of flying base station placement enabled by digital twin (DT), and demonstrates how DT can help reduce the impact of a non-stationary environment on reinforcement learning (RL). RL is able to learn the optimal policy via interacting with a specific environment. However, it has been observed that RL is very sensitive to environment change, mainly because the environment variation disturbs RL training/learning. A possible approach is to execute the RL process in the DT using snapshots of the environment (parameters), and update the parameters at a proper frequency. The DT-RL bundled approach takes advantage of computing resources in the DT, speeds up the process and saves battery energy of the flying base station, and more importantly, mitigates the nonstationary impact on RL. Specifically, the use case is about quickly connecting mobile users with an aerial bass station. The base station is autonomously and optimally placed according to a predefined criterion to connect scattered slow-movement users. Q-learning, a common type of RL, is employed as a solution to the optimization of base station placement. Tailored for this application, a two-stage base station placement algorithm is proposed and evaluated. For the configuration considered in this paper, numerical results suggest that 1) the Q-learning algorithm run solely in the physical space does not work due to intolerable time consuming and optimization divergence, and 2) the proposed scheme can catch up with random slow movement of mobile users, and tolerate certain measurement and model errors. With necessary modification and extension, the proposed framework could be applied to other DT-assisted cyber-physical systems.","PeriodicalId":281426,"journal":{"name":"2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISORC58943.2023.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper considers a use case of flying base station placement enabled by digital twin (DT), and demonstrates how DT can help reduce the impact of a non-stationary environment on reinforcement learning (RL). RL is able to learn the optimal policy via interacting with a specific environment. However, it has been observed that RL is very sensitive to environment change, mainly because the environment variation disturbs RL training/learning. A possible approach is to execute the RL process in the DT using snapshots of the environment (parameters), and update the parameters at a proper frequency. The DT-RL bundled approach takes advantage of computing resources in the DT, speeds up the process and saves battery energy of the flying base station, and more importantly, mitigates the nonstationary impact on RL. Specifically, the use case is about quickly connecting mobile users with an aerial bass station. The base station is autonomously and optimally placed according to a predefined criterion to connect scattered slow-movement users. Q-learning, a common type of RL, is employed as a solution to the optimization of base station placement. Tailored for this application, a two-stage base station placement algorithm is proposed and evaluated. For the configuration considered in this paper, numerical results suggest that 1) the Q-learning algorithm run solely in the physical space does not work due to intolerable time consuming and optimization divergence, and 2) the proposed scheme can catch up with random slow movement of mobile users, and tolerate certain measurement and model errors. With necessary modification and extension, the proposed framework could be applied to other DT-assisted cyber-physical systems.