Digital Twin Enabled Q-Learning for Flying Base Station Placement: Impact of Varying Environment and Model Errors

2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC) Pub Date : 2023-05-01 DOI:10.1109/ISORC58943.2023.00042

T. Guo

{"title":"Digital Twin Enabled Q-Learning for Flying Base Station Placement: Impact of Varying Environment and Model Errors","authors":"T. Guo","doi":"10.1109/ISORC58943.2023.00042","DOIUrl":null,"url":null,"abstract":"This paper considers a use case of flying base station placement enabled by digital twin (DT), and demonstrates how DT can help reduce the impact of a non-stationary environment on reinforcement learning (RL). RL is able to learn the optimal policy via interacting with a specific environment. However, it has been observed that RL is very sensitive to environment change, mainly because the environment variation disturbs RL training/learning. A possible approach is to execute the RL process in the DT using snapshots of the environment (parameters), and update the parameters at a proper frequency. The DT-RL bundled approach takes advantage of computing resources in the DT, speeds up the process and saves battery energy of the flying base station, and more importantly, mitigates the nonstationary impact on RL. Specifically, the use case is about quickly connecting mobile users with an aerial bass station. The base station is autonomously and optimally placed according to a predefined criterion to connect scattered slow-movement users. Q-learning, a common type of RL, is employed as a solution to the optimization of base station placement. Tailored for this application, a two-stage base station placement algorithm is proposed and evaluated. For the configuration considered in this paper, numerical results suggest that 1) the Q-learning algorithm run solely in the physical space does not work due to intolerable time consuming and optimization divergence, and 2) the proposed scheme can catch up with random slow movement of mobile users, and tolerate certain measurement and model errors. With necessary modification and extension, the proposed framework could be applied to other DT-assisted cyber-physical systems.","PeriodicalId":281426,"journal":{"name":"2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISORC58943.2023.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper considers a use case of flying base station placement enabled by digital twin (DT), and demonstrates how DT can help reduce the impact of a non-stationary environment on reinforcement learning (RL). RL is able to learn the optimal policy via interacting with a specific environment. However, it has been observed that RL is very sensitive to environment change, mainly because the environment variation disturbs RL training/learning. A possible approach is to execute the RL process in the DT using snapshots of the environment (parameters), and update the parameters at a proper frequency. The DT-RL bundled approach takes advantage of computing resources in the DT, speeds up the process and saves battery energy of the flying base station, and more importantly, mitigates the nonstationary impact on RL. Specifically, the use case is about quickly connecting mobile users with an aerial bass station. The base station is autonomously and optimally placed according to a predefined criterion to connect scattered slow-movement users. Q-learning, a common type of RL, is employed as a solution to the optimization of base station placement. Tailored for this application, a two-stage base station placement algorithm is proposed and evaluated. For the configuration considered in this paper, numerical results suggest that 1) the Q-learning algorithm run solely in the physical space does not work due to intolerable time consuming and optimization divergence, and 2) the proposed scheme can catch up with random slow movement of mobile users, and tolerate certain measurement and model errors. With necessary modification and extension, the proposed framework could be applied to other DT-assisted cyber-physical systems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于数字孪生的飞行基站定位q学习:变化环境和模型误差的影响

本文考虑了一个由数字孪生(DT)支持的飞行基站放置用例，并演示了DT如何帮助减少非平稳环境对强化学习(RL)的影响。RL能够通过与特定环境的交互来学习最优策略。然而，人们观察到强化学习对环境变化非常敏感，主要是因为环境变化干扰了强化学习的训练/学习。一种可能的方法是使用环境(参数)的快照在DT中执行RL过程，并以适当的频率更新参数。DT-RL捆绑方法利用DT中的计算资源，加快了飞行基站的过程，节省了基站的电池能量，更重要的是减轻了对RL的非平稳影响。具体来说，用例是关于快速连接移动用户与空中低音站。基站根据预先定义的标准自动和最佳地放置，以连接分散的移动缓慢的用户。Q-learning是一种常见的RL算法，它被用来解决基站布局的优化问题。针对这种应用，提出并评估了一种两阶段基站放置算法。对于本文所考虑的配置，数值结果表明:1)仅在物理空间中运行的Q-learning算法由于难以忍受的耗时和优化发散而无法工作;2)所提出的方案可以赶上随机的移动用户缓慢移动，并且可以容忍一定的测量和模型误差。经过必要的修改和扩展，所提出的框架可以应用于其他dt辅助的网络物理系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC)

自引率

0.00%

发文量

期刊最新文献

A Design and Implementation of Decentralized Edge Intelligent LoRa Gateway A collaborative and distributed task management system for real-time systems HRMP3+TECS: Component Framework for Multiprocessor Real-time Operating System with Memory Protection A Robust Scheduling Algorithm for Overload-Tolerant Real-Time Systems Workshop Committee