Digital Twin Enabled Q-Learning for Flying Base Station Placement: Impact of Varying Environment and Model Errors

T. Guo
{"title":"Digital Twin Enabled Q-Learning for Flying Base Station Placement: Impact of Varying Environment and Model Errors","authors":"T. Guo","doi":"10.1109/ISORC58943.2023.00042","DOIUrl":null,"url":null,"abstract":"This paper considers a use case of flying base station placement enabled by digital twin (DT), and demonstrates how DT can help reduce the impact of a non-stationary environment on reinforcement learning (RL). RL is able to learn the optimal policy via interacting with a specific environment. However, it has been observed that RL is very sensitive to environment change, mainly because the environment variation disturbs RL training/learning. A possible approach is to execute the RL process in the DT using snapshots of the environment (parameters), and update the parameters at a proper frequency. The DT-RL bundled approach takes advantage of computing resources in the DT, speeds up the process and saves battery energy of the flying base station, and more importantly, mitigates the nonstationary impact on RL. Specifically, the use case is about quickly connecting mobile users with an aerial bass station. The base station is autonomously and optimally placed according to a predefined criterion to connect scattered slow-movement users. Q-learning, a common type of RL, is employed as a solution to the optimization of base station placement. Tailored for this application, a two-stage base station placement algorithm is proposed and evaluated. For the configuration considered in this paper, numerical results suggest that 1) the Q-learning algorithm run solely in the physical space does not work due to intolerable time consuming and optimization divergence, and 2) the proposed scheme can catch up with random slow movement of mobile users, and tolerate certain measurement and model errors. With necessary modification and extension, the proposed framework could be applied to other DT-assisted cyber-physical systems.","PeriodicalId":281426,"journal":{"name":"2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISORC58943.2023.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper considers a use case of flying base station placement enabled by digital twin (DT), and demonstrates how DT can help reduce the impact of a non-stationary environment on reinforcement learning (RL). RL is able to learn the optimal policy via interacting with a specific environment. However, it has been observed that RL is very sensitive to environment change, mainly because the environment variation disturbs RL training/learning. A possible approach is to execute the RL process in the DT using snapshots of the environment (parameters), and update the parameters at a proper frequency. The DT-RL bundled approach takes advantage of computing resources in the DT, speeds up the process and saves battery energy of the flying base station, and more importantly, mitigates the nonstationary impact on RL. Specifically, the use case is about quickly connecting mobile users with an aerial bass station. The base station is autonomously and optimally placed according to a predefined criterion to connect scattered slow-movement users. Q-learning, a common type of RL, is employed as a solution to the optimization of base station placement. Tailored for this application, a two-stage base station placement algorithm is proposed and evaluated. For the configuration considered in this paper, numerical results suggest that 1) the Q-learning algorithm run solely in the physical space does not work due to intolerable time consuming and optimization divergence, and 2) the proposed scheme can catch up with random slow movement of mobile users, and tolerate certain measurement and model errors. With necessary modification and extension, the proposed framework could be applied to other DT-assisted cyber-physical systems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于数字孪生的飞行基站定位q学习:变化环境和模型误差的影响
本文考虑了一个由数字孪生(DT)支持的飞行基站放置用例,并演示了DT如何帮助减少非平稳环境对强化学习(RL)的影响。RL能够通过与特定环境的交互来学习最优策略。然而,人们观察到强化学习对环境变化非常敏感,主要是因为环境变化干扰了强化学习的训练/学习。一种可能的方法是使用环境(参数)的快照在DT中执行RL过程,并以适当的频率更新参数。DT-RL捆绑方法利用DT中的计算资源,加快了飞行基站的过程,节省了基站的电池能量,更重要的是减轻了对RL的非平稳影响。具体来说,用例是关于快速连接移动用户与空中低音站。基站根据预先定义的标准自动和最佳地放置,以连接分散的移动缓慢的用户。Q-learning是一种常见的RL算法,它被用来解决基站布局的优化问题。针对这种应用,提出并评估了一种两阶段基站放置算法。对于本文所考虑的配置,数值结果表明:1)仅在物理空间中运行的Q-learning算法由于难以忍受的耗时和优化发散而无法工作;2)所提出的方案可以赶上随机的移动用户缓慢移动,并且可以容忍一定的测量和模型误差。经过必要的修改和扩展,所提出的框架可以应用于其他dt辅助的网络物理系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Design and Implementation of Decentralized Edge Intelligent LoRa Gateway A collaborative and distributed task management system for real-time systems HRMP3+TECS: Component Framework for Multiprocessor Real-time Operating System with Memory Protection A Robust Scheduling Algorithm for Overload-Tolerant Real-Time Systems Workshop Committee
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1