Understanding Exploration and Exploitation of Q-Learning Agents in B5G Network Management

Sayantini Majumdar, R. Trivisonno, G. Carle
{"title":"Understanding Exploration and Exploitation of Q-Learning Agents in B5G Network Management","authors":"Sayantini Majumdar, R. Trivisonno, G. Carle","doi":"10.1109/GCWkshps52748.2021.9682129","DOIUrl":null,"url":null,"abstract":"Auto-scaling is a lifecycle management approach that automatically scales resources (CPU, memory etc.) based on incoming load to optimize resource utilization. Centralized orchestration, although optimal, comes at the cost of high signaling overhead. Alternatively, decentralized RL-based approaches such as Q-Learning (QL) are envisaged to be more suitable for the strict latency and overhead requirements of B5G/6G use cases, while also minimizing the number of resource allocation conflicts encountered in a distributed setting. Before QL agents can take optimal auto-scaling decisions, they need to explore or evaluate their actions based on the feedback they receive from the environment. The faster they learn, the sooner they could begin to exploit their knowledge. However, it is not clear when these agents have explored long enough to start taking management actions. This paper focuses on understanding when the exploration should end such that agents may start exploiting built knowledge. In our approach, we posit that the knowledge accrued by the agents in their Q-tables should indicate whether to explore or exploit. Hence, we conceive Knowledge Indicators (KIs) derived from their Q-tables. These KIs enable agents to learn autonomously, thereby enabling adjustment of the exploration parameter epsilon in the epsilon-greedy approach. Convergence results and corresponding impact on the system performance validate the proposed approach. This work has the potential to speed up the convergence of QL agents, thereby providing critical hints to operators targeting live deployments of B5G/6G decentralized network management.","PeriodicalId":6802,"journal":{"name":"2021 IEEE Globecom Workshops (GC Wkshps)","volume":"25 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Globecom Workshops (GC Wkshps)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GCWkshps52748.2021.9682129","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Auto-scaling is a lifecycle management approach that automatically scales resources (CPU, memory etc.) based on incoming load to optimize resource utilization. Centralized orchestration, although optimal, comes at the cost of high signaling overhead. Alternatively, decentralized RL-based approaches such as Q-Learning (QL) are envisaged to be more suitable for the strict latency and overhead requirements of B5G/6G use cases, while also minimizing the number of resource allocation conflicts encountered in a distributed setting. Before QL agents can take optimal auto-scaling decisions, they need to explore or evaluate their actions based on the feedback they receive from the environment. The faster they learn, the sooner they could begin to exploit their knowledge. However, it is not clear when these agents have explored long enough to start taking management actions. This paper focuses on understanding when the exploration should end such that agents may start exploiting built knowledge. In our approach, we posit that the knowledge accrued by the agents in their Q-tables should indicate whether to explore or exploit. Hence, we conceive Knowledge Indicators (KIs) derived from their Q-tables. These KIs enable agents to learn autonomously, thereby enabling adjustment of the exploration parameter epsilon in the epsilon-greedy approach. Convergence results and corresponding impact on the system performance validate the proposed approach. This work has the potential to speed up the convergence of QL agents, thereby providing critical hints to operators targeting live deployments of B5G/6G decentralized network management.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
理解B5G网络管理中q -学习代理的探索与利用
自动伸缩是一种生命周期管理方法,它根据传入的负载自动伸缩资源(CPU、内存等),以优化资源利用率。集中式编排虽然是最佳的,但代价是高昂的信令开销。另外,基于去中心化学习的方法,如Q-Learning (QL),被认为更适合B5G/6G用例的严格延迟和开销要求,同时也最大限度地减少了在分布式环境中遇到的资源分配冲突的数量。在QL代理做出最优的自动扩展决策之前,它们需要根据从环境接收到的反馈来探索或评估它们的操作。他们学得越快,就能越早开始利用他们的知识。然而,尚不清楚这些代理人何时已经探索足够长的时间来开始采取管理行动。本文的重点是理解探索应该在什么时候结束,这样智能体就可以开始利用构建的知识。在我们的方法中,我们假设代理在其q表中积累的知识应该指示是探索还是利用。因此,我们从他们的q表中推导出知识指标(KIs)。这些ki使代理能够自主学习,从而能够在贪心方法中调整探索参数epsilon。收敛结果和对系统性能的影响验证了该方法的有效性。这项工作有可能加速QL代理的融合,从而为瞄准B5G/6G分散网络管理的实时部署的运营商提供关键提示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Blockchain-based Approach for Optimal Energy Dispatch and Fault Reporting in P2P Microgrid Joint Beamforming and BS Selection for Energy-Efficient Communications via Aerial-RIS Security and privacy issues of data-over-sound technologies used in IoT healthcare devices Joint Deployment Design and Power Control for UAV-enabled Covert Communications Leveraging Machine Learning and SDN-Fog Infrastructure to Mitigate Flood Attacks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1