Reinforcement-learning-based wireless resource allocation

Rui Wang
{"title":"Reinforcement-learning-based wireless resource allocation","authors":"Rui Wang","doi":"10.1049/PBTE081E_CH11","DOIUrl":null,"url":null,"abstract":"In this chapter, we shall focus on the formulation of radio resource management via Markov decision process (MDP). Convex optimization has been widely used in the RRM within a short-time duration, where the wireless channel is assumed to be quasi-static. These problems are usually referred to as deterministic optimization problems. On the other hand, MDP is an elegant and powerful tool to handle the resource optimization of wireless systems in a longer timescale, where the random transitions of system and channel status are considered.These problems are usually referred to as stochastic optimization problems. Particularly, MDP is suitable for the joint optimization between physical and media-access control (MAC) layers. Based on MDP, reinforcement learning is a practical method to address the optimization without a priori knowledge of system statistics. In this chapter, we shall first introduce some basics on stochastic approximation, which serves as one basis of reinforcement learning, and then demonstrate the MDP formulations of RRM via some case studies, which require the knowledge of system statistics. Finally, some approaches of reinforcement learning (e.g., Q-learning) are introduced to address the practical issue of unknown system statistics.","PeriodicalId":358911,"journal":{"name":"Applications of Machine Learning in Wireless Communications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applications of Machine Learning in Wireless Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/PBTE081E_CH11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this chapter, we shall focus on the formulation of radio resource management via Markov decision process (MDP). Convex optimization has been widely used in the RRM within a short-time duration, where the wireless channel is assumed to be quasi-static. These problems are usually referred to as deterministic optimization problems. On the other hand, MDP is an elegant and powerful tool to handle the resource optimization of wireless systems in a longer timescale, where the random transitions of system and channel status are considered.These problems are usually referred to as stochastic optimization problems. Particularly, MDP is suitable for the joint optimization between physical and media-access control (MAC) layers. Based on MDP, reinforcement learning is a practical method to address the optimization without a priori knowledge of system statistics. In this chapter, we shall first introduce some basics on stochastic approximation, which serves as one basis of reinforcement learning, and then demonstrate the MDP formulations of RRM via some case studies, which require the knowledge of system statistics. Finally, some approaches of reinforcement learning (e.g., Q-learning) are introduced to address the practical issue of unknown system statistics.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于强化学习的无线资源分配
在本章中,我们将重点讨论通过马尔可夫决策过程(MDP)制定无线电资源管理。在假定无线信道为准静态的短时间RRM中,凸优化被广泛应用。这些问题通常被称为确定性优化问题。另一方面,MDP是一个优雅而强大的工具,可以在更长的时间尺度上处理无线系统的资源优化,其中考虑了系统和信道状态的随机转换。这些问题通常被称为随机优化问题。MDP特别适合物理层和MAC层之间的联合优化。基于MDP的强化学习是一种不需要先验系统统计知识就能解决优化问题的实用方法。在本章中,我们将首先介绍一些关于随机近似的基础知识,这是强化学习的基础之一,然后通过一些案例研究来展示RRM的MDP公式,这需要系统统计的知识。最后,介绍了一些强化学习的方法(例如Q-learning)来解决未知系统统计的实际问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Data-driven vehicular mobility modeling and prediction Back Matter Machine-learning-based channel estimation Deep learning for indoor localization based on bimodal CSI data Compressive sensing for wireless sensor networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1