Distributed Reinforcement Learning for NOMA-Enabled Mobile Edge Computing

2020 IEEE International Conference on Communications Workshops (ICC Workshops) Pub Date : 2020-06-01 DOI:10.1109/ICCWorkshops49005.2020.9145457

Zhong Yang, Yuanwei Liu, Yue Chen

{"title":"Distributed Reinforcement Learning for NOMA-Enabled Mobile Edge Computing","authors":"Zhong Yang, Yuanwei Liu, Yue Chen","doi":"10.1109/ICCWorkshops49005.2020.9145457","DOIUrl":null,"url":null,"abstract":"A novel non-orthogonal multiple access (NOMA) enabled cache-aided mobile edge computing (MEC) framework is proposed, for minimizing the sum energy consumption. The NOMA strategy enables mobile users to offload computation tasks to the access point (AP) simultaneously, which improves the spectrum efficiency. In this article, the considered resource allocation problem is formulated as a long-term reward maximization problem that involves a joint optimization of task offloading decision, computation resource allocation, and caching decision. To tackle this nontrivial problem, a single-agent Q-learning (SAQ-learning) algorithm is invoked to learn a long-term resource allocation strategy from historical experience. Moreover, a Bayesian learning automata (BLA) based multi-agent Q-learning (MAQ-learning) algorithm is proposed for task offloading decisions. More specifically, a BLA based action select scheme is proposed for the agents in MAQ-learning to select the optimal actions in every state. The proposed BLA based action selection scheme is instantaneously self-correcting, consequently, if the probabilities of two computing models (i.e., local computing and offloading computing) are not equal, the optimal action unveils eventually. Extensive simulations demonstrate that: 1) The proposed cache-aided NOMA MEC framework significantly outperforms the other representative benchmark schemes under various network setups. 2) The effectiveness of the proposed BAL-MAQ-learning algorithm is confirmed from the comparison with the results of conventional reinforcement learning algorithms.","PeriodicalId":254869,"journal":{"name":"2020 IEEE International Conference on Communications Workshops (ICC Workshops)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Communications Workshops (ICC Workshops)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCWorkshops49005.2020.9145457","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

A novel non-orthogonal multiple access (NOMA) enabled cache-aided mobile edge computing (MEC) framework is proposed, for minimizing the sum energy consumption. The NOMA strategy enables mobile users to offload computation tasks to the access point (AP) simultaneously, which improves the spectrum efficiency. In this article, the considered resource allocation problem is formulated as a long-term reward maximization problem that involves a joint optimization of task offloading decision, computation resource allocation, and caching decision. To tackle this nontrivial problem, a single-agent Q-learning (SAQ-learning) algorithm is invoked to learn a long-term resource allocation strategy from historical experience. Moreover, a Bayesian learning automata (BLA) based multi-agent Q-learning (MAQ-learning) algorithm is proposed for task offloading decisions. More specifically, a BLA based action select scheme is proposed for the agents in MAQ-learning to select the optimal actions in every state. The proposed BLA based action selection scheme is instantaneously self-correcting, consequently, if the probabilities of two computing models (i.e., local computing and offloading computing) are not equal, the optimal action unveils eventually. Extensive simulations demonstrate that: 1) The proposed cache-aided NOMA MEC framework significantly outperforms the other representative benchmark schemes under various network setups. 2) The effectiveness of the proposed BAL-MAQ-learning algorithm is confirmed from the comparison with the results of conventional reinforcement learning algorithms.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

支持noma的移动边缘计算的分布式强化学习

提出了一种新的非正交多址(NOMA)缓存辅助移动边缘计算(MEC)框架，以最小化总能耗。NOMA策略使移动用户能够同时将计算任务卸载到接入点(AP)，从而提高频谱效率。在本文中，所考虑的资源分配问题被表述为一个长期奖励最大化问题，它涉及任务卸载决策、计算资源分配和缓存决策的联合优化。为了解决这个重要的问题，调用了单智能体q -学习(SAQ-learning)算法，从历史经验中学习长期资源分配策略。此外，提出了一种基于贝叶斯学习自动机(BLA)的多智能体q -学习(maq -学习)算法，用于任务卸载决策。更具体地说，提出了一种基于BLA的maq学习智能体在每个状态下选择最优动作的方法。所提出的基于BLA的动作选择方案具有即时自校正性，因此，当两种计算模型(即局部计算和卸载计算)的概率不相等时，最终会揭示出最优动作。大量的仿真表明:1)在各种网络设置下，所提出的缓存辅助NOMA MEC框架显著优于其他代表性基准方案。2)通过与传统强化学习算法的结果对比，验证了所提bal - maq学习算法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE International Conference on Communications Workshops (ICC Workshops)

自引率

0.00%

发文量