Multi-Agent Reinforcement Learning-Based Joint Caching and Routing in Heterogeneous Networks

IF 7.4 1区计算机科学 Q1 TELECOMMUNICATIONS IEEE Transactions on Cognitive Communications and Networking Pub Date : 2024-04-19 DOI:10.1109/TCCN.2024.3391322

Meiyi Yang;Deyun Gao;Chuan Heng Foh;Wei Quan;Victor C. M. Leung

{"title":"Multi-Agent Reinforcement Learning-Based Joint Caching and Routing in Heterogeneous Networks","authors":"Meiyi Yang;Deyun Gao;Chuan Heng Foh;Wei Quan;Victor C. M. Leung","doi":"10.1109/TCCN.2024.3391322","DOIUrl":null,"url":null,"abstract":"In this paper, we explore the problem of minimizing transmission cost among cooperative nodes by jointly optimizing caching and routing in a hybrid network with vital support of service differentiation. We show that the optimal routing policy is a \n<italic>route-to-least cost-cache</i>\n (RLC) policy for fixed caching policy. We formulate the cooperative caching problem as a multi-agent Markov decision process (MDP) with the goal of maximizing the long-term expected caching reward, which is NP-complete even when assuming users’ demand is perfectly known. To solve this problem, we propose C-MAAC, a partially decentralized multi-agent deep reinforcement learning (MADRL)-based collaborative caching algorithm employing actor-critic learning model. C-MAAC has a key characteristic of centralized training and decentralized execution, with which the challenge from unstable training process caused by simultaneous decision made by all agents can be addressed. Furthermore, we develop an optimization method as a criterion for our MADRL framework when assuming the content popularity is stationary and prior known. Our experimental results demonstrate that compared with the prior art, C-MAAC increases an average of 21.7% caching reward in dynamic environment when user request traffic changes rapidly.","PeriodicalId":13069,"journal":{"name":"IEEE Transactions on Cognitive Communications and Networking","volume":"10 5","pages":"1959-1974"},"PeriodicalIF":7.4000,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10505879/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we explore the problem of minimizing transmission cost among cooperative nodes by jointly optimizing caching and routing in a hybrid network with vital support of service differentiation. We show that the optimal routing policy is a route-to-least cost-cache (RLC) policy for fixed caching policy. We formulate the cooperative caching problem as a multi-agent Markov decision process (MDP) with the goal of maximizing the long-term expected caching reward, which is NP-complete even when assuming users’ demand is perfectly known. To solve this problem, we propose C-MAAC, a partially decentralized multi-agent deep reinforcement learning (MADRL)-based collaborative caching algorithm employing actor-critic learning model. C-MAAC has a key characteristic of centralized training and decentralized execution, with which the challenge from unstable training process caused by simultaneous decision made by all agents can be addressed. Furthermore, we develop an optimization method as a criterion for our MADRL framework when assuming the content popularity is stationary and prior known. Our experimental results demonstrate that compared with the prior art, C-MAAC increases an average of 21.7% caching reward in dynamic environment when user request traffic changes rapidly.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

异构网络中基于多代理强化学习的联合缓存和路由选择

在本文中，我们探讨了在混合网络中通过联合优化缓存和路由来最小化合作节点间传输成本的问题，并对服务差异化提供了重要支持。我们证明，在固定缓存策略下，最优路由策略是路由到最小成本缓存（RLC）策略。我们将合作缓存问题表述为多代理马尔可夫决策过程（MDP），目标是最大化长期预期缓存回报，即使假设用户需求完全已知，该问题也是 NP-complete。为了解决这个问题，我们提出了一种基于多代理深度强化学习（MADRL）的部分去中心化协同缓存算法--C-MAAC，该算法采用了行动者批判学习模型。C-MAAC具有集中训练和分散执行的主要特点，可以解决所有代理同时做出决策导致训练过程不稳定的难题。此外，我们还开发了一种优化方法，作为我们的 MADRL 框架在假定内容流行度是静态和先验已知时的标准。实验结果表明，与现有技术相比，C-MAAC 在用户请求流量快速变化的动态环境中平均提高了 21.7% 的缓存奖励。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Cognitive Communications and Networking Computer Science-Artificial Intelligence

CiteScore

15.50

自引率

7.00%

发文量

108

期刊介绍： The IEEE Transactions on Cognitive Communications and Networking (TCCN) aims to publish high-quality manuscripts that push the boundaries of cognitive communications and networking research. Cognitive, in this context, refers to the application of perception, learning, reasoning, memory, and adaptive approaches in communication system design. The transactions welcome submissions that explore various aspects of cognitive communications and networks, focusing on innovative and holistic approaches to complex system design. Key topics covered include architecture, protocols, cross-layer design, and cognition cycle design for cognitive networks. Additionally, research on machine learning, artificial intelligence, end-to-end and distributed intelligence, software-defined networking, cognitive radios, spectrum sharing, and security and privacy issues in cognitive networks are of interest. The publication also encourages papers addressing novel services and applications enabled by these cognitive concepts.