A Federated Online Restless Bandit Framework for Cooperative Resource Allocation

IF 9.2 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Mobile Computing Pub Date : 2024-09-03 DOI:10.1109/TMC.2024.3453250

Jingwen Tong;Xinran Li;Liqun Fu;Jun Zhang;Khaled B. Letaief

{"title":"A Federated Online Restless Bandit Framework for Cooperative Resource Allocation","authors":"Jingwen Tong;Xinran Li;Liqun Fu;Jun Zhang;Khaled B. Letaief","doi":"10.1109/TMC.2024.3453250","DOIUrl":null,"url":null,"abstract":"Restless multi-armed bandits (RMABs) have been widely utilized to address resource allocation problems with Markov reward processes (MRPs). Existing works often assume that the dynamics of MRPs are known prior, which makes the RMAB problem solvable from an optimization perspective. Nevertheless, an efficient learning-based solution for RMABs with unknown system dynamics remains an open problem. In this paper, we fill this gap by investigating a cooperative resource allocation problem with unknown system dynamics of MRPs. This problem can be modeled as a multi-agent online RMAB problem, where multiple agents collaboratively learn the system dynamics while maximizing their accumulated rewards. We devise a federated online RMAB framework to mitigate the communication overhead and data privacy issue by adopting the federated learning paradigm. Based on this framework, we put forth a Federated Thompson Sampling-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem. The FedTSWI algorithm enjoys a high communication and computation efficiency, and a privacy guarantee. Moreover, we derive a regret upper bound for the FedTSWI algorithm. Finally, we demonstrate the effectiveness of the proposed algorithm on the case of online multi-user multi-channel access. Numerical results show that the proposed algorithm achieves a fast convergence rate of \n<inline-formula><tex-math>$\\mathcal {O}(\\sqrt{T\\log (T)})$</tex-math></inline-formula>\n and better performance compared with baselines. More importantly, its sample complexity reduces sublinearly with the number of agents.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"23 12","pages":"15274-15288"},"PeriodicalIF":9.2000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10663957/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Restless multi-armed bandits (RMABs) have been widely utilized to address resource allocation problems with Markov reward processes (MRPs). Existing works often assume that the dynamics of MRPs are known prior, which makes the RMAB problem solvable from an optimization perspective. Nevertheless, an efficient learning-based solution for RMABs with unknown system dynamics remains an open problem. In this paper, we fill this gap by investigating a cooperative resource allocation problem with unknown system dynamics of MRPs. This problem can be modeled as a multi-agent online RMAB problem, where multiple agents collaboratively learn the system dynamics while maximizing their accumulated rewards. We devise a federated online RMAB framework to mitigate the communication overhead and data privacy issue by adopting the federated learning paradigm. Based on this framework, we put forth a Federated Thompson Sampling-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem. The FedTSWI algorithm enjoys a high communication and computation efficiency, and a privacy guarantee. Moreover, we derive a regret upper bound for the FedTSWI algorithm. Finally, we demonstrate the effectiveness of the proposed algorithm on the case of online multi-user multi-channel access. Numerical results show that the proposed algorithm achieves a fast convergence rate of

$\mathcal {O}(\sqrt{T\log (T)})$

and better performance compared with baselines. More importantly, its sample complexity reduces sublinearly with the number of agents.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于合作资源分配的联合在线无休止强盗框架

无休多臂匪帮（RMABs）已被广泛用于解决马尔可夫奖赏过程（MRPs）的资源分配问题。现有研究通常假定 MRP 的动态是已知的，这使得 RMAB 问题可以从优化角度求解。然而，对于系统动态未知的 RMAB，基于学习的高效解决方案仍是一个未决问题。本文通过研究具有未知系统动态的 MRP 的合作资源分配问题，填补了这一空白。这个问题可以建模为多代理在线 RMAB 问题，其中多个代理协作学习系统动态，同时最大化其累积奖励。我们设计了一个联合在线 RMAB 框架，通过采用联合学习范式来减轻通信开销和数据隐私问题。在此框架基础上，我们提出了一种联合汤普森采样启用惠特尔指数（FedTSWI）算法来解决多代理在线 RMAB 问题。FedTSWI 算法具有很高的通信和计算效率，并能保证隐私。此外，我们还推导出了 FedTSWI 算法的遗憾上限。最后，我们证明了所提算法在多用户多通道在线访问情况下的有效性。数值结果表明，与基线算法相比，所提出的算法达到了$\mathcal {O}(\sqrt{T\log (T)})$的快速收敛率和更好的性能。更重要的是，它的采样复杂度随着代理数量的增加呈亚线性下降。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Mobile Computing 工程技术-电信学

CiteScore

12.90

自引率

2.50%

发文量

403

审稿时长

6.6 months

期刊介绍： IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.