Fair Probabilistic Multi-Armed Bandit With Applications to Network Optimization

Zhiwu Guo;Chicheng Zhang;Ming Li;Marwan Krunz
{"title":"Fair Probabilistic Multi-Armed Bandit With Applications to Network Optimization","authors":"Zhiwu Guo;Chicheng Zhang;Ming Li;Marwan Krunz","doi":"10.1109/TMLCN.2024.3421170","DOIUrl":null,"url":null,"abstract":"Online learning, particularly Multi-Armed Bandit (MAB) algorithms, has been extensively adopted in various real-world networking applications. In certain applications, such as fair heterogeneous networks coexistence, multiple links (individual arms) are selected in each round, and the throughputs (rewards) of these arms depend on the chosen set of links. Additionally, ensuring fairness among individual arms is a critical objective. However, existing MAB algorithms are unsuitable for these applications due to different models and assumptions. In this paper, we introduce a new fair probabilistic MAB (FP-MAB) problem aimed at either maximizing the minimum reward for all arms or maximizing the total reward while imposing a fairness constraint that guarantees a minimum selection fraction for each arm. In FP-MAB, the learning agent probabilistically selects a meta-arm, which is associated with one or multiple individual arms in each decision round. To address the FP-MAB problem, we propose two algorithms: Fair Probabilistic Explore-Then-Commit (FP-ETC) and Fair Probabilistic Optimism In the Face of Uncertainty (FP-OFU). We also introduce a novel concept of regret in the context of the max-min fairness objective. We analyze the performance of FP-ETC and FP-OFU in terms of the upper bound of average regret and average constraint violation. Simulation results demonstrate that FP-ETC and FP-OFU achieve lower regrets (or higher objective values) under the same fairness requirements compared to existing MAB algorithms.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"2 ","pages":"994-1016"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10579843","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10579843/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Online learning, particularly Multi-Armed Bandit (MAB) algorithms, has been extensively adopted in various real-world networking applications. In certain applications, such as fair heterogeneous networks coexistence, multiple links (individual arms) are selected in each round, and the throughputs (rewards) of these arms depend on the chosen set of links. Additionally, ensuring fairness among individual arms is a critical objective. However, existing MAB algorithms are unsuitable for these applications due to different models and assumptions. In this paper, we introduce a new fair probabilistic MAB (FP-MAB) problem aimed at either maximizing the minimum reward for all arms or maximizing the total reward while imposing a fairness constraint that guarantees a minimum selection fraction for each arm. In FP-MAB, the learning agent probabilistically selects a meta-arm, which is associated with one or multiple individual arms in each decision round. To address the FP-MAB problem, we propose two algorithms: Fair Probabilistic Explore-Then-Commit (FP-ETC) and Fair Probabilistic Optimism In the Face of Uncertainty (FP-OFU). We also introduce a novel concept of regret in the context of the max-min fairness objective. We analyze the performance of FP-ETC and FP-OFU in terms of the upper bound of average regret and average constraint violation. Simulation results demonstrate that FP-ETC and FP-OFU achieve lower regrets (or higher objective values) under the same fairness requirements compared to existing MAB algorithms.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
公平概率多臂匪徒与网络优化应用
在线学习,尤其是多臂匪徒(MAB)算法,已被广泛应用于现实世界的各种网络应用中。在某些应用中,例如公平异构网络共存,每轮都会选择多个链路(单臂),而这些单臂的吞吐量(奖励)取决于所选的链路集。此外,确保各个臂之间的公平性也是一个关键目标。然而,由于模型和假设不同,现有的 MAB 算法并不适合这些应用。在本文中,我们引入了一个新的公平概率 MAB(FP-MAB)问题,旨在最大化所有臂的最小奖励,或最大化总奖励,同时施加公平约束,保证每个臂的最小选择分数。在 FP-MAB 中,学习代理以概率方式选择元臂,元臂在每轮决策中与一个或多个单臂相关联。为解决 FP-MAB 问题,我们提出了两种算法:公平概率探索-然后承诺(FP-ETC)和公平概率不确定性乐观(FP-OFU)。我们还在最大最小公平目标的背景下引入了一个新的遗憾概念。我们从平均遗憾上限和平均违反约束上限的角度分析了 FP-ETC 和 FP-OFU 的性能。仿真结果表明,与现有的 MAB 算法相比,在相同的公平性要求下,FP-ETC 和 FP-OFU 能获得更低的遗憾值(或更高的目标值)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Asynchronous Real-Time Federated Learning for Anomaly Detection in Microservice Cloud Applications Conditional Denoising Diffusion Probabilistic Models for Data Reconstruction Enhancement in Wireless Communications Multi-Agent Reinforcement Learning With Action Masking for UAV-Enabled Mobile Communications Online Learning for Intelligent Thermal Management of Interference-Coupled and Passively Cooled Base Stations Robust and Lightweight Modeling of IoT Network Behaviors From Raw Traffic Packets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1