{"title":"Fair Probabilistic Multi-Armed Bandit With Applications to Network Optimization","authors":"Zhiwu Guo;Chicheng Zhang;Ming Li;Marwan Krunz","doi":"10.1109/TMLCN.2024.3421170","DOIUrl":null,"url":null,"abstract":"Online learning, particularly Multi-Armed Bandit (MAB) algorithms, has been extensively adopted in various real-world networking applications. In certain applications, such as fair heterogeneous networks coexistence, multiple links (individual arms) are selected in each round, and the throughputs (rewards) of these arms depend on the chosen set of links. Additionally, ensuring fairness among individual arms is a critical objective. However, existing MAB algorithms are unsuitable for these applications due to different models and assumptions. In this paper, we introduce a new fair probabilistic MAB (FP-MAB) problem aimed at either maximizing the minimum reward for all arms or maximizing the total reward while imposing a fairness constraint that guarantees a minimum selection fraction for each arm. In FP-MAB, the learning agent probabilistically selects a meta-arm, which is associated with one or multiple individual arms in each decision round. To address the FP-MAB problem, we propose two algorithms: Fair Probabilistic Explore-Then-Commit (FP-ETC) and Fair Probabilistic Optimism In the Face of Uncertainty (FP-OFU). We also introduce a novel concept of regret in the context of the max-min fairness objective. We analyze the performance of FP-ETC and FP-OFU in terms of the upper bound of average regret and average constraint violation. Simulation results demonstrate that FP-ETC and FP-OFU achieve lower regrets (or higher objective values) under the same fairness requirements compared to existing MAB algorithms.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"2 ","pages":"994-1016"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10579843","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10579843/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Online learning, particularly Multi-Armed Bandit (MAB) algorithms, has been extensively adopted in various real-world networking applications. In certain applications, such as fair heterogeneous networks coexistence, multiple links (individual arms) are selected in each round, and the throughputs (rewards) of these arms depend on the chosen set of links. Additionally, ensuring fairness among individual arms is a critical objective. However, existing MAB algorithms are unsuitable for these applications due to different models and assumptions. In this paper, we introduce a new fair probabilistic MAB (FP-MAB) problem aimed at either maximizing the minimum reward for all arms or maximizing the total reward while imposing a fairness constraint that guarantees a minimum selection fraction for each arm. In FP-MAB, the learning agent probabilistically selects a meta-arm, which is associated with one or multiple individual arms in each decision round. To address the FP-MAB problem, we propose two algorithms: Fair Probabilistic Explore-Then-Commit (FP-ETC) and Fair Probabilistic Optimism In the Face of Uncertainty (FP-OFU). We also introduce a novel concept of regret in the context of the max-min fairness objective. We analyze the performance of FP-ETC and FP-OFU in terms of the upper bound of average regret and average constraint violation. Simulation results demonstrate that FP-ETC and FP-OFU achieve lower regrets (or higher objective values) under the same fairness requirements compared to existing MAB algorithms.