随机多臂盗匪问题公平性的实现

J. Mach. Learn. Res. Pub Date : 2019-07-23 DOI:10.1609/AAAI.V34I04.5986

Vishakha Patil, Ganesh Ghalme, V. Nair, Y. Narahari

{"title":"随机多臂盗匪问题公平性的实现","authors":"Vishakha Patil, Ganesh Ghalme, V. Nair, Y. Narahari","doi":"10.1609/AAAI.V34I04.5986","DOIUrl":null,"url":null,"abstract":"We study an interesting variant of the stochastic multi-armed bandit problem, called the Fair-SMAB problem, where each arm is required to be pulled for at least a given fraction of the total available rounds. We investigate the interplay between learning and fairness in terms of a pre-specified vector denoting the fractions of guaranteed pulls. We define a fairness-aware regret, called $r$-Regret, that takes into account the above fairness constraints and naturally extends the conventional notion of regret. Our primary contribution is characterizing a class of Fair-SMAB algorithms by two parameters: the unfairness tolerance and the learning algorithm used as a black-box. We provide a fairness guarantee for this class that holds uniformly over time irrespective of the choice of the learning algorithm. In particular, when the learning algorithm is UCB1, we show that our algorithm achieves $O(\\ln T)$ $r$-Regret. Finally, we evaluate the cost of fairness in terms of the conventional notion of regret.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"15 1","pages":"174:1-174:31"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"84","resultStr":"{\"title\":\"Achieving Fairness in the Stochastic Multi-armed Bandit Problem\",\"authors\":\"Vishakha Patil, Ganesh Ghalme, V. Nair, Y. Narahari\",\"doi\":\"10.1609/AAAI.V34I04.5986\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study an interesting variant of the stochastic multi-armed bandit problem, called the Fair-SMAB problem, where each arm is required to be pulled for at least a given fraction of the total available rounds. We investigate the interplay between learning and fairness in terms of a pre-specified vector denoting the fractions of guaranteed pulls. We define a fairness-aware regret, called $r$-Regret, that takes into account the above fairness constraints and naturally extends the conventional notion of regret. Our primary contribution is characterizing a class of Fair-SMAB algorithms by two parameters: the unfairness tolerance and the learning algorithm used as a black-box. We provide a fairness guarantee for this class that holds uniformly over time irrespective of the choice of the learning algorithm. In particular, when the learning algorithm is UCB1, we show that our algorithm achieves $O(\\\\ln T)$ $r$-Regret. Finally, we evaluate the cost of fairness in terms of the conventional notion of regret.\",\"PeriodicalId\":14794,\"journal\":{\"name\":\"J. Mach. Learn. Res.\",\"volume\":\"15 1\",\"pages\":\"174:1-174:31\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"84\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Mach. Learn. Res.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1609/AAAI.V34I04.5986\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Mach. Learn. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/AAAI.V34I04.5986","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 84

摘要

我们研究了随机多臂强盗问题的一个有趣的变体，称为Fair-SMAB问题，其中每只手臂被要求至少在总可用回合的给定分数内被拉动。我们研究了学习和公平之间的相互作用，根据一个预先指定的向量表示保证牵引力的分数。我们定义了一种公平意识的后悔，称为$r$-后悔，它考虑了上述公平约束，自然地扩展了传统的后悔概念。我们的主要贡献是通过两个参数来表征一类Fair-SMAB算法:不公平容忍度和用作黑盒的学习算法。我们为这个类提供了一个公平的保证，无论学习算法的选择如何，它都会随着时间的推移而保持一致。特别是，当学习算法为UCB1时，我们证明了我们的算法达到了$O(\ln T)$ $r$-悔恨。最后，我们根据传统的后悔概念来评估公平的成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Achieving Fairness in the Stochastic Multi-armed Bandit Problem

We study an interesting variant of the stochastic multi-armed bandit problem, called the Fair-SMAB problem, where each arm is required to be pulled for at least a given fraction of the total available rounds. We investigate the interplay between learning and fairness in terms of a pre-specified vector denoting the fractions of guaranteed pulls. We define a fairness-aware regret, called $r$-Regret, that takes into account the above fairness constraints and naturally extends the conventional notion of regret. Our primary contribution is characterizing a class of Fair-SMAB algorithms by two parameters: the unfairness tolerance and the learning algorithm used as a black-box. We provide a fairness guarantee for this class that holds uniformly over time irrespective of the choice of the learning algorithm. In particular, when the learning algorithm is UCB1, we show that our algorithm achieves $O(\ln T)$ $r$-Regret. Finally, we evaluate the cost of fairness in terms of the conventional notion of regret.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

J. Mach. Learn. Res.

自引率

0.00%

发文量

期刊最新文献

Scalable Computation of Causal Bounds A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning Adaptive False Discovery Rate Control with Privacy Guarantee Fairlearn: Assessing and Improving Fairness of AI Systems Generalization Bounds for Adversarial Contrastive Learning