Adversarial Bandits with Knapsacks

IF 2.5 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of the ACM Pub Date : 2022-11-17 DOI:https://dl.acm.org/doi/10.1145/3557045

Nicole Immorlica, Karthik Sankararaman, Robert Schapire, Aleksandrs Slivkins

{"title":"Adversarial Bandits with Knapsacks","authors":"Nicole Immorlica, Karthik Sankararaman, Robert Schapire, Aleksandrs Slivkins","doi":"https://dl.acm.org/doi/10.1145/3557045","DOIUrl":null,"url":null,"abstract":"We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a well-known knapsack problem: find an optimal packing of items into a limited-size knapsack. The BwK problem is a common generalization of numerous motivating examples, which range from dynamic pricing to repeated auctions to dynamic ad allocation to network routing and scheduling. While the prior work on BwK focused on the stochastic version, we pioneer the other extreme in which the outcomes can be chosen adversarially. This is a considerably harder problem, compared to both the stochastic version and the “classic” adversarial bandits, in that regret minimization is no longer feasible. Instead, the objective is to minimize the competitive ratio: the ratio of the benchmark reward to algorithm’s reward.We design an algorithm with competitive ratio O(log T) relative to the best fixed distribution over actions, where T is the time horizon; we also prove a matching lower bound. The key conceptual contribution is a new perspective on the stochastic version of the problem. We suggest a new algorithm for the stochastic version, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work. We then analyze this algorithm for the adversarial version, and use it as a subroutine to solve the latter.Our algorithm is the first “black-box reduction” from bandits to BwK: it takes an arbitrary bandit algorithm and uses it as a subroutine. We use this reduction to derive several extensions.","PeriodicalId":50022,"journal":{"name":"Journal of the ACM","volume":"15 2","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the ACM","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3557045","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a well-known knapsack problem: find an optimal packing of items into a limited-size knapsack. The BwK problem is a common generalization of numerous motivating examples, which range from dynamic pricing to repeated auctions to dynamic ad allocation to network routing and scheduling. While the prior work on BwK focused on the stochastic version, we pioneer the other extreme in which the outcomes can be chosen adversarially. This is a considerably harder problem, compared to both the stochastic version and the “classic” adversarial bandits, in that regret minimization is no longer feasible. Instead, the objective is to minimize the competitive ratio: the ratio of the benchmark reward to algorithm’s reward.

We design an algorithm with competitive ratio O(log T) relative to the best fixed distribution over actions, where T is the time horizon; we also prove a matching lower bound. The key conceptual contribution is a new perspective on the stochastic version of the problem. We suggest a new algorithm for the stochastic version, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work. We then analyze this algorithm for the adversarial version, and use it as a subroutine to solve the latter.

Our algorithm is the first “black-box reduction” from bandits to BwK: it takes an arbitrary bandit algorithm and uses it as a subroutine. We use this reduction to derive several extensions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

带着背包的敌对强盗

我们考虑带着背包的强盗(以下简称BwK)，这是一个在供应/预算限制下的多武装强盗的一般模型。特别是，强盗算法需要解决一个众所周知的背包问题:找到一个最优的物品包装到一个有限大小的背包中。BwK问题是许多激励例子的共同概括，从动态定价到重复拍卖，从动态广告分配到网络路由和调度。虽然之前对BwK的研究主要集中在随机版本，但我们开创了另一个极端，在这个极端中，结果可以被对抗性地选择。与随机版本和“经典”对抗性强盗相比，这是一个相当困难的问题，因为遗憾最小化不再可行。相反，目标是最小化竞争比率:基准奖励与算法奖励的比率。我们设计了一个相对于最佳固定分布的竞争比为O(log T)的算法，其中T是时间范围;我们还证明了一个匹配的下界。关键的概念贡献是对问题的随机版本的新视角。我们提出了一种随机版本的新算法，该算法建立在重复博弈中后悔最小化的框架之上，与之前的工作相比，它的分析要简单得多。然后，我们分析了该算法的对抗性版本，并将其用作解决后者的子程序。我们的算法是第一个从强盗到BwK的“黑盒约简”:它取一个任意的强盗算法，并将其作为子程序使用。我们使用这个约简来推导几个扩展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of the ACM 工程技术-计算机：理论方法

CiteScore

7.50

自引率

0.00%

发文量

审稿时长

3 months

期刊介绍： The best indicator of the scope of the journal is provided by the areas covered by its Editorial Board. These areas change from time to time, as the field evolves. The following areas are currently covered by a member of the Editorial Board: Algorithms and Combinatorial Optimization; Algorithms and Data Structures; Algorithms, Combinatorial Optimization, and Games; Artificial Intelligence; Complexity Theory; Computational Biology; Computational Geometry; Computer Graphics and Computer Vision; Computer-Aided Verification; Cryptography and Security; Cyber-Physical, Embedded, and Real-Time Systems; Database Systems and Theory; Distributed Computing; Economics and Computation; Information Theory; Logic and Computation; Logic, Algorithms, and Complexity; Machine Learning and Computational Learning Theory; Networking; Parallel Computing and Architecture; Programming Languages; Quantum Computing; Randomized Algorithms and Probabilistic Analysis of Algorithms; Scientific Computing and High Performance Computing; Software Engineering; Web Algorithms and Data Mining

期刊最新文献

Query lower bounds for log-concave sampling Transaction Fee Mechanism Design Sparse Higher Order Čech Filtrations Killing a Vortex Separations in Proof Complexity and TFNP