Learning To Maximize Welfare with a Reusable Resource

Proceedings of the ACM on Measurement and Analysis of Computing Systems Pub Date : 2022-05-26 DOI:10.1145/3530893

Matthew Faw, O. Papadigenopoulos, C. Caramanis, S. Shakkottai

{"title":"Learning To Maximize Welfare with a Reusable Resource","authors":"Matthew Faw, O. Papadigenopoulos, C. Caramanis, S. Shakkottai","doi":"10.1145/3530893","DOIUrl":null,"url":null,"abstract":"Considerable work has focused on optimal stopping problems where random IID offers arrive sequentially for a single available resource which is controlled by the decision-maker. After viewing the realization of the offer, the decision-maker irrevocably rejects it, or accepts it, collecting the reward and ending the game. We consider an important extension of this model to a dynamic setting where the resource is \"renewable'' (a rental, a work assignment, or a temporary position) and can be allocated again after a delay period d. In the case where the reward distribution is known a priori, we design an (asymptotically optimal) 1/2-competitive Prophet Inequality, namely, a policy that collects in expectation at least half of the expected reward collected by a prophet who a priori knows all the realizations. This policy has a particularly simple characterization as a thresholding rule which depends on the reward distribution and the blocking period d, and arises naturally from an LP-relaxation of the prophet's optimal solution. Moreover, it gives the key for extending to the case of unknown distributions; here, we construct a dynamic threshold rule using the reward samples collected when the resource is not blocked. We provide a regret guarantee for our algorithm against the best policy in hindsight, and prove a complementing minimax lower bound on the best achievable regret, establishing that our policy achieves, up to poly-logarithmic factors, the best possible regret in this setting.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"427 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3530893","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Considerable work has focused on optimal stopping problems where random IID offers arrive sequentially for a single available resource which is controlled by the decision-maker. After viewing the realization of the offer, the decision-maker irrevocably rejects it, or accepts it, collecting the reward and ending the game. We consider an important extension of this model to a dynamic setting where the resource is "renewable'' (a rental, a work assignment, or a temporary position) and can be allocated again after a delay period d. In the case where the reward distribution is known a priori, we design an (asymptotically optimal) 1/2-competitive Prophet Inequality, namely, a policy that collects in expectation at least half of the expected reward collected by a prophet who a priori knows all the realizations. This policy has a particularly simple characterization as a thresholding rule which depends on the reward distribution and the blocking period d, and arises naturally from an LP-relaxation of the prophet's optimal solution. Moreover, it gives the key for extending to the case of unknown distributions; here, we construct a dynamic threshold rule using the reward samples collected when the resource is not blocked. We provide a regret guarantee for our algorithm against the best policy in hindsight, and prove a complementing minimax lower bound on the best achievable regret, establishing that our policy achieves, up to poly-logarithmic factors, the best possible regret in this setting.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

学会利用可重复使用的资源最大化福利

大量的工作集中在最优停止问题上，其中随机IID提供顺序到达由决策者控制的单个可用资源。在看到提议的实现后，决策者不可撤销地拒绝它，或者接受它，收集奖励并结束游戏。我们考虑的一个重要扩展这个模型动态设置资源的“可再生”(租赁、工作分配,或临时位置),可以分配后再延迟期d。如果奖励是已知的先验分布,我们设计一个(渐近最优)1/2-competitive先知不平等,即期望至少一半的政策,收集收集的期望的奖励一个先知先验知道所有的实现。这个策略有一个特别简单的特征，即阈值规则，它取决于奖励分配和阻塞周期d，并且自然地从先知最优解的lp松弛中产生。并给出了推广到未知分布情况的关键;在这里，我们使用在资源未被阻塞时收集的奖励样本构建一个动态阈值规则。我们在事后为我们的算法提供了一个针对最佳策略的后悔保证，并证明了最佳可实现的后悔的一个互补的极大极小下界，建立了我们的策略在这种设置下达到了多对数因子的最佳可能后悔。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the ACM on Measurement and Analysis of Computing Systems

CiteScore

3.20

自引率

0.00%

发文量