A New Framework: Short-Term and Long-Term Returns in Stochastic Multi-Armed Bandit

IEEE INFOCOM 2023 - IEEE Conference on Computer Communications Pub Date : 2023-05-17 DOI:10.1109/INFOCOM53939.2023.10228899

Abdalaziz Sawwan, Jie Wu

{"title":"A New Framework: Short-Term and Long-Term Returns in Stochastic Multi-Armed Bandit","authors":"Abdalaziz Sawwan, Jie Wu","doi":"10.1109/INFOCOM53939.2023.10228899","DOIUrl":null,"url":null,"abstract":"Stochastic Multi-Armed Bandit (MAB) has recently been studied widely due to its vast range of applications. The classic model considers the reward of a pulled arm to be observed after a time delay that is sampled from a random distribution assigned for each arm. In this paper, we propose an extended framework in which pulling an arm gives both an instant (short-term) reward and a delayed (long-term) reward at the same time. The distributions of reward values for short-term and long-term rewards are related with a previously known relationship. The distribution of time delay for an arm is independent of the reward distributions of the arm. In our work, we devise three UCB-based algorithms, where two of them are near-optimal-regret algorithms for this new model, with the corresponding regret analysis for each one of them. Additionally, the random distributions for time delay values are allowed to yield infinite time, which corresponds to a case where the arm only gives a short-term reward. Finally, we evaluate our algorithms and compare this paradigm with previously known models on both a synthetic data set and a real data set that would reflect one of the potential applications of this model.","PeriodicalId":387707,"journal":{"name":"IEEE INFOCOM 2023 - IEEE Conference on Computer Communications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2023 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM53939.2023.10228899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Stochastic Multi-Armed Bandit (MAB) has recently been studied widely due to its vast range of applications. The classic model considers the reward of a pulled arm to be observed after a time delay that is sampled from a random distribution assigned for each arm. In this paper, we propose an extended framework in which pulling an arm gives both an instant (short-term) reward and a delayed (long-term) reward at the same time. The distributions of reward values for short-term and long-term rewards are related with a previously known relationship. The distribution of time delay for an arm is independent of the reward distributions of the arm. In our work, we devise three UCB-based algorithms, where two of them are near-optimal-regret algorithms for this new model, with the corresponding regret analysis for each one of them. Additionally, the random distributions for time delay values are allowed to yield infinite time, which corresponds to a case where the arm only gives a short-term reward. Finally, we evaluate our algorithms and compare this paradigm with previously known models on both a synthetic data set and a real data set that would reflect one of the potential applications of this model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一个新的框架:随机多臂强盗的短期和长期收益

随机多臂强盗(Stochastic Multi-Armed Bandit, MAB)由于其广泛的应用，近年来得到了广泛的研究。经典模型认为，被拉手臂的奖励是在一个时间延迟后观察到的，这个时间延迟是从分配给每个手臂的随机分布中抽样的。在本文中，我们提出了一个扩展的框架，在这个框架中，拉胳膊同时给予即时(短期)奖励和延迟(长期)奖励。短期和长期奖励的奖励值分布与先前已知的关系有关。手臂的时滞分布与手臂的奖励分布无关。在我们的工作中，我们设计了三个基于ucb的算法，其中两个是这个新模型的近最优后悔算法，每个算法都有相应的后悔分析。另外，时间延迟值的随机分布允许产生无限时间，这对应于手臂只给出短期奖励的情况。最后，我们评估了我们的算法，并将该范式与先前已知的模型在合成数据集和真实数据集上进行了比较，这将反映该模型的潜在应用之一。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE INFOCOM 2023 - IEEE Conference on Computer Communications

自引率

0.00%

发文量