A New Framework: Short-Term and Long-Term Returns in Stochastic Multi-Armed Bandit

Abdalaziz Sawwan, Jie Wu
{"title":"A New Framework: Short-Term and Long-Term Returns in Stochastic Multi-Armed Bandit","authors":"Abdalaziz Sawwan, Jie Wu","doi":"10.1109/INFOCOM53939.2023.10228899","DOIUrl":null,"url":null,"abstract":"Stochastic Multi-Armed Bandit (MAB) has recently been studied widely due to its vast range of applications. The classic model considers the reward of a pulled arm to be observed after a time delay that is sampled from a random distribution assigned for each arm. In this paper, we propose an extended framework in which pulling an arm gives both an instant (short-term) reward and a delayed (long-term) reward at the same time. The distributions of reward values for short-term and long-term rewards are related with a previously known relationship. The distribution of time delay for an arm is independent of the reward distributions of the arm. In our work, we devise three UCB-based algorithms, where two of them are near-optimal-regret algorithms for this new model, with the corresponding regret analysis for each one of them. Additionally, the random distributions for time delay values are allowed to yield infinite time, which corresponds to a case where the arm only gives a short-term reward. Finally, we evaluate our algorithms and compare this paradigm with previously known models on both a synthetic data set and a real data set that would reflect one of the potential applications of this model.","PeriodicalId":387707,"journal":{"name":"IEEE INFOCOM 2023 - IEEE Conference on Computer Communications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2023 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM53939.2023.10228899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Stochastic Multi-Armed Bandit (MAB) has recently been studied widely due to its vast range of applications. The classic model considers the reward of a pulled arm to be observed after a time delay that is sampled from a random distribution assigned for each arm. In this paper, we propose an extended framework in which pulling an arm gives both an instant (short-term) reward and a delayed (long-term) reward at the same time. The distributions of reward values for short-term and long-term rewards are related with a previously known relationship. The distribution of time delay for an arm is independent of the reward distributions of the arm. In our work, we devise three UCB-based algorithms, where two of them are near-optimal-regret algorithms for this new model, with the corresponding regret analysis for each one of them. Additionally, the random distributions for time delay values are allowed to yield infinite time, which corresponds to a case where the arm only gives a short-term reward. Finally, we evaluate our algorithms and compare this paradigm with previously known models on both a synthetic data set and a real data set that would reflect one of the potential applications of this model.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一个新的框架:随机多臂强盗的短期和长期收益
随机多臂强盗(Stochastic Multi-Armed Bandit, MAB)由于其广泛的应用,近年来得到了广泛的研究。经典模型认为,被拉手臂的奖励是在一个时间延迟后观察到的,这个时间延迟是从分配给每个手臂的随机分布中抽样的。在本文中,我们提出了一个扩展的框架,在这个框架中,拉胳膊同时给予即时(短期)奖励和延迟(长期)奖励。短期和长期奖励的奖励值分布与先前已知的关系有关。手臂的时滞分布与手臂的奖励分布无关。在我们的工作中,我们设计了三个基于ucb的算法,其中两个是这个新模型的近最优后悔算法,每个算法都有相应的后悔分析。另外,时间延迟值的随机分布允许产生无限时间,这对应于手臂只给出短期奖励的情况。最后,我们评估了我们的算法,并将该范式与先前已知的模型在合成数据集和真实数据集上进行了比较,这将反映该模型的潜在应用之一。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
i-NVMe: Isolated NVMe over TCP for a Containerized Environment One Shot for All: Quick and Accurate Data Aggregation for LPWANs Joint Participation Incentive and Network Pricing Design for Federated Learning Buffer Awareness Neural Adaptive Video Streaming for Avoiding Extra Buffer Consumption Melody: Toward Resource-Efficient Packet Header Vector Encoding on Programmable Switches
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1