Restless bandits that hide their hand and recommendation systems

R. Meshram, Aditya Gopalan, D. Manjunath
{"title":"Restless bandits that hide their hand and recommendation systems","authors":"R. Meshram, Aditya Gopalan, D. Manjunath","doi":"10.1109/COMSNETS.2017.7945378","DOIUrl":null,"url":null,"abstract":"We consider a restless multi-armed bandit (RMAB) in which each arm can be in one of two states, say 0 or 1. Playing the arm brings it to state 0 with probability one and not playing it induces state transitions with arm-dependent probabilities. Playing an arm generates a unit reward with a probability that depends on the state of the arm. The belief about the state of the arm can be calculated using a Bayesian update after every play. This RMAB has been designed for use in recommendation systems which in turn can be used in applications like creating of playlists or placement of advertisements. In this paper we analyse the RMAB by first showing that it is Whittle-indexable and then obtain a closed form expression for the Whittle index for each arm calculated from the belief about its state and the parameters that describe the arm. For an RMAB to be useful in practice, we need to be able to learn the parameters of the arms. We present an algorithm derived from Thompson sampling scheme, that learns the parameters of the arms and also evaluate its performance numerically.","PeriodicalId":168357,"journal":{"name":"2017 9th International Conference on Communication Systems and Networks (COMSNETS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 9th International Conference on Communication Systems and Networks (COMSNETS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSNETS.2017.7945378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

We consider a restless multi-armed bandit (RMAB) in which each arm can be in one of two states, say 0 or 1. Playing the arm brings it to state 0 with probability one and not playing it induces state transitions with arm-dependent probabilities. Playing an arm generates a unit reward with a probability that depends on the state of the arm. The belief about the state of the arm can be calculated using a Bayesian update after every play. This RMAB has been designed for use in recommendation systems which in turn can be used in applications like creating of playlists or placement of advertisements. In this paper we analyse the RMAB by first showing that it is Whittle-indexable and then obtain a closed form expression for the Whittle index for each arm calculated from the belief about its state and the parameters that describe the arm. For an RMAB to be useful in practice, we need to be able to learn the parameters of the arms. We present an algorithm derived from Thompson sampling scheme, that learns the parameters of the arms and also evaluate its performance numerically.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
不安分的强盗隐藏他们的手和推荐系统
我们考虑一个不安分的多臂强盗(RMAB),其中每条手臂可以处于两种状态之一,比如0或1。玩手臂会使它以1的概率进入状态0,不玩手臂会导致状态转换,其概率与手臂相关。玩一只手臂会产生一个单位奖励,其概率取决于手臂的状态。关于手臂状态的信念可以在每次比赛后使用贝叶斯更新来计算。这个RMAB被设计用于推荐系统,而推荐系统又可以用于创建播放列表或放置广告等应用程序。本文首先对RMAB进行了分析,证明了它是可惠特尔索引的,然后得到了每个臂的惠特尔指数的封闭形式表达式,该表达式由臂的状态和描述臂的参数的信念计算得到。为了使RMAB在实践中有用,我们需要能够学习臂的参数。我们提出了一种基于汤普森采样方案的算法,该算法学习了机械臂的参数,并对其性能进行了数值评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Opp-relay: Managing directionality and mobility issues of millimeter-wave via D2D communication mm-Wave on wheels: Practical 60 GHz vehicular communication without beam training Social network visualization: Does partial edges affect user comprehension? Efficient Twitter sentiment classification using subjective distant supervision Hydra: Leveraging functional slicing for efficient distributed SDN controllers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1