Invariant Lipschitz Bandits: A Side Observation Approach

Nam-Phuong Tran, The-Anh Ta, Long Tran-Thanh
{"title":"Invariant Lipschitz Bandits: A Side Observation Approach","authors":"Nam-Phuong Tran, The-Anh Ta, Long Tran-Thanh","doi":"10.48550/arXiv.2212.07524","DOIUrl":null,"url":null,"abstract":"Symmetry arises in many optimization and decision-making problems, and has attracted considerable attention from the optimization community: By utilizing the existence of such symmetries, the process of searching for optimal solutions can be improved significantly. Despite its success in (offline) optimization, the utilization of symmetries has not been well examined within the online optimization settings, especially in the bandit literature. As such, in this paper we study the invariant Lipschitz bandit setting, a subclass of the Lipschitz bandits where the reward function and the set of arms are preserved under a group of transformations. We introduce an algorithm named \\texttt{UniformMesh-N}, which naturally integrates side observations using group orbits into the \\texttt{UniformMesh} algorithm (\\cite{Kleinberg2005_UniformMesh}), which uniformly discretizes the set of arms. Using the side-observation approach, we prove an improved regret upper bound, which depends on the cardinality of the group, given that the group is finite. We also prove a matching regret's lower bound for the invariant Lipschitz bandit class (up to logarithmic factors). We hope that our work will ignite further investigation of symmetry in bandit theory and sequential decision-making theory in general.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.07524","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Symmetry arises in many optimization and decision-making problems, and has attracted considerable attention from the optimization community: By utilizing the existence of such symmetries, the process of searching for optimal solutions can be improved significantly. Despite its success in (offline) optimization, the utilization of symmetries has not been well examined within the online optimization settings, especially in the bandit literature. As such, in this paper we study the invariant Lipschitz bandit setting, a subclass of the Lipschitz bandits where the reward function and the set of arms are preserved under a group of transformations. We introduce an algorithm named \texttt{UniformMesh-N}, which naturally integrates side observations using group orbits into the \texttt{UniformMesh} algorithm (\cite{Kleinberg2005_UniformMesh}), which uniformly discretizes the set of arms. Using the side-observation approach, we prove an improved regret upper bound, which depends on the cardinality of the group, given that the group is finite. We also prove a matching regret's lower bound for the invariant Lipschitz bandit class (up to logarithmic factors). We hope that our work will ignite further investigation of symmetry in bandit theory and sequential decision-making theory in general.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
不变Lipschitz强盗:一种侧面观察方法
对称性出现在许多优化和决策问题中,并引起了优化界的广泛关注:通过利用这种对称性的存在,可以显著改善搜索最优解的过程。尽管它在(离线)优化方面取得了成功,但在在线优化设置中,特别是在强盗文献中,对称性的利用还没有得到很好的检验。因此,本文研究了不变Lipschitz匪盗集合,即在一组变换下奖励函数和武器集合保持不变的Lipschitz匪盗的子类。我们引入了一种名为\texttt{UniformMesh- n}的算法,该算法自然地将使用群轨道的侧面观测整合到\texttt{UniformMesh}算法(\cite{Kleinberg2005_UniformMesh})中,从而均匀地离散化臂集。利用侧观察方法,我们证明了一个改进的遗憾上界,它取决于群体的基数,给定群体是有限的。我们还证明了不变Lipschitz强盗类的匹配遗憾下界(直到对数因子)。我们希望我们的工作能够点燃对强盗理论和顺序决策理论中对称性的进一步研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Explaining Full-disk Deep Learning Model for Solar Flare Prediction using Attribution Methods Offline Reinforcement Learning with On-Policy Q-Function Regularization Visualizing Overlapping Biclusterings and Boolean Matrix Factorizations An Examination of Wearable Sensors and Video Data Capture for Human Exercise Classification Online Network Source Optimization with Graph-Kernel MAB
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1