Kernel-based methods for bandit convex optimization

Sébastien Bubeck, Ronen Eldan, Y. Lee
{"title":"Kernel-based methods for bandit convex optimization","authors":"Sébastien Bubeck, Ronen Eldan, Y. Lee","doi":"10.1145/3055399.3055403","DOIUrl":null,"url":null,"abstract":"We consider the adversarial convex bandit problem and we build the first poly(T)-time algorithm with poly(n) √T-regret for this problem. To do so we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves Õ(n9.5 #8730;T)-regret, and we show that a simple variant of this algorithm can be run in poly(n log(T))-time per step at the cost of an additional poly(n) To(1) factor in the regret. These results improve upon the Õ(n11 #8730;T)-regret and exp(poly(T))-time result of the first two authors, and the log(T)poly(n) #8730;T-regret and log(T)poly(n)-time result of Hazan and Li. Furthermore we conjecture that another variant of the algorithm could achieve Õ(n1.5 #8730;T)-regret, and moreover that this regret is unimprovable (the current best lower bound being Ω(n #8730;T) and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order n3 / ϵ2.","PeriodicalId":20615,"journal":{"name":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"147","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3055399.3055403","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 147

Abstract

We consider the adversarial convex bandit problem and we build the first poly(T)-time algorithm with poly(n) √T-regret for this problem. To do so we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves Õ(n9.5 #8730;T)-regret, and we show that a simple variant of this algorithm can be run in poly(n log(T))-time per step at the cost of an additional poly(n) To(1) factor in the regret. These results improve upon the Õ(n11 #8730;T)-regret and exp(poly(T))-time result of the first two authors, and the log(T)poly(n) #8730;T-regret and log(T)poly(n)-time result of Hazan and Li. Furthermore we conjecture that another variant of the algorithm could achieve Õ(n1.5 #8730;T)-regret, and moreover that this regret is unimprovable (the current best lower bound being Ω(n #8730;T) and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order n3 / ϵ2.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于核的强盗凸优化方法
我们考虑了对抗性凸强盗问题,并针对该问题建立了第一个具有poly(n)√T-regret的poly(T) time算法。为此,我们在无导数优化文献中引入了三个新思想:(i)核方法,(ii)伯努利卷积的推广,以及(iii)指数权重的新退火计划(学习率增加)。我们的算法的基本版本实现了Õ(n9.5 #8730;T)-遗憾,并且我们证明了该算法的一个简单变体可以在每一步的poly(n log(T))时间内运行,代价是在遗憾中增加一个额外的poly(n) To(1)因子。这些结果改进了前两位作者的Õ(n11 #8730;T)-regret和exp(poly(T))-time结果,以及Hazan和Li的log(T)poly(n) #8730;T-regret和log(T)poly(n)-time结果。此外,我们推测该算法的另一种变体可以实现Õ(n1.5 #8730;T)-遗憾,而且这种遗憾是不可改进的(目前最好的下界是Ω(n# 8730;T),它是用线性函数实现的)。对于零阶随机凸优化的简单情况,这对应于最优查询复杂度为阶n3 / ϵ2的猜想。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Online service with delay A simpler and faster strongly polynomial algorithm for generalized flow maximization Low rank approximation with entrywise l1-norm error Fast convergence of learning in games (invited talk) Surviving in directed graphs: a quasi-polynomial-time polylogarithmic approximation for two-connected directed Steiner tree
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1