{"title":"Kernel-based methods for bandit convex optimization","authors":"Sébastien Bubeck, Ronen Eldan, Y. Lee","doi":"10.1145/3055399.3055403","DOIUrl":null,"url":null,"abstract":"We consider the adversarial convex bandit problem and we build the first poly(T)-time algorithm with poly(n) √T-regret for this problem. To do so we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves Õ(n9.5 #8730;T)-regret, and we show that a simple variant of this algorithm can be run in poly(n log(T))-time per step at the cost of an additional poly(n) To(1) factor in the regret. These results improve upon the Õ(n11 #8730;T)-regret and exp(poly(T))-time result of the first two authors, and the log(T)poly(n) #8730;T-regret and log(T)poly(n)-time result of Hazan and Li. Furthermore we conjecture that another variant of the algorithm could achieve Õ(n1.5 #8730;T)-regret, and moreover that this regret is unimprovable (the current best lower bound being Ω(n #8730;T) and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order n3 / ϵ2.","PeriodicalId":20615,"journal":{"name":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"147","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3055399.3055403","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 147
基于核的强盗凸优化方法
我们考虑了对抗性凸强盗问题,并针对该问题建立了第一个具有poly(n)√T-regret的poly(T) time算法。为此,我们在无导数优化文献中引入了三个新思想:(i)核方法,(ii)伯努利卷积的推广,以及(iii)指数权重的新退火计划(学习率增加)。我们的算法的基本版本实现了Õ(n9.5 #8730;T)-遗憾,并且我们证明了该算法的一个简单变体可以在每一步的poly(n log(T))时间内运行,代价是在遗憾中增加一个额外的poly(n) To(1)因子。这些结果改进了前两位作者的Õ(n11 #8730;T)-regret和exp(poly(T))-time结果,以及Hazan和Li的log(T)poly(n) #8730;T-regret和log(T)poly(n)-time结果。此外,我们推测该算法的另一种变体可以实现Õ(n1.5 #8730;T)-遗憾,而且这种遗憾是不可改进的(目前最好的下界是Ω(n# 8730;T),它是用线性函数实现的)。对于零阶随机凸优化的简单情况,这对应于最优查询复杂度为阶n3 / ϵ2的猜想。
本文章由计算机程序翻译,如有差异,请以英文原文为准。