{"title":"Pure Exploration of Continuum-Armed Bandits under Concavity and Quadratic Growth Conditions","authors":"Xiaotian Yu","doi":"10.1109/CCAI57533.2023.10201299","DOIUrl":null,"url":null,"abstract":"The traditional setting for pure exploration of multi-armed bandits is to identify an optimal arm in a decision set, which contains a finite number of stochastic slot machines. The finite-arm setting restricts classic bandit algorithms, because the decision set for optimal selection can be continuous and infinite in many practical applications, e.g., determining the optimal parameter in communication networks. In this paper, to generalize bandits into wider real scenarios, we focus on the problem of pure exploration of Continuum-Armed Bandits (CAB), where the decision set is a compact and continuous set. Compared to the traditional setting of pure exploration, identifying the optimal arm in CAB raises new challenges, of which the most notorious one is the infinite number of arms. By fully taking advantage of the structure information of payoffs, we successfully solve the challenges. In particular, we derive an upper bound of sample complexity for pure exploration of CAB with concave structures via gradient methodology. More importantly, we develop a warm-restart algorithm to solve the problem where a quadratic growth condition is further satisfied, and derive an improved upper bound of sample complexity. Finally, we conduct experiments with real-world oracles to demonstrate the superiority of our warm-restart algorithm.","PeriodicalId":285760,"journal":{"name":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCAI57533.2023.10201299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The traditional setting for pure exploration of multi-armed bandits is to identify an optimal arm in a decision set, which contains a finite number of stochastic slot machines. The finite-arm setting restricts classic bandit algorithms, because the decision set for optimal selection can be continuous and infinite in many practical applications, e.g., determining the optimal parameter in communication networks. In this paper, to generalize bandits into wider real scenarios, we focus on the problem of pure exploration of Continuum-Armed Bandits (CAB), where the decision set is a compact and continuous set. Compared to the traditional setting of pure exploration, identifying the optimal arm in CAB raises new challenges, of which the most notorious one is the infinite number of arms. By fully taking advantage of the structure information of payoffs, we successfully solve the challenges. In particular, we derive an upper bound of sample complexity for pure exploration of CAB with concave structures via gradient methodology. More importantly, we develop a warm-restart algorithm to solve the problem where a quadratic growth condition is further satisfied, and derive an improved upper bound of sample complexity. Finally, we conduct experiments with real-world oracles to demonstrate the superiority of our warm-restart algorithm.