{"title":"线性匪徒的修正元汤普森抽样及其贝叶斯后悔分析","authors":"Hao Li, Dong Liang, Zheng Xie","doi":"arxiv-2409.06329","DOIUrl":null,"url":null,"abstract":"Meta-learning is characterized by its ability to learn how to learn, enabling\nthe adaptation of learning strategies across different tasks. Recent research\nintroduced the Meta-Thompson Sampling (Meta-TS), which meta-learns an unknown\nprior distribution sampled from a meta-prior by interacting with bandit\ninstances drawn from it. However, its analysis was limited to Gaussian bandit.\nThe contextual multi-armed bandit framework is an extension of the Gaussian\nBandit, which challenges agent to utilize context vectors to predict the most\nvaluable arms, optimally balancing exploration and exploitation to minimize\nregret over time. This paper introduces Meta-TSLB algorithm, a modified Meta-TS\nfor linear contextual bandits. We theoretically analyze Meta-TSLB and derive an\n$ O\\left( \\left( m+\\log \\left( m \\right) \\right) \\sqrt{n\\log \\left( n \\right)}\n\\right)$ bound on its Bayes regret, in which $m$ represents the number of\nbandit instances, and $n$ the number of rounds of Thompson Sampling.\nAdditionally, our work complements the analysis of Meta-TS for linear\ncontextual bandits. The performance of Meta-TSLB is evaluated experimentally\nunder different settings, and we experimente and analyze the generalization\ncapability of Meta-TSLB, showcasing its potential to adapt to unseen instances.","PeriodicalId":501340,"journal":{"name":"arXiv - STAT - Machine Learning","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis\",\"authors\":\"Hao Li, Dong Liang, Zheng Xie\",\"doi\":\"arxiv-2409.06329\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Meta-learning is characterized by its ability to learn how to learn, enabling\\nthe adaptation of learning strategies across different tasks. Recent research\\nintroduced the Meta-Thompson Sampling (Meta-TS), which meta-learns an unknown\\nprior distribution sampled from a meta-prior by interacting with bandit\\ninstances drawn from it. However, its analysis was limited to Gaussian bandit.\\nThe contextual multi-armed bandit framework is an extension of the Gaussian\\nBandit, which challenges agent to utilize context vectors to predict the most\\nvaluable arms, optimally balancing exploration and exploitation to minimize\\nregret over time. This paper introduces Meta-TSLB algorithm, a modified Meta-TS\\nfor linear contextual bandits. We theoretically analyze Meta-TSLB and derive an\\n$ O\\\\left( \\\\left( m+\\\\log \\\\left( m \\\\right) \\\\right) \\\\sqrt{n\\\\log \\\\left( n \\\\right)}\\n\\\\right)$ bound on its Bayes regret, in which $m$ represents the number of\\nbandit instances, and $n$ the number of rounds of Thompson Sampling.\\nAdditionally, our work complements the analysis of Meta-TS for linear\\ncontextual bandits. The performance of Meta-TSLB is evaluated experimentally\\nunder different settings, and we experimente and analyze the generalization\\ncapability of Meta-TSLB, showcasing its potential to adapt to unseen instances.\",\"PeriodicalId\":501340,\"journal\":{\"name\":\"arXiv - STAT - Machine Learning\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.06329\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis
Meta-learning is characterized by its ability to learn how to learn, enabling
the adaptation of learning strategies across different tasks. Recent research
introduced the Meta-Thompson Sampling (Meta-TS), which meta-learns an unknown
prior distribution sampled from a meta-prior by interacting with bandit
instances drawn from it. However, its analysis was limited to Gaussian bandit.
The contextual multi-armed bandit framework is an extension of the Gaussian
Bandit, which challenges agent to utilize context vectors to predict the most
valuable arms, optimally balancing exploration and exploitation to minimize
regret over time. This paper introduces Meta-TSLB algorithm, a modified Meta-TS
for linear contextual bandits. We theoretically analyze Meta-TSLB and derive an
$ O\left( \left( m+\log \left( m \right) \right) \sqrt{n\log \left( n \right)}
\right)$ bound on its Bayes regret, in which $m$ represents the number of
bandit instances, and $n$ the number of rounds of Thompson Sampling.
Additionally, our work complements the analysis of Meta-TS for linear
contextual bandits. The performance of Meta-TSLB is evaluated experimentally
under different settings, and we experimente and analyze the generalization
capability of Meta-TSLB, showcasing its potential to adapt to unseen instances.