V. Farias, C. Moallemi, Benjamin Van Roy, T. Weissman
{"title":"A universal scheme for learning","authors":"V. Farias, C. Moallemi, Benjamin Van Roy, T. Weissman","doi":"10.1109/ISIT.2005.1523523","DOIUrl":null,"url":null,"abstract":"We consider the problem of optimal control of a Kth order Markov process so as to minimize long-term average cost, a framework with many applications in communications and beyond. Specifically, we wish to do so without knowledge of either the transition kernel or even the order K. We develop and analyze two algorithms, based on the Lempel-Ziv scheme for data compression, that maintain probability estimates along variable length contexts. We establish that eventually, with probability 1, the optimal action is taken at each context. Further, in the case of the second algorithm, we establish almost sure asymptotic optimality","PeriodicalId":166130,"journal":{"name":"Proceedings. International Symposium on Information Theory, 2005. ISIT 2005.","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Symposium on Information Theory, 2005. ISIT 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIT.2005.1523523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We consider the problem of optimal control of a Kth order Markov process so as to minimize long-term average cost, a framework with many applications in communications and beyond. Specifically, we wish to do so without knowledge of either the transition kernel or even the order K. We develop and analyze two algorithms, based on the Lempel-Ziv scheme for data compression, that maintain probability estimates along variable length contexts. We establish that eventually, with probability 1, the optimal action is taken at each context. Further, in the case of the second algorithm, we establish almost sure asymptotic optimality