{"title":"Importance sampling for model-based reinforcement learning","authors":"Orhan Sonmez, A. Cemgil","doi":"10.1109/SIU.2012.6204703","DOIUrl":null,"url":null,"abstract":"Most of the state-of-the-art reinforcement learning algorithms are based on Bellman equations and make use of fixed-point iteration methods to converge to suboptimal solutions. However, some of the recent approaches transform the reinforcement learning problem into an equivalent likelihood maximization problem with using appropriate graphical models. Hence, it allows the adoption of probabilistic inference methods. Here, we propose an expectation-maximization method that employs importance sampling in its E-step in order to estimate the likelihood and then to determine the optimal policy.","PeriodicalId":256154,"journal":{"name":"2012 20th Signal Processing and Communications Applications Conference (SIU)","volume":"30 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 20th Signal Processing and Communications Applications Conference (SIU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU.2012.6204703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Most of the state-of-the-art reinforcement learning algorithms are based on Bellman equations and make use of fixed-point iteration methods to converge to suboptimal solutions. However, some of the recent approaches transform the reinforcement learning problem into an equivalent likelihood maximization problem with using appropriate graphical models. Hence, it allows the adoption of probabilistic inference methods. Here, we propose an expectation-maximization method that employs importance sampling in its E-step in order to estimate the likelihood and then to determine the optimal policy.