{"title":"Improving Reinforcement Learning Pre-Training with Variational Dropout","authors":"Tom Blau, Lionel Ott, F. Ramos","doi":"10.1109/IROS.2018.8594341","DOIUrl":null,"url":null,"abstract":"Reinforcement learning has been very successful at learning control policies for robotic agents in order to perform various tasks, such as driving around a track, navigating a maze, and bipedal locomotion. One significant drawback of reinforcement learning methods is that they require a large number of data points in order to learn good policies, a trait known as poor data efficiency or poor sample efficiency. One approach for improving sample efficiency is supervised pre-training of policies to directly clone the behavior of an expert, but this suffers from poor generalization far from the training data. We propose to improve this by using Gaussian dropout networks with a regularization term based on variational inference in the pre-training step. We show that this initializes policy parameters to significantly better values than standard supervised learning or random initialization, thus greatly reducing sample complexity compared with state-of-the-art methods, and enabling an RL algorithm to learn optimal policies for high-dimensional continuous control problems in a practical time frame.","PeriodicalId":6640,"journal":{"name":"2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)","volume":"115 1","pages":"4115-4122"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IROS.2018.8594341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Reinforcement learning has been very successful at learning control policies for robotic agents in order to perform various tasks, such as driving around a track, navigating a maze, and bipedal locomotion. One significant drawback of reinforcement learning methods is that they require a large number of data points in order to learn good policies, a trait known as poor data efficiency or poor sample efficiency. One approach for improving sample efficiency is supervised pre-training of policies to directly clone the behavior of an expert, but this suffers from poor generalization far from the training data. We propose to improve this by using Gaussian dropout networks with a regularization term based on variational inference in the pre-training step. We show that this initializes policy parameters to significantly better values than standard supervised learning or random initialization, thus greatly reducing sample complexity compared with state-of-the-art methods, and enabling an RL algorithm to learn optimal policies for high-dimensional continuous control problems in a practical time frame.