{"title":"Uncertainty Estimation based Intrinsic Reward For Efficient Reinforcement Learning","authors":"Chao Chen, Tianjiao Wan, Peichang Shi, Bo Ding, Zijian Gao, Dawei Feng","doi":"10.1109/JCC56315.2022.00008","DOIUrl":null,"url":null,"abstract":"For reinforcement learning, the extrinsic reward is a core factor for the learning process which however can be very sparse or completely missing. In response, researchers have proposed the idea of intrinsic reward, such as encouraging the agent to visit novel states through prediction error. However, the deep prediction model can provide over-confident and miscalibrated predictions. To mitigate the impact of inaccurate prediction, previous research applied deep ensembles and achieved superior results, despite the increased computation and storage space. In this paper, inspired by the uncertainty estimation, we leverage Monte Carlo Dropout to generate intrinsic reward from the perspective of uncertainty estimation with the goal to decrease the demands for computing resources while retaining superior performance. Utilizing the simple yet effective approach, we conduct extensive experiments across a variety of benchmark environments. The experimental results suggest that our method provides a competitive performance in final score and is faster in running speed, while requiring much fewer computing resources and storage space.","PeriodicalId":239996,"journal":{"name":"2022 IEEE International Conference on Joint Cloud Computing (JCC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Joint Cloud Computing (JCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCC56315.2022.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
For reinforcement learning, the extrinsic reward is a core factor for the learning process which however can be very sparse or completely missing. In response, researchers have proposed the idea of intrinsic reward, such as encouraging the agent to visit novel states through prediction error. However, the deep prediction model can provide over-confident and miscalibrated predictions. To mitigate the impact of inaccurate prediction, previous research applied deep ensembles and achieved superior results, despite the increased computation and storage space. In this paper, inspired by the uncertainty estimation, we leverage Monte Carlo Dropout to generate intrinsic reward from the perspective of uncertainty estimation with the goal to decrease the demands for computing resources while retaining superior performance. Utilizing the simple yet effective approach, we conduct extensive experiments across a variety of benchmark environments. The experimental results suggest that our method provides a competitive performance in final score and is faster in running speed, while requiring much fewer computing resources and storage space.