Uncertainty Aware Model Integration on Reinforcement Learning

2022 International Joint Conference on Neural Networks (IJCNN) Pub Date : 2022-07-18 DOI:10.1109/IJCNN55064.2022.9892778

Takashi Nagata, Jinwei Xing, Tsutomu Kumazawa, E. Neftci

{"title":"Uncertainty Aware Model Integration on Reinforcement Learning","authors":"Takashi Nagata, Jinwei Xing, Tsutomu Kumazawa, E. Neftci","doi":"10.1109/IJCNN55064.2022.9892778","DOIUrl":null,"url":null,"abstract":"Model-based reinforcement learning is an effective approach to reducing sample complexity by adding more data from the model. Dyna is a well-known architecture that contains model-based reinforcement learning and integrates learning from interactions with an environment and a model of the environment. Although the model can greatly help to speed up the agent's learning, acquiring an accurate model is a hard problem in spite of the recent great success of function approximation using neural networks. A wrong model causes degradation of the agent's performance and raises another question: to which extent should an agent rely on the model to update its policy? In this paper, we propose to use the confidence of the model simulations to the integrated learning process so that the agent avoids updating its policy based on uncertain simulations by the model. To obtain confidence, we apply the Monte Carlo dropout technique to the state transition model. We show that this approach contributes to improving early-stage training, thus helping speed up the agent to reach reasonable performance. We conduct experiments on simulated robotic locomotion tasks to demonstrate the effectiveness of our approach.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN55064.2022.9892778","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Model-based reinforcement learning is an effective approach to reducing sample complexity by adding more data from the model. Dyna is a well-known architecture that contains model-based reinforcement learning and integrates learning from interactions with an environment and a model of the environment. Although the model can greatly help to speed up the agent's learning, acquiring an accurate model is a hard problem in spite of the recent great success of function approximation using neural networks. A wrong model causes degradation of the agent's performance and raises another question: to which extent should an agent rely on the model to update its policy? In this paper, we propose to use the confidence of the model simulations to the integrated learning process so that the agent avoids updating its policy based on uncertain simulations by the model. To obtain confidence, we apply the Monte Carlo dropout technique to the state transition model. We show that this approach contributes to improving early-stage training, thus helping speed up the agent to reach reasonable performance. We conduct experiments on simulated robotic locomotion tasks to demonstrate the effectiveness of our approach.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于强化学习的不确定性感知模型集成

基于模型的强化学习是一种通过从模型中添加更多数据来降低样本复杂度的有效方法。Dyna是一个著名的架构，它包含基于模型的强化学习，并集成了从与环境和环境模型的交互中学习。尽管该模型可以极大地加快智能体的学习速度，但尽管近年来使用神经网络的函数逼近取得了巨大的成功，但获取准确的模型仍然是一个难题。一个错误的模型会导致代理的性能下降，并提出另一个问题:代理应该在多大程度上依赖模型来更新其策略?在本文中，我们提出将模型模拟的置信度用于集成学习过程，以避免智能体根据模型的不确定模拟更新其策略。为了获得置信度，我们将蒙特卡罗dropout技术应用于状态转移模型。我们表明，这种方法有助于改善早期训练，从而有助于加速智能体达到合理的性能。我们进行了模拟机器人运动任务的实验，以证明我们的方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 International Joint Conference on Neural Networks (IJCNN)

自引率

0.00%

发文量