Mixed Reinforcement Learning for Efficient Policy Optimization in Stochastic Environments

2020 20th International Conference on Control, Automation and Systems (ICCAS) Pub Date : 2020-10-13 DOI:10.23919/ICCAS50221.2020.9268413

Yao Mu, Baiyu Peng, Ziqing Gu, S. Li, Chang Liu, Bingbing Nie, Jianfeng Zheng, Bo Zhang

{"title":"Mixed Reinforcement Learning for Efficient Policy Optimization in Stochastic Environments","authors":"Yao Mu, Baiyu Peng, Ziqing Gu, S. Li, Chang Liu, Bingbing Nie, Jianfeng Zheng, Bo Zhang","doi":"10.23919/ICCAS50221.2020.9268413","DOIUrl":null,"url":null,"abstract":"Reinforcement learning has the potential to control stochastic nonlinear systems in optimal manners successfully. We propose a mixed reinforcement learning (mixed RL) algorithm by simultaneously using dual representations of environmental dynamics to search the optimal policy. The dual representation includes an empirical dynamic model and a set of state-action data. The former can embed the designer’s knowledge and reduce the difficulty of learning, and the latter can be used to compensate the model inaccuracy since it reflects the real system dynamics accurately. Such a design has the capability of improving both learning accuracy and training speed. In the mixed RL framework, the additive uncertainty of stochastic model is compensated by using explored state-action data via iterative Bayesian estimator (IBE). The optimal policy is then computed in an iterative way by alternating between policy evaluation (PEV) and policy improvement (PIM). The effectiveness of mixed RL is demonstrated by a typical optimal control problem of stochastic non-affine nonlinear systems (i.e., double lane change task with an automated vehicle).","PeriodicalId":6732,"journal":{"name":"2020 20th International Conference on Control, Automation and Systems (ICCAS)","volume":"59 1","pages":"1212-1219"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 20th International Conference on Control, Automation and Systems (ICCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ICCAS50221.2020.9268413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Reinforcement learning has the potential to control stochastic nonlinear systems in optimal manners successfully. We propose a mixed reinforcement learning (mixed RL) algorithm by simultaneously using dual representations of environmental dynamics to search the optimal policy. The dual representation includes an empirical dynamic model and a set of state-action data. The former can embed the designer’s knowledge and reduce the difficulty of learning, and the latter can be used to compensate the model inaccuracy since it reflects the real system dynamics accurately. Such a design has the capability of improving both learning accuracy and training speed. In the mixed RL framework, the additive uncertainty of stochastic model is compensated by using explored state-action data via iterative Bayesian estimator (IBE). The optimal policy is then computed in an iterative way by alternating between policy evaluation (PEV) and policy improvement (PIM). The effectiveness of mixed RL is demonstrated by a typical optimal control problem of stochastic non-affine nonlinear systems (i.e., double lane change task with an automated vehicle).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

随机环境下高效策略优化的混合强化学习

强化学习具有以最优方式成功控制随机非线性系统的潜力。我们提出了一种混合强化学习(mixed RL)算法，该算法同时使用环境动力学的对偶表示来搜索最优策略。对偶表示包括一个经验动态模型和一组状态-行为数据。前者能嵌入设计者的知识，降低学习难度;后者能准确反映系统的真实动态，可用于补偿模型的不准确性。这样的设计既能提高学习精度，又能提高训练速度。在混合RL框架中，随机模型的可加性不确定性通过迭代贝叶斯估计器(IBE)来补偿。然后，通过策略评估(PEV)和策略改进(PIM)之间的交替，以迭代的方式计算最优策略。通过一个典型的随机非仿射非线性系统的最优控制问题(即自动车辆的双变道任务)证明了混合强化学习的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 20th International Conference on Control, Automation and Systems (ICCAS)

自引率

0.00%

发文量