随机环境下高效策略优化的混合强化学习

Yao Mu, Baiyu Peng, Ziqing Gu, S. Li, Chang Liu, Bingbing Nie, Jianfeng Zheng, Bo Zhang
{"title":"随机环境下高效策略优化的混合强化学习","authors":"Yao Mu, Baiyu Peng, Ziqing Gu, S. Li, Chang Liu, Bingbing Nie, Jianfeng Zheng, Bo Zhang","doi":"10.23919/ICCAS50221.2020.9268413","DOIUrl":null,"url":null,"abstract":"Reinforcement learning has the potential to control stochastic nonlinear systems in optimal manners successfully. We propose a mixed reinforcement learning (mixed RL) algorithm by simultaneously using dual representations of environmental dynamics to search the optimal policy. The dual representation includes an empirical dynamic model and a set of state-action data. The former can embed the designer’s knowledge and reduce the difficulty of learning, and the latter can be used to compensate the model inaccuracy since it reflects the real system dynamics accurately. Such a design has the capability of improving both learning accuracy and training speed. In the mixed RL framework, the additive uncertainty of stochastic model is compensated by using explored state-action data via iterative Bayesian estimator (IBE). The optimal policy is then computed in an iterative way by alternating between policy evaluation (PEV) and policy improvement (PIM). The effectiveness of mixed RL is demonstrated by a typical optimal control problem of stochastic non-affine nonlinear systems (i.e., double lane change task with an automated vehicle).","PeriodicalId":6732,"journal":{"name":"2020 20th International Conference on Control, Automation and Systems (ICCAS)","volume":"59 1","pages":"1212-1219"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Mixed Reinforcement Learning for Efficient Policy Optimization in Stochastic Environments\",\"authors\":\"Yao Mu, Baiyu Peng, Ziqing Gu, S. Li, Chang Liu, Bingbing Nie, Jianfeng Zheng, Bo Zhang\",\"doi\":\"10.23919/ICCAS50221.2020.9268413\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning has the potential to control stochastic nonlinear systems in optimal manners successfully. We propose a mixed reinforcement learning (mixed RL) algorithm by simultaneously using dual representations of environmental dynamics to search the optimal policy. The dual representation includes an empirical dynamic model and a set of state-action data. The former can embed the designer’s knowledge and reduce the difficulty of learning, and the latter can be used to compensate the model inaccuracy since it reflects the real system dynamics accurately. Such a design has the capability of improving both learning accuracy and training speed. In the mixed RL framework, the additive uncertainty of stochastic model is compensated by using explored state-action data via iterative Bayesian estimator (IBE). The optimal policy is then computed in an iterative way by alternating between policy evaluation (PEV) and policy improvement (PIM). The effectiveness of mixed RL is demonstrated by a typical optimal control problem of stochastic non-affine nonlinear systems (i.e., double lane change task with an automated vehicle).\",\"PeriodicalId\":6732,\"journal\":{\"name\":\"2020 20th International Conference on Control, Automation and Systems (ICCAS)\",\"volume\":\"59 1\",\"pages\":\"1212-1219\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 20th International Conference on Control, Automation and Systems (ICCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/ICCAS50221.2020.9268413\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 20th International Conference on Control, Automation and Systems (ICCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ICCAS50221.2020.9268413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

强化学习具有以最优方式成功控制随机非线性系统的潜力。我们提出了一种混合强化学习(mixed RL)算法,该算法同时使用环境动力学的对偶表示来搜索最优策略。对偶表示包括一个经验动态模型和一组状态-行为数据。前者能嵌入设计者的知识,降低学习难度;后者能准确反映系统的真实动态,可用于补偿模型的不准确性。这样的设计既能提高学习精度,又能提高训练速度。在混合RL框架中,随机模型的可加性不确定性通过迭代贝叶斯估计器(IBE)来补偿。然后,通过策略评估(PEV)和策略改进(PIM)之间的交替,以迭代的方式计算最优策略。通过一个典型的随机非仿射非线性系统的最优控制问题(即自动车辆的双变道任务)证明了混合强化学习的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Mixed Reinforcement Learning for Efficient Policy Optimization in Stochastic Environments
Reinforcement learning has the potential to control stochastic nonlinear systems in optimal manners successfully. We propose a mixed reinforcement learning (mixed RL) algorithm by simultaneously using dual representations of environmental dynamics to search the optimal policy. The dual representation includes an empirical dynamic model and a set of state-action data. The former can embed the designer’s knowledge and reduce the difficulty of learning, and the latter can be used to compensate the model inaccuracy since it reflects the real system dynamics accurately. Such a design has the capability of improving both learning accuracy and training speed. In the mixed RL framework, the additive uncertainty of stochastic model is compensated by using explored state-action data via iterative Bayesian estimator (IBE). The optimal policy is then computed in an iterative way by alternating between policy evaluation (PEV) and policy improvement (PIM). The effectiveness of mixed RL is demonstrated by a typical optimal control problem of stochastic non-affine nonlinear systems (i.e., double lane change task with an automated vehicle).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Real-time quadrotor actuator fault detection and isolation using multivariate statistical analysis techniques with sensor measurements Autonomous docking of an Unmanned Surface Vehicle based on Reachability Analysis Clutch Torque Estimation of Ball-ramp Dual Clutch Transmission using Higher Order Disturbance Observer Robust Traffic Light Detection and Classification Under Day and Night Conditions Visual Surveillance using Deep Reinforcement Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1