An Actor-Critic Reinforcement Learning Control Approach for Discrete-Time Linear System with Uncertainty

Hsin-Chang Chen, Yu‐Chen Lin, Yu-Heng Chang
{"title":"An Actor-Critic Reinforcement Learning Control Approach for Discrete-Time Linear System with Uncertainty","authors":"Hsin-Chang Chen, Yu‐Chen Lin, Yu-Heng Chang","doi":"10.1109/CACS.2018.8606740","DOIUrl":null,"url":null,"abstract":"This paper is concerned with an adaptive optimal controller based an actor-critic architecture for solving discrete-time linear system with uncertainty. The actor-critic reinforcement learning progress is similar to the produce of dopamine in human brain and the mechanism which acts on the motoneuron, which dopamine enhances specific actions by reinforce the synaptic contact of the frontal lobe. As same as artificial intelligence (AI), it means the reward signal of dopamine in the neural network can be used to adjust weights in artificial neural which makes the system find the right way to solve the work. The actor-critic scheme is applied to solve the dynamic programming equation problem, using actor and critic neural networks (NNs) for solving optimal controller and optimal value function, respectively. The weights of actor and critic NNs are updated using policy gradient and recursive least squares temporal-difference learning (RLS-TD) scheme at each sampling instant. Finally, time and frequency domain simulations performed using a typical quarter-car suspension systems that an active suspension systems with the proposed control strategy is able to improve ride comfort significantly, compared with the conventional passive suspension systems.","PeriodicalId":282633,"journal":{"name":"2018 International Automatic Control Conference (CACS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Automatic Control Conference (CACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CACS.2018.8606740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

This paper is concerned with an adaptive optimal controller based an actor-critic architecture for solving discrete-time linear system with uncertainty. The actor-critic reinforcement learning progress is similar to the produce of dopamine in human brain and the mechanism which acts on the motoneuron, which dopamine enhances specific actions by reinforce the synaptic contact of the frontal lobe. As same as artificial intelligence (AI), it means the reward signal of dopamine in the neural network can be used to adjust weights in artificial neural which makes the system find the right way to solve the work. The actor-critic scheme is applied to solve the dynamic programming equation problem, using actor and critic neural networks (NNs) for solving optimal controller and optimal value function, respectively. The weights of actor and critic NNs are updated using policy gradient and recursive least squares temporal-difference learning (RLS-TD) scheme at each sampling instant. Finally, time and frequency domain simulations performed using a typical quarter-car suspension systems that an active suspension systems with the proposed control strategy is able to improve ride comfort significantly, compared with the conventional passive suspension systems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种具有不确定性的离散线性系统的Actor-Critic强化学习控制方法
针对具有不确定性的离散线性系统,研究了一种基于角色评价体系结构的自适应最优控制器。行为-批评强化学习过程类似于人脑中多巴胺的产生及其作用于运动神经元的机制,多巴胺通过加强额叶的突触接触来增强特定行为。与人工智能(AI)一样,这意味着神经网络中多巴胺的奖励信号可以用来调节人工神经中的权重,使系统找到正确的方法来解决工作。采用参与者-批评者方案求解动态规划方程问题,使用参与者和批评者神经网络(nn)分别求解最优控制器和最优值函数。在每个采样时刻,使用策略梯度和递归最小二乘时间差学习(RLS-TD)方案更新行动者和评论家神经网络的权重。最后,使用典型的四分之一汽车悬架系统进行了时域和频域仿真,结果表明,与传统的被动悬架系统相比,采用所提出的控制策略的主动悬架系统能够显著提高乘坐舒适性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Implementation of PD (Proportional Derivative) Control System On Six-Legged Wall Follower Robot A Wireless Control Mobile Hoist System The robot for recycling based on machine learning Object Transportation Using Networked Mobile Manipulators without Force/Torque Sensors A Method for Finding the Routes of Mazes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1