{"title":"Online critic-identifier-actor algorithm for optimal control of nonlinear systems","authors":"H. Lin, Qinglai Wei, Derong Liu","doi":"10.1109/ICICIP.2015.7388204","DOIUrl":null,"url":null,"abstract":"In this paper, a novel critic-identifier-actor optimal control scheme is designed for discrete-time affine nonlinear systems with uncertainties. A neural identifier is established to learn the unknown control coefficient matrix for affine nonlinear system working together with an actor-critic-based scheme to solve the optimal control in online and forward-in-time manner without value or policy iterations. A critic network learns approximate value function at each step. Another actor network attempts to improve the current policy based on the approximate value function. The weights of all neural networks (NNs) are updated at each sampling instant. Lyapunov theory is utilized to prove the stability of the closed-loop system. A simulation example is provided to illustrate the effectiveness of the developed method.","PeriodicalId":265426,"journal":{"name":"2015 Sixth International Conference on Intelligent Control and Information Processing (ICICIP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Sixth International Conference on Intelligent Control and Information Processing (ICICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICIP.2015.7388204","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, a novel critic-identifier-actor optimal control scheme is designed for discrete-time affine nonlinear systems with uncertainties. A neural identifier is established to learn the unknown control coefficient matrix for affine nonlinear system working together with an actor-critic-based scheme to solve the optimal control in online and forward-in-time manner without value or policy iterations. A critic network learns approximate value function at each step. Another actor network attempts to improve the current policy based on the approximate value function. The weights of all neural networks (NNs) are updated at each sampling instant. Lyapunov theory is utilized to prove the stability of the closed-loop system. A simulation example is provided to illustrate the effectiveness of the developed method.