{"title":"连续时间框架下基于广义策略迭代的自适应最优控制器","authors":"D. Vrabie, K. Vamvoudakis, F. Lewis","doi":"10.1109/MED.2009.5164743","DOIUrl":null,"url":null,"abstract":"In this paper we present two adaptive algorithms which offer solution to the continuous-time optimal control problem for nonlinear, affine in the inputs, time-invariant systems. Both algorithms were developed based on the Generalized Policy Iteration technique and involve adaptation of two neural network structures namely Actor, providing the control signal, and Critic, performing evaluation of the control performance. Despite the similarities, the two adaptive algorithms differ in the manner in which the adaptation takes place, required knowledge on the system dynamics, and formulation of the persistence of excitation requirement. The main difference is that one algorithm uses sequential adaptation of the actor and critic structures, i.e. while one is trained the other one is kept constant, while for the second algorithm the two neural networks are trained synchronously in a continuous-time fashion. The two algorithms are described in detail and proof of convergence is provided. Simulation results of applying the two algorithms for finding the optimal state feedback controller of a nonlinear system are also presented.","PeriodicalId":422386,"journal":{"name":"2009 17th Mediterranean Conference on Control and Automation","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":"{\"title\":\"Adaptive optimal controllers based on Generalized Policy Iteration in a continuous-time framework\",\"authors\":\"D. Vrabie, K. Vamvoudakis, F. Lewis\",\"doi\":\"10.1109/MED.2009.5164743\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present two adaptive algorithms which offer solution to the continuous-time optimal control problem for nonlinear, affine in the inputs, time-invariant systems. Both algorithms were developed based on the Generalized Policy Iteration technique and involve adaptation of two neural network structures namely Actor, providing the control signal, and Critic, performing evaluation of the control performance. Despite the similarities, the two adaptive algorithms differ in the manner in which the adaptation takes place, required knowledge on the system dynamics, and formulation of the persistence of excitation requirement. The main difference is that one algorithm uses sequential adaptation of the actor and critic structures, i.e. while one is trained the other one is kept constant, while for the second algorithm the two neural networks are trained synchronously in a continuous-time fashion. The two algorithms are described in detail and proof of convergence is provided. Simulation results of applying the two algorithms for finding the optimal state feedback controller of a nonlinear system are also presented.\",\"PeriodicalId\":422386,\"journal\":{\"name\":\"2009 17th Mediterranean Conference on Control and Automation\",\"volume\":\"136 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"51\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 17th Mediterranean Conference on Control and Automation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MED.2009.5164743\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 17th Mediterranean Conference on Control and Automation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MED.2009.5164743","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Adaptive optimal controllers based on Generalized Policy Iteration in a continuous-time framework
In this paper we present two adaptive algorithms which offer solution to the continuous-time optimal control problem for nonlinear, affine in the inputs, time-invariant systems. Both algorithms were developed based on the Generalized Policy Iteration technique and involve adaptation of two neural network structures namely Actor, providing the control signal, and Critic, performing evaluation of the control performance. Despite the similarities, the two adaptive algorithms differ in the manner in which the adaptation takes place, required knowledge on the system dynamics, and formulation of the persistence of excitation requirement. The main difference is that one algorithm uses sequential adaptation of the actor and critic structures, i.e. while one is trained the other one is kept constant, while for the second algorithm the two neural networks are trained synchronously in a continuous-time fashion. The two algorithms are described in detail and proof of convergence is provided. Simulation results of applying the two algorithms for finding the optimal state feedback controller of a nonlinear system are also presented.