Bahare Kiumarsi-Khomartash, F. Lewis, M. Naghibi-Sistani, A. Karimpour
{"title":"基于强化学习的线性离散系统最优跟踪控制","authors":"Bahare Kiumarsi-Khomartash, F. Lewis, M. Naghibi-Sistani, A. Karimpour","doi":"10.1109/CDC.2013.6760476","DOIUrl":null,"url":null,"abstract":"This paper presents an online solution to the infinite-horizon linear quadratic tracker (LQT) using reinforcement learning. It is first assumed that the value function for the LQT is quadratic in terms of the reference trajectory and the state of the system. Then, using the quadratic form of the value function, an augmented algebraic Riccati equation (ARE) is derived to solve the LQT. Using this formulation, both feedback and feedforward parts of the optimal control solution are obtained simultaneously by solving the augmented ARE. To find the solution to the augmented ARE online, policy iteration as a class of reinforcement learning algorithms, is employed. This algorithm is implemented on an actor-critic structure by using two neural networks and it does not need the knowledge of the drift system dynamics or the command generator dynamics. A simulation example shows that the proposed algorithm works for a system with partially unknown dynamics.","PeriodicalId":415568,"journal":{"name":"52nd IEEE Conference on Decision and Control","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"Optimal tracking control for linear discrete-time systems using reinforcement learning\",\"authors\":\"Bahare Kiumarsi-Khomartash, F. Lewis, M. Naghibi-Sistani, A. Karimpour\",\"doi\":\"10.1109/CDC.2013.6760476\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents an online solution to the infinite-horizon linear quadratic tracker (LQT) using reinforcement learning. It is first assumed that the value function for the LQT is quadratic in terms of the reference trajectory and the state of the system. Then, using the quadratic form of the value function, an augmented algebraic Riccati equation (ARE) is derived to solve the LQT. Using this formulation, both feedback and feedforward parts of the optimal control solution are obtained simultaneously by solving the augmented ARE. To find the solution to the augmented ARE online, policy iteration as a class of reinforcement learning algorithms, is employed. This algorithm is implemented on an actor-critic structure by using two neural networks and it does not need the knowledge of the drift system dynamics or the command generator dynamics. A simulation example shows that the proposed algorithm works for a system with partially unknown dynamics.\",\"PeriodicalId\":415568,\"journal\":{\"name\":\"52nd IEEE Conference on Decision and Control\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"52nd IEEE Conference on Decision and Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CDC.2013.6760476\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"52nd IEEE Conference on Decision and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CDC.2013.6760476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimal tracking control for linear discrete-time systems using reinforcement learning
This paper presents an online solution to the infinite-horizon linear quadratic tracker (LQT) using reinforcement learning. It is first assumed that the value function for the LQT is quadratic in terms of the reference trajectory and the state of the system. Then, using the quadratic form of the value function, an augmented algebraic Riccati equation (ARE) is derived to solve the LQT. Using this formulation, both feedback and feedforward parts of the optimal control solution are obtained simultaneously by solving the augmented ARE. To find the solution to the augmented ARE online, policy iteration as a class of reinforcement learning algorithms, is employed. This algorithm is implemented on an actor-critic structure by using two neural networks and it does not need the knowledge of the drift system dynamics or the command generator dynamics. A simulation example shows that the proposed algorithm works for a system with partially unknown dynamics.