{"title":"Online Optimal Control of Discrete-Time Systems Based on Globalized Dual Heuristic Programming with Eligibility Traces","authors":"J. Ye, Yougang Bian, Biao Xu, Z. Qin, Manjiang Hu","doi":"10.1109/IAI53119.2021.9619346","DOIUrl":null,"url":null,"abstract":"In this paper, an online adaptive dynamic programming (ADP) scheme that combines eligibility trace is presented for solving optimal control of discrete-time systems. In contrast with the forward view learning that requires to store additional vectors to update, the backward view learning of the proposed scheme employs online collected data and previous gradient information to update the neural network (NN) parameters at each step, which reduces the computational burden. In order to approximate the cost function more accurately to achieve a better policy improvement direction in the exploration process, the proposed algorithm introduces an independent costate network on the basis of the traditional HDP framework to approximate the costate function. By utilizing the costate as supplement information to estimate the cost function, the estimation accuracy has been greatly improved. Finally, two numerical examples are presented and the simulation results demonstrate the effectiveness and the advantage of computation efficiency of the presented method.","PeriodicalId":106675,"journal":{"name":"2021 3rd International Conference on Industrial Artificial Intelligence (IAI)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 3rd International Conference on Industrial Artificial Intelligence (IAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAI53119.2021.9619346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, an online adaptive dynamic programming (ADP) scheme that combines eligibility trace is presented for solving optimal control of discrete-time systems. In contrast with the forward view learning that requires to store additional vectors to update, the backward view learning of the proposed scheme employs online collected data and previous gradient information to update the neural network (NN) parameters at each step, which reduces the computational burden. In order to approximate the cost function more accurately to achieve a better policy improvement direction in the exploration process, the proposed algorithm introduces an independent costate network on the basis of the traditional HDP framework to approximate the costate function. By utilizing the costate as supplement information to estimate the cost function, the estimation accuracy has been greatly improved. Finally, two numerical examples are presented and the simulation results demonstrate the effectiveness and the advantage of computation efficiency of the presented method.