{"title":"具有近似误差的基于值迭代的q学习理论分析","authors":"Zhantao Liang, Mingming Ha, Derong Liu","doi":"10.1109/ICIST55546.2022.9926794","DOIUrl":null,"url":null,"abstract":"In this paper, the value-iteration-based Q-Iearning algorithm with approximation errors is analyzed theoretically. First, based on an upper bound of the approximation errors caused by the Q-function approximator, we get the lower and upper bound functions of the iterative Q-function, which proves that the limit of the approximate Q-function sequence is bounded. Then, we develop a stability condition for the termination of the iterative algorithm, for ensuring that the current control policy derived from the resulting approximate Q-function is stabilizing. Also, we establish an upper bound function of the approximation errors, which is caused by the policy function approximator, to guarantee that the approximate control policy is stabilizing. Finally, the numerical results verifies the theoretical results with a simulation example.","PeriodicalId":211213,"journal":{"name":"2022 12th International Conference on Information Science and Technology (ICIST)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Theoretical Analysis of Value-Iteration-Based Q-Learning with Approximation Errors\",\"authors\":\"Zhantao Liang, Mingming Ha, Derong Liu\",\"doi\":\"10.1109/ICIST55546.2022.9926794\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, the value-iteration-based Q-Iearning algorithm with approximation errors is analyzed theoretically. First, based on an upper bound of the approximation errors caused by the Q-function approximator, we get the lower and upper bound functions of the iterative Q-function, which proves that the limit of the approximate Q-function sequence is bounded. Then, we develop a stability condition for the termination of the iterative algorithm, for ensuring that the current control policy derived from the resulting approximate Q-function is stabilizing. Also, we establish an upper bound function of the approximation errors, which is caused by the policy function approximator, to guarantee that the approximate control policy is stabilizing. Finally, the numerical results verifies the theoretical results with a simulation example.\",\"PeriodicalId\":211213,\"journal\":{\"name\":\"2022 12th International Conference on Information Science and Technology (ICIST)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 12th International Conference on Information Science and Technology (ICIST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIST55546.2022.9926794\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Conference on Information Science and Technology (ICIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIST55546.2022.9926794","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Theoretical Analysis of Value-Iteration-Based Q-Learning with Approximation Errors
In this paper, the value-iteration-based Q-Iearning algorithm with approximation errors is analyzed theoretically. First, based on an upper bound of the approximation errors caused by the Q-function approximator, we get the lower and upper bound functions of the iterative Q-function, which proves that the limit of the approximate Q-function sequence is bounded. Then, we develop a stability condition for the termination of the iterative algorithm, for ensuring that the current control policy derived from the resulting approximate Q-function is stabilizing. Also, we establish an upper bound function of the approximation errors, which is caused by the policy function approximator, to guarantee that the approximate control policy is stabilizing. Finally, the numerical results verifies the theoretical results with a simulation example.