基于强化学习和ADP的四旋翼飞行器飞行控制研究

2022 8th Annual International Conference on Network and Information Systems for Computers (ICNISC) Pub Date : 2022-09-01 DOI:10.1109/ICNISC57059.2022.00061

Xueyuan Li, Wentao Xie, Wentao Zhan

{"title":"基于强化学习和ADP的四旋翼飞行器飞行控制研究","authors":"Xueyuan Li, Wentao Xie, Wentao Zhan","doi":"10.1109/ICNISC57059.2022.00061","DOIUrl":null,"url":null,"abstract":"This paper studies the application of Lookup-Table reinforcement learning method into the continuous state space control of quadrotor simulator and designs a attitude controller for the quadrotor simulator based on Q-learning; for the improvement of defects concerning difficulty in the learning algorithm's convergence and low efficiency in learning when Q-learning is faced with large-scale and continuous-space optimized decision, the method of kernel approximate dynamic programming is introduced, Kernel-based Least-Squares Policy Iteration (KLSPI) is proposed, and a controller for the quadrotor simulator is designed based on this algorithm. The experiment shows that the reinforcement learning control method is of fast convergence speed, small steady-state error, strong adaptive ability and good control effect; when dealing with the problem of continuous state space, the Least-Squares Policy Iteration can converge better strategies with fewer training data compared with the traditional method of discretizing state space first.","PeriodicalId":286467,"journal":{"name":"2022 8th Annual International Conference on Network and Information Systems for Computers (ICNISC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"The Research of Quadrotor Flight Control Based on Reinforcement Learning and ADP\",\"authors\":\"Xueyuan Li, Wentao Xie, Wentao Zhan\",\"doi\":\"10.1109/ICNISC57059.2022.00061\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper studies the application of Lookup-Table reinforcement learning method into the continuous state space control of quadrotor simulator and designs a attitude controller for the quadrotor simulator based on Q-learning; for the improvement of defects concerning difficulty in the learning algorithm's convergence and low efficiency in learning when Q-learning is faced with large-scale and continuous-space optimized decision, the method of kernel approximate dynamic programming is introduced, Kernel-based Least-Squares Policy Iteration (KLSPI) is proposed, and a controller for the quadrotor simulator is designed based on this algorithm. The experiment shows that the reinforcement learning control method is of fast convergence speed, small steady-state error, strong adaptive ability and good control effect; when dealing with the problem of continuous state space, the Least-Squares Policy Iteration can converge better strategies with fewer training data compared with the traditional method of discretizing state space first.\",\"PeriodicalId\":286467,\"journal\":{\"name\":\"2022 8th Annual International Conference on Network and Information Systems for Computers (ICNISC)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 8th Annual International Conference on Network and Information Systems for Computers (ICNISC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNISC57059.2022.00061\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 8th Annual International Conference on Network and Information Systems for Computers (ICNISC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNISC57059.2022.00061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

研究了查找表强化学习方法在四旋翼模拟器连续状态空间控制中的应用，设计了一种基于q学习的四旋翼模拟器姿态控制器;针对q -学习面对大规模连续空间优化决策时学习算法收敛困难、学习效率低的缺陷，引入核近似动态规划方法，提出了基于核的最小二乘策略迭代(KLSPI)，并基于该算法设计了四旋翼模拟器控制器。实验表明，强化学习控制方法收敛速度快，稳态误差小，自适应能力强，控制效果好;在处理连续状态空间问题时，与传统的先离散状态空间的方法相比，最小二乘策略迭代可以在训练数据较少的情况下更好地收敛策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The Research of Quadrotor Flight Control Based on Reinforcement Learning and ADP

This paper studies the application of Lookup-Table reinforcement learning method into the continuous state space control of quadrotor simulator and designs a attitude controller for the quadrotor simulator based on Q-learning; for the improvement of defects concerning difficulty in the learning algorithm's convergence and low efficiency in learning when Q-learning is faced with large-scale and continuous-space optimized decision, the method of kernel approximate dynamic programming is introduced, Kernel-based Least-Squares Policy Iteration (KLSPI) is proposed, and a controller for the quadrotor simulator is designed based on this algorithm. The experiment shows that the reinforcement learning control method is of fast convergence speed, small steady-state error, strong adaptive ability and good control effect; when dealing with the problem of continuous state space, the Least-Squares Policy Iteration can converge better strategies with fewer training data compared with the traditional method of discretizing state space first.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 8th Annual International Conference on Network and Information Systems for Computers (ICNISC)

自引率

0.00%

发文量