{"title":"基于自适应学习曲线调度的课程强化学习的固定翼飞机自动驾驶控制器","authors":"Lun Li;Xuebo Zhang;Chenxu Qian;Runhua Wang;Minghui Zhao","doi":"10.1109/TETCI.2024.3360322","DOIUrl":null,"url":null,"abstract":"In this paper, we present a novel curriculum reinforcement learning method that can automatically generate a high-performance autopilot controller for a 6-degree-of-freedom (6-DOF) aircraft with an unknown dynamic model, which is difficult to be handled using traditional control methods. In this method, a sigmoid-like learning curve is elegantly introduced to generate goals (the desired heading, altitude, and velocity) from easy to hard for autopilot. The shape of the learning curve can be intelligently adjusted to adapt to the training process of Proximal Policy Optimization (PPO). In addition, the conflict between multiple goals in autopilot training is solved by designing an adaptive reward function. Furthermore, the control inputs can avoid large oscillations by filtering the outputs from PPO with a first-order filter to ensure the smoothness. A series of simulation results show that the proposed method can not only observably improve the success rate and stability of training but also has superior performance in settling time and robustness compared with the traditional PID control and a state-of-the-art (SOTA) method. In the end, the applications of the controller, including the navigation task, pursuit-evasion, and dogfighting, are demonstrated to prove its feasibility to multiple tasks.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":5.3000,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Autopilot Controller of Fixed-Wing Planes Based on Curriculum Reinforcement Learning Scheduled by Adaptive Learning Curve\",\"authors\":\"Lun Li;Xuebo Zhang;Chenxu Qian;Runhua Wang;Minghui Zhao\",\"doi\":\"10.1109/TETCI.2024.3360322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present a novel curriculum reinforcement learning method that can automatically generate a high-performance autopilot controller for a 6-degree-of-freedom (6-DOF) aircraft with an unknown dynamic model, which is difficult to be handled using traditional control methods. In this method, a sigmoid-like learning curve is elegantly introduced to generate goals (the desired heading, altitude, and velocity) from easy to hard for autopilot. The shape of the learning curve can be intelligently adjusted to adapt to the training process of Proximal Policy Optimization (PPO). In addition, the conflict between multiple goals in autopilot training is solved by designing an adaptive reward function. Furthermore, the control inputs can avoid large oscillations by filtering the outputs from PPO with a first-order filter to ensure the smoothness. A series of simulation results show that the proposed method can not only observably improve the success rate and stability of training but also has superior performance in settling time and robustness compared with the traditional PID control and a state-of-the-art (SOTA) method. In the end, the applications of the controller, including the navigation task, pursuit-evasion, and dogfighting, are demonstrated to prove its feasibility to multiple tasks.\",\"PeriodicalId\":13135,\"journal\":{\"name\":\"IEEE Transactions on Emerging Topics in Computational Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2024-02-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Emerging Topics in Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10438517/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10438517/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Autopilot Controller of Fixed-Wing Planes Based on Curriculum Reinforcement Learning Scheduled by Adaptive Learning Curve
In this paper, we present a novel curriculum reinforcement learning method that can automatically generate a high-performance autopilot controller for a 6-degree-of-freedom (6-DOF) aircraft with an unknown dynamic model, which is difficult to be handled using traditional control methods. In this method, a sigmoid-like learning curve is elegantly introduced to generate goals (the desired heading, altitude, and velocity) from easy to hard for autopilot. The shape of the learning curve can be intelligently adjusted to adapt to the training process of Proximal Policy Optimization (PPO). In addition, the conflict between multiple goals in autopilot training is solved by designing an adaptive reward function. Furthermore, the control inputs can avoid large oscillations by filtering the outputs from PPO with a first-order filter to ensure the smoothness. A series of simulation results show that the proposed method can not only observably improve the success rate and stability of training but also has superior performance in settling time and robustness compared with the traditional PID control and a state-of-the-art (SOTA) method. In the end, the applications of the controller, including the navigation task, pursuit-evasion, and dogfighting, are demonstrated to prove its feasibility to multiple tasks.
期刊介绍:
The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys.
TETCI is an electronics only publication. TETCI publishes six issues per year.
Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.