基于自适应学习曲线调度的课程强化学习的固定翼飞机自动驾驶控制器

IF 5.3 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Emerging Topics in Computational Intelligence Pub Date : 2024-02-16 DOI:10.1109/TETCI.2024.3360322

Lun Li;Xuebo Zhang;Chenxu Qian;Runhua Wang;Minghui Zhao

{"title":"基于自适应学习曲线调度的课程强化学习的固定翼飞机自动驾驶控制器","authors":"Lun Li;Xuebo Zhang;Chenxu Qian;Runhua Wang;Minghui Zhao","doi":"10.1109/TETCI.2024.3360322","DOIUrl":null,"url":null,"abstract":"In this paper, we present a novel curriculum reinforcement learning method that can automatically generate a high-performance autopilot controller for a 6-degree-of-freedom (6-DOF) aircraft with an unknown dynamic model, which is difficult to be handled using traditional control methods. In this method, a sigmoid-like learning curve is elegantly introduced to generate goals (the desired heading, altitude, and velocity) from easy to hard for autopilot. The shape of the learning curve can be intelligently adjusted to adapt to the training process of Proximal Policy Optimization (PPO). In addition, the conflict between multiple goals in autopilot training is solved by designing an adaptive reward function. Furthermore, the control inputs can avoid large oscillations by filtering the outputs from PPO with a first-order filter to ensure the smoothness. A series of simulation results show that the proposed method can not only observably improve the success rate and stability of training but also has superior performance in settling time and robustness compared with the traditional PID control and a state-of-the-art (SOTA) method. In the end, the applications of the controller, including the navigation task, pursuit-evasion, and dogfighting, are demonstrated to prove its feasibility to multiple tasks.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":5.3000,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Autopilot Controller of Fixed-Wing Planes Based on Curriculum Reinforcement Learning Scheduled by Adaptive Learning Curve\",\"authors\":\"Lun Li;Xuebo Zhang;Chenxu Qian;Runhua Wang;Minghui Zhao\",\"doi\":\"10.1109/TETCI.2024.3360322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present a novel curriculum reinforcement learning method that can automatically generate a high-performance autopilot controller for a 6-degree-of-freedom (6-DOF) aircraft with an unknown dynamic model, which is difficult to be handled using traditional control methods. In this method, a sigmoid-like learning curve is elegantly introduced to generate goals (the desired heading, altitude, and velocity) from easy to hard for autopilot. The shape of the learning curve can be intelligently adjusted to adapt to the training process of Proximal Policy Optimization (PPO). In addition, the conflict between multiple goals in autopilot training is solved by designing an adaptive reward function. Furthermore, the control inputs can avoid large oscillations by filtering the outputs from PPO with a first-order filter to ensure the smoothness. A series of simulation results show that the proposed method can not only observably improve the success rate and stability of training but also has superior performance in settling time and robustness compared with the traditional PID control and a state-of-the-art (SOTA) method. In the end, the applications of the controller, including the navigation task, pursuit-evasion, and dogfighting, are demonstrated to prove its feasibility to multiple tasks.\",\"PeriodicalId\":13135,\"journal\":{\"name\":\"IEEE Transactions on Emerging Topics in Computational Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2024-02-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Emerging Topics in Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10438517/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10438517/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种新颖的课程强化学习方法，可为具有未知动态模型的 6 自由度（6-DOF）飞机自动生成高性能自动驾驶控制器，而传统的控制方法很难处理这一问题。在这种方法中，优雅地引入了一条类似于西格玛的学习曲线，为自动驾驶仪生成由易到难的目标（所需航向、高度和速度）。学习曲线的形状可以智能调整，以适应近端策略优化（PPO）的训练过程。此外，通过设计自适应奖励函数，解决了自动驾驶训练中多个目标之间的冲突。此外，通过对 PPO 的输出进行一阶滤波以确保平滑性，从而避免了控制输入的大幅振荡。一系列仿真结果表明，与传统的 PID 控制和最先进的（SOTA）方法相比，所提出的方法不仅能明显提高训练的成功率和稳定性，而且在平稳时间和鲁棒性方面也有更出色的表现。最后，还演示了该控制器的应用，包括导航任务、追击-规避和斗犬，以证明其在多种任务中的可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Autopilot Controller of Fixed-Wing Planes Based on Curriculum Reinforcement Learning Scheduled by Adaptive Learning Curve

In this paper, we present a novel curriculum reinforcement learning method that can automatically generate a high-performance autopilot controller for a 6-degree-of-freedom (6-DOF) aircraft with an unknown dynamic model, which is difficult to be handled using traditional control methods. In this method, a sigmoid-like learning curve is elegantly introduced to generate goals (the desired heading, altitude, and velocity) from easy to hard for autopilot. The shape of the learning curve can be intelligently adjusted to adapt to the training process of Proximal Policy Optimization (PPO). In addition, the conflict between multiple goals in autopilot training is solved by designing an adaptive reward function. Furthermore, the control inputs can avoid large oscillations by filtering the outputs from PPO with a first-order filter to ensure the smoothness. A series of simulation results show that the proposed method can not only observably improve the success rate and stability of training but also has superior performance in settling time and robustness compared with the traditional PID control and a state-of-the-art (SOTA) method. In the end, the applications of the controller, including the navigation task, pursuit-evasion, and dogfighting, are demonstrated to prove its feasibility to multiple tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Emerging Topics in Computational Intelligence Mathematics-Control and Optimization

CiteScore

10.30

自引率

7.50%

发文量

147

期刊介绍： The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys. TETCI is an electronics only publication. TETCI publishes six issues per year. Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.

期刊最新文献

Table of Contents IEEE Computational Intelligence Society Information IEEE Transactions on Emerging Topics in Computational Intelligence Information for Authors IEEE Transactions on Emerging Topics in Computational Intelligence Publication Information A Novel Multi-Source Information Fusion Method Based on Dependency Interval