Learning Stall Recovery Policies using a Soft Actor-Critic Algorithm with Smooth Reward Functions

2023 IEEE International Conference on Robotics and Biomimetics (ROBIO) Pub Date : 2023-12-04 DOI:10.1109/ROBIO58561.2023.10354940

Junqiu Wang, Jianmei Tan, Peng Lin, Chenguang Xing, Bo Liu

{"title":"Learning Stall Recovery Policies using a Soft Actor-Critic Algorithm with Smooth Reward Functions","authors":"Junqiu Wang, Jianmei Tan, Peng Lin, Chenguang Xing, Bo Liu","doi":"10.1109/ROBIO58561.2023.10354940","DOIUrl":null,"url":null,"abstract":"We propose an effective stall recovery learning approach based on a soft actor-critic algorithm with smooth reward functions. Stalling is extremely dangerous for aircraft and unmanned aerial vehicles (UAVs) because altitude decreases can result in fatal accidents. Stall recovery policies perform appropriate control sequences to save aircrafts from such lethal situations. Learning stall recovery policies using reinforcement learning methods is desirable because such policies can be learned automatically. However, stall recovery training is challenging since the interplay between an aircraft and its environment is very complicated. In this work, the proposed stall recovery learning approach yields better performance than other methods. We successfully apply smooth reward functions to the learning process because reward functions are critical for the convergence of policy learning. We achieve good performance by applying reward scaling to the soft actor-critic algorithm with automatic entropy learning. Experimental results demonstrate that stalls can be successfully recovered using the learned policies. The comparison results show that our method provides better results than previous algorithms.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"64 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBIO58561.2023.10354940","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We propose an effective stall recovery learning approach based on a soft actor-critic algorithm with smooth reward functions. Stalling is extremely dangerous for aircraft and unmanned aerial vehicles (UAVs) because altitude decreases can result in fatal accidents. Stall recovery policies perform appropriate control sequences to save aircrafts from such lethal situations. Learning stall recovery policies using reinforcement learning methods is desirable because such policies can be learned automatically. However, stall recovery training is challenging since the interplay between an aircraft and its environment is very complicated. In this work, the proposed stall recovery learning approach yields better performance than other methods. We successfully apply smooth reward functions to the learning process because reward functions are critical for the convergence of policy learning. We achieve good performance by applying reward scaling to the soft actor-critic algorithm with automatic entropy learning. Experimental results demonstrate that stalls can be successfully recovered using the learned policies. The comparison results show that our method provides better results than previous algorithms.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用具有平滑奖励函数的软代理批判算法学习失速恢复策略

我们提出了一种有效的失速恢复学习方法，该方法基于具有平滑奖励函数的软行为批评算法。失速对飞机和无人驾驶飞行器（UAV）来说极其危险，因为高度下降可能导致致命事故。失速恢复策略可以执行适当的控制顺序，将飞机从这种致命的情况中拯救出来。使用强化学习方法学习失速恢复策略是可取的，因为这种策略可以自动学习。然而，失速恢复训练具有挑战性，因为飞机与其环境之间的相互作用非常复杂。在这项工作中，所提出的失速恢复学习方法比其他方法产生了更好的性能。我们成功地将平滑奖励函数应用到了学习过程中，因为奖励函数对于策略学习的收敛性至关重要。我们将奖励缩放应用于具有自动熵学习功能的软演员批评算法，从而取得了良好的性能。实验结果表明，利用学习到的策略可以成功恢复停滞。对比结果表明，我们的方法比以前的算法效果更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)

自引率

0.00%

发文量