Learning Stall Recovery Policies using a Soft Actor-Critic Algorithm with Smooth Reward Functions

Junqiu Wang, Jianmei Tan, Peng Lin, Chenguang Xing, Bo Liu
{"title":"Learning Stall Recovery Policies using a Soft Actor-Critic Algorithm with Smooth Reward Functions","authors":"Junqiu Wang, Jianmei Tan, Peng Lin, Chenguang Xing, Bo Liu","doi":"10.1109/ROBIO58561.2023.10354940","DOIUrl":null,"url":null,"abstract":"We propose an effective stall recovery learning approach based on a soft actor-critic algorithm with smooth reward functions. Stalling is extremely dangerous for aircraft and unmanned aerial vehicles (UAVs) because altitude decreases can result in fatal accidents. Stall recovery policies perform appropriate control sequences to save aircrafts from such lethal situations. Learning stall recovery policies using reinforcement learning methods is desirable because such policies can be learned automatically. However, stall recovery training is challenging since the interplay between an aircraft and its environment is very complicated. In this work, the proposed stall recovery learning approach yields better performance than other methods. We successfully apply smooth reward functions to the learning process because reward functions are critical for the convergence of policy learning. We achieve good performance by applying reward scaling to the soft actor-critic algorithm with automatic entropy learning. Experimental results demonstrate that stalls can be successfully recovered using the learned policies. The comparison results show that our method provides better results than previous algorithms.","PeriodicalId":505134,"journal":{"name":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","volume":"64 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Robotics and Biomimetics (ROBIO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBIO58561.2023.10354940","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We propose an effective stall recovery learning approach based on a soft actor-critic algorithm with smooth reward functions. Stalling is extremely dangerous for aircraft and unmanned aerial vehicles (UAVs) because altitude decreases can result in fatal accidents. Stall recovery policies perform appropriate control sequences to save aircrafts from such lethal situations. Learning stall recovery policies using reinforcement learning methods is desirable because such policies can be learned automatically. However, stall recovery training is challenging since the interplay between an aircraft and its environment is very complicated. In this work, the proposed stall recovery learning approach yields better performance than other methods. We successfully apply smooth reward functions to the learning process because reward functions are critical for the convergence of policy learning. We achieve good performance by applying reward scaling to the soft actor-critic algorithm with automatic entropy learning. Experimental results demonstrate that stalls can be successfully recovered using the learned policies. The comparison results show that our method provides better results than previous algorithms.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用具有平滑奖励函数的软代理批判算法学习失速恢复策略
我们提出了一种有效的失速恢复学习方法,该方法基于具有平滑奖励函数的软行为批评算法。失速对飞机和无人驾驶飞行器(UAV)来说极其危险,因为高度下降可能导致致命事故。失速恢复策略可以执行适当的控制顺序,将飞机从这种致命的情况中拯救出来。使用强化学习方法学习失速恢复策略是可取的,因为这种策略可以自动学习。然而,失速恢复训练具有挑战性,因为飞机与其环境之间的相互作用非常复杂。在这项工作中,所提出的失速恢复学习方法比其他方法产生了更好的性能。我们成功地将平滑奖励函数应用到了学习过程中,因为奖励函数对于策略学习的收敛性至关重要。我们将奖励缩放应用于具有自动熵学习功能的软演员批评算法,从而取得了良好的性能。实验结果表明,利用学习到的策略可以成功恢复停滞。对比结果表明,我们的方法比以前的算法效果更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Barometric Soft Tactile Sensor for Depth Independent Contact Localization Stability Margin Based Gait Design on Slopes for a Novel Reconfigurable Quadruped Robot with a Foldable Trunk Blind Walking Balance Control and Disturbance Rejection of the Bipedal Humanoid Robot Xiao-Man via Reinforcement Learning A Closed-Loop Multi-perspective Visual Servoing Approach with Reinforcement Learning Modeling and Analysis of Pipe External Surface Grinding Force using Cup-shaped Wire Brush
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1