A deep reinforcement learning-based approach to onboard trajectory generation for hypersonic vehicles

C. Bao, X. Zhou, P. Wang, R. He, G. Tang
{"title":"A deep reinforcement learning-based approach to onboard trajectory generation for hypersonic vehicles","authors":"C. Bao, X. Zhou, P. Wang, R. He, G. Tang","doi":"10.1017/aer.2023.4","DOIUrl":null,"url":null,"abstract":"\n An onboard three-dimensional (3D) trajectory generation approach based on the reinforcement learning (RL) algorithm and deep neural network (DNN) is proposed for hypersonic vehicles in glide phase. Multiple trajectory samples are generated offline through the convex optimisation method. The deep learning (DL) is employed to pre-train the DNN for initialising the actor network and accelerating the RL process. Based on the offline deep policy deterministic actor-critic algorithm, a flight target-oriented reward function with path constraints is designed. The actor network is optimised by the end-to-end RL and policy gradients of the critic network until the reward function converges to the maximum. The actor network is considered as the onboard trajectory generator to compute optimal control values online based on the real-time motion states. The simulation results show that the single-step online planning time meets the real-time requirements of onboard trajectory generation. The significant improvement in terminal accuracy of the online trajectory and the better generalisation under biased initial states for hypersonic vehicles in glide phase is observed.","PeriodicalId":22567,"journal":{"name":"The Aeronautical Journal (1968)","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Aeronautical Journal (1968)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/aer.2023.4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

An onboard three-dimensional (3D) trajectory generation approach based on the reinforcement learning (RL) algorithm and deep neural network (DNN) is proposed for hypersonic vehicles in glide phase. Multiple trajectory samples are generated offline through the convex optimisation method. The deep learning (DL) is employed to pre-train the DNN for initialising the actor network and accelerating the RL process. Based on the offline deep policy deterministic actor-critic algorithm, a flight target-oriented reward function with path constraints is designed. The actor network is optimised by the end-to-end RL and policy gradients of the critic network until the reward function converges to the maximum. The actor network is considered as the onboard trajectory generator to compute optimal control values online based on the real-time motion states. The simulation results show that the single-step online planning time meets the real-time requirements of onboard trajectory generation. The significant improvement in terminal accuracy of the online trajectory and the better generalisation under biased initial states for hypersonic vehicles in glide phase is observed.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于深度强化学习的高超声速飞行器机载轨迹生成方法
提出了一种基于强化学习(RL)算法和深度神经网络(DNN)的高超声速飞行器滑翔阶段机载三维轨迹生成方法。通过凸优化方法离线生成多个轨迹样本。采用深度学习(DL)对深度神经网络进行预训练,初始化行动者网络,加速强化学习过程。基于离线深度策略确定性行为者批评算法,设计了一个带路径约束的飞行目标导向奖励函数。行动者网络通过端到端强化学习和评论家网络的策略梯度进行优化,直到奖励函数收敛到最大值。行动者网络作为机载轨迹生成器,根据实时运动状态在线计算最优控制值。仿真结果表明,单步在线规划时间满足机载弹道生成的实时性要求。研究结果表明,该方法显著提高了高超声速飞行器滑翔段初始偏置状态下在线轨迹的末端精度和泛化性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Spray behaviour of hydro-treated ester fatty acids fuel made from used cooking oil at low injection pressures Visualising flight regimes using self-organising maps A folding wing system for guided ammunitions: mechanism design, manufacturing and real-time results with LQR, LQI, SMC and SOSMC Re-entry vehicle performance analysis under the control of lateral jet Spacecraft attitude control based on generalised dynamic inversion with adaptive neural network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1