Rocket Self-learning Control based on Lightweight Neural Network Architecture Search

2022 IEEE International Conference on Unmanned Systems (ICUS) Pub Date : 2022-10-28 DOI:10.1109/ICUS55513.2022.9986957

Zhaolei Wang, Kunfeng Lu, Chunmei Yu, Na Yao, Ludi Wang, Jikang Zhao

{"title":"Rocket Self-learning Control based on Lightweight Neural Network Architecture Search","authors":"Zhaolei Wang, Kunfeng Lu, Chunmei Yu, Na Yao, Ludi Wang, Jikang Zhao","doi":"10.1109/ICUS55513.2022.9986957","DOIUrl":null,"url":null,"abstract":"Aiming at the problem that the traditional control law design process is complex and relies heavily on accurate mathematical models, this paper uses the Deep Deterministic Policy Gradient (DDPG) reinforcement learning to realize the self-learning of continuous motion control law. However, since the performance of the DDPG algorithm depends heavily on the hyper-parameters, there is no clear design basis for the Actor-Critic framework neural network architecture. Considering that the reinforcement learning requires a large amount of computation, the repetitive manual trial and error of hyper-parameters greatly reduces the design efficiency of the algorithm and increases labor costs. On the basis of converting the network architecture design problem into a graph topology generation problem, an automatic search and optimization framework for deep reinforcement learning neural network structure is given in this paper, where the graph topology generation algorithm based on LSTM recurrent neural network, the weight sharing-based lightweight training and evaluation mechanism of deep reinforcement network parameter, and the policy gradient-based learning algorithm of graph topology generator parameter are innovatively combined. Thus, the neural network hyper-parameters in the DDPG algorithm are automatically optimized, and the control law is obtained by self-learning training. Finally, taking rocket vertical recovery control as an ex-ample, the effectiveness of the proposed method is verified.","PeriodicalId":345773,"journal":{"name":"2022 IEEE International Conference on Unmanned Systems (ICUS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Unmanned Systems (ICUS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICUS55513.2022.9986957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Aiming at the problem that the traditional control law design process is complex and relies heavily on accurate mathematical models, this paper uses the Deep Deterministic Policy Gradient (DDPG) reinforcement learning to realize the self-learning of continuous motion control law. However, since the performance of the DDPG algorithm depends heavily on the hyper-parameters, there is no clear design basis for the Actor-Critic framework neural network architecture. Considering that the reinforcement learning requires a large amount of computation, the repetitive manual trial and error of hyper-parameters greatly reduces the design efficiency of the algorithm and increases labor costs. On the basis of converting the network architecture design problem into a graph topology generation problem, an automatic search and optimization framework for deep reinforcement learning neural network structure is given in this paper, where the graph topology generation algorithm based on LSTM recurrent neural network, the weight sharing-based lightweight training and evaluation mechanism of deep reinforcement network parameter, and the policy gradient-based learning algorithm of graph topology generator parameter are innovatively combined. Thus, the neural network hyper-parameters in the DDPG algorithm are automatically optimized, and the control law is obtained by self-learning training. Finally, taking rocket vertical recovery control as an ex-ample, the effectiveness of the proposed method is verified.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于轻量级神经网络结构搜索的火箭自学习控制

针对传统控制律设计过程复杂且严重依赖精确数学模型的问题，采用深度确定性策略梯度(Deep Deterministic Policy Gradient, DDPG)强化学习方法实现连续运动控制律的自学习。然而，由于DDPG算法的性能严重依赖于超参数，因此Actor-Critic框架神经网络架构没有明确的设计基础。考虑到强化学习需要大量的计算量，超参数的重复人工试错大大降低了算法的设计效率，增加了人工成本。在将网络架构设计问题转化为图拓扑生成问题的基础上，给出了深度强化学习神经网络结构的自动搜索与优化框架，其中基于LSTM递归神经网络的图拓扑生成算法、基于权值共享的深度强化网络参数轻量训练与评价机制、创新地结合了基于策略梯度的图拓扑生成器参数学习算法。从而自动优化DDPG算法中的神经网络超参数，并通过自学习训练得到控制律。最后，以火箭垂直回收控制为例，验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE International Conference on Unmanned Systems (ICUS)

自引率

0.00%

发文量