Zhaolei Wang, Kunfeng Lu, Chunmei Yu, Na Yao, Ludi Wang, Jikang Zhao
{"title":"Rocket Self-learning Control based on Lightweight Neural Network Architecture Search","authors":"Zhaolei Wang, Kunfeng Lu, Chunmei Yu, Na Yao, Ludi Wang, Jikang Zhao","doi":"10.1109/ICUS55513.2022.9986957","DOIUrl":null,"url":null,"abstract":"Aiming at the problem that the traditional control law design process is complex and relies heavily on accurate mathematical models, this paper uses the Deep Deterministic Policy Gradient (DDPG) reinforcement learning to realize the self-learning of continuous motion control law. However, since the performance of the DDPG algorithm depends heavily on the hyper-parameters, there is no clear design basis for the Actor-Critic framework neural network architecture. Considering that the reinforcement learning requires a large amount of computation, the repetitive manual trial and error of hyper-parameters greatly reduces the design efficiency of the algorithm and increases labor costs. On the basis of converting the network architecture design problem into a graph topology generation problem, an automatic search and optimization framework for deep reinforcement learning neural network structure is given in this paper, where the graph topology generation algorithm based on LSTM recurrent neural network, the weight sharing-based lightweight training and evaluation mechanism of deep reinforcement network parameter, and the policy gradient-based learning algorithm of graph topology generator parameter are innovatively combined. Thus, the neural network hyper-parameters in the DDPG algorithm are automatically optimized, and the control law is obtained by self-learning training. Finally, taking rocket vertical recovery control as an ex-ample, the effectiveness of the proposed method is verified.","PeriodicalId":345773,"journal":{"name":"2022 IEEE International Conference on Unmanned Systems (ICUS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Unmanned Systems (ICUS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICUS55513.2022.9986957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Aiming at the problem that the traditional control law design process is complex and relies heavily on accurate mathematical models, this paper uses the Deep Deterministic Policy Gradient (DDPG) reinforcement learning to realize the self-learning of continuous motion control law. However, since the performance of the DDPG algorithm depends heavily on the hyper-parameters, there is no clear design basis for the Actor-Critic framework neural network architecture. Considering that the reinforcement learning requires a large amount of computation, the repetitive manual trial and error of hyper-parameters greatly reduces the design efficiency of the algorithm and increases labor costs. On the basis of converting the network architecture design problem into a graph topology generation problem, an automatic search and optimization framework for deep reinforcement learning neural network structure is given in this paper, where the graph topology generation algorithm based on LSTM recurrent neural network, the weight sharing-based lightweight training and evaluation mechanism of deep reinforcement network parameter, and the policy gradient-based learning algorithm of graph topology generator parameter are innovatively combined. Thus, the neural network hyper-parameters in the DDPG algorithm are automatically optimized, and the control law is obtained by self-learning training. Finally, taking rocket vertical recovery control as an ex-ample, the effectiveness of the proposed method is verified.