基于近端策略优化的强化学习三维滑模拦截制导

IF 2.1 IEEE Journal on Miniaturization for Air and Space Systems Pub Date : 2023-10-17 DOI:10.1109/JMASS.2023.3325054

Jianguo Guo;Mengxuan Li;Zongyi Guo;Zhiyong She

{"title":"基于近端策略优化的强化学习三维滑模拦截制导","authors":"Jianguo Guo;Mengxuan Li;Zongyi Guo;Zhiyong She","doi":"10.1109/JMASS.2023.3325054","DOIUrl":null,"url":null,"abstract":"This article proposes a novel 3-D sliding mode interception guidance law for maneuvering targets, which explores the potential of reinforcement learning (RL) techniques to enhance guidance accuracy and reduce chattering. The guidance problem of intercepting maneuvering targets is abstracted into a Markov decision process whose reward function is established to estimate the off-target amount and line-of-sight angular rate chattering. Importantly, a design framework of reward function suitable for general guidance problems based on RL can be proposed. Then, the proximal policy optimization algorithm with a satisfactory training performance is introduced to learn an action policy which represents the observed engagements states to sliding mode interception guidance. Finally, numerical simulations and comparisons are conducted to demonstrate the effectiveness of the proposed guidance law.","PeriodicalId":100624,"journal":{"name":"IEEE Journal on Miniaturization for Air and Space Systems","volume":"4 4","pages":"423-430"},"PeriodicalIF":2.1000,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning-Based 3-D Sliding Mode Interception Guidance via Proximal Policy Optimization\",\"authors\":\"Jianguo Guo;Mengxuan Li;Zongyi Guo;Zhiyong She\",\"doi\":\"10.1109/JMASS.2023.3325054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article proposes a novel 3-D sliding mode interception guidance law for maneuvering targets, which explores the potential of reinforcement learning (RL) techniques to enhance guidance accuracy and reduce chattering. The guidance problem of intercepting maneuvering targets is abstracted into a Markov decision process whose reward function is established to estimate the off-target amount and line-of-sight angular rate chattering. Importantly, a design framework of reward function suitable for general guidance problems based on RL can be proposed. Then, the proximal policy optimization algorithm with a satisfactory training performance is introduced to learn an action policy which represents the observed engagements states to sliding mode interception guidance. Finally, numerical simulations and comparisons are conducted to demonstrate the effectiveness of the proposed guidance law.\",\"PeriodicalId\":100624,\"journal\":{\"name\":\"IEEE Journal on Miniaturization for Air and Space Systems\",\"volume\":\"4 4\",\"pages\":\"423-430\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2023-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal on Miniaturization for Air and Space Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10287104/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Miniaturization for Air and Space Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10287104/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种新的机动目标三维滑模拦截制导律，探索了强化学习技术在提高制导精度和减少抖振方面的潜力。将拦截机动目标的制导问题抽象为一个马尔可夫决策过程，并建立奖励函数来估计偏离目标量和视距角速率抖振。重要的是，可以提出一种适用于基于强化学习的一般制导问题的奖励函数设计框架。然后，引入训练性能满意的最近邻策略优化算法，学习一种表示观察到的交战状态的动作策略，用于滑模拦截制导。最后，通过数值仿真和比较验证了所提制导律的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Reinforcement Learning-Based 3-D Sliding Mode Interception Guidance via Proximal Policy Optimization

This article proposes a novel 3-D sliding mode interception guidance law for maneuvering targets, which explores the potential of reinforcement learning (RL) techniques to enhance guidance accuracy and reduce chattering. The guidance problem of intercepting maneuvering targets is abstracted into a Markov decision process whose reward function is established to estimate the off-target amount and line-of-sight angular rate chattering. Importantly, a design framework of reward function suitable for general guidance problems based on RL can be proposed. Then, the proximal policy optimization algorithm with a satisfactory training performance is introduced to learn an action policy which represents the observed engagements states to sliding mode interception guidance. Finally, numerical simulations and comparisons are conducted to demonstrate the effectiveness of the proposed guidance law.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊