{"title":"利用在线非政策积分强化学习的计算智能拦截指导法","authors":"Qi Wang, Zhizhong Liao","doi":"10.23919/jsee.2024.000067","DOIUrl":null,"url":null,"abstract":"Missile interception problem can be regarded as a two-person zero-sum differential games problem, which depends on the solution of Hamilton-Jacobi-Isaacs (HJI) equation. It has been proved impossible to obtain a closed-form solution due to the nonlinearity of HJI equation, and many iterative algorithms are proposed to solve the HJI equation. Simultaneous policy updating algorithm (SPUA) is an effective algorithm for solving HJI equation, but it is an on-policy integral reinforcement learning (IRL). For online implementation of SPUA, the disturbance signals need to be adjustable, which is unrealistic. In this paper, an off-policy IRL algorithm based on SPUA is proposed without making use of any knowledge of the systems dynamics. Then, a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is presented. Based on the online off-policy IRL method, a computational intelligence interception guidance (CIIG) law is developed for intercepting high-maneuvering target. As a model-free method, intercepting targets can be achieved through measuring system data online. The effectiveness of the CIIG is verified through two missile and target engagement scenarios.","PeriodicalId":50030,"journal":{"name":"Journal of Systems Engineering and Electronics","volume":"21 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Computational Intelligence Interception Guidance Law Using Online Off-Policy Integral Reinforcement Learning\",\"authors\":\"Qi Wang, Zhizhong Liao\",\"doi\":\"10.23919/jsee.2024.000067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Missile interception problem can be regarded as a two-person zero-sum differential games problem, which depends on the solution of Hamilton-Jacobi-Isaacs (HJI) equation. It has been proved impossible to obtain a closed-form solution due to the nonlinearity of HJI equation, and many iterative algorithms are proposed to solve the HJI equation. Simultaneous policy updating algorithm (SPUA) is an effective algorithm for solving HJI equation, but it is an on-policy integral reinforcement learning (IRL). For online implementation of SPUA, the disturbance signals need to be adjustable, which is unrealistic. In this paper, an off-policy IRL algorithm based on SPUA is proposed without making use of any knowledge of the systems dynamics. Then, a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is presented. Based on the online off-policy IRL method, a computational intelligence interception guidance (CIIG) law is developed for intercepting high-maneuvering target. As a model-free method, intercepting targets can be achieved through measuring system data online. The effectiveness of the CIIG is verified through two missile and target engagement scenarios.\",\"PeriodicalId\":50030,\"journal\":{\"name\":\"Journal of Systems Engineering and Electronics\",\"volume\":\"21 1\",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2024-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Engineering and Electronics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.23919/jsee.2024.000067\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Engineering and Electronics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.23919/jsee.2024.000067","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Computational Intelligence Interception Guidance Law Using Online Off-Policy Integral Reinforcement Learning
Missile interception problem can be regarded as a two-person zero-sum differential games problem, which depends on the solution of Hamilton-Jacobi-Isaacs (HJI) equation. It has been proved impossible to obtain a closed-form solution due to the nonlinearity of HJI equation, and many iterative algorithms are proposed to solve the HJI equation. Simultaneous policy updating algorithm (SPUA) is an effective algorithm for solving HJI equation, but it is an on-policy integral reinforcement learning (IRL). For online implementation of SPUA, the disturbance signals need to be adjustable, which is unrealistic. In this paper, an off-policy IRL algorithm based on SPUA is proposed without making use of any knowledge of the systems dynamics. Then, a neural-network based online adaptive critic implementation scheme of the off-policy IRL algorithm is presented. Based on the online off-policy IRL method, a computational intelligence interception guidance (CIIG) law is developed for intercepting high-maneuvering target. As a model-free method, intercepting targets can be achieved through measuring system data online. The effectiveness of the CIIG is verified through two missile and target engagement scenarios.