Cong Huang, Chaozhe Wang, S. Chai, Qiqi Tong, Yong Li
{"title":"基于软角色评判算法的无人机躲避红外空空导弹策略研究","authors":"Cong Huang, Chaozhe Wang, S. Chai, Qiqi Tong, Yong Li","doi":"10.1145/3547578.3547602","DOIUrl":null,"url":null,"abstract":"Aiming at the policy problem of unmanned combat aerial vehicle (UCAV) evading infrared air-to-air missiles in close air combat, based on the establishment of an infrared offensive and defensive confrontation simulation system, the application of soft actor critic algorithm is studied to train the agent to learn the escape maneuver policy and decoy launching policy of UCAV to evade missiles. The three-dimensional coordinates of the missile in the UCAV body coordinate system and the number of remaining decoys are taken as the input states. The joystick, throttle stick control stroke and the decoy launching pulse are taken as the output actions. The dense reward composed of relative situational parameters and flight parameters and the sparse reward constituted by the result of the decoy interference and the result of the engagement are designed. The soft actor critic (SAC) algorithm is improved to adapt to the action space of mixed continuous action and discrete action, and finally the end-to-end UCAV escape maneuver policy and decoy launching policy from state input to control output is obtained. Through simulation verification, the escape rates of the UCAV with and without the decoys are compared, and the results show that the escape rate with escape maneuver policy realized by the agent can reach 59.0%, and the escape rate combined with the decoy launching policy will increase by 6.7%, finally the UCAV escape rate can reach 65.7%.","PeriodicalId":381600,"journal":{"name":"Proceedings of the 14th International Conference on Computer Modeling and Simulation","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on Evasion Policy of UCAV Against Infrared Air-to-Air Missile Based on Soft Actor Critic Algorithm\",\"authors\":\"Cong Huang, Chaozhe Wang, S. Chai, Qiqi Tong, Yong Li\",\"doi\":\"10.1145/3547578.3547602\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aiming at the policy problem of unmanned combat aerial vehicle (UCAV) evading infrared air-to-air missiles in close air combat, based on the establishment of an infrared offensive and defensive confrontation simulation system, the application of soft actor critic algorithm is studied to train the agent to learn the escape maneuver policy and decoy launching policy of UCAV to evade missiles. The three-dimensional coordinates of the missile in the UCAV body coordinate system and the number of remaining decoys are taken as the input states. The joystick, throttle stick control stroke and the decoy launching pulse are taken as the output actions. The dense reward composed of relative situational parameters and flight parameters and the sparse reward constituted by the result of the decoy interference and the result of the engagement are designed. The soft actor critic (SAC) algorithm is improved to adapt to the action space of mixed continuous action and discrete action, and finally the end-to-end UCAV escape maneuver policy and decoy launching policy from state input to control output is obtained. Through simulation verification, the escape rates of the UCAV with and without the decoys are compared, and the results show that the escape rate with escape maneuver policy realized by the agent can reach 59.0%, and the escape rate combined with the decoy launching policy will increase by 6.7%, finally the UCAV escape rate can reach 65.7%.\",\"PeriodicalId\":381600,\"journal\":{\"name\":\"Proceedings of the 14th International Conference on Computer Modeling and Simulation\",\"volume\":\"151 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 14th International Conference on Computer Modeling and Simulation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3547578.3547602\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th International Conference on Computer Modeling and Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3547578.3547602","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research on Evasion Policy of UCAV Against Infrared Air-to-Air Missile Based on Soft Actor Critic Algorithm
Aiming at the policy problem of unmanned combat aerial vehicle (UCAV) evading infrared air-to-air missiles in close air combat, based on the establishment of an infrared offensive and defensive confrontation simulation system, the application of soft actor critic algorithm is studied to train the agent to learn the escape maneuver policy and decoy launching policy of UCAV to evade missiles. The three-dimensional coordinates of the missile in the UCAV body coordinate system and the number of remaining decoys are taken as the input states. The joystick, throttle stick control stroke and the decoy launching pulse are taken as the output actions. The dense reward composed of relative situational parameters and flight parameters and the sparse reward constituted by the result of the decoy interference and the result of the engagement are designed. The soft actor critic (SAC) algorithm is improved to adapt to the action space of mixed continuous action and discrete action, and finally the end-to-end UCAV escape maneuver policy and decoy launching policy from state input to control output is obtained. Through simulation verification, the escape rates of the UCAV with and without the decoys are compared, and the results show that the escape rate with escape maneuver policy realized by the agent can reach 59.0%, and the escape rate combined with the decoy launching policy will increase by 6.7%, finally the UCAV escape rate can reach 65.7%.