{"title":"二对二超视距空战的混合奖励多智能体近端策略优化方法","authors":"Haojie Peng;Weihua Li;Sifan Dai;Ruihai Chen","doi":"10.1109/ICJECE.2024.3451965","DOIUrl":null,"url":null,"abstract":"With recent advances in airborne weapons, modern air combats tend to be accomplished in the beyond-visual-range (BVR) phase. Multiaircraft cooperation is also required to adapt to the complexities of modern air combats. The scale of the traditional rule-based expert system will become incredible in this case. In view of this, a mixed-reward multiagent proximal policy optimization (MRMAPPO) method is proposed in this article that is used to help train cooperative BVR air combat tactics via adversarial self-play. First, a two-on-two BVR air combat simulation platform is established, and the combat game is modeled as a Markov game. Second, centralized training with decentralized execution architecture is established. Multiple actors are involved in the architecture, each corresponding to a policy that generates a specified kind of command, e.g., the maneuvering and firing command. Moreover, in order to accelerate training as well as enhance the stability of the training process, four optimization mechanisms are introduced. The experimental section discusses how the effectiveness of the MRMAPPO is verified with comparative and ablation experiments, along with several air combat tactics that emerge in the training process.","PeriodicalId":100619,"journal":{"name":"IEEE Canadian Journal of Electrical and Computer Engineering","volume":"47 4","pages":"206-217"},"PeriodicalIF":2.1000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mixed-Reward Multiagent Proximal Policy Optimization Method for Two-on-Two Beyond-Visual-Range Air Combat\",\"authors\":\"Haojie Peng;Weihua Li;Sifan Dai;Ruihai Chen\",\"doi\":\"10.1109/ICJECE.2024.3451965\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With recent advances in airborne weapons, modern air combats tend to be accomplished in the beyond-visual-range (BVR) phase. Multiaircraft cooperation is also required to adapt to the complexities of modern air combats. The scale of the traditional rule-based expert system will become incredible in this case. In view of this, a mixed-reward multiagent proximal policy optimization (MRMAPPO) method is proposed in this article that is used to help train cooperative BVR air combat tactics via adversarial self-play. First, a two-on-two BVR air combat simulation platform is established, and the combat game is modeled as a Markov game. Second, centralized training with decentralized execution architecture is established. Multiple actors are involved in the architecture, each corresponding to a policy that generates a specified kind of command, e.g., the maneuvering and firing command. Moreover, in order to accelerate training as well as enhance the stability of the training process, four optimization mechanisms are introduced. The experimental section discusses how the effectiveness of the MRMAPPO is verified with comparative and ablation experiments, along with several air combat tactics that emerge in the training process.\",\"PeriodicalId\":100619,\"journal\":{\"name\":\"IEEE Canadian Journal of Electrical and Computer Engineering\",\"volume\":\"47 4\",\"pages\":\"206-217\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Canadian Journal of Electrical and Computer Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10688404/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Canadian Journal of Electrical and Computer Engineering","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10688404/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Mixed-Reward Multiagent Proximal Policy Optimization Method for Two-on-Two Beyond-Visual-Range Air Combat
With recent advances in airborne weapons, modern air combats tend to be accomplished in the beyond-visual-range (BVR) phase. Multiaircraft cooperation is also required to adapt to the complexities of modern air combats. The scale of the traditional rule-based expert system will become incredible in this case. In view of this, a mixed-reward multiagent proximal policy optimization (MRMAPPO) method is proposed in this article that is used to help train cooperative BVR air combat tactics via adversarial self-play. First, a two-on-two BVR air combat simulation platform is established, and the combat game is modeled as a Markov game. Second, centralized training with decentralized execution architecture is established. Multiple actors are involved in the architecture, each corresponding to a policy that generates a specified kind of command, e.g., the maneuvering and firing command. Moreover, in order to accelerate training as well as enhance the stability of the training process, four optimization mechanisms are introduced. The experimental section discusses how the effectiveness of the MRMAPPO is verified with comparative and ablation experiments, along with several air combat tactics that emerge in the training process.