Cooperative Advantage Actor–Critic Reinforcement Learning for Multiagent Pursuit-Evasion Games on Communication Graphs

IEEE transactions on artificial intelligence Pub Date : 2024-07-23 DOI:10.1109/TAI.2024.3432511

Yizhen Meng;Chun Liu;Qiang Wang;Longyu Tan

{"title":"Cooperative Advantage Actor–Critic Reinforcement Learning for Multiagent Pursuit-Evasion Games on Communication Graphs","authors":"Yizhen Meng;Chun Liu;Qiang Wang;Longyu Tan","doi":"10.1109/TAI.2024.3432511","DOIUrl":null,"url":null,"abstract":"This article investigates the distributed optimal strategy problem in multiagent pursuit-evasion (MPE) games, striving for Nash equilibrium through the optimization of individual benefit matrices based on observations. To this end, a novel collaborative control scheme for MPE games using communication graphs is proposed. This scheme employs cooperative advantage actor–critic (A2C) reinforcement learning to facilitate collaborative capture by pursuers in a distributed manner while maintaining bounded system signals. The strategy orchestrates the actions of pursuers through adaptive neural network learning, ensuring proximity-based collaboration for effective captures. Meanwhile, evaders aim to evade collectively by converging toward each other. Through extensive simulations involving five pursuers and two evaders, the efficacy of the proposed approach is demonstrated, and pursuers seamlessly organize into pursuit units and capture evaders, validating the collaborative capture objective. This article represents a promising step toward effective and cooperative control strategies in MPE game scenarios.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6509-6523"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10606954/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This article investigates the distributed optimal strategy problem in multiagent pursuit-evasion (MPE) games, striving for Nash equilibrium through the optimization of individual benefit matrices based on observations. To this end, a novel collaborative control scheme for MPE games using communication graphs is proposed. This scheme employs cooperative advantage actor–critic (A2C) reinforcement learning to facilitate collaborative capture by pursuers in a distributed manner while maintaining bounded system signals. The strategy orchestrates the actions of pursuers through adaptive neural network learning, ensuring proximity-based collaboration for effective captures. Meanwhile, evaders aim to evade collectively by converging toward each other. Through extensive simulations involving five pursuers and two evaders, the efficacy of the proposed approach is demonstrated, and pursuers seamlessly organize into pursuit units and capture evaders, validating the collaborative capture objective. This article represents a promising step toward effective and cooperative control strategies in MPE game scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通信图上多智能体追逃博弈的合作优势行为-批判强化学习

本文研究了多智能体追逐-逃避博弈中的分布式最优策略问题，在观察的基础上，通过对个体利益矩阵的优化，力求达到纳什均衡。为此，提出了一种基于通信图的MPE游戏协同控制方案。该方案采用合作优势行为-批评（A2C）强化学习，在保持有界系统信号的同时，促进追踪者以分布式方式进行协作捕获。该策略通过自适应神经网络学习来协调追捕者的行动，确保基于邻近度的协作以实现有效捕获。同时，逃避者以相互趋同的方式进行集体逃避。通过涉及5个追踪者和2个逃避者的大量模拟，证明了所提出方法的有效性，并且追踪者无缝地组织成追捕单位和捕获逃避者，验证了协同捕获目标。这篇文章代表了在MPE游戏场景中朝着有效和合作控制策略迈出的有希望的一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊