An Effective Training Method for Counterfactual Multi-Agent Policy Network Based on Differential Evolution Algorithm

Q1 Mathematics Applied Sciences Pub Date : 2024-09-18 DOI:10.3390/app14188383

Shaochun Qu, Ruiqi Guo, Zijian Cao, Jiawei Liu, Baolong Su, Minghao Liu

{"title":"An Effective Training Method for Counterfactual Multi-Agent Policy Network Based on Differential Evolution Algorithm","authors":"Shaochun Qu, Ruiqi Guo, Zijian Cao, Jiawei Liu, Baolong Su, Minghao Liu","doi":"10.3390/app14188383","DOIUrl":null,"url":null,"abstract":"Due to the advantages of a centralized critic to estimate the Q-function value and decentralized actors to optimize the agents’ policies, counterfactual multi-agent (COMA) stands out in most multi-agent reinforcement learning (MARL) algorithms. The sharing of policy parameters can improve sampling efficiency and learning effectiveness, but it may lead to a lack of policy diversity. Hence, to balance parameter sharing and diversity among agents in COMA has been a persistent research topic. In this paper, an effective training method for a COMA policy network based on a differential evolution (DE) algorithm is proposed, named DE-COMA. DE-COMA introduces individuals in a population as computational units to construct the policy network with operations such as mutation, crossover, and selection. The average return of DE-COMA is set as the fitness function, and the best individual of policy network will be chosen for the next generation. By maintaining better parameter sharing to enhance parameter diversity, multi-agent strategies will become more exploratory. To validate the effectiveness of DE-COMA, experiments were conducted in the StarCraft II environment with 2s_vs_1sc, 2s3z, 3m, and 8m battle scenarios. Experimental results demonstrate that DE-COMA significantly outperforms the traditional COMA and most other multi-agent reinforcement learning algorithms in terms of win rate and convergence speed.","PeriodicalId":8224,"journal":{"name":"Applied Sciences","volume":"52 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/app14188383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 0

Abstract

Due to the advantages of a centralized critic to estimate the Q-function value and decentralized actors to optimize the agents’ policies, counterfactual multi-agent (COMA) stands out in most multi-agent reinforcement learning (MARL) algorithms. The sharing of policy parameters can improve sampling efficiency and learning effectiveness, but it may lead to a lack of policy diversity. Hence, to balance parameter sharing and diversity among agents in COMA has been a persistent research topic. In this paper, an effective training method for a COMA policy network based on a differential evolution (DE) algorithm is proposed, named DE-COMA. DE-COMA introduces individuals in a population as computational units to construct the policy network with operations such as mutation, crossover, and selection. The average return of DE-COMA is set as the fitness function, and the best individual of policy network will be chosen for the next generation. By maintaining better parameter sharing to enhance parameter diversity, multi-agent strategies will become more exploratory. To validate the effectiveness of DE-COMA, experiments were conducted in the StarCraft II environment with 2s_vs_1sc, 2s3z, 3m, and 8m battle scenarios. Experimental results demonstrate that DE-COMA significantly outperforms the traditional COMA and most other multi-agent reinforcement learning algorithms in terms of win rate and convergence speed.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于差分进化算法的反事实多代理策略网络的有效训练方法

反事实多代理（COMA）具有集中式批评者估计 Q 函数值和分散式行动者优化代理策略的优势，因此在大多数多代理强化学习（MARL）算法中脱颖而出。共享策略参数可以提高采样效率和学习效果，但可能会导致策略多样性的缺乏。因此，如何平衡 COMA 中代理之间的参数共享和多样性一直是一个研究课题。本文提出了一种基于差分进化（DE）算法的 COMA 策略网络的有效训练方法，命名为 DE-COMA。DE-COMA 将种群中的个体作为计算单元，通过突变、交叉和选择等操作来构建策略网络。将 DE-COMA 的平均收益设定为适应度函数，并选择策略网络中的最佳个体作为下一代。通过保持更好的参数共享来增强参数多样性，多代理策略将变得更具探索性。为了验证 DE-COMA 的有效性，我们在《星际争霸 II》环境中进行了 2s_vs_1sc、2s3z、3m 和 8m 战斗场景的实验。实验结果表明，在胜率和收敛速度方面，DE-COMA 明显优于传统的 COMA 和其他大多数多代理强化学习算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Applied Sciences Mathematics-Applied Mathematics

CiteScore

6.40

自引率

0.00%

发文量

审稿时长

11 weeks

期刊介绍： APPS is an international journal. APPS covers a wide spectrum of pure and applied mathematics in science and technology, promoting especially papers presented at Carpato-Balkan meetings. The Editorial Board of APPS takes a very active role in selecting and refereeing papers, ensuring the best quality of contemporary mathematics and its applications. APPS is abstracted in Zentralblatt für Mathematik. The APPS journal uses Double blind peer review.