An Effective Training Method for Counterfactual Multi-Agent Policy Network Based on Differential Evolution Algorithm

Q1 Mathematics Applied Sciences Pub Date : 2024-09-18 DOI:10.3390/app14188383
Shaochun Qu, Ruiqi Guo, Zijian Cao, Jiawei Liu, Baolong Su, Minghao Liu
{"title":"An Effective Training Method for Counterfactual Multi-Agent Policy Network Based on Differential Evolution Algorithm","authors":"Shaochun Qu, Ruiqi Guo, Zijian Cao, Jiawei Liu, Baolong Su, Minghao Liu","doi":"10.3390/app14188383","DOIUrl":null,"url":null,"abstract":"Due to the advantages of a centralized critic to estimate the Q-function value and decentralized actors to optimize the agents’ policies, counterfactual multi-agent (COMA) stands out in most multi-agent reinforcement learning (MARL) algorithms. The sharing of policy parameters can improve sampling efficiency and learning effectiveness, but it may lead to a lack of policy diversity. Hence, to balance parameter sharing and diversity among agents in COMA has been a persistent research topic. In this paper, an effective training method for a COMA policy network based on a differential evolution (DE) algorithm is proposed, named DE-COMA. DE-COMA introduces individuals in a population as computational units to construct the policy network with operations such as mutation, crossover, and selection. The average return of DE-COMA is set as the fitness function, and the best individual of policy network will be chosen for the next generation. By maintaining better parameter sharing to enhance parameter diversity, multi-agent strategies will become more exploratory. To validate the effectiveness of DE-COMA, experiments were conducted in the StarCraft II environment with 2s_vs_1sc, 2s3z, 3m, and 8m battle scenarios. Experimental results demonstrate that DE-COMA significantly outperforms the traditional COMA and most other multi-agent reinforcement learning algorithms in terms of win rate and convergence speed.","PeriodicalId":8224,"journal":{"name":"Applied Sciences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/app14188383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

Abstract

Due to the advantages of a centralized critic to estimate the Q-function value and decentralized actors to optimize the agents’ policies, counterfactual multi-agent (COMA) stands out in most multi-agent reinforcement learning (MARL) algorithms. The sharing of policy parameters can improve sampling efficiency and learning effectiveness, but it may lead to a lack of policy diversity. Hence, to balance parameter sharing and diversity among agents in COMA has been a persistent research topic. In this paper, an effective training method for a COMA policy network based on a differential evolution (DE) algorithm is proposed, named DE-COMA. DE-COMA introduces individuals in a population as computational units to construct the policy network with operations such as mutation, crossover, and selection. The average return of DE-COMA is set as the fitness function, and the best individual of policy network will be chosen for the next generation. By maintaining better parameter sharing to enhance parameter diversity, multi-agent strategies will become more exploratory. To validate the effectiveness of DE-COMA, experiments were conducted in the StarCraft II environment with 2s_vs_1sc, 2s3z, 3m, and 8m battle scenarios. Experimental results demonstrate that DE-COMA significantly outperforms the traditional COMA and most other multi-agent reinforcement learning algorithms in terms of win rate and convergence speed.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于差分进化算法的反事实多代理策略网络的有效训练方法
反事实多代理(COMA)具有集中式批评者估计 Q 函数值和分散式行动者优化代理策略的优势,因此在大多数多代理强化学习(MARL)算法中脱颖而出。共享策略参数可以提高采样效率和学习效果,但可能会导致策略多样性的缺乏。因此,如何平衡 COMA 中代理之间的参数共享和多样性一直是一个研究课题。本文提出了一种基于差分进化(DE)算法的 COMA 策略网络的有效训练方法,命名为 DE-COMA。DE-COMA 将种群中的个体作为计算单元,通过突变、交叉和选择等操作来构建策略网络。将 DE-COMA 的平均收益设定为适应度函数,并选择策略网络中的最佳个体作为下一代。通过保持更好的参数共享来增强参数多样性,多代理策略将变得更具探索性。为了验证 DE-COMA 的有效性,我们在《星际争霸 II》环境中进行了 2s_vs_1sc、2s3z、3m 和 8m 战斗场景的实验。实验结果表明,在胜率和收敛速度方面,DE-COMA 明显优于传统的 COMA 和其他大多数多代理强化学习算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Applied Sciences
Applied Sciences Mathematics-Applied Mathematics
CiteScore
6.40
自引率
0.00%
发文量
0
审稿时长
11 weeks
期刊介绍: APPS is an international journal. APPS covers a wide spectrum of pure and applied mathematics in science and technology, promoting especially papers presented at Carpato-Balkan meetings. The Editorial Board of APPS takes a very active role in selecting and refereeing papers, ensuring the best quality of contemporary mathematics and its applications. APPS is abstracted in Zentralblatt für Mathematik. The APPS journal uses Double blind peer review.
期刊最新文献
The Effectiveness of Exercise Programs on Balance, Functional Ability, Quality of Life, and Depression in Progressive Supranuclear Palsy: A Case Study Application of Historical Comprehensive Multimodal Transportation Data for Testing the Commuting Time Paradox: Evidence from the Portland, OR Region Real-Time Optimization of Ancillary Service Allocation in Renewable Energy Microgrids Using Virtual Load Exploring the Association between Pro-Inflammation and the Early Diagnosis of Alzheimer’s Disease in Buccal Cells Using Immunocytochemistry and Machine Learning Techniques HumanEnerg Hotspot: Conceptual Design of an Agile Toolkit for Human Energy Reinforcement in Industry 5.0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1