Leveraging Joint-Action Embedding in Multiagent Reinforcement Learning for Cooperative Games

IF 1.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Games Pub Date : 2023-08-07 DOI:10.1109/TG.2023.3302694
Xingzhou Lou;Junge Zhang;Yali Du;Chao Yu;Zhaofeng He;Kaiqi Huang
{"title":"Leveraging Joint-Action Embedding in Multiagent Reinforcement Learning for Cooperative Games","authors":"Xingzhou Lou;Junge Zhang;Yali Du;Chao Yu;Zhaofeng He;Kaiqi Huang","doi":"10.1109/TG.2023.3302694","DOIUrl":null,"url":null,"abstract":"State-of-the-art multiagent policy gradient (MAPG) methods have demonstrated convincing capability in many cooperative games. However, the exponentially growing joint-action space severely challenges the critic's value evaluation and hinders performance of MAPG methods. To address this issue, we augment Central-Q policy gradient with a joint-action embedding function and propose mutual-information maximization MAPG (M3APG). The joint-action embedding function makes joint-actions contain information of state transitions, which will improve the critic's generalization over the joint-action space by allowing it to infer joint-actions' outcomes. We theoretically prove that with a fixed joint-action embedding function, the convergence of M3APG is guaranteed. Experiment results of the \n<italic>StarCraft</i>\n multiagent challenge (SMAC) demonstrate that M3APG gives evaluation results with better accuracy and outperform other MAPG basic models across various maps of multiple difficulty levels. We empirically show that our joint-action embedding model can be extended to value-based multiagent reinforcement learning methods and state-of-the-art MAPG methods. Finally, we run an ablation study to show that the usage of mutual information in our method is necessary and effective.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2023-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10210002/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

State-of-the-art multiagent policy gradient (MAPG) methods have demonstrated convincing capability in many cooperative games. However, the exponentially growing joint-action space severely challenges the critic's value evaluation and hinders performance of MAPG methods. To address this issue, we augment Central-Q policy gradient with a joint-action embedding function and propose mutual-information maximization MAPG (M3APG). The joint-action embedding function makes joint-actions contain information of state transitions, which will improve the critic's generalization over the joint-action space by allowing it to infer joint-actions' outcomes. We theoretically prove that with a fixed joint-action embedding function, the convergence of M3APG is guaranteed. Experiment results of the StarCraft multiagent challenge (SMAC) demonstrate that M3APG gives evaluation results with better accuracy and outperform other MAPG basic models across various maps of multiple difficulty levels. We empirically show that our joint-action embedding model can be extended to value-based multiagent reinforcement learning methods and state-of-the-art MAPG methods. Finally, we run an ablation study to show that the usage of mutual information in our method is necessary and effective.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用多代理强化学习中的联合行动嵌入来实现合作游戏
最先进的多代理策略梯度(MAPG)方法已在许多合作博弈中展现出令人信服的能力。然而,指数级增长的联合行动空间严重挑战了评论家的价值评估,阻碍了 MAPG 方法的性能。为了解决这个问题,我们用联合行动嵌入函数增强了 Central-Q 策略梯度,并提出了相互信息最大化 MAPG(M3APG)。联合行动嵌入函数使联合行动包含状态转换信息,这将通过允许批判者推断联合行动的结果来提高批判者在联合行动空间中的泛化能力。我们从理论上证明,在联合行动嵌入函数固定的情况下,M3APG 的收敛性是有保证的。星际争霸》多代理挑战赛(SMAC)的实验结果表明,M3APG 在各种难度的地图上都能给出更准确的评估结果,并优于其他 MAPG 基本模型。我们的经验表明,我们的联合行动嵌入模型可以扩展到基于价值的多代理强化学习方法和最先进的 MAPG 方法。最后,我们进行了一项消融研究,以证明在我们的方法中使用互信息是必要而有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Games
IEEE Transactions on Games Engineering-Electrical and Electronic Engineering
CiteScore
4.60
自引率
8.70%
发文量
87
期刊最新文献
Table of Contents IEEE Computational Intelligence Society Information IEEE Transactions on Games Publication Information Large Language Models and Games: A Survey and Roadmap Investigating Efficiency of Free-For-All Models in a Matchmaking Context
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1