QTypeMix: Enhancing Multi-Agent Cooperative Strategies through Heterogeneous and Homogeneous Value Decomposition

Songchen Fu, Shaojing Zhao, Ta Li, YongHong Yan
{"title":"QTypeMix: Enhancing Multi-Agent Cooperative Strategies through Heterogeneous and Homogeneous Value Decomposition","authors":"Songchen Fu, Shaojing Zhao, Ta Li, YongHong Yan","doi":"arxiv-2408.07098","DOIUrl":null,"url":null,"abstract":"In multi-agent cooperative tasks, the presence of heterogeneous agents is\nfamiliar. Compared to cooperation among homogeneous agents, collaboration\nrequires considering the best-suited sub-tasks for each agent. However, the\noperation of multi-agent systems often involves a large amount of complex\ninteraction information, making it more challenging to learn heterogeneous\nstrategies. Related multi-agent reinforcement learning methods sometimes use\ngrouping mechanisms to form smaller cooperative groups or leverage prior domain\nknowledge to learn strategies for different roles. In contrast, agents should\nlearn deeper role features without relying on additional information.\nTherefore, we propose QTypeMix, which divides the value decomposition process\ninto homogeneous and heterogeneous stages. QTypeMix learns to extract type\nfeatures from local historical observations through the TE loss. In addition,\nwe introduce advanced network structures containing attention mechanisms and\nhypernets to enhance the representation capability and achieve the value\ndecomposition process. The results of testing the proposed method on 14 maps\nfrom SMAC and SMACv2 show that QTypeMix achieves state-of-the-art performance\nin tasks of varying difficulty.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.07098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In multi-agent cooperative tasks, the presence of heterogeneous agents is familiar. Compared to cooperation among homogeneous agents, collaboration requires considering the best-suited sub-tasks for each agent. However, the operation of multi-agent systems often involves a large amount of complex interaction information, making it more challenging to learn heterogeneous strategies. Related multi-agent reinforcement learning methods sometimes use grouping mechanisms to form smaller cooperative groups or leverage prior domain knowledge to learn strategies for different roles. In contrast, agents should learn deeper role features without relying on additional information. Therefore, we propose QTypeMix, which divides the value decomposition process into homogeneous and heterogeneous stages. QTypeMix learns to extract type features from local historical observations through the TE loss. In addition, we introduce advanced network structures containing attention mechanisms and hypernets to enhance the representation capability and achieve the value decomposition process. The results of testing the proposed method on 14 maps from SMAC and SMACv2 show that QTypeMix achieves state-of-the-art performance in tasks of varying difficulty.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
QTypeMix:通过异质和同质价值分解增强多代理合作策略
在多代理合作任务中,异质代理的存在并不陌生。与同质代理之间的合作相比,合作需要考虑最适合每个代理的子任务。然而,多代理系统的运行往往涉及大量复杂的交互信息,这使得学习异构策略更具挑战性。相关的多代理强化学习方法有时会使用分组机制来形成较小的合作小组,或利用先前的领域知识来学习不同角色的策略。因此,我们提出了 QTypeMix,它将价值分解过程分为同质和异质两个阶段。QTypeMix 通过 TE loss 学习从本地历史观测中提取类型特征。此外,我们还引入了包含注意力机制和超网络的高级网络结构,以增强表示能力并实现值分解过程。对来自 SMAC 和 SMACv2 的 14 幅地图的测试结果表明,QTypeMix 在不同难度的任务中都取得了最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning HARP: Human-Assisted Regrouping with Permutation Invariant Critic for Multi-Agent Reinforcement Learning On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark Multi-agent Path Finding in Continuous Environment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1