Multi-Agent Uncertainty Sharing for Cooperative Multi-Agent Reinforcement Learning

Haoxing Chen, Guangkai Yang, Junge Zhang, Qiyue Yin, Kaiqi Huang
{"title":"Multi-Agent Uncertainty Sharing for Cooperative Multi-Agent Reinforcement Learning","authors":"Haoxing Chen, Guangkai Yang, Junge Zhang, Qiyue Yin, Kaiqi Huang","doi":"10.1109/IJCNN55064.2022.9891948","DOIUrl":null,"url":null,"abstract":"Cooperative multi-agent reinforcement learning has been considered promising to complete many complex cooperative tasks in the real world such as coordination of robot swarms and self-driving. To promote multi-agent cooperation, Centralized Training with Decentralized Execution emerges as a popular learning paradigm due to partial observability and communication constraints during execution and computational complexity in training. Value decomposition has been known to produce competitive performance to other methods in complex environment within this paradigm such as VDN and QMIX, which approximates the global joint Q-value function with multiple local individual Q-value functions. However, existing works often neglect the uncertainty of multiple agents resulting from the partial observability and very large action space in the multi-agent setting and can only obtain the sub-optimal policy. To alleviate the limitations above, building upon the value decomposition, we propose a novel method called multi-agent uncertainty sharing (MAUS). This method utilizes the Bayesian neural network to explicitly capture the uncertainty of all agents and combines with Thompson sampling to select actions for policy learning. Besides, we impose the uncertainty-sharing mechanism among agents to stabilize training as well as coordinate the behaviors of all the agents for multi-agent cooperation. Extensive experiments on the StarCraft Multi-Agent Challenge (SMAC) environment demonstrate that our approach achieves significant performance to exceed the prior baselines and verify the effectiveness of our method.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN55064.2022.9891948","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Cooperative multi-agent reinforcement learning has been considered promising to complete many complex cooperative tasks in the real world such as coordination of robot swarms and self-driving. To promote multi-agent cooperation, Centralized Training with Decentralized Execution emerges as a popular learning paradigm due to partial observability and communication constraints during execution and computational complexity in training. Value decomposition has been known to produce competitive performance to other methods in complex environment within this paradigm such as VDN and QMIX, which approximates the global joint Q-value function with multiple local individual Q-value functions. However, existing works often neglect the uncertainty of multiple agents resulting from the partial observability and very large action space in the multi-agent setting and can only obtain the sub-optimal policy. To alleviate the limitations above, building upon the value decomposition, we propose a novel method called multi-agent uncertainty sharing (MAUS). This method utilizes the Bayesian neural network to explicitly capture the uncertainty of all agents and combines with Thompson sampling to select actions for policy learning. Besides, we impose the uncertainty-sharing mechanism among agents to stabilize training as well as coordinate the behaviors of all the agents for multi-agent cooperation. Extensive experiments on the StarCraft Multi-Agent Challenge (SMAC) environment demonstrate that our approach achieves significant performance to exceed the prior baselines and verify the effectiveness of our method.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
协同多智能体强化学习的多智能体不确定性共享
协作式多智能体强化学习被认为有望完成现实世界中许多复杂的协作任务,如机器人群的协调和自动驾驶。为了促进多智能体合作,由于执行过程中的部分可观察性和通信限制以及训练中的计算复杂性,集中式训练与分散执行成为一种流行的学习范式。众所周知,在这种范式下,价值分解在复杂环境中产生的性能优于其他方法,如VDN和QMIX,它们用多个局部单个q值函数近似全局联合q值函数。然而,现有的工作往往忽略了多智能体设置中由于部分可观察性和非常大的动作空间而导致的多智能体的不确定性,只能得到次优策略。为了减轻上述局限性,在价值分解的基础上,我们提出了一种新的方法,称为多智能体不确定性共享(MAUS)。该方法利用贝叶斯神经网络显式捕获所有代理的不确定性,并结合汤普森抽样选择策略学习的行动。此外,我们在智能体之间引入不确定性共享机制以稳定训练,并协调所有智能体的行为以进行多智能体合作。在星际争霸多智能体挑战(SMAC)环境下的大量实验表明,我们的方法取得了显著的性能,超过了先前的基线,验证了我们方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Parameterization of Vector Symbolic Approach for Sequence Encoding Based Visual Place Recognition Nested compression of convolutional neural networks with Tucker-2 decomposition SQL-Rank++: A Novel Listwise Approach for Collaborative Ranking with Implicit Feedback ACTSS: Input Detection Defense against Backdoor Attacks via Activation Subset Scanning ADV-ResNet: Residual Network with Controlled Adversarial Regularization for Effective Classification of Practical Time Series Under Training Data Scarcity Problem
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1