{"title":"Multi-Agent Uncertainty Sharing for Cooperative Multi-Agent Reinforcement Learning","authors":"Haoxing Chen, Guangkai Yang, Junge Zhang, Qiyue Yin, Kaiqi Huang","doi":"10.1109/IJCNN55064.2022.9891948","DOIUrl":null,"url":null,"abstract":"Cooperative multi-agent reinforcement learning has been considered promising to complete many complex cooperative tasks in the real world such as coordination of robot swarms and self-driving. To promote multi-agent cooperation, Centralized Training with Decentralized Execution emerges as a popular learning paradigm due to partial observability and communication constraints during execution and computational complexity in training. Value decomposition has been known to produce competitive performance to other methods in complex environment within this paradigm such as VDN and QMIX, which approximates the global joint Q-value function with multiple local individual Q-value functions. However, existing works often neglect the uncertainty of multiple agents resulting from the partial observability and very large action space in the multi-agent setting and can only obtain the sub-optimal policy. To alleviate the limitations above, building upon the value decomposition, we propose a novel method called multi-agent uncertainty sharing (MAUS). This method utilizes the Bayesian neural network to explicitly capture the uncertainty of all agents and combines with Thompson sampling to select actions for policy learning. Besides, we impose the uncertainty-sharing mechanism among agents to stabilize training as well as coordinate the behaviors of all the agents for multi-agent cooperation. Extensive experiments on the StarCraft Multi-Agent Challenge (SMAC) environment demonstrate that our approach achieves significant performance to exceed the prior baselines and verify the effectiveness of our method.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN55064.2022.9891948","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Cooperative multi-agent reinforcement learning has been considered promising to complete many complex cooperative tasks in the real world such as coordination of robot swarms and self-driving. To promote multi-agent cooperation, Centralized Training with Decentralized Execution emerges as a popular learning paradigm due to partial observability and communication constraints during execution and computational complexity in training. Value decomposition has been known to produce competitive performance to other methods in complex environment within this paradigm such as VDN and QMIX, which approximates the global joint Q-value function with multiple local individual Q-value functions. However, existing works often neglect the uncertainty of multiple agents resulting from the partial observability and very large action space in the multi-agent setting and can only obtain the sub-optimal policy. To alleviate the limitations above, building upon the value decomposition, we propose a novel method called multi-agent uncertainty sharing (MAUS). This method utilizes the Bayesian neural network to explicitly capture the uncertainty of all agents and combines with Thompson sampling to select actions for policy learning. Besides, we impose the uncertainty-sharing mechanism among agents to stabilize training as well as coordinate the behaviors of all the agents for multi-agent cooperation. Extensive experiments on the StarCraft Multi-Agent Challenge (SMAC) environment demonstrate that our approach achieves significant performance to exceed the prior baselines and verify the effectiveness of our method.