Reducing overestimation with attentional multi-agent twin delayed deep deterministic policy gradient

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Engineering Applications of Artificial Intelligence Pub Date : 2025-02-20 DOI:10.1016/j.engappai.2025.110352

Yizhi Cao, Zijian Tian, Zhaoran Liu, Naizheng Jia, Xinggao Liu

{"title":"Reducing overestimation with attentional multi-agent twin delayed deep deterministic policy gradient","authors":"Yizhi Cao, Zijian Tian, Zhaoran Liu, Naizheng Jia, Xinggao Liu","doi":"10.1016/j.engappai.2025.110352","DOIUrl":null,"url":null,"abstract":"<div><div>In multi-agent reinforcement learning, establishing effective communication protocols is crucial for enhancing agent collaboration. However, traditional communication methods face challenges in scalability and efficiency as the number of agents increases, due to the expansion in the dimensions of observation and action spaces. This leads to heightened resource consumption and degrades performance in large multi-agent scenarios. To address these issues, we introduce a novel Attentional Multi-agent Twin Delayed Deep Deterministic Policy Gradient (AMATD3) algorithm that incorporates an attentional communication policy gradient approach. This approach selectively initiates communications through an attention unit that assesses the necessity of information exchange among agents, combined with a communication module that effectively integrates essential information. By implementing a double-Q function, AMATD3 further addresses issues of overestimation and suboptimal policy choices in existing methods, enhancing the algorithm's accuracy and reducing communication overheads. Specifically, our algorithm demonstrates superior performance in the StarCraft II environment by achieving higher cumulative rewards and enhancing task success rates compared to existing algorithms. For example, AMATD3 yields reward values of 16.908 and 6.858 for the 8m and 25m scenarios, respectively, which is more than double the reward achieved by other methods. This confirms the algorithm's enhanced efficiency and effectiveness in complex multi-agent settings, contributing to the ongoing development of scalable and efficient communication protocols in artificial intelligence.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"146 ","pages":"Article 110352"},"PeriodicalIF":7.5000,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625003525","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In multi-agent reinforcement learning, establishing effective communication protocols is crucial for enhancing agent collaboration. However, traditional communication methods face challenges in scalability and efficiency as the number of agents increases, due to the expansion in the dimensions of observation and action spaces. This leads to heightened resource consumption and degrades performance in large multi-agent scenarios. To address these issues, we introduce a novel Attentional Multi-agent Twin Delayed Deep Deterministic Policy Gradient (AMATD3) algorithm that incorporates an attentional communication policy gradient approach. This approach selectively initiates communications through an attention unit that assesses the necessity of information exchange among agents, combined with a communication module that effectively integrates essential information. By implementing a double-Q function, AMATD3 further addresses issues of overestimation and suboptimal policy choices in existing methods, enhancing the algorithm's accuracy and reducing communication overheads. Specifically, our algorithm demonstrates superior performance in the StarCraft II environment by achieving higher cumulative rewards and enhancing task success rates compared to existing algorithms. For example, AMATD3 yields reward values of 16.908 and 6.858 for the 8m and 25m scenarios, respectively, which is more than double the reward achieved by other methods. This confirms the algorithm's enhanced efficiency and effectiveness in complex multi-agent settings, contributing to the ongoing development of scalable and efficient communication protocols in artificial intelligence.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.