Reducing overestimation with attentional multi-agent twin delayed deep deterministic policy gradient

IF 8 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Engineering Applications of Artificial Intelligence Pub Date : 2025-04-15 Epub Date: 2025-02-20 DOI:10.1016/j.engappai.2025.110352
Yizhi Cao, Zijian Tian, Zhaoran Liu, Naizheng Jia, Xinggao Liu
{"title":"Reducing overestimation with attentional multi-agent twin delayed deep deterministic policy gradient","authors":"Yizhi Cao,&nbsp;Zijian Tian,&nbsp;Zhaoran Liu,&nbsp;Naizheng Jia,&nbsp;Xinggao Liu","doi":"10.1016/j.engappai.2025.110352","DOIUrl":null,"url":null,"abstract":"<div><div>In multi-agent reinforcement learning, establishing effective communication protocols is crucial for enhancing agent collaboration. However, traditional communication methods face challenges in scalability and efficiency as the number of agents increases, due to the expansion in the dimensions of observation and action spaces. This leads to heightened resource consumption and degrades performance in large multi-agent scenarios. To address these issues, we introduce a novel Attentional Multi-agent Twin Delayed Deep Deterministic Policy Gradient (AMATD3) algorithm that incorporates an attentional communication policy gradient approach. This approach selectively initiates communications through an attention unit that assesses the necessity of information exchange among agents, combined with a communication module that effectively integrates essential information. By implementing a double-Q function, AMATD3 further addresses issues of overestimation and suboptimal policy choices in existing methods, enhancing the algorithm's accuracy and reducing communication overheads. Specifically, our algorithm demonstrates superior performance in the StarCraft II environment by achieving higher cumulative rewards and enhancing task success rates compared to existing algorithms. For example, AMATD3 yields reward values of 16.908 and 6.858 for the 8m and 25m scenarios, respectively, which is more than double the reward achieved by other methods. This confirms the algorithm's enhanced efficiency and effectiveness in complex multi-agent settings, contributing to the ongoing development of scalable and efficient communication protocols in artificial intelligence.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"146 ","pages":"Article 110352"},"PeriodicalIF":8.0000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625003525","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/20 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In multi-agent reinforcement learning, establishing effective communication protocols is crucial for enhancing agent collaboration. However, traditional communication methods face challenges in scalability and efficiency as the number of agents increases, due to the expansion in the dimensions of observation and action spaces. This leads to heightened resource consumption and degrades performance in large multi-agent scenarios. To address these issues, we introduce a novel Attentional Multi-agent Twin Delayed Deep Deterministic Policy Gradient (AMATD3) algorithm that incorporates an attentional communication policy gradient approach. This approach selectively initiates communications through an attention unit that assesses the necessity of information exchange among agents, combined with a communication module that effectively integrates essential information. By implementing a double-Q function, AMATD3 further addresses issues of overestimation and suboptimal policy choices in existing methods, enhancing the algorithm's accuracy and reducing communication overheads. Specifically, our algorithm demonstrates superior performance in the StarCraft II environment by achieving higher cumulative rewards and enhancing task success rates compared to existing algorithms. For example, AMATD3 yields reward values of 16.908 and 6.858 for the 8m and 25m scenarios, respectively, which is more than double the reward achieved by other methods. This confirms the algorithm's enhanced efficiency and effectiveness in complex multi-agent settings, contributing to the ongoing development of scalable and efficient communication protocols in artificial intelligence.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用注意多智能体双延迟深度确定性策略梯度减少高估
在多智能体强化学习中,建立有效的通信协议是增强智能体协作的关键。然而,随着智能体数量的增加,由于观察和行动空间维度的扩大,传统的通信方法在可扩展性和效率方面面临挑战。这将导致资源消耗增加,并降低大型多代理场景中的性能。为了解决这些问题,我们引入了一种新的注意力多智能体双延迟深度确定性策略梯度(AMATD3)算法,该算法结合了注意力通信策略梯度方法。该方法通过一个评估代理之间信息交换必要性的注意单元,结合一个有效集成基本信息的通信模块,选择性地启动通信。通过实现双q函数,AMATD3进一步解决了现有方法中的高估和次优策略选择问题,提高了算法的准确性,降低了通信开销。具体来说,与现有算法相比,我们的算法通过获得更高的累积奖励和提高任务成功率,在《星际争霸2》环境中展示了卓越的性能。例如,在8m和25m场景下,AMATD3的奖励值分别为16.908和6.858,是其他方法的两倍以上。这证实了该算法在复杂的多智能体设置中提高的效率和有效性,有助于人工智能中可扩展和高效通信协议的持续发展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Engineering Applications of Artificial Intelligence
Engineering Applications of Artificial Intelligence 工程技术-工程:电子与电气
CiteScore
9.60
自引率
10.00%
发文量
505
审稿时长
68 days
期刊介绍: Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.
期刊最新文献
Morphology-aware hierarchical mixture of experts for Chest X-ray anatomy segmentation Multi-dimensional logic anomaly inspection method for assembly components based on virtual domain contrastive pre-training Data-centric federated learning for neuro-oncology: Addressing heterogeneity via privacy-preserving generative augmentation A diffusion-based data augmentation framework for hydraulic pump fault diagnosis A permutation-coded evolutionary algorithm for optimizing the irregular bin packing layout in industrial manufacturing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1