针对大规模多代理环境的注意力自适应分散政策

IEEE transactions on artificial intelligence Pub Date : 2024-06-18 DOI:10.1109/TAI.2024.3415550

Youness Boutyour;Abdellah Idrissi

{"title":"针对大规模多代理环境的注意力自适应分散政策","authors":"Youness Boutyour;Abdellah Idrissi","doi":"10.1109/TAI.2024.3415550","DOIUrl":null,"url":null,"abstract":"Multiagent reinforcement learning (MARL) poses unique challenges in real-world applications, demanding the adaptation of reinforcement learning principles to scenarios where agents interact in dynamically changing environments. This article presents a novel approach, “decentralized policy with attention” (ADPA), designed to address these challenges in large-scale multiagent environments. ADPA leverages an attention mechanism to dynamically select relevant information for estimating critics while training decentralized policies. This enables effective and scalable learning, supporting both cooperative and competitive settings, and scenarios with nonglobal states. In this work, we conduct a comprehensive evaluation of ADPA across a range of multiagent environments, including cooperative treasure collection and rover-tower communication. We compare ADPA with existing centralized training methods and ablated variants to showcase its advantages in terms of scalability, adaptability to various environments, and robustness. Our results demonstrate that ADPA offers a promising solution for addressing the complexities of large-scale MARL, providing the flexibility to handle diverse multiagent scenarios. By combining decentralized policies with attention mechanisms, we contribute to the advancement of MARL techniques, offering a powerful tool for real-world applications in dynamic and interactive multiagent systems.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 10","pages":"4905-4914"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Decentralized Policies With Attention for Large-Scale Multiagent Environments\",\"authors\":\"Youness Boutyour;Abdellah Idrissi\",\"doi\":\"10.1109/TAI.2024.3415550\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multiagent reinforcement learning (MARL) poses unique challenges in real-world applications, demanding the adaptation of reinforcement learning principles to scenarios where agents interact in dynamically changing environments. This article presents a novel approach, “decentralized policy with attention” (ADPA), designed to address these challenges in large-scale multiagent environments. ADPA leverages an attention mechanism to dynamically select relevant information for estimating critics while training decentralized policies. This enables effective and scalable learning, supporting both cooperative and competitive settings, and scenarios with nonglobal states. In this work, we conduct a comprehensive evaluation of ADPA across a range of multiagent environments, including cooperative treasure collection and rover-tower communication. We compare ADPA with existing centralized training methods and ablated variants to showcase its advantages in terms of scalability, adaptability to various environments, and robustness. Our results demonstrate that ADPA offers a promising solution for addressing the complexities of large-scale MARL, providing the flexibility to handle diverse multiagent scenarios. By combining decentralized policies with attention mechanisms, we contribute to the advancement of MARL techniques, offering a powerful tool for real-world applications in dynamic and interactive multiagent systems.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"5 10\",\"pages\":\"4905-4914\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10562040/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10562040/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

多代理强化学习（MARL）在现实世界的应用中提出了独特的挑战，要求将强化学习原理适应代理在动态变化的环境中交互的场景。本文介绍了一种新颖的方法--"带注意力的分散策略"（ADPA），旨在应对大规模多代理环境中的这些挑战。ADPA 利用注意力机制来动态选择相关信息，以便在训练分散策略时估计批评者。这就实现了有效和可扩展的学习，同时支持合作和竞争环境，以及具有非全局状态的场景。在这项工作中，我们在一系列多代理环境中对 ADPA 进行了全面评估，包括合作寻宝和漫游者-塔台通信。我们将 ADPA 与现有的集中式训练方法和消融变体进行了比较，以展示其在可扩展性、对各种环境的适应性和鲁棒性方面的优势。我们的研究结果表明，ADPA 为解决大规模 MARL 的复杂性提供了一种很有前途的解决方案，它能灵活地处理各种多代理场景。通过将分散策略与关注机制相结合，我们为 MARL 技术的进步做出了贡献，为动态交互式多代理系统的实际应用提供了一个强大的工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Adaptive Decentralized Policies With Attention for Large-Scale Multiagent Environments

Multiagent reinforcement learning (MARL) poses unique challenges in real-world applications, demanding the adaptation of reinforcement learning principles to scenarios where agents interact in dynamically changing environments. This article presents a novel approach, “decentralized policy with attention” (ADPA), designed to address these challenges in large-scale multiagent environments. ADPA leverages an attention mechanism to dynamically select relevant information for estimating critics while training decentralized policies. This enables effective and scalable learning, supporting both cooperative and competitive settings, and scenarios with nonglobal states. In this work, we conduct a comprehensive evaluation of ADPA across a range of multiagent environments, including cooperative treasure collection and rover-tower communication. We compare ADPA with existing centralized training methods and ablated variants to showcase its advantages in terms of scalability, adaptability to various environments, and robustness. Our results demonstrate that ADPA offers a promising solution for addressing the complexities of large-scale MARL, providing the flexibility to handle diverse multiagent scenarios. By combining decentralized policies with attention mechanisms, we contribute to the advancement of MARL techniques, offering a powerful tool for real-world applications in dynamic and interactive multiagent systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助