Bidirectional Influence and Interaction for Multiagent Reinforcement Learning

IEEE transactions on artificial intelligence Pub Date : 2024-03-15 DOI:10.1109/TAI.2024.3401649

Shaoqi Sun;Kele Xu;Dawei Feng;Bo Ding

{"title":"Bidirectional Influence and Interaction for Multiagent Reinforcement Learning","authors":"Shaoqi Sun;Kele Xu;Dawei Feng;Bo Ding","doi":"10.1109/TAI.2024.3401649","DOIUrl":null,"url":null,"abstract":"In recent years, multiagent reinforcement learning (MARL) has demonstrated considerable potential across diverse applications. However, in reinforcement learning environments characterized by sparse rewards, the scarcity of reward signals may give rise to reward conflicts among agents. In these scenarios, each agent tends to compete to obtain limited rewards, deviating from collaborative efforts aimed at achieving collective team objectives. This not only amplifies the learning challenge but also imposes constraints on the overall learning performance of agents, ultimately compromising the attainment of team goals. To mitigate the conflicting competition for rewards among agents in MARL, we introduce the bidirectional influence and interaction (BDII) MARL framework. This innovative approach draws inspiration from the collaborative ethos observed in human social cooperation, specifically the concept of “sharing joys and sorrows.” The fundamental concept behind BDII is to empower agents to share their individual rewards with collaborators, fostering a cooperative rather than competitive behavioral paradigm. This strategic shift aims to resolve the pervasive issue of reward conflicts among agents operating in sparse-reward environments. BDII incorporates two key factors—namely, the Gaussian kernel distance between agents (physical distance) and policy diversity among agents (logical distance). The two factor collectively contribute to the dynamic adjustment of reward allocation coefficients, culminating in the formation of reward distribution weights. The incorporation of these weights facilitates the equitable sharing of agents’ contributions to rewards, promoting a cooperative learning environment. Through extensive experimental evaluations, we substantiate the efficacy of BDII in addressing the challenge of reward conflicts in MARL. Our research findings affirm that BDII significantly mitigates reward conflicts, ensuring that agents consistently align with the original team objectives, thereby achieving state-of-the-art performance. This validation underscores the potential of the proposed framework in enhancing the collaborative nature of multiagent systems, offering a promising avenue for advancing the field of reinforcement learning.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 10","pages":"4984-4995"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10531155/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, multiagent reinforcement learning (MARL) has demonstrated considerable potential across diverse applications. However, in reinforcement learning environments characterized by sparse rewards, the scarcity of reward signals may give rise to reward conflicts among agents. In these scenarios, each agent tends to compete to obtain limited rewards, deviating from collaborative efforts aimed at achieving collective team objectives. This not only amplifies the learning challenge but also imposes constraints on the overall learning performance of agents, ultimately compromising the attainment of team goals. To mitigate the conflicting competition for rewards among agents in MARL, we introduce the bidirectional influence and interaction (BDII) MARL framework. This innovative approach draws inspiration from the collaborative ethos observed in human social cooperation, specifically the concept of “sharing joys and sorrows.” The fundamental concept behind BDII is to empower agents to share their individual rewards with collaborators, fostering a cooperative rather than competitive behavioral paradigm. This strategic shift aims to resolve the pervasive issue of reward conflicts among agents operating in sparse-reward environments. BDII incorporates two key factors—namely, the Gaussian kernel distance between agents (physical distance) and policy diversity among agents (logical distance). The two factor collectively contribute to the dynamic adjustment of reward allocation coefficients, culminating in the formation of reward distribution weights. The incorporation of these weights facilitates the equitable sharing of agents’ contributions to rewards, promoting a cooperative learning environment. Through extensive experimental evaluations, we substantiate the efficacy of BDII in addressing the challenge of reward conflicts in MARL. Our research findings affirm that BDII significantly mitigates reward conflicts, ensuring that agents consistently align with the original team objectives, thereby achieving state-of-the-art performance. This validation underscores the potential of the proposed framework in enhancing the collaborative nature of multiagent systems, offering a promising avenue for advancing the field of reinforcement learning.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多代理强化学习的双向影响与互动

近年来，多代理强化学习（MARL）在各种应用中展现出了巨大的潜力。然而，在以奖励稀少为特征的强化学习环境中，奖励信号的稀缺可能会引起代理之间的奖励冲突。在这种情况下，每个代理都倾向于为获得有限的奖励而竞争，从而偏离了旨在实现团队集体目标的协作努力。这不仅会加大学习难度，还会对代理的整体学习表现造成制约，最终影响团队目标的实现。为了缓解 MARL 中代理之间为获得奖励而相互竞争的矛盾，我们引入了双向影响和互动（BDII）MARL 框架。这种创新方法的灵感来自人类社会合作中的协作精神，特别是 "同甘共苦 "的理念。BDII 背后的基本概念是授权代理与合作者分享各自的回报，从而培养一种合作而非竞争的行为模式。这一战略转变旨在解决在奖励稀缺环境中运行的代理之间普遍存在的奖励冲突问题。BDII 包含两个关键因素，即代理之间的高斯核距离（物理距离）和代理之间的政策多样性（逻辑距离）。这两个因素共同作用于奖励分配系数的动态调整，最终形成奖励分配权重。这些权重的加入有助于公平分享代理对奖励的贡献，促进合作学习环境的形成。通过广泛的实验评估，我们证实了 BDII 在解决 MARL 中奖励冲突难题方面的功效。我们的研究结果证实，BDII 能显著缓解奖励冲突，确保代理始终与最初的团队目标保持一致，从而实现最先进的性能。这一验证强调了所提出的框架在增强多代理系统协作性方面的潜力，为推进强化学习领域的发展提供了一条大有可为的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊