多代理强化学习的双向影响与互动

Shaoqi Sun;Kele Xu;Dawei Feng;Bo Ding
{"title":"多代理强化学习的双向影响与互动","authors":"Shaoqi Sun;Kele Xu;Dawei Feng;Bo Ding","doi":"10.1109/TAI.2024.3401649","DOIUrl":null,"url":null,"abstract":"In recent years, multiagent reinforcement learning (MARL) has demonstrated considerable potential across diverse applications. However, in reinforcement learning environments characterized by sparse rewards, the scarcity of reward signals may give rise to reward conflicts among agents. In these scenarios, each agent tends to compete to obtain limited rewards, deviating from collaborative efforts aimed at achieving collective team objectives. This not only amplifies the learning challenge but also imposes constraints on the overall learning performance of agents, ultimately compromising the attainment of team goals. To mitigate the conflicting competition for rewards among agents in MARL, we introduce the bidirectional influence and interaction (BDII) MARL framework. This innovative approach draws inspiration from the collaborative ethos observed in human social cooperation, specifically the concept of “sharing joys and sorrows.” The fundamental concept behind BDII is to empower agents to share their individual rewards with collaborators, fostering a cooperative rather than competitive behavioral paradigm. This strategic shift aims to resolve the pervasive issue of reward conflicts among agents operating in sparse-reward environments. BDII incorporates two key factors—namely, the Gaussian kernel distance between agents (physical distance) and policy diversity among agents (logical distance). The two factor collectively contribute to the dynamic adjustment of reward allocation coefficients, culminating in the formation of reward distribution weights. The incorporation of these weights facilitates the equitable sharing of agents’ contributions to rewards, promoting a cooperative learning environment. Through extensive experimental evaluations, we substantiate the efficacy of BDII in addressing the challenge of reward conflicts in MARL. Our research findings affirm that BDII significantly mitigates reward conflicts, ensuring that agents consistently align with the original team objectives, thereby achieving state-of-the-art performance. This validation underscores the potential of the proposed framework in enhancing the collaborative nature of multiagent systems, offering a promising avenue for advancing the field of reinforcement learning.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bidirectional Influence and Interaction for Multiagent Reinforcement Learning\",\"authors\":\"Shaoqi Sun;Kele Xu;Dawei Feng;Bo Ding\",\"doi\":\"10.1109/TAI.2024.3401649\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, multiagent reinforcement learning (MARL) has demonstrated considerable potential across diverse applications. However, in reinforcement learning environments characterized by sparse rewards, the scarcity of reward signals may give rise to reward conflicts among agents. In these scenarios, each agent tends to compete to obtain limited rewards, deviating from collaborative efforts aimed at achieving collective team objectives. This not only amplifies the learning challenge but also imposes constraints on the overall learning performance of agents, ultimately compromising the attainment of team goals. To mitigate the conflicting competition for rewards among agents in MARL, we introduce the bidirectional influence and interaction (BDII) MARL framework. This innovative approach draws inspiration from the collaborative ethos observed in human social cooperation, specifically the concept of “sharing joys and sorrows.” The fundamental concept behind BDII is to empower agents to share their individual rewards with collaborators, fostering a cooperative rather than competitive behavioral paradigm. This strategic shift aims to resolve the pervasive issue of reward conflicts among agents operating in sparse-reward environments. BDII incorporates two key factors—namely, the Gaussian kernel distance between agents (physical distance) and policy diversity among agents (logical distance). The two factor collectively contribute to the dynamic adjustment of reward allocation coefficients, culminating in the formation of reward distribution weights. The incorporation of these weights facilitates the equitable sharing of agents’ contributions to rewards, promoting a cooperative learning environment. Through extensive experimental evaluations, we substantiate the efficacy of BDII in addressing the challenge of reward conflicts in MARL. Our research findings affirm that BDII significantly mitigates reward conflicts, ensuring that agents consistently align with the original team objectives, thereby achieving state-of-the-art performance. This validation underscores the potential of the proposed framework in enhancing the collaborative nature of multiagent systems, offering a promising avenue for advancing the field of reinforcement learning.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10531155/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10531155/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,多代理强化学习(MARL)在各种应用中展现出了巨大的潜力。然而,在以奖励稀少为特征的强化学习环境中,奖励信号的稀缺可能会引起代理之间的奖励冲突。在这种情况下,每个代理都倾向于为获得有限的奖励而竞争,从而偏离了旨在实现团队集体目标的协作努力。这不仅会加大学习难度,还会对代理的整体学习表现造成制约,最终影响团队目标的实现。为了缓解 MARL 中代理之间为获得奖励而相互竞争的矛盾,我们引入了双向影响和互动(BDII)MARL 框架。这种创新方法的灵感来自人类社会合作中的协作精神,特别是 "同甘共苦 "的理念。BDII 背后的基本概念是授权代理与合作者分享各自的回报,从而培养一种合作而非竞争的行为模式。这一战略转变旨在解决在奖励稀缺环境中运行的代理之间普遍存在的奖励冲突问题。BDII 包含两个关键因素,即代理之间的高斯核距离(物理距离)和代理之间的政策多样性(逻辑距离)。这两个因素共同作用于奖励分配系数的动态调整,最终形成奖励分配权重。这些权重的加入有助于公平分享代理对奖励的贡献,促进合作学习环境的形成。通过广泛的实验评估,我们证实了 BDII 在解决 MARL 中奖励冲突难题方面的功效。我们的研究结果证实,BDII 能显著缓解奖励冲突,确保代理始终与最初的团队目标保持一致,从而实现最先进的性能。这一验证强调了所提出的框架在增强多代理系统协作性方面的潜力,为推进强化学习领域的发展提供了一条大有可为的途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Bidirectional Influence and Interaction for Multiagent Reinforcement Learning
In recent years, multiagent reinforcement learning (MARL) has demonstrated considerable potential across diverse applications. However, in reinforcement learning environments characterized by sparse rewards, the scarcity of reward signals may give rise to reward conflicts among agents. In these scenarios, each agent tends to compete to obtain limited rewards, deviating from collaborative efforts aimed at achieving collective team objectives. This not only amplifies the learning challenge but also imposes constraints on the overall learning performance of agents, ultimately compromising the attainment of team goals. To mitigate the conflicting competition for rewards among agents in MARL, we introduce the bidirectional influence and interaction (BDII) MARL framework. This innovative approach draws inspiration from the collaborative ethos observed in human social cooperation, specifically the concept of “sharing joys and sorrows.” The fundamental concept behind BDII is to empower agents to share their individual rewards with collaborators, fostering a cooperative rather than competitive behavioral paradigm. This strategic shift aims to resolve the pervasive issue of reward conflicts among agents operating in sparse-reward environments. BDII incorporates two key factors—namely, the Gaussian kernel distance between agents (physical distance) and policy diversity among agents (logical distance). The two factor collectively contribute to the dynamic adjustment of reward allocation coefficients, culminating in the formation of reward distribution weights. The incorporation of these weights facilitates the equitable sharing of agents’ contributions to rewards, promoting a cooperative learning environment. Through extensive experimental evaluations, we substantiate the efficacy of BDII in addressing the challenge of reward conflicts in MARL. Our research findings affirm that BDII significantly mitigates reward conflicts, ensuring that agents consistently align with the original team objectives, thereby achieving state-of-the-art performance. This validation underscores the potential of the proposed framework in enhancing the collaborative nature of multiagent systems, offering a promising avenue for advancing the field of reinforcement learning.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.70
自引率
0.00%
发文量
0
期刊最新文献
Table of Contents Front Cover IEEE Transactions on Artificial Intelligence Publication Information Front Cover Table of Contents
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1