多智能体强化学习中的集中批评

IF 4.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Journal of Artificial Intelligence Research Pub Date : 2023-05-31 DOI:10.1613/jair.1.14386
Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Brett Daley, Chris Amato
{"title":"多智能体强化学习中的集中批评","authors":"Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Brett Daley, Chris Amato","doi":"10.1613/jair.1.14386","DOIUrl":null,"url":null,"abstract":"Centralized Training for Decentralized Execution, where agents are trained offline in a centralized fashion and execute online in a decentralized manner, has become a popular approach in Multi-Agent Reinforcement Learning (MARL). In particular, it has become popular to develop actor-critic methods that train decentralized actors with a centralized critic where the centralized critic is allowed access global information of the entire system, including the true system state. Such centralized critics are possible given offline information and are not used for online execution. While these methods perform well in a number of domains and have become a de facto standard in MARL, using a centralized critic in this context has yet to be sufficiently analyzed theoretically or empirically. In this paper, we therefore formally analyze centralized and decentralized critic approaches, and analyze the effect of using state-based critics in partially observable environments. We derive theories contrary to the common intuition: critic centralization is not strictly beneficial, and using state values can be harmful. We further prove that, in particular, state-based critics can introduce unexpected bias and variance compared to history-based critics. Finally, we demonstrate how the theory applies in practice by comparing different forms of critics on a wide range of common multi-agent benchmarks. The experiments show practical issues such as the difficulty of representation learning with partial observability, which highlights why the theoretical problems are often overlooked in the literature.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":null,"pages":null},"PeriodicalIF":4.5000,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"On Centralized Critics in Multi-Agent Reinforcement Learning\",\"authors\":\"Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Brett Daley, Chris Amato\",\"doi\":\"10.1613/jair.1.14386\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Centralized Training for Decentralized Execution, where agents are trained offline in a centralized fashion and execute online in a decentralized manner, has become a popular approach in Multi-Agent Reinforcement Learning (MARL). In particular, it has become popular to develop actor-critic methods that train decentralized actors with a centralized critic where the centralized critic is allowed access global information of the entire system, including the true system state. Such centralized critics are possible given offline information and are not used for online execution. While these methods perform well in a number of domains and have become a de facto standard in MARL, using a centralized critic in this context has yet to be sufficiently analyzed theoretically or empirically. In this paper, we therefore formally analyze centralized and decentralized critic approaches, and analyze the effect of using state-based critics in partially observable environments. We derive theories contrary to the common intuition: critic centralization is not strictly beneficial, and using state values can be harmful. We further prove that, in particular, state-based critics can introduce unexpected bias and variance compared to history-based critics. Finally, we demonstrate how the theory applies in practice by comparing different forms of critics on a wide range of common multi-agent benchmarks. The experiments show practical issues such as the difficulty of representation learning with partial observability, which highlights why the theoretical problems are often overlooked in the literature.\",\"PeriodicalId\":54877,\"journal\":{\"name\":\"Journal of Artificial Intelligence Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2023-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Artificial Intelligence Research\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1613/jair.1.14386\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence Research","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1613/jair.1.14386","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 1

摘要

分散式执行的集中式训练,即代理以集中的方式离线训练,并以分散的方式在线执行,已成为多代理强化学习(MARL)中的一种流行方法。特别是,开发参与者-批评者方法已经变得流行,这种方法可以用集中的批评者来训练分散的参与者,其中集中的批评者可以访问整个系统的全局信息,包括真实的系统状态。这种集中式批评在提供离线信息的情况下是可能的,而不是用于在线执行。虽然这些方法在许多领域表现良好,并已成为MARL的事实上的标准,但在这种情况下使用集中式批评尚未得到充分的理论或经验分析。因此,在本文中,我们正式分析了集中式和分散式批评方法,并分析了在部分可观察的环境中使用基于状态的批评的效果。我们得出了与普遍直觉相反的理论:批评家集中化并不严格有益,使用国家价值可能有害。我们进一步证明,与基于历史的批评相比,基于国家的批评尤其会引入意想不到的偏见和方差。最后,我们通过在广泛的常见多智能体基准上比较不同形式的批评来证明该理论如何应用于实践。实验显示了实际问题,如部分可观察性表征学习的困难,这突出了为什么理论问题在文献中经常被忽视。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
On Centralized Critics in Multi-Agent Reinforcement Learning
Centralized Training for Decentralized Execution, where agents are trained offline in a centralized fashion and execute online in a decentralized manner, has become a popular approach in Multi-Agent Reinforcement Learning (MARL). In particular, it has become popular to develop actor-critic methods that train decentralized actors with a centralized critic where the centralized critic is allowed access global information of the entire system, including the true system state. Such centralized critics are possible given offline information and are not used for online execution. While these methods perform well in a number of domains and have become a de facto standard in MARL, using a centralized critic in this context has yet to be sufficiently analyzed theoretically or empirically. In this paper, we therefore formally analyze centralized and decentralized critic approaches, and analyze the effect of using state-based critics in partially observable environments. We derive theories contrary to the common intuition: critic centralization is not strictly beneficial, and using state values can be harmful. We further prove that, in particular, state-based critics can introduce unexpected bias and variance compared to history-based critics. Finally, we demonstrate how the theory applies in practice by comparing different forms of critics on a wide range of common multi-agent benchmarks. The experiments show practical issues such as the difficulty of representation learning with partial observability, which highlights why the theoretical problems are often overlooked in the literature.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Artificial Intelligence Research
Journal of Artificial Intelligence Research 工程技术-计算机:人工智能
CiteScore
9.60
自引率
4.00%
发文量
98
审稿时长
4 months
期刊介绍: JAIR(ISSN 1076 - 9757) covers all areas of artificial intelligence (AI), publishing refereed research articles, survey articles, and technical notes. Established in 1993 as one of the first electronic scientific journals, JAIR is indexed by INSPEC, Science Citation Index, and MathSciNet. JAIR reviews papers within approximately three months of submission and publishes accepted articles on the internet immediately upon receiving the final versions. JAIR articles are published for free distribution on the internet by the AI Access Foundation, and for purchase in bound volumes by AAAI Press.
期刊最新文献
Symbolic Task Inference in Deep Reinforcement Learning Axiomatization of Non-Recursive Aggregates in First-Order Answer Set Programming Unifying SAT-Based Approaches to Maximum Satisfiability Solving The TOAD System for Totally Ordered HTN Planning Mitigating Value Hallucination in Dyna-Style Planning via Multistep Predecessor Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1