异步多代理 TD 学习的有限时间分析

Nicolò Dal Fabbro, Arman Adibi, Aritra Mitra, George J. Pappas
{"title":"异步多代理 TD 学习的有限时间分析","authors":"Nicolò Dal Fabbro, Arman Adibi, Aritra Mitra, George J. Pappas","doi":"arxiv-2407.20441","DOIUrl":null,"url":null,"abstract":"Recent research endeavours have theoretically shown the beneficial effect of\ncooperation in multi-agent reinforcement learning (MARL). In a setting\ninvolving $N$ agents, this beneficial effect usually comes in the form of an\n$N$-fold linear convergence speedup, i.e., a reduction - proportional to $N$ -\nin the number of iterations required to reach a certain convergence precision.\nIn this paper, we show for the first time that this speedup property also holds\nfor a MARL framework subject to asynchronous delays in the local agents'\nupdates. In particular, we consider a policy evaluation problem in which\nmultiple agents cooperate to evaluate a common policy by communicating with a\ncentral aggregator. In this setting, we study the finite-time convergence of\n\\texttt{AsyncMATD}, an asynchronous multi-agent temporal difference (TD)\nlearning algorithm in which agents' local TD update directions are subject to\nasynchronous bounded delays. Our main contribution is providing a finite-time\nanalysis of \\texttt{AsyncMATD}, for which we establish a linear convergence\nspeedup while highlighting the effect of time-varying asynchronous delays on\nthe resulting convergence rate.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Finite-Time Analysis of Asynchronous Multi-Agent TD Learning\",\"authors\":\"Nicolò Dal Fabbro, Arman Adibi, Aritra Mitra, George J. Pappas\",\"doi\":\"arxiv-2407.20441\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent research endeavours have theoretically shown the beneficial effect of\\ncooperation in multi-agent reinforcement learning (MARL). In a setting\\ninvolving $N$ agents, this beneficial effect usually comes in the form of an\\n$N$-fold linear convergence speedup, i.e., a reduction - proportional to $N$ -\\nin the number of iterations required to reach a certain convergence precision.\\nIn this paper, we show for the first time that this speedup property also holds\\nfor a MARL framework subject to asynchronous delays in the local agents'\\nupdates. In particular, we consider a policy evaluation problem in which\\nmultiple agents cooperate to evaluate a common policy by communicating with a\\ncentral aggregator. In this setting, we study the finite-time convergence of\\n\\\\texttt{AsyncMATD}, an asynchronous multi-agent temporal difference (TD)\\nlearning algorithm in which agents' local TD update directions are subject to\\nasynchronous bounded delays. Our main contribution is providing a finite-time\\nanalysis of \\\\texttt{AsyncMATD}, for which we establish a linear convergence\\nspeedup while highlighting the effect of time-varying asynchronous delays on\\nthe resulting convergence rate.\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.20441\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.20441","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

最近的研究工作从理论上证明了合作对多代理强化学习(MARL)的有利影响。在涉及 $N$ 代理的环境中,这种有利效应通常表现为 $N$ 倍的线性收敛加速,即达到一定收敛精度所需的迭代次数减少 - 与 $N$ 成比例。在本文中,我们首次证明了这种加速特性也适用于本地代理更新存在异步延迟的 MARL 框架。我们特别考虑了一个策略评估问题,在这个问题中,多个代理通过与中心聚合器通信,合作评估一个共同策略。在这种情况下,我们研究了一种异步多代理时差(TD)学习算法(texttt{AsyncMATD})的有限时间收敛性,在这种算法中,代理的本地 TD 更新方向受到异步有界延迟的影响。我们的主要贡献是提供了对\texttt{AsyncMATD}的有限时间分析,我们建立了线性收敛加速,同时强调了时变异步延迟对收敛速率的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Finite-Time Analysis of Asynchronous Multi-Agent TD Learning
Recent research endeavours have theoretically shown the beneficial effect of cooperation in multi-agent reinforcement learning (MARL). In a setting involving $N$ agents, this beneficial effect usually comes in the form of an $N$-fold linear convergence speedup, i.e., a reduction - proportional to $N$ - in the number of iterations required to reach a certain convergence precision. In this paper, we show for the first time that this speedup property also holds for a MARL framework subject to asynchronous delays in the local agents' updates. In particular, we consider a policy evaluation problem in which multiple agents cooperate to evaluate a common policy by communicating with a central aggregator. In this setting, we study the finite-time convergence of \texttt{AsyncMATD}, an asynchronous multi-agent temporal difference (TD) learning algorithm in which agents' local TD update directions are subject to asynchronous bounded delays. Our main contribution is providing a finite-time analysis of \texttt{AsyncMATD}, for which we establish a linear convergence speedup while highlighting the effect of time-varying asynchronous delays on the resulting convergence rate.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning HARP: Human-Assisted Regrouping with Permutation Invariant Critic for Multi-Agent Reinforcement Learning On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark Multi-agent Path Finding in Continuous Environment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1