A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning

Gugan Thoppe, Bhumesh Kumar
{"title":"A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning","authors":"Gugan Thoppe, Bhumesh Kumar","doi":"10.1109/ICC54714.2021.9702912","DOIUrl":null,"url":null,"abstract":"In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MARL. In particular, our result describes the convergence rate on almost every sample path where the algorithm converges. This result is the first of its kind in the distributed setup and provides deeper insights than the existing ones, which only discuss convergence rates in the expected or the CLT sense. Importantly, our result holds under significantly weaker assumptions: neither the gossip matrix needs to be doubly stochastic nor the stepsizes square summable.","PeriodicalId":382373,"journal":{"name":"2021 Seventh Indian Control Conference (ICC)","volume":"398 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Seventh Indian Control Conference (ICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICC54714.2021.9702912","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MARL. In particular, our result describes the convergence rate on almost every sample path where the algorithm converges. This result is the first of its kind in the distributed setup and provides deeper insights than the existing ones, which only discuss convergence rates in the expected or the CLT sense. Importantly, our result holds under significantly weaker assumptions: neither the gossip matrix needs to be doubly stochastic nor the stepsizes square summable.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多智能体强化学习的迭代对数律
在多智能体强化学习(MARL)中,多个智能体与一个共同的环境相互作用,也相互作用,以解决顺序决策中的共享问题。在这项工作中,我们为一组分布非线性随机逼近格式导出了一种新的迭代对数律,它在MARL中很有用。特别是,我们的结果描述了算法收敛的几乎每个样本路径上的收敛率。该结果是分布式设置中的第一个此类结果,并且比现有的结果提供了更深入的见解,这些结果只讨论了预期或CLT意义上的收敛速度。重要的是,我们的结果在明显较弱的假设下成立:八卦矩阵既不需要双重随机,也不需要步长平方可和。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Robust Control of Buck-Boost Converter using Second Order Sliding Modes Finite-Time Stability Analysis of a Distributed Microgrid Connected via Detail-Balanced Graph Improving network's transition cohesion by approximating strongly damped waves using delayed self reinforcement Nonlinear Spacecraft Attitude Control Design Using Modified Rodrigues Parameters Comparison of Deep Reinforcement Learning Techniques with Gradient based approach in Cooperative Control of Wind Farm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1