Accelerated Multi-Time-Scale Stochastic Approximation: Optimal Complexity and Applications in Reinforcement Learning and Multi-Agent Games

Sihan Zeng, Thinh T. Doan
{"title":"Accelerated Multi-Time-Scale Stochastic Approximation: Optimal Complexity and Applications in Reinforcement Learning and Multi-Agent Games","authors":"Sihan Zeng, Thinh T. Doan","doi":"arxiv-2409.07767","DOIUrl":null,"url":null,"abstract":"Multi-time-scale stochastic approximation is an iterative algorithm for\nfinding the fixed point of a set of $N$ coupled operators given their noisy\nsamples. It has been observed that due to the coupling between the decision\nvariables and noisy samples of the operators, the performance of this method\ndecays as $N$ increases. In this work, we develop a new accelerated variant of\nmulti-time-scale stochastic approximation, which significantly improves the\nconvergence rates of its standard counterpart. Our key idea is to introduce\nauxiliary variables to dynamically estimate the operators from their samples,\nwhich are then used to update the decision variables. These auxiliary variables\nhelp not only to control the variance of the operator estimates but also to\ndecouple the sampling noise and the decision variables. This allows us to\nselect more aggressive step sizes to achieve an optimal convergence rate.\nSpecifically, under a strong monotonicity condition, we show that for any value\nof $N$ the $t^{\\text{th}}$ iterate of the proposed algorithm converges to the\ndesired solution at a rate $\\widetilde{O}(1/t)$ when the operator samples are\ngenerated from a single from Markov process trajectory. A second contribution of this work is to demonstrate that the objective of a\nrange of problems in reinforcement learning and multi-agent games can be\nexpressed as a system of fixed-point equations. As such, the proposed approach\ncan be used to design new learning algorithms for solving these problems. We\nillustrate this observation with numerical simulations in a multi-agent game\nand show the advantage of the proposed method over the standard\nmulti-time-scale stochastic approximation algorithm.","PeriodicalId":501286,"journal":{"name":"arXiv - MATH - Optimization and Control","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Optimization and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Multi-time-scale stochastic approximation is an iterative algorithm for finding the fixed point of a set of $N$ coupled operators given their noisy samples. It has been observed that due to the coupling between the decision variables and noisy samples of the operators, the performance of this method decays as $N$ increases. In this work, we develop a new accelerated variant of multi-time-scale stochastic approximation, which significantly improves the convergence rates of its standard counterpart. Our key idea is to introduce auxiliary variables to dynamically estimate the operators from their samples, which are then used to update the decision variables. These auxiliary variables help not only to control the variance of the operator estimates but also to decouple the sampling noise and the decision variables. This allows us to select more aggressive step sizes to achieve an optimal convergence rate. Specifically, under a strong monotonicity condition, we show that for any value of $N$ the $t^{\text{th}}$ iterate of the proposed algorithm converges to the desired solution at a rate $\widetilde{O}(1/t)$ when the operator samples are generated from a single from Markov process trajectory. A second contribution of this work is to demonstrate that the objective of a range of problems in reinforcement learning and multi-agent games can be expressed as a system of fixed-point equations. As such, the proposed approach can be used to design new learning algorithms for solving these problems. We illustrate this observation with numerical simulations in a multi-agent game and show the advantage of the proposed method over the standard multi-time-scale stochastic approximation algorithm.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
加速多时间尺度随机逼近:最优复杂性及其在强化学习和多代理游戏中的应用
多时间尺度随机逼近法是一种迭代算法,用于在给定其噪声样本的情况下找到一组 $N$ 耦合算子的定点。据观察,由于决策变量与算子噪声样本之间的耦合,该方法的性能会随着 $N$ 的增加而下降。在这项工作中,我们开发了一种新的多时间尺度随机逼近加速变体,大大提高了其标准对应方法的收敛率。我们的主要想法是引入辅助变量,从样本中动态估计算子,然后用于更新决策变量。这些辅助变量不仅有助于控制算子估计值的方差,还能将采样噪声和决策变量分离开来。具体来说,在强单调性条件下,我们证明了对于任意 $N$ 值,当算子样本从马尔可夫过程轨迹中生成时,所提算法的 $t^{text{th}}$ 次迭代以 $/widetilde{O}(1/t)$ 的速率收敛到所需的解。这项工作的第二个贡献是证明了强化学习和多代理博弈中一系列问题的目标可以表达为一个定点方程组。因此,所提出的方法可用于设计解决这些问题的新学习算法。我们通过多代理博弈中的数值模拟证明了这一观点,并展示了所提方法相对于标准多时间尺度随机逼近算法的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Trading with propagators and constraints: applications to optimal execution and battery storage Upgrading edges in the maximal covering location problem Minmax regret maximal covering location problems with edge demands Parametric Shape Optimization of Flagellated Micro-Swimmers Using Bayesian Techniques Rapid and finite-time boundary stabilization of a KdV system
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1