Accelerated Multi-Time-Scale Stochastic Approximation: Optimal Complexity and Applications in Reinforcement Learning and Multi-Agent Games

arXiv - MATH - Optimization and Control Pub Date : 2024-09-12 DOI:arxiv-2409.07767

Sihan Zeng, Thinh T. Doan

{"title":"Accelerated Multi-Time-Scale Stochastic Approximation: Optimal Complexity and Applications in Reinforcement Learning and Multi-Agent Games","authors":"Sihan Zeng, Thinh T. Doan","doi":"arxiv-2409.07767","DOIUrl":null,"url":null,"abstract":"Multi-time-scale stochastic approximation is an iterative algorithm for\nfinding the fixed point of a set of $N$ coupled operators given their noisy\nsamples. It has been observed that due to the coupling between the decision\nvariables and noisy samples of the operators, the performance of this method\ndecays as $N$ increases. In this work, we develop a new accelerated variant of\nmulti-time-scale stochastic approximation, which significantly improves the\nconvergence rates of its standard counterpart. Our key idea is to introduce\nauxiliary variables to dynamically estimate the operators from their samples,\nwhich are then used to update the decision variables. These auxiliary variables\nhelp not only to control the variance of the operator estimates but also to\ndecouple the sampling noise and the decision variables. This allows us to\nselect more aggressive step sizes to achieve an optimal convergence rate.\nSpecifically, under a strong monotonicity condition, we show that for any value\nof $N$ the $t^{\\text{th}}$ iterate of the proposed algorithm converges to the\ndesired solution at a rate $\\widetilde{O}(1/t)$ when the operator samples are\ngenerated from a single from Markov process trajectory. A second contribution of this work is to demonstrate that the objective of a\nrange of problems in reinforcement learning and multi-agent games can be\nexpressed as a system of fixed-point equations. As such, the proposed approach\ncan be used to design new learning algorithms for solving these problems. We\nillustrate this observation with numerical simulations in a multi-agent game\nand show the advantage of the proposed method over the standard\nmulti-time-scale stochastic approximation algorithm.","PeriodicalId":501286,"journal":{"name":"arXiv - MATH - Optimization and Control","volume":"468 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Optimization and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-time-scale stochastic approximation is an iterative algorithm for finding the fixed point of a set of $N$ coupled operators given their noisy samples. It has been observed that due to the coupling between the decision variables and noisy samples of the operators, the performance of this method decays as $N$ increases. In this work, we develop a new accelerated variant of multi-time-scale stochastic approximation, which significantly improves the convergence rates of its standard counterpart. Our key idea is to introduce auxiliary variables to dynamically estimate the operators from their samples, which are then used to update the decision variables. These auxiliary variables help not only to control the variance of the operator estimates but also to decouple the sampling noise and the decision variables. This allows us to select more aggressive step sizes to achieve an optimal convergence rate. Specifically, under a strong monotonicity condition, we show that for any value of $N$ the $t^{\text{th}}$ iterate of the proposed algorithm converges to the desired solution at a rate $\widetilde{O}(1/t)$ when the operator samples are generated from a single from Markov process trajectory. A second contribution of this work is to demonstrate that the objective of a range of problems in reinforcement learning and multi-agent games can be expressed as a system of fixed-point equations. As such, the proposed approach can be used to design new learning algorithms for solving these problems. We illustrate this observation with numerical simulations in a multi-agent game and show the advantage of the proposed method over the standard multi-time-scale stochastic approximation algorithm.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

加速多时间尺度随机逼近：最优复杂性及其在强化学习和多代理游戏中的应用

多时间尺度随机逼近法是一种迭代算法，用于在给定其噪声样本的情况下找到一组 $N$ 耦合算子的定点。据观察，由于决策变量与算子噪声样本之间的耦合，该方法的性能会随着 $N$ 的增加而下降。在这项工作中，我们开发了一种新的多时间尺度随机逼近加速变体，大大提高了其标准对应方法的收敛率。我们的主要想法是引入辅助变量，从样本中动态估计算子，然后用于更新决策变量。这些辅助变量不仅有助于控制算子估计值的方差，还能将采样噪声和决策变量分离开来。具体来说，在强单调性条件下，我们证明了对于任意 $N$ 值，当算子样本从马尔可夫过程轨迹中生成时，所提算法的 $t^{text{th}}$ 次迭代以 $/widetilde{O}（1/t）$ 的速率收敛到所需的解。这项工作的第二个贡献是证明了强化学习和多代理博弈中一系列问题的目标可以表达为一个定点方程组。因此，所提出的方法可用于设计解决这些问题的新学习算法。我们通过多代理博弈中的数值模拟证明了这一观点，并展示了所提方法相对于标准多时间尺度随机逼近算法的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - MATH - Optimization and Control

自引率

0.00%

发文量

期刊最新文献

Trading with propagators and constraints: applications to optimal execution and battery storage Upgrading edges in the maximal covering location problem Minmax regret maximal covering location problems with edge demands Parametric Shape Optimization of Flagellated Micro-Swimmers Using Bayesian Techniques Rapid and finite-time boundary stabilization of a KdV system