Tradeoffs Between Convergence Rate and Noise Amplification for Momentum-Based Accelerated Optimization Algorithms

IF 7 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automatic Control Pub Date : 2024-09-03 DOI:10.1109/TAC.2024.3453656

Hesameddin Mohammadi;Meisam Razaviyayn;Mihailo R. Jovanović

{"title":"Tradeoffs Between Convergence Rate and Noise Amplification for Momentum-Based Accelerated Optimization Algorithms","authors":"Hesameddin Mohammadi;Meisam Razaviyayn;Mihailo R. Jovanović","doi":"10.1109/TAC.2024.3453656","DOIUrl":null,"url":null,"abstract":"In this article, we study momentum-based first-order optimization algorithms in which the iterations utilize information from the two previous steps and are subject to an additive white noise. This setup uses noise to account for uncertainty in either gradient evaluation or iteration updates, and it includes Polyak's heavy-ball and Nesterov's accelerated methods as special cases. For strongly convex quadratic problems, we use the steady-state variance of the error in the optimization variable to quantify noise amplification and identify fundamental stochastic performance tradeoffs. Our approach utilizes the Jury stability criterion to provide a novel geometric characterization of conditions for linear convergence, and it reveals the relation between the noise amplification and convergence rate as well as their dependence on the condition number and the constant algorithmic parameters. This geometric insight leads to simple alternative proofs of standard convergence results and allows us to establish “uncertainty principle” of strongly convex optimization: for the two-step momentum method with linear convergence rate, the lower bound on the product between the settling time and noise amplification scales quadratically with the condition number. Our analysis also identifies a key difference between the gradient and iterate noise models: while the amplification of gradient noise can be made arbitrarily small by sufficiently decelerating the algorithm, the best achievable variance for the iterate noise model increases linearly with the settling time in the decelerating regime. Finally, we introduce two parameterized families of algorithms that strike a balance between noise amplification and settling time while preserving orderwise Pareto optimality for both noise models.","PeriodicalId":13201,"journal":{"name":"IEEE Transactions on Automatic Control","volume":"70 2","pages":"889-904"},"PeriodicalIF":7.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automatic Control","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10663923/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In this article, we study momentum-based first-order optimization algorithms in which the iterations utilize information from the two previous steps and are subject to an additive white noise. This setup uses noise to account for uncertainty in either gradient evaluation or iteration updates, and it includes Polyak's heavy-ball and Nesterov's accelerated methods as special cases. For strongly convex quadratic problems, we use the steady-state variance of the error in the optimization variable to quantify noise amplification and identify fundamental stochastic performance tradeoffs. Our approach utilizes the Jury stability criterion to provide a novel geometric characterization of conditions for linear convergence, and it reveals the relation between the noise amplification and convergence rate as well as their dependence on the condition number and the constant algorithmic parameters. This geometric insight leads to simple alternative proofs of standard convergence results and allows us to establish “uncertainty principle” of strongly convex optimization: for the two-step momentum method with linear convergence rate, the lower bound on the product between the settling time and noise amplification scales quadratically with the condition number. Our analysis also identifies a key difference between the gradient and iterate noise models: while the amplification of gradient noise can be made arbitrarily small by sufficiently decelerating the algorithm, the best achievable variance for the iterate noise model increases linearly with the settling time in the decelerating regime. Finally, we introduce two parameterized families of algorithms that strike a balance between noise amplification and settling time while preserving orderwise Pareto optimality for both noise models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于动量的加速优化算法在收敛速度和噪声放大之间的权衡

在本文中，我们研究了基于动量的一阶优化算法，其中迭代利用来自前两个步骤的信息，并受到加性白噪声的影响。这种设置使用噪声来解释梯度计算或迭代更新中的不确定性，它包括Polyak的heavy-ball和Nesterov的加速方法作为特殊情况。对于强凸二次问题，我们使用优化变量中误差的稳态方差来量化噪声放大并确定基本的随机性能权衡。该方法利用陪审团稳定性判据提供了线性收敛条件的一种新的几何表征，揭示了噪声放大与收敛速率之间的关系以及它们对条件数和恒定算法参数的依赖关系。这种几何洞察力导致标准收敛结果的简单替代证明，并允许我们建立强凸优化的“不确定性原理”：对于具有线性收敛速率的两步动量方法，沉降时间与噪声放大之间的乘积的下界与条件数成二次比例。我们的分析还确定了梯度和迭代噪声模型之间的一个关键区别：虽然通过充分减速算法可以使梯度噪声的放大任意小，但迭代噪声模型的最佳可实现方差随着减速状态下的沉降时间线性增加。最后，我们介绍了两个参数化的算法族，它们在噪声放大和稳定时间之间取得平衡，同时保持两种噪声模型的有序帕累托最优性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Automatic Control 工程技术-工程：电子与电气

CiteScore

11.30

自引率

5.90%

发文量

824

审稿时长

9 months

期刊介绍： In the IEEE Transactions on Automatic Control, the IEEE Control Systems Society publishes high-quality papers on the theory, design, and applications of control engineering. Two types of contributions are regularly considered: 1) Papers: Presentation of significant research, development, or application of control concepts. 2) Technical Notes and Correspondence: Brief technical notes, comments on published areas or established control topics, corrections to papers and notes published in the Transactions. In addition, special papers (tutorials, surveys, and perspectives on the theory and applications of control systems topics) are solicited.