UAdam: Unified Adam-Type Algorithmic Framework for Nonconvex Optimization

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Computation Pub Date : 2024-08-19 DOI:10.1162/neco_a_01692

Yiming Jiang;Jinlan Liu;Dongpo Xu;Danilo P. Mandic

{"title":"UAdam: Unified Adam-Type Algorithmic Framework for Nonconvex Optimization","authors":"Yiming Jiang;Jinlan Liu;Dongpo Xu;Danilo P. Mandic","doi":"10.1162/neco_a_01692","DOIUrl":null,"url":null,"abstract":"Adam-type algorithms have become a preferred choice for optimization in the deep learning setting; however, despite their success, their convergence is still not well understood. To this end, we introduce a unified framework for Adam-type algorithms, termed UAdam. It is equipped with a general form of the second-order moment, which makes it possible to include Adam and its existing and future variants as special cases, such as NAdam, AMSGrad, AdaBound, AdaFom, and Adan. The approach is supported by a rigorous convergence analysis of UAdam in the general nonconvex stochastic setting, showing that UAdam converges to the neighborhood of stationary points with a rate of O(1/T). Furthermore, the size of the neighborhood decreases as the parameter β1 increases. Importantly, our analysis only requires the first-order momentum factor to be close enough to 1, without any restrictions on the second-order momentum factor. Theoretical results also reveal the convergence conditions of vanilla Adam, together with the selection of appropriate hyperparameters. This provides a theoretical guarantee for the analysis, applications, and further developments of the whole general class of Adam-type algorithms. Finally, several numerical experiments are provided to support our theoretical findings.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1912-1938"},"PeriodicalIF":2.7000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10661271/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Adam-type algorithms have become a preferred choice for optimization in the deep learning setting; however, despite their success, their convergence is still not well understood. To this end, we introduce a unified framework for Adam-type algorithms, termed UAdam. It is equipped with a general form of the second-order moment, which makes it possible to include Adam and its existing and future variants as special cases, such as NAdam, AMSGrad, AdaBound, AdaFom, and Adan. The approach is supported by a rigorous convergence analysis of UAdam in the general nonconvex stochastic setting, showing that UAdam converges to the neighborhood of stationary points with a rate of O(1/T). Furthermore, the size of the neighborhood decreases as the parameter β1 increases. Importantly, our analysis only requires the first-order momentum factor to be close enough to 1, without any restrictions on the second-order momentum factor. Theoretical results also reveal the convergence conditions of vanilla Adam, together with the selection of appropriate hyperparameters. This provides a theoretical guarantee for the analysis, applications, and further developments of the whole general class of Adam-type algorithms. Finally, several numerical experiments are provided to support our theoretical findings.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

UAdam：非凸优化的统一亚当式算法框架。

亚当型算法已成为深度学习环境中优化的首选；然而，尽管亚当型算法取得了成功，但人们对其收敛性仍不甚了解。为此，我们引入了亚当型算法的统一框架，称为 UAdam。它配备了二阶矩的一般形式，从而可以将亚当及其现有和未来的变体作为特例，如 NAdam、AMSGrad、AdaBound、AdaFom 和 Adan。UAdam 在一般非凸随机环境下的严格收敛分析支持了这一方法，分析表明 UAdam 以 O(1/T) 的速度收敛到静止点邻域。此外，邻域的大小会随着参数 β1 的增大而减小。重要的是，我们的分析只要求一阶动量因子足够接近 1，而对二阶动量因子没有任何限制。理论结果还揭示了 vanilla Adam 的收敛条件，以及适当超参数的选择。这为整个亚当型算法的分析、应用和进一步发展提供了理论保证。最后，我们还提供了几个数值实验来支持我们的理论发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.

期刊最新文献

Gradual Domain Adaptation via Normalizing Flows Improving Recall in Sparse Associative Memories That Use Neurogenesis Replay as a Basis for Backpropagation Through Time in the Brain Toward a Free-Response Paradigm of Decision Making in Spiking Neural Networks Uncovering Dynamical Equations of Stochastic Decision Models Using Data-Driven SINDy Algorithm