Obtaining Lower Query Complexities Through Lightweight Zeroth-Order Proximal Gradient Algorithms

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Computation Pub Date : 2024-04-23 DOI:10.1162/neco_a_01636

Bin Gu;Xiyuan Wei;Hualin Zhang;Yi Chang;Heng Huang

{"title":"Obtaining Lower Query Complexities Through Lightweight Zeroth-Order Proximal Gradient Algorithms","authors":"Bin Gu;Xiyuan Wei;Hualin Zhang;Yi Chang;Heng Huang","doi":"10.1162/neco_a_01636","DOIUrl":null,"url":null,"abstract":"Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance, reduced ZO proximal algorithms have been proposed to speed up ZO optimization for nonsmooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces a larger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only O(1) computation, which is significantly less than O(d) computation of the coordinated ZO estimator, with d being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance-reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from Omindn1/2ε2,dε3 to O˜n+dε2 under d>n12 for nonconvex problems, and from Odε2 to O˜nlog1ε+dε for convex problems. Finally, we conduct experiments to verify the superiority of our proposed methods.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"897-935"},"PeriodicalIF":2.7000,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10535065/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance, reduced ZO proximal algorithms have been proposed to speed up ZO optimization for nonsmooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces a larger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only O(1) computation, which is significantly less than O(d) computation of the coordinated ZO estimator, with d being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance-reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from Omindn1/2ε2,dε3 to O˜n+dε2 under d>n12 for nonconvex problems, and from Odε2 to O˜nlog1ε+dε for convex problems. Finally, we conduct experiments to verify the superiority of our proposed methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过轻量级零阶近似梯度算法降低查询复杂度

对于梯度计算昂贵或无法实现的机器学习问题，零阶（ZO）优化是一项关键技术。为了加快非光滑问题的 ZO 优化速度，人们提出了几种方差缩小 ZO 近似算法，所有这些算法在逼近真实梯度时都选择了协调 ZO 估计器，而不是随机 ZO 估计器，因为前者更准确。虽然与协调 ZO 估计器相比，随机 ZO 估计器引入的误差更大，收敛分析更具挑战性，但它只需要 O(1) 计算量，明显少于协调 ZO 估计器的 O(d) 计算量（d 为问题空间的维数）。为了利用随机 ZO 估计器的高效计算特性，我们首先提出了一种 ZO 目标下降（ZOOD）特性，它可以将两种不同类型的误差纳入收敛速率的上限。接下来，我们提出了两种通用的 ZO 优化还原框架，只要内求解器的收敛速率满足 ZOOD 属性，它们就能分别自动推导出凸问题和非凸问题的收敛结果。在我们提出的 ZOR-ProxSVRG 和 ZOR-ProxSAGA 这两个具有全随机 ZO 估计子的方差降低 ZO 近似算法上应用了两个降低框架，我们将最先进的函数查询复杂度从 Omindn1/2ε2,dε3 提高到 O˜n+dε2（d>n12 时）（适用于非凸问题），并将凸问题的复杂度从 Odε2 提高到 O˜nlog1ε+dε。最后，我们通过实验验证了所提方法的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.

期刊最新文献

Gradual Domain Adaptation via Normalizing Flows. Improving Recall in Sparse Associative Memories That Use Neurogenesis. Replay as a Basis for Backpropagation Through Time in the Brain. Toward a Free-Response Paradigm of Decision Making in Spiking Neural Networks. Uncovering Dynamical Equations of Stochastic Decision Models Using Data-Driven SINDy Algorithm.