Obtaining Lower Query Complexities Through Lightweight Zeroth-Order Proximal Gradient Algorithms

IF 2.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Computation Pub Date : 2024-04-23 DOI:10.1162/neco_a_01636
Bin Gu;Xiyuan Wei;Hualin Zhang;Yi Chang;Heng Huang
{"title":"Obtaining Lower Query Complexities Through Lightweight Zeroth-Order Proximal Gradient Algorithms","authors":"Bin Gu;Xiyuan Wei;Hualin Zhang;Yi Chang;Heng Huang","doi":"10.1162/neco_a_01636","DOIUrl":null,"url":null,"abstract":"Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance, reduced ZO proximal algorithms have been proposed to speed up ZO optimization for nonsmooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces a larger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only O(1) computation, which is significantly less than O(d) computation of the coordinated ZO estimator, with d being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance-reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from Omindn1/2ε2,dε3 to O˜n+dε2 under d>n12 for nonconvex problems, and from Odε2 to O˜nlog1ε+dε for convex problems. Finally, we conduct experiments to verify the superiority of our proposed methods.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"897-935"},"PeriodicalIF":2.7000,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10535065/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance, reduced ZO proximal algorithms have been proposed to speed up ZO optimization for nonsmooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces a larger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only O(1) computation, which is significantly less than O(d) computation of the coordinated ZO estimator, with d being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance-reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from Omindn1/2ε2,dε3 to O˜n+dε2 under d>n12 for nonconvex problems, and from Odε2 to O˜nlog1ε+dε for convex problems. Finally, we conduct experiments to verify the superiority of our proposed methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过轻量级零阶近似梯度算法降低查询复杂度
对于梯度计算昂贵或无法实现的机器学习问题,零阶(ZO)优化是一项关键技术。为了加快非光滑问题的 ZO 优化速度,人们提出了几种方差缩小 ZO 近似算法,所有这些算法在逼近真实梯度时都选择了协调 ZO 估计器,而不是随机 ZO 估计器,因为前者更准确。虽然与协调 ZO 估计器相比,随机 ZO 估计器引入的误差更大,收敛分析更具挑战性,但它只需要 O(1) 计算量,明显少于协调 ZO 估计器的 O(d) 计算量(d 为问题空间的维数)。为了利用随机 ZO 估计器的高效计算特性,我们首先提出了一种 ZO 目标下降(ZOOD)特性,它可以将两种不同类型的误差纳入收敛速率的上限。接下来,我们提出了两种通用的 ZO 优化还原框架,只要内求解器的收敛速率满足 ZOOD 属性,它们就能分别自动推导出凸问题和非凸问题的收敛结果。在我们提出的 ZOR-ProxSVRG 和 ZOR-ProxSAGA 这两个具有全随机 ZO 估计子的方差降低 ZO 近似算法上应用了两个降低框架,我们将最先进的函数查询复杂度从 Omindn1/2ε2,dε3 提高到 O˜n+dε2(d>n12 时)(适用于非凸问题),并将凸问题的复杂度从 Odε2 提高到 O˜nlog1ε+dε。最后,我们通过实验验证了所提方法的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Neural Computation
Neural Computation 工程技术-计算机:人工智能
CiteScore
6.30
自引率
3.40%
发文量
83
审稿时长
3.0 months
期刊介绍: Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.
期刊最新文献
Realizing Synthetic Active Inference Agents, Part II: Variational Message Updates Bounded Rational Decision Networks With Belief Propagation Computation With Sequences of Assemblies in a Model of the Brain Relating Human Error–Based Learning to Modern Deep RL Algorithms Selective Inference for Change Point Detection by Recurrent Neural Network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1