Low-Rank Gradient Descent

Romain Cosson;Ali Jadbabaie;Anuran Makur;Amirhossein Reisizadeh;Devavrat Shah
{"title":"Low-Rank Gradient Descent","authors":"Romain Cosson;Ali Jadbabaie;Anuran Makur;Amirhossein Reisizadeh;Devavrat Shah","doi":"10.1109/OJCSYS.2023.3315088","DOIUrl":null,"url":null,"abstract":"Several recent empirical studies demonstrate that important machine learning tasks such as training deep neural networks, exhibit a low-rank structure, where most of the variation in the loss function occurs only in a few directions of the input space. In this article, we leverage such low-rank structure to reduce the high computational cost of canonical gradient-based methods such as gradient descent (\n<monospace>GD</monospace>\n). Our proposed \n<italic>Low-Rank Gradient Descent</i>\n (\n<monospace>LRGD</monospace>\n) algorithm finds an \n<inline-formula><tex-math>$\\epsilon$</tex-math></inline-formula>\n-approximate stationary point of a \n<inline-formula><tex-math>$p$</tex-math></inline-formula>\n-dimensional function by first identifying \n<inline-formula><tex-math>$r \\leq p$</tex-math></inline-formula>\n significant directions, and then estimating the true \n<inline-formula><tex-math>$p$</tex-math></inline-formula>\n-dimensional gradient at every iteration by computing directional derivatives only along those \n<inline-formula><tex-math>$r$</tex-math></inline-formula>\n directions. We establish that the “directional oracle complexities” of \n<monospace>LRGD</monospace>\n for strongly convex and non-convex objective functions are \n<inline-formula><tex-math>${\\mathcal {O}}(r \\log (1/\\epsilon) + rp)$</tex-math></inline-formula>\n and \n<inline-formula><tex-math>${\\mathcal {O}}(r/\\epsilon ^{2} + rp)$</tex-math></inline-formula>\n, respectively. Therefore, when \n<inline-formula><tex-math>$r \\ll p$</tex-math></inline-formula>\n, \n<monospace>LRGD</monospace>\n provides significant improvement over the known complexities of \n<inline-formula><tex-math>${\\mathcal {O}}(p \\log (1/\\epsilon))$</tex-math></inline-formula>\n and \n<inline-formula><tex-math>${\\mathcal {O}}(p/\\epsilon ^{2})$</tex-math></inline-formula>\n of \n<monospace>GD</monospace>\n in the strongly convex and non-convex settings, respectively. Furthermore, we formally characterize the classes of exactly and approximately low-rank functions. Empirically, using real and synthetic data, \n<monospace>LRGD</monospace>\n provides significant gains over \n<monospace>GD</monospace>\n when the data has low-rank structure, and in the absence of such structure, \n<monospace>LRGD</monospace>\n does not degrade performance compared to \n<monospace>GD</monospace>\n. This suggests that \n<monospace>LRGD</monospace>\n could be used in practice in any setting in place of \n<monospace>GD</monospace>\n.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"380-395"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10250907.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of control systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10250907/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Several recent empirical studies demonstrate that important machine learning tasks such as training deep neural networks, exhibit a low-rank structure, where most of the variation in the loss function occurs only in a few directions of the input space. In this article, we leverage such low-rank structure to reduce the high computational cost of canonical gradient-based methods such as gradient descent ( GD ). Our proposed Low-Rank Gradient Descent ( LRGD ) algorithm finds an $\epsilon$ -approximate stationary point of a $p$ -dimensional function by first identifying $r \leq p$ significant directions, and then estimating the true $p$ -dimensional gradient at every iteration by computing directional derivatives only along those $r$ directions. We establish that the “directional oracle complexities” of LRGD for strongly convex and non-convex objective functions are ${\mathcal {O}}(r \log (1/\epsilon) + rp)$ and ${\mathcal {O}}(r/\epsilon ^{2} + rp)$ , respectively. Therefore, when $r \ll p$ , LRGD provides significant improvement over the known complexities of ${\mathcal {O}}(p \log (1/\epsilon))$ and ${\mathcal {O}}(p/\epsilon ^{2})$ of GD in the strongly convex and non-convex settings, respectively. Furthermore, we formally characterize the classes of exactly and approximately low-rank functions. Empirically, using real and synthetic data, LRGD provides significant gains over GD when the data has low-rank structure, and in the absence of such structure, LRGD does not degrade performance compared to GD . This suggests that LRGD could be used in practice in any setting in place of GD .
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
低阶梯度下降
最近的几项实证研究表明,重要的机器学习任务,如训练深度神经网络,表现出低秩结构,其中损失函数的大部分变化仅发生在输入空间的几个方向上。在本文中,我们利用这种低秩结构来降低基于正则梯度的方法(如梯度下降(GD))的高计算成本。我们提出的低阶梯度下降(LRGD)算法通过首先识别$r\leq-p$有效方向,然后通过仅沿$r$方向计算方向导数来估计每次迭代时的真实$p$维梯度,从而找到$p$维度函数的$\epsilon$近似平稳点。我们确定了强凸和非凸目标函数的LRGD的“方向预言复杂性”分别为${\mathcal{O}}(r\log(1/\epsilon)+rp)$和${\ mathcal{O}}(r/\epsilon^{2}+rp)$。因此,当$r\ll p$时,LRGD在强凸和非凸设置中分别提供了对GD的已知复杂性${\mathcal{O}}(p\log(1/\epsilon))$和${\ mathcal{O}}(p/\epsilon^{2})$的显著改进。此外,我们形式化地刻画了精确和近似低秩函数的类。从经验上讲,使用真实和合成数据,当数据具有低秩结构时,LRGD比GD提供了显著的增益,并且在没有这种结构的情况下,与GD相比,LRGD不会降低性能。这表明LRGD可以在任何情况下代替GD在实践中使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Erratum to “Learning to Boost the Performance of Stable Nonlinear Systems” Generalizing Robust Control Barrier Functions From a Controller Design Perspective 2024 Index IEEE Open Journal of Control Systems Vol. 3 Front Cover Table of Contents
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1