低阶梯度下降

Romain Cosson;Ali Jadbabaie;Anuran Makur;Amirhossein Reisizadeh;Devavrat Shah
{"title":"低阶梯度下降","authors":"Romain Cosson;Ali Jadbabaie;Anuran Makur;Amirhossein Reisizadeh;Devavrat Shah","doi":"10.1109/OJCSYS.2023.3315088","DOIUrl":null,"url":null,"abstract":"Several recent empirical studies demonstrate that important machine learning tasks such as training deep neural networks, exhibit a low-rank structure, where most of the variation in the loss function occurs only in a few directions of the input space. In this article, we leverage such low-rank structure to reduce the high computational cost of canonical gradient-based methods such as gradient descent (\n<monospace>GD</monospace>\n). Our proposed \n<italic>Low-Rank Gradient Descent</i>\n (\n<monospace>LRGD</monospace>\n) algorithm finds an \n<inline-formula><tex-math>$\\epsilon$</tex-math></inline-formula>\n-approximate stationary point of a \n<inline-formula><tex-math>$p$</tex-math></inline-formula>\n-dimensional function by first identifying \n<inline-formula><tex-math>$r \\leq p$</tex-math></inline-formula>\n significant directions, and then estimating the true \n<inline-formula><tex-math>$p$</tex-math></inline-formula>\n-dimensional gradient at every iteration by computing directional derivatives only along those \n<inline-formula><tex-math>$r$</tex-math></inline-formula>\n directions. We establish that the “directional oracle complexities” of \n<monospace>LRGD</monospace>\n for strongly convex and non-convex objective functions are \n<inline-formula><tex-math>${\\mathcal {O}}(r \\log (1/\\epsilon) + rp)$</tex-math></inline-formula>\n and \n<inline-formula><tex-math>${\\mathcal {O}}(r/\\epsilon ^{2} + rp)$</tex-math></inline-formula>\n, respectively. Therefore, when \n<inline-formula><tex-math>$r \\ll p$</tex-math></inline-formula>\n, \n<monospace>LRGD</monospace>\n provides significant improvement over the known complexities of \n<inline-formula><tex-math>${\\mathcal {O}}(p \\log (1/\\epsilon))$</tex-math></inline-formula>\n and \n<inline-formula><tex-math>${\\mathcal {O}}(p/\\epsilon ^{2})$</tex-math></inline-formula>\n of \n<monospace>GD</monospace>\n in the strongly convex and non-convex settings, respectively. Furthermore, we formally characterize the classes of exactly and approximately low-rank functions. Empirically, using real and synthetic data, \n<monospace>LRGD</monospace>\n provides significant gains over \n<monospace>GD</monospace>\n when the data has low-rank structure, and in the absence of such structure, \n<monospace>LRGD</monospace>\n does not degrade performance compared to \n<monospace>GD</monospace>\n. This suggests that \n<monospace>LRGD</monospace>\n could be used in practice in any setting in place of \n<monospace>GD</monospace>\n.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"380-395"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10250907.pdf","citationCount":"0","resultStr":"{\"title\":\"Low-Rank Gradient Descent\",\"authors\":\"Romain Cosson;Ali Jadbabaie;Anuran Makur;Amirhossein Reisizadeh;Devavrat Shah\",\"doi\":\"10.1109/OJCSYS.2023.3315088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Several recent empirical studies demonstrate that important machine learning tasks such as training deep neural networks, exhibit a low-rank structure, where most of the variation in the loss function occurs only in a few directions of the input space. In this article, we leverage such low-rank structure to reduce the high computational cost of canonical gradient-based methods such as gradient descent (\\n<monospace>GD</monospace>\\n). Our proposed \\n<italic>Low-Rank Gradient Descent</i>\\n (\\n<monospace>LRGD</monospace>\\n) algorithm finds an \\n<inline-formula><tex-math>$\\\\epsilon$</tex-math></inline-formula>\\n-approximate stationary point of a \\n<inline-formula><tex-math>$p$</tex-math></inline-formula>\\n-dimensional function by first identifying \\n<inline-formula><tex-math>$r \\\\leq p$</tex-math></inline-formula>\\n significant directions, and then estimating the true \\n<inline-formula><tex-math>$p$</tex-math></inline-formula>\\n-dimensional gradient at every iteration by computing directional derivatives only along those \\n<inline-formula><tex-math>$r$</tex-math></inline-formula>\\n directions. We establish that the “directional oracle complexities” of \\n<monospace>LRGD</monospace>\\n for strongly convex and non-convex objective functions are \\n<inline-formula><tex-math>${\\\\mathcal {O}}(r \\\\log (1/\\\\epsilon) + rp)$</tex-math></inline-formula>\\n and \\n<inline-formula><tex-math>${\\\\mathcal {O}}(r/\\\\epsilon ^{2} + rp)$</tex-math></inline-formula>\\n, respectively. Therefore, when \\n<inline-formula><tex-math>$r \\\\ll p$</tex-math></inline-formula>\\n, \\n<monospace>LRGD</monospace>\\n provides significant improvement over the known complexities of \\n<inline-formula><tex-math>${\\\\mathcal {O}}(p \\\\log (1/\\\\epsilon))$</tex-math></inline-formula>\\n and \\n<inline-formula><tex-math>${\\\\mathcal {O}}(p/\\\\epsilon ^{2})$</tex-math></inline-formula>\\n of \\n<monospace>GD</monospace>\\n in the strongly convex and non-convex settings, respectively. Furthermore, we formally characterize the classes of exactly and approximately low-rank functions. Empirically, using real and synthetic data, \\n<monospace>LRGD</monospace>\\n provides significant gains over \\n<monospace>GD</monospace>\\n when the data has low-rank structure, and in the absence of such structure, \\n<monospace>LRGD</monospace>\\n does not degrade performance compared to \\n<monospace>GD</monospace>\\n. This suggests that \\n<monospace>LRGD</monospace>\\n could be used in practice in any setting in place of \\n<monospace>GD</monospace>\\n.\",\"PeriodicalId\":73299,\"journal\":{\"name\":\"IEEE open journal of control systems\",\"volume\":\"2 \",\"pages\":\"380-395\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/iel7/9552933/9973428/10250907.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of control systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10250907/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of control systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10250907/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

最近的几项实证研究表明,重要的机器学习任务,如训练深度神经网络,表现出低秩结构,其中损失函数的大部分变化仅发生在输入空间的几个方向上。在本文中,我们利用这种低秩结构来降低基于正则梯度的方法(如梯度下降(GD))的高计算成本。我们提出的低阶梯度下降(LRGD)算法通过首先识别$r\leq-p$有效方向,然后通过仅沿$r$方向计算方向导数来估计每次迭代时的真实$p$维梯度,从而找到$p$维度函数的$\epsilon$近似平稳点。我们确定了强凸和非凸目标函数的LRGD的“方向预言复杂性”分别为${\mathcal{O}}(r\log(1/\epsilon)+rp)$和${\ mathcal{O}}(r/\epsilon^{2}+rp)$。因此,当$r\ll p$时,LRGD在强凸和非凸设置中分别提供了对GD的已知复杂性${\mathcal{O}}(p\log(1/\epsilon))$和${\ mathcal{O}}(p/\epsilon^{2})$的显著改进。此外,我们形式化地刻画了精确和近似低秩函数的类。从经验上讲,使用真实和合成数据,当数据具有低秩结构时,LRGD比GD提供了显著的增益,并且在没有这种结构的情况下,与GD相比,LRGD不会降低性能。这表明LRGD可以在任何情况下代替GD在实践中使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Low-Rank Gradient Descent
Several recent empirical studies demonstrate that important machine learning tasks such as training deep neural networks, exhibit a low-rank structure, where most of the variation in the loss function occurs only in a few directions of the input space. In this article, we leverage such low-rank structure to reduce the high computational cost of canonical gradient-based methods such as gradient descent ( GD ). Our proposed Low-Rank Gradient Descent ( LRGD ) algorithm finds an $\epsilon$ -approximate stationary point of a $p$ -dimensional function by first identifying $r \leq p$ significant directions, and then estimating the true $p$ -dimensional gradient at every iteration by computing directional derivatives only along those $r$ directions. We establish that the “directional oracle complexities” of LRGD for strongly convex and non-convex objective functions are ${\mathcal {O}}(r \log (1/\epsilon) + rp)$ and ${\mathcal {O}}(r/\epsilon ^{2} + rp)$ , respectively. Therefore, when $r \ll p$ , LRGD provides significant improvement over the known complexities of ${\mathcal {O}}(p \log (1/\epsilon))$ and ${\mathcal {O}}(p/\epsilon ^{2})$ of GD in the strongly convex and non-convex settings, respectively. Furthermore, we formally characterize the classes of exactly and approximately low-rank functions. Empirically, using real and synthetic data, LRGD provides significant gains over GD when the data has low-rank structure, and in the absence of such structure, LRGD does not degrade performance compared to GD . This suggests that LRGD could be used in practice in any setting in place of GD .
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Erratum to “Learning to Boost the Performance of Stable Nonlinear Systems” Generalizing Robust Control Barrier Functions From a Controller Design Perspective 2024 Index IEEE Open Journal of Control Systems Vol. 3 Front Cover Table of Contents
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1