High-dimensional limit theorems for SGD: Effective dynamics and critical scaling

IF 2.7 1区 数学 Q1 MATHEMATICS Communications on Pure and Applied Mathematics Pub Date : 2023-10-04 DOI:10.1002/cpa.22169
Gérard Ben Arous, Reza Gheissari, Aukosh Jagannath
{"title":"High-dimensional limit theorems for SGD: Effective dynamics and critical scaling","authors":"Gérard Ben Arous,&nbsp;Reza Gheissari,&nbsp;Aukosh Jagannath","doi":"10.1002/cpa.22169","DOIUrl":null,"url":null,"abstract":"<p>We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in the high-dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite-dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step-size. It yields both ballistic (ODE) and diffusive (SDE) limits, with the limit depending dramatically on the former choices. We show a critical scaling regime for the step-size, below which the effective ballistic dynamics matches gradient flow for the population loss, but at which, a new correction term appears which changes the phase diagram. About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate. We demonstrate our approach on popular examples including estimation for spiked matrix and tensor models and classification via two-layer networks for binary and XOR-type Gaussian mixture models. These examples exhibit surprising phenomena including multimodal timescales to convergence as well as convergence to sub-optimal solutions with probability bounded away from zero from random (e.g., Gaussian) initializations. At the same time, we demonstrate the benefit of overparametrization by showing that the latter probability goes to zero as the second layer width grows.</p>","PeriodicalId":10601,"journal":{"name":"Communications on Pure and Applied Mathematics","volume":"77 3","pages":"2030-2080"},"PeriodicalIF":2.7000,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpa.22169","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications on Pure and Applied Mathematics","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpa.22169","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in the high-dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite-dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step-size. It yields both ballistic (ODE) and diffusive (SDE) limits, with the limit depending dramatically on the former choices. We show a critical scaling regime for the step-size, below which the effective ballistic dynamics matches gradient flow for the population loss, but at which, a new correction term appears which changes the phase diagram. About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate. We demonstrate our approach on popular examples including estimation for spiked matrix and tensor models and classification via two-layer networks for binary and XOR-type Gaussian mixture models. These examples exhibit surprising phenomena including multimodal timescales to convergence as well as convergence to sub-optimal solutions with probability bounded away from zero from random (e.g., Gaussian) initializations. At the same time, we demonstrate the benefit of overparametrization by showing that the latter probability goes to zero as the second layer width grows.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SGD的高维极限定理:有效动力学和临界标度
我们研究了高维区域中具有恒定步长的随机梯度下降(SGD)的标度极限。我们证明了SGD的汇总统计(即有限维函数)的轨迹在维数无穷大时的极限定理。我们的方法允许选择跟踪的汇总统计信息、初始化和步长。它产生了弹道(ODE)和扩散(SDE)极限,极限在很大程度上取决于前一种选择。我们展示了步长的临界标度制度,低于该制度,有效弹道动力学与种群损失的梯度流相匹配,但在该制度下,出现了一个新的校正项,它改变了相图。关于这种有效动力学的不动点,相应的扩散极限可能相当复杂,甚至退化。我们在流行的例子中展示了我们的方法,包括对尖峰矩阵和张量模型的估计,以及通过二元和XOR型高斯混合模型的两层网络进行分类。这些例子展示了令人惊讶的现象,包括收敛的多模式时间尺度,以及从随机(例如,高斯)初始化到概率为零的次优解的收敛。同时,我们通过表明后一种概率随着第二层宽度的增长而变为零来证明过帧化的好处。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.70
自引率
3.30%
发文量
59
审稿时长
>12 weeks
期刊介绍: Communications on Pure and Applied Mathematics (ISSN 0010-3640) is published monthly, one volume per year, by John Wiley & Sons, Inc. © 2019. The journal primarily publishes papers originating at or solicited by the Courant Institute of Mathematical Sciences. It features recent developments in applied mathematics, mathematical physics, and mathematical analysis. The topics include partial differential equations, computer science, and applied mathematics. CPAM is devoted to mathematical contributions to the sciences; both theoretical and applied papers, of original or expository type, are included.
期刊最新文献
Eventual regularization of fractional mean curvature flow A priori bounds for the generalised parabolic Anderson model 3‐Manifolds With Positive Scalar Curvature and Bounded Geometry Degree theory for 4‐dimensional asymptotically conical gradient expanding solitons Issue Information - TOC
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1