High-dimensional limit theorems for SGD: Effective dynamics and critical scaling

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC ACS Applied Electronic Materials Pub Date : 2023-10-04 DOI:10.1002/cpa.22169
Gérard Ben Arous, Reza Gheissari, Aukosh Jagannath
{"title":"High-dimensional limit theorems for SGD: Effective dynamics and critical scaling","authors":"Gérard Ben Arous,&nbsp;Reza Gheissari,&nbsp;Aukosh Jagannath","doi":"10.1002/cpa.22169","DOIUrl":null,"url":null,"abstract":"<p>We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in the high-dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite-dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step-size. It yields both ballistic (ODE) and diffusive (SDE) limits, with the limit depending dramatically on the former choices. We show a critical scaling regime for the step-size, below which the effective ballistic dynamics matches gradient flow for the population loss, but at which, a new correction term appears which changes the phase diagram. About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate. We demonstrate our approach on popular examples including estimation for spiked matrix and tensor models and classification via two-layer networks for binary and XOR-type Gaussian mixture models. These examples exhibit surprising phenomena including multimodal timescales to convergence as well as convergence to sub-optimal solutions with probability bounded away from zero from random (e.g., Gaussian) initializations. At the same time, we demonstrate the benefit of overparametrization by showing that the latter probability goes to zero as the second layer width grows.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpa.22169","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpa.22169","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in the high-dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite-dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step-size. It yields both ballistic (ODE) and diffusive (SDE) limits, with the limit depending dramatically on the former choices. We show a critical scaling regime for the step-size, below which the effective ballistic dynamics matches gradient flow for the population loss, but at which, a new correction term appears which changes the phase diagram. About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate. We demonstrate our approach on popular examples including estimation for spiked matrix and tensor models and classification via two-layer networks for binary and XOR-type Gaussian mixture models. These examples exhibit surprising phenomena including multimodal timescales to convergence as well as convergence to sub-optimal solutions with probability bounded away from zero from random (e.g., Gaussian) initializations. At the same time, we demonstrate the benefit of overparametrization by showing that the latter probability goes to zero as the second layer width grows.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SGD的高维极限定理:有效动力学和临界标度
我们研究了高维区域中具有恒定步长的随机梯度下降(SGD)的标度极限。我们证明了SGD的汇总统计(即有限维函数)的轨迹在维数无穷大时的极限定理。我们的方法允许选择跟踪的汇总统计信息、初始化和步长。它产生了弹道(ODE)和扩散(SDE)极限,极限在很大程度上取决于前一种选择。我们展示了步长的临界标度制度,低于该制度,有效弹道动力学与种群损失的梯度流相匹配,但在该制度下,出现了一个新的校正项,它改变了相图。关于这种有效动力学的不动点,相应的扩散极限可能相当复杂,甚至退化。我们在流行的例子中展示了我们的方法,包括对尖峰矩阵和张量模型的估计,以及通过二元和XOR型高斯混合模型的两层网络进行分类。这些例子展示了令人惊讶的现象,包括收敛的多模式时间尺度,以及从随机(例如,高斯)初始化到概率为零的次优解的收敛。同时,我们通过表明后一种概率随着第二层宽度的增长而变为零来证明过帧化的好处。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
期刊最新文献
Hyperbaric oxygen treatment promotes tendon-bone interface healing in a rabbit model of rotator cuff tears. Oxygen-ozone therapy for myocardial ischemic stroke and cardiovascular disorders. Comparative study on the anti-inflammatory and protective effects of different oxygen therapy regimens on lipopolysaccharide-induced acute lung injury in mice. Heme oxygenase/carbon monoxide system and development of the heart. Hyperbaric oxygen for moderate-to-severe traumatic brain injury: outcomes 5-8 years after injury.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1