Lénaïc Chizat, Maria Colombo, Xavier Fernández-Real, Alessio Figalli
{"title":"Infinite-width limit of deep linear neural networks","authors":"Lénaïc Chizat, Maria Colombo, Xavier Fernández-Real, Alessio Figalli","doi":"10.1002/cpa.22200","DOIUrl":null,"url":null,"abstract":"<p>This paper studies the infinite-width limit of deep linear neural networks (NNs) initialized with random parameters. We obtain that, when the number of parameters diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear NN. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of parameters. We finally study the continuous-time limit obtained for infinitely wide linear NNs and show that the linear predictors of the NN converge at an exponential rate to the minimal <span></span><math>\n <semantics>\n <msub>\n <mi>ℓ</mi>\n <mn>2</mn>\n </msub>\n <annotation>$\\ell _2$</annotation>\n </semantics></math>-norm minimizer of the risk.</p>","PeriodicalId":10601,"journal":{"name":"Communications on Pure and Applied Mathematics","volume":"77 10","pages":"3958-4007"},"PeriodicalIF":3.1000,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpa.22200","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications on Pure and Applied Mathematics","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpa.22200","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper studies the infinite-width limit of deep linear neural networks (NNs) initialized with random parameters. We obtain that, when the number of parameters diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear NN. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of parameters. We finally study the continuous-time limit obtained for infinitely wide linear NNs and show that the linear predictors of the NN converge at an exponential rate to the minimal -norm minimizer of the risk.
本文研究了以随机参数初始化的深度线性神经网络(NN)的无限宽极限。我们发现,当参数数量发散时,训练动态(在精确意义上)会收敛到无限宽确定性线性神经网络的梯度下降动态。此外,即使权重仍然是随机的,我们也能沿着训练动态得到它们的精确规律,并证明了线性预测器在参数数量上的定量收敛结果。最后,我们研究了无限宽线性 NN 的连续时间极限,并证明 NN 的线性预测器以指数速度收敛到风险的最小正态最小化。