Lénaïc Chizat, Maria Colombo, Xavier Fernández-Real, Alessio Figalli
{"title":"深度线性神经网络的无穷宽极限","authors":"Lénaïc Chizat, Maria Colombo, Xavier Fernández-Real, Alessio Figalli","doi":"10.1002/cpa.22200","DOIUrl":null,"url":null,"abstract":"<p>This paper studies the infinite-width limit of deep linear neural networks (NNs) initialized with random parameters. We obtain that, when the number of parameters diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear NN. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of parameters. We finally study the continuous-time limit obtained for infinitely wide linear NNs and show that the linear predictors of the NN converge at an exponential rate to the minimal <span></span><math>\n <semantics>\n <msub>\n <mi>ℓ</mi>\n <mn>2</mn>\n </msub>\n <annotation>$\\ell _2$</annotation>\n </semantics></math>-norm minimizer of the risk.</p>","PeriodicalId":10601,"journal":{"name":"Communications on Pure and Applied Mathematics","volume":"77 10","pages":"3958-4007"},"PeriodicalIF":3.1000,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpa.22200","citationCount":"0","resultStr":"{\"title\":\"Infinite-width limit of deep linear neural networks\",\"authors\":\"Lénaïc Chizat, Maria Colombo, Xavier Fernández-Real, Alessio Figalli\",\"doi\":\"10.1002/cpa.22200\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>This paper studies the infinite-width limit of deep linear neural networks (NNs) initialized with random parameters. We obtain that, when the number of parameters diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear NN. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of parameters. We finally study the continuous-time limit obtained for infinitely wide linear NNs and show that the linear predictors of the NN converge at an exponential rate to the minimal <span></span><math>\\n <semantics>\\n <msub>\\n <mi>ℓ</mi>\\n <mn>2</mn>\\n </msub>\\n <annotation>$\\\\ell _2$</annotation>\\n </semantics></math>-norm minimizer of the risk.</p>\",\"PeriodicalId\":10601,\"journal\":{\"name\":\"Communications on Pure and Applied Mathematics\",\"volume\":\"77 10\",\"pages\":\"3958-4007\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpa.22200\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications on Pure and Applied Mathematics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cpa.22200\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications on Pure and Applied Mathematics","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpa.22200","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 0
摘要
本文研究了以随机参数初始化的深度线性神经网络(NN)的无限宽极限。我们发现,当参数数量发散时,训练动态(在精确意义上)会收敛到无限宽确定性线性神经网络的梯度下降动态。此外,即使权重仍然是随机的,我们也能沿着训练动态得到它们的精确规律,并证明了线性预测器在参数数量上的定量收敛结果。最后,我们研究了无限宽线性 NN 的连续时间极限,并证明 NN 的线性预测器以指数速度收敛到风险的最小正态最小化。
Infinite-width limit of deep linear neural networks
This paper studies the infinite-width limit of deep linear neural networks (NNs) initialized with random parameters. We obtain that, when the number of parameters diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear NN. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of parameters. We finally study the continuous-time limit obtained for infinitely wide linear NNs and show that the linear predictors of the NN converge at an exponential rate to the minimal -norm minimizer of the risk.