深度线性神经网络的无穷宽极限

IF 3.1 1区数学 Q1 MATHEMATICS Communications on Pure and Applied Mathematics Pub Date : 2024-05-06 DOI:10.1002/cpa.22200

Lénaïc Chizat, Maria Colombo, Xavier Fernández-Real, Alessio Figalli

{"title":"深度线性神经网络的无穷宽极限","authors":"Lénaïc Chizat, Maria Colombo, Xavier Fernández-Real, Alessio Figalli","doi":"10.1002/cpa.22200","DOIUrl":null,"url":null,"abstract":"This paper studies the infinite-width limit of deep linear neural networks (NNs) initialized with random parameters. We obtain that, when the number of parameters diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear NN. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of parameters. We finally study the continuous-time limit obtained for infinitely wide linear NNs and show that the linear predictors of the NN converge at an exponential rate to the minimal <math>\n <semantics>\n <msub>\n <mi>ℓ</mi>\n <mn>2</mn>\n </msub>\n <annotation>$\\ell _2$</annotation>\n </semantics></math>-norm minimizer of the risk.","PeriodicalId":10601,"journal":{"name":"Communications on Pure and Applied Mathematics","volume":"77 10","pages":"3958-4007"},"PeriodicalIF":3.1000,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpa.22200","citationCount":"0","resultStr":"{\"title\":\"Infinite-width limit of deep linear neural networks\",\"authors\":\"Lénaïc Chizat, Maria Colombo, Xavier Fernández-Real, Alessio Figalli\",\"doi\":\"10.1002/cpa.22200\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper studies the infinite-width limit of deep linear neural networks (NNs) initialized with random parameters. We obtain that, when the number of parameters diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear NN. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of parameters. We finally study the continuous-time limit obtained for infinitely wide linear NNs and show that the linear predictors of the NN converge at an exponential rate to the minimal <math>\\n <semantics>\\n <msub>\\n <mi>ℓ</mi>\\n <mn>2</mn>\\n </msub>\\n <annotation>$\\\\ell _2$</annotation>\\n </semantics></math>-norm minimizer of the risk.\",\"PeriodicalId\":10601,\"journal\":{\"name\":\"Communications on Pure and Applied Mathematics\",\"volume\":\"77 10\",\"pages\":\"3958-4007\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpa.22200\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications on Pure and Applied Mathematics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cpa.22200\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications on Pure and Applied Mathematics","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpa.22200","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS","Score":null,"Total":0}

引用次数: 0

摘要

本文研究了以随机参数初始化的深度线性神经网络（NN）的无限宽极限。我们发现，当参数数量发散时，训练动态（在精确意义上）会收敛到无限宽确定性线性神经网络的梯度下降动态。此外，即使权重仍然是随机的，我们也能沿着训练动态得到它们的精确规律，并证明了线性预测器在参数数量上的定量收敛结果。最后，我们研究了无限宽线性 NN 的连续时间极限，并证明 NN 的线性预测器以指数速度收敛到风险的最小正态最小化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Infinite-width limit of deep linear neural networks

This paper studies the infinite-width limit of deep linear neural networks (NNs) initialized with random parameters. We obtain that, when the number of parameters diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear NN. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of parameters. We finally study the continuous-time limit obtained for infinitely wide linear NNs and show that the linear predictors of the NN converge at an exponential rate to the minimal $ℓ_{2}$ -norm minimizer of the risk.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Communications on Pure and Applied Mathematics 数学-数学

CiteScore

6.70

自引率

3.30%

发文量

审稿时长

>12 weeks

期刊介绍： Communications on Pure and Applied Mathematics (ISSN 0010-3640) is published monthly, one volume per year, by John Wiley & Sons, Inc. © 2019. The journal primarily publishes papers originating at or solicited by the Courant Institute of Mathematical Sciences. It features recent developments in applied mathematics, mathematical physics, and mathematical analysis. The topics include partial differential equations, computer science, and applied mathematics. CPAM is devoted to mathematical contributions to the sciences; both theoretical and applied papers, of original or expository type, are included.

期刊最新文献

Issue Information - TOC First‐order Sobolev spaces, self‐similar energies and energy measures on the Sierpiński carpet On the Read‐Shockley energy for grain boundaries in 2D polycrystals Stability of perfectly matched layers for Maxwell's equations in rectangular solids Issue Information - TOC