-重尾/轻尾无限宽神经网络的稳定收敛

Pub Date : 2023-07-03 DOI:10.1017/apr.2023.3
Paul Jung, Hoileong Lee, Jiho Lee, Hongseok Yang
{"title":"-重尾/轻尾无限宽神经网络的稳定收敛","authors":"Paul Jung, Hoileong Lee, Jiho Lee, Hongseok Yang","doi":"10.1017/apr.2023.3","DOIUrl":null,"url":null,"abstract":"\n We consider infinitely wide multi-layer perceptrons (MLPs) which are limits of standard deep feed-forward neural networks. We assume that, for each layer, the weights of an MLP are initialized with independent and identically distributed (i.i.d.) samples from either a light-tailed (finite-variance) or a heavy-tailed distribution in the domain of attraction of a symmetric \n \n \n \n$\\alpha$\n\n \n -stable distribution, where \n \n \n \n$\\alpha\\in(0,2]$\n\n \n may depend on the layer. For the bias terms of the layer, we assume i.i.d. initializations with a symmetric \n \n \n \n$\\alpha$\n\n \n -stable distribution having the same \n \n \n \n$\\alpha$\n\n \n parameter as that layer. Non-stable heavy-tailed weight distributions are important since they have been empirically seen to emerge in trained deep neural nets such as the ResNet and VGG series, and proven to naturally arise via stochastic gradient descent. The introduction of heavy-tailed weights broadens the class of priors in Bayesian neural networks. In this work we extend a recent result of Favaro, Fortini, and Peluchetti (2020) to show that the vector of pre-activation values at all nodes of a given hidden layer converges in the limit, under a suitable scaling, to a vector of i.i.d. random variables with symmetric \n \n \n \n$\\alpha$\n\n \n -stable distributions, \n \n \n \n$\\alpha\\in(0,2]$\n\n \n .","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"-Stable convergence of heavy-/light-tailed infinitely wide neural networks\",\"authors\":\"Paul Jung, Hoileong Lee, Jiho Lee, Hongseok Yang\",\"doi\":\"10.1017/apr.2023.3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n We consider infinitely wide multi-layer perceptrons (MLPs) which are limits of standard deep feed-forward neural networks. We assume that, for each layer, the weights of an MLP are initialized with independent and identically distributed (i.i.d.) samples from either a light-tailed (finite-variance) or a heavy-tailed distribution in the domain of attraction of a symmetric \\n \\n \\n \\n$\\\\alpha$\\n\\n \\n -stable distribution, where \\n \\n \\n \\n$\\\\alpha\\\\in(0,2]$\\n\\n \\n may depend on the layer. For the bias terms of the layer, we assume i.i.d. initializations with a symmetric \\n \\n \\n \\n$\\\\alpha$\\n\\n \\n -stable distribution having the same \\n \\n \\n \\n$\\\\alpha$\\n\\n \\n parameter as that layer. Non-stable heavy-tailed weight distributions are important since they have been empirically seen to emerge in trained deep neural nets such as the ResNet and VGG series, and proven to naturally arise via stochastic gradient descent. The introduction of heavy-tailed weights broadens the class of priors in Bayesian neural networks. In this work we extend a recent result of Favaro, Fortini, and Peluchetti (2020) to show that the vector of pre-activation values at all nodes of a given hidden layer converges in the limit, under a suitable scaling, to a vector of i.i.d. random variables with symmetric \\n \\n \\n \\n$\\\\alpha$\\n\\n \\n -stable distributions, \\n \\n \\n \\n$\\\\alpha\\\\in(0,2]$\\n\\n \\n .\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2023-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1017/apr.2023.3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1017/apr.2023.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们考虑无限宽多层感知器(mlp),这是标准深度前馈神经网络的极限。我们假设,对于每一层,MLP的权重初始化为独立且同分布(i.i.d)的样本,这些样本来自对称$\alpha$ -稳定分布的吸引域中的轻尾(有限方差)或重尾分布,其中$\alpha\in(0,2]$可能取决于层。对于层的偏置项,我们假设具有对称$\alpha$稳定分布的i.i.d初始化具有与该层相同的$\alpha$参数。非稳定的重尾权重分布很重要,因为它们在ResNet和VGG系列等训练有素的深度神经网络中出现,并被证明是通过随机梯度下降自然产生的。重尾权值的引入拓宽了贝叶斯神经网络的先验类别。在这项工作中,我们扩展了Favaro, Fortini和Peluchetti(2020)的最新结果,表明在给定隐藏层的所有节点上的预激活值向量在适当的缩放下收敛到具有对称$\alpha$ -稳定分布$\alpha\ In(0,2]$的i.i.d随机变量向量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
-Stable convergence of heavy-/light-tailed infinitely wide neural networks
We consider infinitely wide multi-layer perceptrons (MLPs) which are limits of standard deep feed-forward neural networks. We assume that, for each layer, the weights of an MLP are initialized with independent and identically distributed (i.i.d.) samples from either a light-tailed (finite-variance) or a heavy-tailed distribution in the domain of attraction of a symmetric $\alpha$ -stable distribution, where $\alpha\in(0,2]$ may depend on the layer. For the bias terms of the layer, we assume i.i.d. initializations with a symmetric $\alpha$ -stable distribution having the same $\alpha$ parameter as that layer. Non-stable heavy-tailed weight distributions are important since they have been empirically seen to emerge in trained deep neural nets such as the ResNet and VGG series, and proven to naturally arise via stochastic gradient descent. The introduction of heavy-tailed weights broadens the class of priors in Bayesian neural networks. In this work we extend a recent result of Favaro, Fortini, and Peluchetti (2020) to show that the vector of pre-activation values at all nodes of a given hidden layer converges in the limit, under a suitable scaling, to a vector of i.i.d. random variables with symmetric $\alpha$ -stable distributions, $\alpha\in(0,2]$ .
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1