Trainability of ReLU networks and Data-dependent Initialization.

Yeonjong Shin, G. Karniadakis
{"title":"Trainability of ReLU networks and Data-dependent Initialization.","authors":"Yeonjong Shin, G. Karniadakis","doi":"10.1615/.2020034126","DOIUrl":null,"url":null,"abstract":"In this paper, we study the trainability of rectified linear unit (ReLU) networks. A ReLU neuron is said to be dead if it only outputs a constant for any input. Two death states of neurons are introduced; tentative and permanent death. A network is then said to be trainable if the number of permanently dead neurons is sufficiently small for a learning task. We refer to the probability of a network being trainable as trainability. We show that a network being trainable is a necessary condition for successful training and the trainability serves as an upper bound of successful training rates. In order to quantify the trainability, we study the probability distribution of the number of active neurons at the initialization. In many applications, over-specified or over-parameterized neural networks are successfully employed and shown to be trained effectively. With the notion of trainability, we show that over-parameterization is both a necessary and a sufficient condition for minimizing the training loss. Furthermore, we propose a data-dependent initialization method in the over-parameterized setting. Numerical examples are provided to demonstrate the effectiveness of the method and our theoretical findings.","PeriodicalId":8468,"journal":{"name":"arXiv: Learning","volume":"28 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1615/.2020034126","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In this paper, we study the trainability of rectified linear unit (ReLU) networks. A ReLU neuron is said to be dead if it only outputs a constant for any input. Two death states of neurons are introduced; tentative and permanent death. A network is then said to be trainable if the number of permanently dead neurons is sufficiently small for a learning task. We refer to the probability of a network being trainable as trainability. We show that a network being trainable is a necessary condition for successful training and the trainability serves as an upper bound of successful training rates. In order to quantify the trainability, we study the probability distribution of the number of active neurons at the initialization. In many applications, over-specified or over-parameterized neural networks are successfully employed and shown to be trained effectively. With the notion of trainability, we show that over-parameterization is both a necessary and a sufficient condition for minimizing the training loss. Furthermore, we propose a data-dependent initialization method in the over-parameterized setting. Numerical examples are provided to demonstrate the effectiveness of the method and our theoretical findings.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ReLU网络的可训练性与数据依赖初始化。
本文研究了整流线性单元(ReLU)网络的可训练性问题。如果一个ReLU神经元对任何输入只输出一个常数,那么它就被认为是死的。介绍了神经元的两种死亡状态;暂时和永久的死亡。如果永久死亡神经元的数量对于学习任务来说足够小,那么我们就说这个网络是可训练的。我们把网络可训练的概率称为可训练性。我们证明了网络的可训练性是训练成功的必要条件,可训练性是训练成功率的上界。为了量化可训练性,我们研究了初始化时活动神经元数量的概率分布。在许多应用中,过度指定或过度参数化的神经网络被成功地应用并被证明是有效的训练。利用可训练性的概念,我们证明了过度参数化是最小化训练损失的充分必要条件。此外,我们提出了一种基于数据的过参数化初始化方法。数值算例验证了该方法的有效性和理论结论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
High dimensional robust M-estimation : arbitrary corruption and heavy tails Boosting share routing for multi-task learning. Clustering Residential Electricity Consumption Data to Create Archetypes that Capture Household Behaviour in South Africa Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins? A Review of Privacy-Preserving Federated Learning for the Internet-of-Things
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1