Exact capacity of the \emph{wide} hidden layer treelike neural networks with generic activations

Mihailo Stojnic
{"title":"Exact capacity of the \\emph{wide} hidden layer treelike neural networks with generic activations","authors":"Mihailo Stojnic","doi":"arxiv-2402.05719","DOIUrl":null,"url":null,"abstract":"Recent progress in studying \\emph{treelike committee machines} (TCM) neural\nnetworks (NN) in\n\\cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23,Stojnictcmspnncapdiffactrdt23}\nshowed that the Random Duality Theory (RDT) and its a \\emph{partially\nlifted}(pl RDT) variant are powerful tools that can be used for very precise\nnetworks capacity analysis. Here, we consider \\emph{wide} hidden layer networks\nand uncover that certain aspects of numerical difficulties faced in\n\\cite{Stojnictcmspnncapdiffactrdt23} miraculously disappear. In particular, we\nemploy recently developed \\emph{fully lifted} (fl) RDT to characterize the\n\\emph{wide} ($d\\rightarrow \\infty$) TCM nets capacity. We obtain explicit,\nclosed form, capacity characterizations for a very generic class of the hidden\nlayer activations. While the utilized approach significantly lowers the amount\nof the needed numerical evaluations, the ultimate fl RDT usefulness and success\nstill require a solid portion of the residual numerical work. To get the\nconcrete capacity values, we take four very famous activations examples:\n\\emph{\\textbf{ReLU}}, \\textbf{\\emph{quadratic}}, \\textbf{\\emph{erf}}, and\n\\textbf{\\emph{tanh}}. After successfully conducting all the residual numerical\nwork for all of them, we uncover that the whole lifting mechanism exhibits a\nremarkably rapid convergence with the relative improvements no better than\n$\\sim 0.1\\%$ happening already on the 3-rd level of lifting. As a convenient\nbonus, we also uncover that the capacity characterizations obtained on the\nfirst and second level of lifting precisely match those obtained through the\nstatistical physics replica theory methods in \\cite{ZavPeh21} for the generic\nand in \\cite{BalMalZech19} for the ReLU activations.","PeriodicalId":501433,"journal":{"name":"arXiv - CS - Information Theory","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2402.05719","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recent progress in studying \emph{treelike committee machines} (TCM) neural networks (NN) in \cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23,Stojnictcmspnncapdiffactrdt23} showed that the Random Duality Theory (RDT) and its a \emph{partially lifted}(pl RDT) variant are powerful tools that can be used for very precise networks capacity analysis. Here, we consider \emph{wide} hidden layer networks and uncover that certain aspects of numerical difficulties faced in \cite{Stojnictcmspnncapdiffactrdt23} miraculously disappear. In particular, we employ recently developed \emph{fully lifted} (fl) RDT to characterize the \emph{wide} ($d\rightarrow \infty$) TCM nets capacity. We obtain explicit, closed form, capacity characterizations for a very generic class of the hidden layer activations. While the utilized approach significantly lowers the amount of the needed numerical evaluations, the ultimate fl RDT usefulness and success still require a solid portion of the residual numerical work. To get the concrete capacity values, we take four very famous activations examples: \emph{\textbf{ReLU}}, \textbf{\emph{quadratic}}, \textbf{\emph{erf}}, and \textbf{\emph{tanh}}. After successfully conducting all the residual numerical work for all of them, we uncover that the whole lifting mechanism exhibits a remarkably rapid convergence with the relative improvements no better than $\sim 0.1\%$ happening already on the 3-rd level of lifting. As a convenient bonus, we also uncover that the capacity characterizations obtained on the first and second level of lifting precisely match those obtained through the statistical physics replica theory methods in \cite{ZavPeh21} for the generic and in \cite{BalMalZech19} for the ReLU activations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
具有通用激活的 \emph{wide} 隐藏层树状神经网络的精确容量
最近,Stojnictcmspnncaprdt23、Stojnictcmspnncapliftedrdt23、Stojnictcmspnncapdiffactrdt23} 中表明,随机对偶理论(RDT)及其变体(pl RDT)是可以用于非常精确的网络容量分析的强大工具。在这里,我们考虑到了\emph{wide}隐藏层网络,并发现在\cite{Stojnictcmspncapdiffactrdt23}中面临的某些方面的数值困难奇迹般地消失了。特别是,我们利用最近开发的全提升(fl)RDT来描述($d\rightarrow \infty$)中医网络的容量。我们获得了一类非常通用的隐藏层激活的显式、闭式容量特征。虽然所使用的方法大大降低了所需的数值评估量,但要最终实现 RDT 的实用性和成功,仍然需要大量的剩余数值工作。为了得到具体的容量值,我们举了四个非常著名的激活例子:emph{textbf{ReLU}}、textbf{emph{quadratic}}、textbf{emph{erf}}和textbf{emph{tanh}}。在成功地对它们进行了所有的残差数值计算后,我们发现整个提升机制表现出了明显的快速收敛性,在第 3 层提升时的相对改进不超过 0.1%。作为一个方便的奖励,我们还发现在第一级和第二级提升中获得的容量特征与通过统计物理复制理论方法获得的容量特征精确吻合,这些方法是在(cite{ZavPeh21}中针对通用的和在(cite{BalMalZech19}中针对ReLU激活的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massive MIMO CSI Feedback using Channel Prediction: How to Avoid Machine Learning at UE? Reverse em-problem based on Bregman divergence and its application to classical and quantum information theory From "um" to "yeah": Producing, predicting, and regulating information flow in human conversation Electrochemical Communication in Bacterial Biofilms: A Study on Potassium Stimulation and Signal Transmission Semantics-Empowered Space-Air-Ground-Sea Integrated Network: New Paradigm, Frameworks, and Challenges
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1