Exact capacity of the \emph{wide} hidden layer treelike neural networks with generic activations

arXiv - CS - Information Theory Pub Date : 2024-02-08 DOI:arxiv-2402.05719

Mihailo Stojnic

{"title":"Exact capacity of the \\emph{wide} hidden layer treelike neural networks with generic activations","authors":"Mihailo Stojnic","doi":"arxiv-2402.05719","DOIUrl":null,"url":null,"abstract":"Recent progress in studying \\emph{treelike committee machines} (TCM) neural\nnetworks (NN) in\n\\cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23,Stojnictcmspnncapdiffactrdt23}\nshowed that the Random Duality Theory (RDT) and its a \\emph{partially\nlifted}(pl RDT) variant are powerful tools that can be used for very precise\nnetworks capacity analysis. Here, we consider \\emph{wide} hidden layer networks\nand uncover that certain aspects of numerical difficulties faced in\n\\cite{Stojnictcmspnncapdiffactrdt23} miraculously disappear. In particular, we\nemploy recently developed \\emph{fully lifted} (fl) RDT to characterize the\n\\emph{wide} ($d\\rightarrow \\infty$) TCM nets capacity. We obtain explicit,\nclosed form, capacity characterizations for a very generic class of the hidden\nlayer activations. While the utilized approach significantly lowers the amount\nof the needed numerical evaluations, the ultimate fl RDT usefulness and success\nstill require a solid portion of the residual numerical work. To get the\nconcrete capacity values, we take four very famous activations examples:\n\\emph{\\textbf{ReLU}}, \\textbf{\\emph{quadratic}}, \\textbf{\\emph{erf}}, and\n\\textbf{\\emph{tanh}}. After successfully conducting all the residual numerical\nwork for all of them, we uncover that the whole lifting mechanism exhibits a\nremarkably rapid convergence with the relative improvements no better than\n$\\sim 0.1\\%$ happening already on the 3-rd level of lifting. As a convenient\nbonus, we also uncover that the capacity characterizations obtained on the\nfirst and second level of lifting precisely match those obtained through the\nstatistical physics replica theory methods in \\cite{ZavPeh21} for the generic\nand in \\cite{BalMalZech19} for the ReLU activations.","PeriodicalId":501433,"journal":{"name":"arXiv - CS - Information Theory","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2402.05719","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent progress in studying \emph{treelike committee machines} (TCM) neural networks (NN) in \cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23,Stojnictcmspnncapdiffactrdt23} showed that the Random Duality Theory (RDT) and its a \emph{partially lifted}(pl RDT) variant are powerful tools that can be used for very precise networks capacity analysis. Here, we consider \emph{wide} hidden layer networks and uncover that certain aspects of numerical difficulties faced in \cite{Stojnictcmspnncapdiffactrdt23} miraculously disappear. In particular, we employ recently developed \emph{fully lifted} (fl) RDT to characterize the \emph{wide} ($d\rightarrow \infty$) TCM nets capacity. We obtain explicit, closed form, capacity characterizations for a very generic class of the hidden layer activations. While the utilized approach significantly lowers the amount of the needed numerical evaluations, the ultimate fl RDT usefulness and success still require a solid portion of the residual numerical work. To get the concrete capacity values, we take four very famous activations examples: \emph{\textbf{ReLU}}, \textbf{\emph{quadratic}}, \textbf{\emph{erf}}, and \textbf{\emph{tanh}}. After successfully conducting all the residual numerical work for all of them, we uncover that the whole lifting mechanism exhibits a remarkably rapid convergence with the relative improvements no better than $\sim 0.1\%$ happening already on the 3-rd level of lifting. As a convenient bonus, we also uncover that the capacity characterizations obtained on the first and second level of lifting precisely match those obtained through the statistical physics replica theory methods in \cite{ZavPeh21} for the generic and in \cite{BalMalZech19} for the ReLU activations.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有通用激活的 \emph{wide} 隐藏层树状神经网络的精确容量

最近，Stojnictcmspnncaprdt23、Stojnictcmspnncapliftedrdt23、Stojnictcmspnncapdiffactrdt23} 中表明，随机对偶理论（RDT）及其变体（pl RDT）是可以用于非常精确的网络容量分析的强大工具。在这里，我们考虑到了\emph{wide}隐藏层网络，并发现在\cite{Stojnictcmspncapdiffactrdt23}中面临的某些方面的数值困难奇迹般地消失了。特别是，我们利用最近开发的全提升（fl）RDT来描述（$d\rightarrow \infty$）中医网络的容量。我们获得了一类非常通用的隐藏层激活的显式、闭式容量特征。虽然所使用的方法大大降低了所需的数值评估量，但要最终实现 RDT 的实用性和成功，仍然需要大量的剩余数值工作。为了得到具体的容量值，我们举了四个非常著名的激活例子：emph{textbf{ReLU}}、textbf{emph{quadratic}}、textbf{emph{erf}}和textbf{emph{tanh}}。在成功地对它们进行了所有的残差数值计算后，我们发现整个提升机制表现出了明显的快速收敛性，在第 3 层提升时的相对改进不超过 0.1%。作为一个方便的奖励，我们还发现在第一级和第二级提升中获得的容量特征与通过统计物理复制理论方法获得的容量特征精确吻合，这些方法是在（cite{ZavPeh21}中针对通用的和在（cite{BalMalZech19}中针对ReLU激活的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Information Theory

自引率

0.00%

发文量