{"title":"Exact capacity of the \\emph{wide} hidden layer treelike neural networks with generic activations","authors":"Mihailo Stojnic","doi":"arxiv-2402.05719","DOIUrl":null,"url":null,"abstract":"Recent progress in studying \\emph{treelike committee machines} (TCM) neural\nnetworks (NN) in\n\\cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23,Stojnictcmspnncapdiffactrdt23}\nshowed that the Random Duality Theory (RDT) and its a \\emph{partially\nlifted}(pl RDT) variant are powerful tools that can be used for very precise\nnetworks capacity analysis. Here, we consider \\emph{wide} hidden layer networks\nand uncover that certain aspects of numerical difficulties faced in\n\\cite{Stojnictcmspnncapdiffactrdt23} miraculously disappear. In particular, we\nemploy recently developed \\emph{fully lifted} (fl) RDT to characterize the\n\\emph{wide} ($d\\rightarrow \\infty$) TCM nets capacity. We obtain explicit,\nclosed form, capacity characterizations for a very generic class of the hidden\nlayer activations. While the utilized approach significantly lowers the amount\nof the needed numerical evaluations, the ultimate fl RDT usefulness and success\nstill require a solid portion of the residual numerical work. To get the\nconcrete capacity values, we take four very famous activations examples:\n\\emph{\\textbf{ReLU}}, \\textbf{\\emph{quadratic}}, \\textbf{\\emph{erf}}, and\n\\textbf{\\emph{tanh}}. After successfully conducting all the residual numerical\nwork for all of them, we uncover that the whole lifting mechanism exhibits a\nremarkably rapid convergence with the relative improvements no better than\n$\\sim 0.1\\%$ happening already on the 3-rd level of lifting. As a convenient\nbonus, we also uncover that the capacity characterizations obtained on the\nfirst and second level of lifting precisely match those obtained through the\nstatistical physics replica theory methods in \\cite{ZavPeh21} for the generic\nand in \\cite{BalMalZech19} for the ReLU activations.","PeriodicalId":501433,"journal":{"name":"arXiv - CS - Information Theory","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2402.05719","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recent progress in studying \emph{treelike committee machines} (TCM) neural
networks (NN) in
\cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23,Stojnictcmspnncapdiffactrdt23}
showed that the Random Duality Theory (RDT) and its a \emph{partially
lifted}(pl RDT) variant are powerful tools that can be used for very precise
networks capacity analysis. Here, we consider \emph{wide} hidden layer networks
and uncover that certain aspects of numerical difficulties faced in
\cite{Stojnictcmspnncapdiffactrdt23} miraculously disappear. In particular, we
employ recently developed \emph{fully lifted} (fl) RDT to characterize the
\emph{wide} ($d\rightarrow \infty$) TCM nets capacity. We obtain explicit,
closed form, capacity characterizations for a very generic class of the hidden
layer activations. While the utilized approach significantly lowers the amount
of the needed numerical evaluations, the ultimate fl RDT usefulness and success
still require a solid portion of the residual numerical work. To get the
concrete capacity values, we take four very famous activations examples:
\emph{\textbf{ReLU}}, \textbf{\emph{quadratic}}, \textbf{\emph{erf}}, and
\textbf{\emph{tanh}}. After successfully conducting all the residual numerical
work for all of them, we uncover that the whole lifting mechanism exhibits a
remarkably rapid convergence with the relative improvements no better than
$\sim 0.1\%$ happening already on the 3-rd level of lifting. As a convenient
bonus, we also uncover that the capacity characterizations obtained on the
first and second level of lifting precisely match those obtained through the
statistical physics replica theory methods in \cite{ZavPeh21} for the generic
and in \cite{BalMalZech19} for the ReLU activations.