{"title":"固定宽度树状神经网络容量分析 -- 通用激活","authors":"Mihailo Stojnic","doi":"arxiv-2402.05696","DOIUrl":null,"url":null,"abstract":"We consider the capacity of \\emph{treelike committee machines} (TCM) neural\nnetworks. Relying on Random Duality Theory (RDT), \\cite{Stojnictcmspnncaprdt23}\nrecently introduced a generic framework for their capacity analysis. An upgrade\nbased on the so-called \\emph{partially lifted} RDT (pl RDT) was then presented\nin \\cite{Stojnictcmspnncapliftedrdt23}. Both lines of work focused on the\nnetworks with the most typical, \\emph{sign}, activations. Here, on the other\nhand, we focus on networks with other, more general, types of activations and\nshow that the frameworks of\n\\cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23} are sufficiently\npowerful to enable handling of such scenarios as well. In addition to the\nstandard \\emph{linear} activations, we uncover that particularly convenient\nresults can be obtained for two very commonly used activations, namely, the\n\\emph{quadratic} and \\emph{rectified linear unit (ReLU)} ones. In more concrete\nterms, for each of these activations, we obtain both the RDT and pl RDT based\nmemory capacities upper bound characterization for \\emph{any} given (even)\nnumber of the hidden layer neurons, $d$. In the process, we also uncover the\nfollowing two, rather remarkable, facts: 1) contrary to the common wisdom, both\nsets of results show that the bounding capacity decreases for large $d$ (the\nwidth of the hidden layer) while converging to a constant value; and 2) the\nmaximum bounding capacity is achieved for the networks with precisely\n\\textbf{\\emph{two}} hidden layer neurons! Moreover, the large $d$ converging\nvalues are observed to be in excellent agrement with the statistical physics\nreplica theory based predictions.","PeriodicalId":501433,"journal":{"name":"arXiv - CS - Information Theory","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fixed width treelike neural networks capacity analysis -- generic activations\",\"authors\":\"Mihailo Stojnic\",\"doi\":\"arxiv-2402.05696\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the capacity of \\\\emph{treelike committee machines} (TCM) neural\\nnetworks. Relying on Random Duality Theory (RDT), \\\\cite{Stojnictcmspnncaprdt23}\\nrecently introduced a generic framework for their capacity analysis. An upgrade\\nbased on the so-called \\\\emph{partially lifted} RDT (pl RDT) was then presented\\nin \\\\cite{Stojnictcmspnncapliftedrdt23}. Both lines of work focused on the\\nnetworks with the most typical, \\\\emph{sign}, activations. Here, on the other\\nhand, we focus on networks with other, more general, types of activations and\\nshow that the frameworks of\\n\\\\cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23} are sufficiently\\npowerful to enable handling of such scenarios as well. In addition to the\\nstandard \\\\emph{linear} activations, we uncover that particularly convenient\\nresults can be obtained for two very commonly used activations, namely, the\\n\\\\emph{quadratic} and \\\\emph{rectified linear unit (ReLU)} ones. In more concrete\\nterms, for each of these activations, we obtain both the RDT and pl RDT based\\nmemory capacities upper bound characterization for \\\\emph{any} given (even)\\nnumber of the hidden layer neurons, $d$. In the process, we also uncover the\\nfollowing two, rather remarkable, facts: 1) contrary to the common wisdom, both\\nsets of results show that the bounding capacity decreases for large $d$ (the\\nwidth of the hidden layer) while converging to a constant value; and 2) the\\nmaximum bounding capacity is achieved for the networks with precisely\\n\\\\textbf{\\\\emph{two}} hidden layer neurons! Moreover, the large $d$ converging\\nvalues are observed to be in excellent agrement with the statistical physics\\nreplica theory based predictions.\",\"PeriodicalId\":501433,\"journal\":{\"name\":\"arXiv - CS - Information Theory\",\"volume\":\"20 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2402.05696\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2402.05696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We consider the capacity of \emph{treelike committee machines} (TCM) neural
networks. Relying on Random Duality Theory (RDT), \cite{Stojnictcmspnncaprdt23}
recently introduced a generic framework for their capacity analysis. An upgrade
based on the so-called \emph{partially lifted} RDT (pl RDT) was then presented
in \cite{Stojnictcmspnncapliftedrdt23}. Both lines of work focused on the
networks with the most typical, \emph{sign}, activations. Here, on the other
hand, we focus on networks with other, more general, types of activations and
show that the frameworks of
\cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23} are sufficiently
powerful to enable handling of such scenarios as well. In addition to the
standard \emph{linear} activations, we uncover that particularly convenient
results can be obtained for two very commonly used activations, namely, the
\emph{quadratic} and \emph{rectified linear unit (ReLU)} ones. In more concrete
terms, for each of these activations, we obtain both the RDT and pl RDT based
memory capacities upper bound characterization for \emph{any} given (even)
number of the hidden layer neurons, $d$. In the process, we also uncover the
following two, rather remarkable, facts: 1) contrary to the common wisdom, both
sets of results show that the bounding capacity decreases for large $d$ (the
width of the hidden layer) while converging to a constant value; and 2) the
maximum bounding capacity is achieved for the networks with precisely
\textbf{\emph{two}} hidden layer neurons! Moreover, the large $d$ converging
values are observed to be in excellent agrement with the statistical physics
replica theory based predictions.