{"title":"Trade-Offs Between Energy and Depth of Neural Networks","authors":"Kei Uchizawa;Haruki Abe","doi":"10.1162/neco_a_01683","DOIUrl":null,"url":null,"abstract":"We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources—size (the number of gates), depth (the number of layers), weight (weight resolution), and energy—where the energy is a complexity measure inspired by sparse coding and is defined as the maximum number of gates outputting nonzero values, taken over all the input assignments. As our main result, we prove that if a threshold circuit C of size s, depth d, energy e, and weight w computes a Boolean function f (i.e., a classification task) of n variables, it holds that log( rk (f))≤ed(logs+logw+logn) regardless of the algorithm employed by C to compute f, where rk (f) is a parameter solely determined by a scale of f and defined as the maximum rank of a communication matrix with regard to f taken over all the possible partitions of the n input variables. For example, given a Boolean function CD n(ξ) = ⋁i=1n/2ξi∧ξn/2+i, we can prove that n/2≤ed( log s+logw+logn) holds for any circuit C computing CD n. While its left-hand side is linear in n, its right-hand side is bounded by the product of the logarithmic factors of s,w,n and the linear factors of d,e. If we view the logarithmic terms as having a negligible impact on the bound, our result implies a trade-off between depth and energy: n/2 needs to be smaller than the product of e and d. For other neural network models, such as discretized ReLU circuits and discretized sigmoid circuits, we also prove that a similar trade-off holds. Thus, our results indicate that increasing depth linearly enhances the capability of neural networks to acquire sparse representations when there are hardware constraints on the number of neurons and weight resolution.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 8","pages":"1541-1567"},"PeriodicalIF":2.7000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10661268/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources—size (the number of gates), depth (the number of layers), weight (weight resolution), and energy—where the energy is a complexity measure inspired by sparse coding and is defined as the maximum number of gates outputting nonzero values, taken over all the input assignments. As our main result, we prove that if a threshold circuit C of size s, depth d, energy e, and weight w computes a Boolean function f (i.e., a classification task) of n variables, it holds that log( rk (f))≤ed(logs+logw+logn) regardless of the algorithm employed by C to compute f, where rk (f) is a parameter solely determined by a scale of f and defined as the maximum rank of a communication matrix with regard to f taken over all the possible partitions of the n input variables. For example, given a Boolean function CD n(ξ) = ⋁i=1n/2ξi∧ξn/2+i, we can prove that n/2≤ed( log s+logw+logn) holds for any circuit C computing CD n. While its left-hand side is linear in n, its right-hand side is bounded by the product of the logarithmic factors of s,w,n and the linear factors of d,e. If we view the logarithmic terms as having a negligible impact on the bound, our result implies a trade-off between depth and energy: n/2 needs to be smaller than the product of e and d. For other neural network models, such as discretized ReLU circuits and discretized sigmoid circuits, we also prove that a similar trade-off holds. Thus, our results indicate that increasing depth linearly enhances the capability of neural networks to acquire sparse representations when there are hardware constraints on the number of neurons and weight resolution.
我们根据以下四种计算资源--大小(门电路数量)、深度(层数)、权重(权重分辨率)和能量--对阈值电路和其他离散化神经网络进行了研究,其中能量是受稀疏编码启发的一种复杂性度量,定义为在所有输入分配中输出非零值的门电路的最大数量。作为我们的主要结果,我们证明,如果一个大小为 s、深度为 d、能量为 e、权重为 w 的阈值电路 C 计算一个包含 n 个变量的布尔函数 f(即分类任务),那么无论 C 采用何种算法计算 f,log( rk (f))≤ed(logs+logw+logn)都是成立的,其中 rk (f) 是一个完全由 f 的规模决定的参数,定义为在 n 个输入变量的所有可能分区中与 f 有关的通信矩阵的最大秩。例如,给定布尔函数 CD n(ξ) =⋁i=1n/2ξi∧ξn/2+i,我们可以证明,对于任何计算 CD n 的电路 C,n/2≤ed( log s+logw+logn) 都成立。如果我们认为对数项对边界的影响可以忽略不计,那么我们的结果就意味着深度和能量之间的权衡:n/2 必须小于 e 和 d 的乘积。对于其他神经网络模型,如离散化 ReLU 电路和离散化 sigmoid 电路,我们也证明了类似的权衡。因此,我们的研究结果表明,当神经元数量和权重分辨率受到硬件限制时,深度的增加会线性地增强神经网络获取稀疏表征的能力。
期刊介绍:
Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.