Trade-Offs Between Energy and Depth of Neural Networks

IF 2.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Computation Pub Date : 2024-07-19 DOI:10.1162/neco_a_01683
Kei Uchizawa;Haruki Abe
{"title":"Trade-Offs Between Energy and Depth of Neural Networks","authors":"Kei Uchizawa;Haruki Abe","doi":"10.1162/neco_a_01683","DOIUrl":null,"url":null,"abstract":"We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources—size (the number of gates), depth (the number of layers), weight (weight resolution), and energy—where the energy is a complexity measure inspired by sparse coding and is defined as the maximum number of gates outputting nonzero values, taken over all the input assignments. As our main result, we prove that if a threshold circuit C of size s, depth d, energy e, and weight w computes a Boolean function f (i.e., a classification task) of n variables, it holds that log( rk (f))≤ed(logs+logw+logn) regardless of the algorithm employed by C to compute f, where rk (f) is a parameter solely determined by a scale of f and defined as the maximum rank of a communication matrix with regard to f taken over all the possible partitions of the n input variables. For example, given a Boolean function CD n(ξ) = ⋁i=1n/2ξi∧ξn/2+i, we can prove that n/2≤ed( log s+logw+logn) holds for any circuit C computing CD n. While its left-hand side is linear in n, its right-hand side is bounded by the product of the logarithmic factors of s,w,n and the linear factors of d,e. If we view the logarithmic terms as having a negligible impact on the bound, our result implies a trade-off between depth and energy: n/2 needs to be smaller than the product of e and d. For other neural network models, such as discretized ReLU circuits and discretized sigmoid circuits, we also prove that a similar trade-off holds. Thus, our results indicate that increasing depth linearly enhances the capability of neural networks to acquire sparse representations when there are hardware constraints on the number of neurons and weight resolution.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 8","pages":"1541-1567"},"PeriodicalIF":2.7000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10661268/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources—size (the number of gates), depth (the number of layers), weight (weight resolution), and energy—where the energy is a complexity measure inspired by sparse coding and is defined as the maximum number of gates outputting nonzero values, taken over all the input assignments. As our main result, we prove that if a threshold circuit C of size s, depth d, energy e, and weight w computes a Boolean function f (i.e., a classification task) of n variables, it holds that log( rk (f))≤ed(logs+logw+logn) regardless of the algorithm employed by C to compute f, where rk (f) is a parameter solely determined by a scale of f and defined as the maximum rank of a communication matrix with regard to f taken over all the possible partitions of the n input variables. For example, given a Boolean function CD n(ξ) = ⋁i=1n/2ξi∧ξn/2+i, we can prove that n/2≤ed( log s+logw+logn) holds for any circuit C computing CD n. While its left-hand side is linear in n, its right-hand side is bounded by the product of the logarithmic factors of s,w,n and the linear factors of d,e. If we view the logarithmic terms as having a negligible impact on the bound, our result implies a trade-off between depth and energy: n/2 needs to be smaller than the product of e and d. For other neural network models, such as discretized ReLU circuits and discretized sigmoid circuits, we also prove that a similar trade-off holds. Thus, our results indicate that increasing depth linearly enhances the capability of neural networks to acquire sparse representations when there are hardware constraints on the number of neurons and weight resolution.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
神经网络能量与深度之间的权衡。
我们根据以下四种计算资源--大小(门电路数量)、深度(层数)、权重(权重分辨率)和能量--对阈值电路和其他离散化神经网络进行了研究,其中能量是受稀疏编码启发的一种复杂性度量,定义为在所有输入分配中输出非零值的门电路的最大数量。作为我们的主要结果,我们证明,如果一个大小为 s、深度为 d、能量为 e、权重为 w 的阈值电路 C 计算一个包含 n 个变量的布尔函数 f(即分类任务),那么无论 C 采用何种算法计算 f,log( rk (f))≤ed(logs+logw+logn)都是成立的,其中 rk (f) 是一个完全由 f 的规模决定的参数,定义为在 n 个输入变量的所有可能分区中与 f 有关的通信矩阵的最大秩。例如,给定布尔函数 CD n(ξ) =⋁i=1n/2ξi∧ξn/2+i,我们可以证明,对于任何计算 CD n 的电路 C,n/2≤ed( log s+logw+logn) 都成立。如果我们认为对数项对边界的影响可以忽略不计,那么我们的结果就意味着深度和能量之间的权衡:n/2 必须小于 e 和 d 的乘积。对于其他神经网络模型,如离散化 ReLU 电路和离散化 sigmoid 电路,我们也证明了类似的权衡。因此,我们的研究结果表明,当神经元数量和权重分辨率受到硬件限制时,深度的增加会线性地增强神经网络获取稀疏表征的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Neural Computation
Neural Computation 工程技术-计算机:人工智能
CiteScore
6.30
自引率
3.40%
发文量
83
审稿时长
3.0 months
期刊介绍: Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.
期刊最新文献
Realizing Synthetic Active Inference Agents, Part II: Variational Message Updates Bounded Rational Decision Networks With Belief Propagation Computation With Sequences of Assemblies in a Model of the Brain Relating Human Error–Based Learning to Modern Deep RL Algorithms Selective Inference for Change Point Detection by Recurrent Neural Network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1