Hyperdimensional (HD) computing (also referred to as vector symbolic architectures, VSAs) offers a method for encoding symbols into vectors, allowing for those symbols to be combined in different ways to form other vectors in the same vector space. The vectors and operators form a compositional algebra, such that composite vectors can be decomposed back to their constituent vectors. Many useful algorithms have implementations in HD computing, such as classification, spatial navigation, language modeling, and logic. In this letter, we propose a spiking implementation of Fourier holographic reduced representation (FHRR), one of the most versatile VSAs. The phase of each complex number of an FHRR vector is encoded as a spike time within a cycle. Neuron models derived from these spiking phasors can perform the requisite vector operations to implement an FHRR. We demonstrate the power and versatility of our spiking networks in a number of foundational problem domains, including symbol binding and unbinding, spatial representation, function representation, function integration, and memory (i.e., signal delay).
超维(HD)计算(也称为向量符号架构,VSA)提供了一种将符号编码成向量的方法,允许这些符号以不同的方式组合成同一向量空间中的其他向量。向量和运算符构成了一个组合代数,因此复合向量可以分解回其组成向量。许多有用的算法都可以在高清计算中实现,如分类、空间导航、语言建模和逻辑。在这封信中,我们提出了傅立叶全息还原表示法(FHRR)的尖峰实施方案,这是最通用的 VSA 之一。FHRR 向量每个复数的相位被编码为一个周期内的尖峰时间。从这些尖峰相位衍生出来的神经元模型可以执行必要的向量运算,从而实现 FHRR。我们在多个基础问题领域展示了我们的尖峰网络的强大功能和多功能性,包括符号绑定和解绑、空间表示、函数表示、函数整合和记忆(即信号延迟)。
{"title":"Efficient Hyperdimensional Computing With Spiking Phasors","authors":"Jeff Orchard;P. Michael Furlong;Kathryn Simone","doi":"10.1162/neco_a_01693","DOIUrl":"10.1162/neco_a_01693","url":null,"abstract":"Hyperdimensional (HD) computing (also referred to as vector symbolic architectures, VSAs) offers a method for encoding symbols into vectors, allowing for those symbols to be combined in different ways to form other vectors in the same vector space. The vectors and operators form a compositional algebra, such that composite vectors can be decomposed back to their constituent vectors. Many useful algorithms have implementations in HD computing, such as classification, spatial navigation, language modeling, and logic. In this letter, we propose a spiking implementation of Fourier holographic reduced representation (FHRR), one of the most versatile VSAs. The phase of each complex number of an FHRR vector is encoded as a spike time within a cycle. Neuron models derived from these spiking phasors can perform the requisite vector operations to implement an FHRR. We demonstrate the power and versatility of our spiking networks in a number of foundational problem domains, including symbol binding and unbinding, spatial representation, function representation, function integration, and memory (i.e., signal delay).","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1886-1911"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141898952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Magris;Mostafa Shabani;Alexandros Iosifidis
We propose an optimization algorithm for variational inference (VI) in complex models. Our approach relies on natural gradient updates where the variational space is a Riemann manifold. We develop an efficient algorithm for gaussian variational inference whose updates satisfy the positive definite constraint on the variational covariance matrix. Our manifold gaussian variational Bayes on the precision matrix (MGVBP) solution provides simple update rules, is straightforward to implement, and the use of the precision matrix parameterization has a significant computational advantage. Due to its black-box nature, MGVBP stands as a ready-to-use solution for VI in complex models. Over five data sets, we empirically validate our feasible approach on different statistical and econometric models, discussing its performance with respect to baseline methods.
我们为复杂模型中的变分推理(VI)提出了一种优化算法。我们的方法依赖于自然梯度更新,其中变异空间是黎曼流形。我们为高斯变分推理开发了一种高效算法,其更新满足变分协方差矩阵的正定约束。我们的精确矩阵流形高斯变分贝叶斯(MGVBP)解决方案提供了简单的更新规则,易于实现,而且使用精确矩阵参数化具有显著的计算优势。由于其黑箱性质,MGVBP 是复杂模型 VI 的即用型解决方案。通过五个数据集,我们在不同的统计和计量经济学模型上验证了我们的可行方法,并讨论了它与基准方法相比的性能。
{"title":"Manifold Gaussian Variational Bayes on the Precision Matrix","authors":"Martin Magris;Mostafa Shabani;Alexandros Iosifidis","doi":"10.1162/neco_a_01686","DOIUrl":"10.1162/neco_a_01686","url":null,"abstract":"We propose an optimization algorithm for variational inference (VI) in complex models. Our approach relies on natural gradient updates where the variational space is a Riemann manifold. We develop an efficient algorithm for gaussian variational inference whose updates satisfy the positive definite constraint on the variational covariance matrix. Our manifold gaussian variational Bayes on the precision matrix (MGVBP) solution provides simple update rules, is straightforward to implement, and the use of the precision matrix parameterization has a significant computational advantage. Due to its black-box nature, MGVBP stands as a ready-to-use solution for VI in complex models. Over five data sets, we empirically validate our feasible approach on different statistical and econometric models, discussing its performance with respect to baseline methods.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1744-1798"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142009907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiming Jiang;Jinlan Liu;Dongpo Xu;Danilo P. Mandic
Adam-type algorithms have become a preferred choice for optimization in the deep learning setting; however, despite their success, their convergence is still not well understood. To this end, we introduce a unified framework for Adam-type algorithms, termed UAdam. It is equipped with a general form of the second-order moment, which makes it possible to include Adam and its existing and future variants as special cases, such as NAdam, AMSGrad, AdaBound, AdaFom, and Adan. The approach is supported by a rigorous convergence analysis of UAdam in the general nonconvex stochastic setting, showing that UAdam converges to the neighborhood of stationary points with a rate of O(1/T). Furthermore, the size of the neighborhood decreases as the parameter β1 increases. Importantly, our analysis only requires the first-order momentum factor to be close enough to 1, without any restrictions on the second-order momentum factor. Theoretical results also reveal the convergence conditions of vanilla Adam, together with the selection of appropriate hyperparameters. This provides a theoretical guarantee for the analysis, applications, and further developments of the whole general class of Adam-type algorithms. Finally, several numerical experiments are provided to support our theoretical findings.
{"title":"UAdam: Unified Adam-Type Algorithmic Framework for Nonconvex Optimization","authors":"Yiming Jiang;Jinlan Liu;Dongpo Xu;Danilo P. Mandic","doi":"10.1162/neco_a_01692","DOIUrl":"10.1162/neco_a_01692","url":null,"abstract":"Adam-type algorithms have become a preferred choice for optimization in the deep learning setting; however, despite their success, their convergence is still not well understood. To this end, we introduce a unified framework for Adam-type algorithms, termed UAdam. It is equipped with a general form of the second-order moment, which makes it possible to include Adam and its existing and future variants as special cases, such as NAdam, AMSGrad, AdaBound, AdaFom, and Adan. The approach is supported by a rigorous convergence analysis of UAdam in the general nonconvex stochastic setting, showing that UAdam converges to the neighborhood of stationary points with a rate of O(1/T). Furthermore, the size of the neighborhood decreases as the parameter β1 increases. Importantly, our analysis only requires the first-order momentum factor to be close enough to 1, without any restrictions on the second-order momentum factor. Theoretical results also reveal the convergence conditions of vanilla Adam, together with the selection of appropriate hyperparameters. This provides a theoretical guarantee for the analysis, applications, and further developments of the whole general class of Adam-type algorithms. Finally, several numerical experiments are provided to support our theoretical findings.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1912-1938"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141898957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study discusses the negative impact of the derivative of the activation functions in the output layer of artificial neural networks, in particular in continual learning. We propose Hebbian descent as a theoretical framework to overcome this limitation, which is implemented through an alternative loss function for gradient descent we refer to as Hebbian descent loss. This loss is effectively the generalized log-likelihood loss and corresponds to an alternative weight update rule for the output layer wherein the derivative of the activation function is disregarded. We show how this update avoids vanishing error signals during backpropagation in saturated regions of the activation functions, which is particularly helpful in training shallow neural networks and deep neural networks where saturating activation functions are only used in the output layer. In combination with centering, Hebbian descent leads to better continual learning capabilities. It provides a unifying perspective on Hebbian learning, gradient descent, and generalized linear models, for all of which we discuss the advantages and disadvantages. Given activation functions with strictly positive derivative (as often the case in practice), Hebbian descent inherits the convergence properties of regular gradient descent. While established pairings of loss and output layer activation function (e.g., mean squared error with linear or cross-entropy with sigmoid/softmax) are subsumed by Hebbian descent, we provide general insights for designing arbitrary loss activation function combinations that benefit from Hebbian descent. For shallow networks, we show that Hebbian descent outperforms Hebbian learning, has a performance similar to regular gradient descent, and has a much better performance than all other tested update rules in continual learning. In combination with centering, Hebbian descent implements a forgetting mechanism that prevents catastrophic interference notably better than the other tested update rules. When training deep neural networks, our experimental results suggest that Hebbian descent has better or similar performance as gradient descent.
{"title":"Hebbian Descent: A Unified View on Log-Likelihood Learning","authors":"Jan Melchior;Robin Schiewer;Laurenz Wiskott","doi":"10.1162/neco_a_01684","DOIUrl":"10.1162/neco_a_01684","url":null,"abstract":"This study discusses the negative impact of the derivative of the activation functions in the output layer of artificial neural networks, in particular in continual learning. We propose Hebbian descent as a theoretical framework to overcome this limitation, which is implemented through an alternative loss function for gradient descent we refer to as Hebbian descent loss. This loss is effectively the generalized log-likelihood loss and corresponds to an alternative weight update rule for the output layer wherein the derivative of the activation function is disregarded. We show how this update avoids vanishing error signals during backpropagation in saturated regions of the activation functions, which is particularly helpful in training shallow neural networks and deep neural networks where saturating activation functions are only used in the output layer. In combination with centering, Hebbian descent leads to better continual learning capabilities. It provides a unifying perspective on Hebbian learning, gradient descent, and generalized linear models, for all of which we discuss the advantages and disadvantages. Given activation functions with strictly positive derivative (as often the case in practice), Hebbian descent inherits the convergence properties of regular gradient descent. While established pairings of loss and output layer activation function (e.g., mean squared error with linear or cross-entropy with sigmoid/softmax) are subsumed by Hebbian descent, we provide general insights for designing arbitrary loss activation function combinations that benefit from Hebbian descent. For shallow networks, we show that Hebbian descent outperforms Hebbian learning, has a performance similar to regular gradient descent, and has a much better performance than all other tested update rules in continual learning. In combination with centering, Hebbian descent implements a forgetting mechanism that prevents catastrophic interference notably better than the other tested update rules. When training deep neural networks, our experimental results suggest that Hebbian descent has better or similar performance as gradient descent.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1669-1712"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142009906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In reinforcement learning (RL), artificial agents are trained to maximize numerical rewards by performing tasks. Exploration is essential in RL because agents must discover information before exploiting it. Two rewards encouraging efficient exploration are the entropy of action policy and curiosity for information gain. Entropy is well established in the literature, promoting randomized action selection. Curiosity is defined in a broad variety of ways in literature, promoting discovery of novel experiences. One example, prediction error curiosity, rewards agents for discovering observations they cannot accurately predict. However, such agents may be distracted by unpredictable observational noises known as curiosity traps. Based on the free energy principle (FEP), this letter proposes hidden state curiosity, which rewards agents by the KL divergence between the predictive prior and posterior probabilities of latent variables. We trained six types of agents to navigate mazes: baseline agents without rewards for entropy or curiosity and agents rewarded for entropy and/or either prediction error curiosity or hidden state curiosity. We find that entropy and curiosity result in efficient exploration, especially both employed together. Notably, agents with hidden state curiosity demonstrate resilience against curiosity traps, which hinder agents with prediction error curiosity. This suggests implementing the FEP that may enhance the robustness and generalization of RL models, potentially aligning the learning processes of artificial and biological agents.
{"title":"Intrinsic Rewards for Exploration Without Harm From Observational Noise: A Simulation Study Based on the Free Energy Principle","authors":"Theodore Jerome Tinker;Kenji Doya;Jun Tani","doi":"10.1162/neco_a_01690","DOIUrl":"10.1162/neco_a_01690","url":null,"abstract":"In reinforcement learning (RL), artificial agents are trained to maximize numerical rewards by performing tasks. Exploration is essential in RL because agents must discover information before exploiting it. Two rewards encouraging efficient exploration are the entropy of action policy and curiosity for information gain. Entropy is well established in the literature, promoting randomized action selection. Curiosity is defined in a broad variety of ways in literature, promoting discovery of novel experiences. One example, prediction error curiosity, rewards agents for discovering observations they cannot accurately predict. However, such agents may be distracted by unpredictable observational noises known as curiosity traps. Based on the free energy principle (FEP), this letter proposes hidden state curiosity, which rewards agents by the KL divergence between the predictive prior and posterior probabilities of latent variables. We trained six types of agents to navigate mazes: baseline agents without rewards for entropy or curiosity and agents rewarded for entropy and/or either prediction error curiosity or hidden state curiosity. We find that entropy and curiosity result in efficient exploration, especially both employed together. Notably, agents with hidden state curiosity demonstrate resilience against curiosity traps, which hinder agents with prediction error curiosity. This suggests implementing the FEP that may enhance the robustness and generalization of RL models, potentially aligning the learning processes of artificial and biological agents.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1854-1885"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141898954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minkyu Choi;Yizhen Zhang;Kuan Han;Xiaokai Wang;Zhongming Liu
Humans actively observe the visual surroundings by focusing on salient objects and ignoring trivial details. However, computer vision models based on convolutional neural networks (CNN) often analyze visual input all at once through a single feedforward pass. In this study, we designed a dual-stream vision model inspired by the human brain. This model features retina-like input layers and includes two streams: one determining the next point of focus (the fixation), while the other interprets the visuals surrounding the fixation. Trained on image recognition, this model examines an image through a sequence of fixations, each time focusing on different parts, thereby progressively building a representation of the image. We evaluated this model against various benchmarks in terms of object recognition, gaze behavior, and adversarial robustness. Our findings suggest that the model can attend and gaze in ways similar to humans without being explicitly trained to mimic human attention and that the model can enhance robustness against adversarial attacks due to its retinal sampling and recurrent processing. In particular, the model can correct its perceptual errors by taking more glances, setting itself apart from all feedforward-only models. In conclusion, the interactions of retinal sampling, eye movement, and recurrent dynamics are important to human-like visual exploration and inference.
{"title":"Human Eyes–Inspired Recurrent Neural Networks Are More Robust Against Adversarial Noises","authors":"Minkyu Choi;Yizhen Zhang;Kuan Han;Xiaokai Wang;Zhongming Liu","doi":"10.1162/neco_a_01688","DOIUrl":"10.1162/neco_a_01688","url":null,"abstract":"Humans actively observe the visual surroundings by focusing on salient objects and ignoring trivial details. However, computer vision models based on convolutional neural networks (CNN) often analyze visual input all at once through a single feedforward pass. In this study, we designed a dual-stream vision model inspired by the human brain. This model features retina-like input layers and includes two streams: one determining the next point of focus (the fixation), while the other interprets the visuals surrounding the fixation. Trained on image recognition, this model examines an image through a sequence of fixations, each time focusing on different parts, thereby progressively building a representation of the image. We evaluated this model against various benchmarks in terms of object recognition, gaze behavior, and adversarial robustness. Our findings suggest that the model can attend and gaze in ways similar to humans without being explicitly trained to mimic human attention and that the model can enhance robustness against adversarial attacks due to its retinal sampling and recurrent processing. In particular, the model can correct its perceptual errors by taking more glances, setting itself apart from all feedforward-only models. In conclusion, the interactions of retinal sampling, eye movement, and recurrent dynamics are important to human-like visual exploration and inference.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1713-1743"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141898953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Della Daiyi Luo;Bapun Giri;Kamran Diba;Caleb Kemere
Dimension reduction on neural activity paves a way for unsupervised neural decoding by dissociating the measurement of internal neural pattern reactivation from the measurement of external variable tuning. With assumptions only on the smoothness of latent dynamics and of internal tuning curves, the Poisson gaussian-process latent variable model (P-GPLVM; Wu et al., 2017) is a powerful tool to discover the low-dimensional latent structure for high-dimensional spike trains. However, when given novel neural data, the original model lacks a method to infer their latent trajectories in the learned latent space, limiting its ability for estimating the neural reactivation. Here, we extend the P-GPLVM to enable the latent variable inference of new data constrained by previously learned smoothness and mapping information. We also describe a principled approach for the constrained latent variable inference for temporally compressed patterns of activity, such as those found in population burst events during hippocampal sharp-wave ripples, as well as metrics for assessing the validity of neural pattern reactivation and inferring the encoded experience. Applying these approaches to hippocampal ensemble recordings during active maze exploration, we replicate the result that P-GPLVM learns a latent space encoding the animal’s position. We further demonstrate that this latent space can differentiate one maze context from another. By inferring the latent variables of new neural data during running, certain neural patterns are observed to reactivate, in accordance with the similarity of experiences encoded by its nearby neural trajectories in the training data manifold. Finally, reactivation of neural patterns can be estimated for neural activity during population burst events as well, allowing the identification for replay events of versatile behaviors and more general experiences. Thus, our extension of the P-GPLVM framework for unsupervised analysis of neural activity can be used to answer critical questions related to scientific discovery.
{"title":"Extended Poisson Gaussian-Process Latent Variable Model for Unsupervised Neural Decoding","authors":"Della Daiyi Luo;Bapun Giri;Kamran Diba;Caleb Kemere","doi":"10.1162/neco_a_01685","DOIUrl":"10.1162/neco_a_01685","url":null,"abstract":"Dimension reduction on neural activity paves a way for unsupervised neural decoding by dissociating the measurement of internal neural pattern reactivation from the measurement of external variable tuning. With assumptions only on the smoothness of latent dynamics and of internal tuning curves, the Poisson gaussian-process latent variable model (P-GPLVM; Wu et al., 2017) is a powerful tool to discover the low-dimensional latent structure for high-dimensional spike trains. However, when given novel neural data, the original model lacks a method to infer their latent trajectories in the learned latent space, limiting its ability for estimating the neural reactivation. Here, we extend the P-GPLVM to enable the latent variable inference of new data constrained by previously learned smoothness and mapping information. We also describe a principled approach for the constrained latent variable inference for temporally compressed patterns of activity, such as those found in population burst events during hippocampal sharp-wave ripples, as well as metrics for assessing the validity of neural pattern reactivation and inferring the encoded experience. Applying these approaches to hippocampal ensemble recordings during active maze exploration, we replicate the result that P-GPLVM learns a latent space encoding the animal’s position. We further demonstrate that this latent space can differentiate one maze context from another. By inferring the latent variables of new neural data during running, certain neural patterns are observed to reactivate, in accordance with the similarity of experiences encoded by its nearby neural trajectories in the training data manifold. Finally, reactivation of neural patterns can be estimated for neural activity during population burst events as well, allowing the identification for replay events of versatile behaviors and more general experiences. Thus, our extension of the P-GPLVM framework for unsupervised analysis of neural activity can be used to answer critical questions related to scientific discovery.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 8","pages":"1449-1475"},"PeriodicalIF":2.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141728313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The energy efficiency of hardware implementations of convolutional neural networks (CNNs) is critical to their widespread deployment in low-power mobile devices. Recently, a number of methods have been proposed for providing energy-optimal mappings of CNNs onto diverse hardware accelerators. Their estimated energy consumption is related to specific implementation details and hardware parameters, which does not allow for machine-independent exploration of CNN energy measures. In this letter, we introduce a simplified theoretical energy complexity model for CNNs, based on only a two-level memory hierarchy that captures asymptotically all important sources of energy consumption for different CNN hardware implementations. In this model, we derive a simple energy lower bound and calculate the energy complexity of evaluating a CNN layer for two common data flows, providing corresponding upper bounds. According to statistical tests, the theoretical energy upper and lower bounds we present fit asymptotically very well with the real energy consumption of CNN implementations on the Simba and Eyeriss hardware platforms, estimated by the Timeloop/Accelergy program, which validates the proposed energy complexity model for CNNs.
{"title":"Energy Complexity of Convolutional Neural Networks","authors":"Jiří Šíma;Petra Vidnerová;Vojtěch Mrázek","doi":"10.1162/neco_a_01676","DOIUrl":"10.1162/neco_a_01676","url":null,"abstract":"The energy efficiency of hardware implementations of convolutional neural networks (CNNs) is critical to their widespread deployment in low-power mobile devices. Recently, a number of methods have been proposed for providing energy-optimal mappings of CNNs onto diverse hardware accelerators. Their estimated energy consumption is related to specific implementation details and hardware parameters, which does not allow for machine-independent exploration of CNN energy measures. In this letter, we introduce a simplified theoretical energy complexity model for CNNs, based on only a two-level memory hierarchy that captures asymptotically all important sources of energy consumption for different CNN hardware implementations. In this model, we derive a simple energy lower bound and calculate the energy complexity of evaluating a CNN layer for two common data flows, providing corresponding upper bounds. According to statistical tests, the theoretical energy upper and lower bounds we present fit asymptotically very well with the real energy consumption of CNN implementations on the Simba and Eyeriss hardware platforms, estimated by the Timeloop/Accelergy program, which validates the proposed energy complexity model for CNNs.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 8","pages":"1601-1625"},"PeriodicalIF":2.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141082351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources—size (the number of gates), depth (the number of layers), weight (weight resolution), and energy—where the energy is a complexity measure inspired by sparse coding and is defined as the maximum number of gates outputting nonzero values, taken over all the input assignments. As our main result, we prove that if a threshold circuit C of size s, depth d, energy e, and weight w computes a Boolean function f (i.e., a classification task) of n variables, it holds that log( rk (f))≤ed(logs+logw+logn) regardless of the algorithm employed by C to compute f, where rk (f) is a parameter solely determined by a scale of f and defined as the maximum rank of a communication matrix with regard to f taken over all the possible partitions of the n input variables. For example, given a Boolean function CD n(ξ) = ⋁i=1n/2ξi∧ξn/2+i, we can prove that n/2≤ed( log s+logw+logn) holds for any circuit C computing CD n. While its left-hand side is linear in n, its right-hand side is bounded by the product of the logarithmic factors of s,w,n and the linear factors of d,e. If we view the logarithmic terms as having a negligible impact on the bound, our result implies a trade-off between depth and energy: n/2 needs to be smaller than the product of e and d. For other neural network models, such as discretized ReLU circuits and discretized sigmoid circuits, we also prove that a similar trade-off holds. Thus, our results indicate that increasing depth linearly enhances the capability of neural networks to acquire sparse representations when there are hardware constraints on the number of neurons and weight resolution.
我们根据以下四种计算资源--大小(门电路数量)、深度(层数)、权重(权重分辨率)和能量--对阈值电路和其他离散化神经网络进行了研究,其中能量是受稀疏编码启发的一种复杂性度量,定义为在所有输入分配中输出非零值的门电路的最大数量。作为我们的主要结果,我们证明,如果一个大小为 s、深度为 d、能量为 e、权重为 w 的阈值电路 C 计算一个包含 n 个变量的布尔函数 f(即分类任务),那么无论 C 采用何种算法计算 f,log( rk (f))≤ed(logs+logw+logn)都是成立的,其中 rk (f) 是一个完全由 f 的规模决定的参数,定义为在 n 个输入变量的所有可能分区中与 f 有关的通信矩阵的最大秩。例如,给定布尔函数 CD n(ξ) =⋁i=1n/2ξi∧ξn/2+i,我们可以证明,对于任何计算 CD n 的电路 C,n/2≤ed( log s+logw+logn) 都成立。如果我们认为对数项对边界的影响可以忽略不计,那么我们的结果就意味着深度和能量之间的权衡:n/2 必须小于 e 和 d 的乘积。对于其他神经网络模型,如离散化 ReLU 电路和离散化 sigmoid 电路,我们也证明了类似的权衡。因此,我们的研究结果表明,当神经元数量和权重分辨率受到硬件限制时,深度的增加会线性地增强神经网络获取稀疏表征的能力。
{"title":"Trade-Offs Between Energy and Depth of Neural Networks","authors":"Kei Uchizawa;Haruki Abe","doi":"10.1162/neco_a_01683","DOIUrl":"10.1162/neco_a_01683","url":null,"abstract":"We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources—size (the number of gates), depth (the number of layers), weight (weight resolution), and energy—where the energy is a complexity measure inspired by sparse coding and is defined as the maximum number of gates outputting nonzero values, taken over all the input assignments. As our main result, we prove that if a threshold circuit C of size s, depth d, energy e, and weight w computes a Boolean function f (i.e., a classification task) of n variables, it holds that log( rk (f))≤ed(logs+logw+logn) regardless of the algorithm employed by C to compute f, where rk (f) is a parameter solely determined by a scale of f and defined as the maximum rank of a communication matrix with regard to f taken over all the possible partitions of the n input variables. For example, given a Boolean function CD n(ξ) = ⋁i=1n/2ξi∧ξn/2+i, we can prove that n/2≤ed( log s+logw+logn) holds for any circuit C computing CD n. While its left-hand side is linear in n, its right-hand side is bounded by the product of the logarithmic factors of s,w,n and the linear factors of d,e. If we view the logarithmic terms as having a negligible impact on the bound, our result implies a trade-off between depth and energy: n/2 needs to be smaller than the product of e and d. For other neural network models, such as discretized ReLU circuits and discretized sigmoid circuits, we also prove that a similar trade-off holds. Thus, our results indicate that increasing depth linearly enhances the capability of neural networks to acquire sparse representations when there are hardware constraints on the number of neurons and weight resolution.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 8","pages":"1541-1567"},"PeriodicalIF":2.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141728316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In computer vision research, convolutional neural networks (CNNs) have demonstrated remarkable capabilities at extracting patterns from raw pixel data, achieving state-of-the-art recognition accuracy. However, they significantly differ from human visual perception, prioritizing pixel-level correlations and statistical patterns, often overlooking object semantics. To explore this difference, we propose an approach that isolates core visual features crucial for human perception and object recognition: color, texture, and shape. In experiments on three benchmarks—Fruits 360, CIFAR-10, and Fashion MNIST—each visual feature is individually input into a neural network. Results reveal data set–dependent variations in classification accuracy, highlighting that deep learning models tend to learn pixel-level correlations instead of fundamental visual features. To validate this observation, we used various combinations of concatenated visual features as input for a neural network on the CIFAR-10 data set. CNNs excel at learning statistical patterns in images, achieving exceptional performance when training and test data share similar distributions. To substantiate this point, we trained a CNN on CIFAR-10 data set and evaluated its performance on the “dog” class from CIFAR-10 and on an equivalent number of examples from the Stanford Dogs data set. The CNN poor performance on Stanford Dogs images underlines the disparity between deep learning and human visual perception, highlighting the need for models that learn object semantics. Specialized benchmark data sets with controlled variations hold promise for aligning learned representations with human cognition in computer vision research.
{"title":"Promoting the Shift From Pixel-Level Correlations to Object Semantics Learning by Rethinking Computer Vision Benchmark Data Sets","authors":"Maria Osório;Andreas Wichert","doi":"10.1162/neco_a_01677","DOIUrl":"10.1162/neco_a_01677","url":null,"abstract":"In computer vision research, convolutional neural networks (CNNs) have demonstrated remarkable capabilities at extracting patterns from raw pixel data, achieving state-of-the-art recognition accuracy. However, they significantly differ from human visual perception, prioritizing pixel-level correlations and statistical patterns, often overlooking object semantics. To explore this difference, we propose an approach that isolates core visual features crucial for human perception and object recognition: color, texture, and shape. In experiments on three benchmarks—Fruits 360, CIFAR-10, and Fashion MNIST—each visual feature is individually input into a neural network. Results reveal data set–dependent variations in classification accuracy, highlighting that deep learning models tend to learn pixel-level correlations instead of fundamental visual features. To validate this observation, we used various combinations of concatenated visual features as input for a neural network on the CIFAR-10 data set. CNNs excel at learning statistical patterns in images, achieving exceptional performance when training and test data share similar distributions. To substantiate this point, we trained a CNN on CIFAR-10 data set and evaluated its performance on the “dog” class from CIFAR-10 and on an equivalent number of examples from the Stanford Dogs data set. The CNN poor performance on Stanford Dogs images underlines the disparity between deep learning and human visual perception, highlighting the need for models that learn object semantics. Specialized benchmark data sets with controlled variations hold promise for aligning learned representations with human cognition in computer vision research.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 8","pages":"1626-1642"},"PeriodicalIF":2.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141082367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}