Neural Computation最新文献_第5页

Efficient Hyperdimensional Computing With Spiking Phasors 利用尖峰相位进行高效超维计算

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-08-19 DOI: 10.1162/neco_a_01693

Jeff Orchard;P. Michael Furlong;Kathryn Simone

Hyperdimensional (HD) computing (also referred to as vector symbolic architectures, VSAs) offers a method for encoding symbols into vectors, allowing for those symbols to be combined in different ways to form other vectors in the same vector space. The vectors and operators form a compositional algebra, such that composite vectors can be decomposed back to their constituent vectors. Many useful algorithms have implementations in HD computing, such as classification, spatial navigation, language modeling, and logic. In this letter, we propose a spiking implementation of Fourier holographic reduced representation (FHRR), one of the most versatile VSAs. The phase of each complex number of an FHRR vector is encoded as a spike time within a cycle. Neuron models derived from these spiking phasors can perform the requisite vector operations to implement an FHRR. We demonstrate the power and versatility of our spiking networks in a number of foundational problem domains, including symbol binding and unbinding, spatial representation, function representation, function integration, and memory (i.e., signal delay).

超维（HD）计算（也称为向量符号架构，VSA）提供了一种将符号编码成向量的方法，允许这些符号以不同的方式组合成同一向量空间中的其他向量。向量和运算符构成了一个组合代数，因此复合向量可以分解回其组成向量。许多有用的算法都可以在高清计算中实现，如分类、空间导航、语言建模和逻辑。在这封信中，我们提出了傅立叶全息还原表示法（FHRR）的尖峰实施方案，这是最通用的 VSA 之一。FHRR 向量每个复数的相位被编码为一个周期内的尖峰时间。从这些尖峰相位衍生出来的神经元模型可以执行必要的向量运算，从而实现 FHRR。我们在多个基础问题领域展示了我们的尖峰网络的强大功能和多功能性，包括符号绑定和解绑、空间表示、函数表示、函数整合和记忆（即信号延迟）。

引用次数: 0

Manifold Gaussian Variational Bayes on the Precision Matrix 精度矩阵上的漫反射高斯变异贝叶斯。

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-08-19 DOI: 10.1162/neco_a_01686

Martin Magris;Mostafa Shabani;Alexandros Iosifidis

We propose an optimization algorithm for variational inference (VI) in complex models. Our approach relies on natural gradient updates where the variational space is a Riemann manifold. We develop an efficient algorithm for gaussian variational inference whose updates satisfy the positive definite constraint on the variational covariance matrix. Our manifold gaussian variational Bayes on the precision matrix (MGVBP) solution provides simple update rules, is straightforward to implement, and the use of the precision matrix parameterization has a significant computational advantage. Due to its black-box nature, MGVBP stands as a ready-to-use solution for VI in complex models. Over five data sets, we empirically validate our feasible approach on different statistical and econometric models, discussing its performance with respect to baseline methods.

我们为复杂模型中的变分推理（VI）提出了一种优化算法。我们的方法依赖于自然梯度更新，其中变异空间是黎曼流形。我们为高斯变分推理开发了一种高效算法，其更新满足变分协方差矩阵的正定约束。我们的精确矩阵流形高斯变分贝叶斯（MGVBP）解决方案提供了简单的更新规则，易于实现，而且使用精确矩阵参数化具有显著的计算优势。由于其黑箱性质，MGVBP 是复杂模型 VI 的即用型解决方案。通过五个数据集，我们在不同的统计和计量经济学模型上验证了我们的可行方法，并讨论了它与基准方法相比的性能。

引用次数: 0

UAdam: Unified Adam-Type Algorithmic Framework for Nonconvex Optimization UAdam：非凸优化的统一亚当式算法框架。

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-08-19 DOI: 10.1162/neco_a_01692

Yiming Jiang;Jinlan Liu;Dongpo Xu;Danilo P. Mandic

Adam-type algorithms have become a preferred choice for optimization in the deep learning setting; however, despite their success, their convergence is still not well understood. To this end, we introduce a unified framework for Adam-type algorithms, termed UAdam. It is equipped with a general form of the second-order moment, which makes it possible to include Adam and its existing and future variants as special cases, such as NAdam, AMSGrad, AdaBound, AdaFom, and Adan. The approach is supported by a rigorous convergence analysis of UAdam in the general nonconvex stochastic setting, showing that UAdam converges to the neighborhood of stationary points with a rate of O(1/T). Furthermore, the size of the neighborhood decreases as the parameter β1 increases. Importantly, our analysis only requires the first-order momentum factor to be close enough to 1, without any restrictions on the second-order momentum factor. Theoretical results also reveal the convergence conditions of vanilla Adam, together with the selection of appropriate hyperparameters. This provides a theoretical guarantee for the analysis, applications, and further developments of the whole general class of Adam-type algorithms. Finally, several numerical experiments are provided to support our theoretical findings.

亚当型算法已成为深度学习环境中优化的首选；然而，尽管亚当型算法取得了成功，但人们对其收敛性仍不甚了解。为此，我们引入了亚当型算法的统一框架，称为 UAdam。它配备了二阶矩的一般形式，从而可以将亚当及其现有和未来的变体作为特例，如 NAdam、AMSGrad、AdaBound、AdaFom 和 Adan。UAdam 在一般非凸随机环境下的严格收敛分析支持了这一方法，分析表明 UAdam 以 O(1/T) 的速度收敛到静止点邻域。此外，邻域的大小会随着参数 β1 的增大而减小。重要的是，我们的分析只要求一阶动量因子足够接近 1，而对二阶动量因子没有任何限制。理论结果还揭示了 vanilla Adam 的收敛条件，以及适当超参数的选择。这为整个亚当型算法的分析、应用和进一步发展提供了理论保证。最后，我们还提供了几个数值实验来支持我们的理论发现。

{"title":"UAdam: Unified Adam-Type Algorithmic Framework for Nonconvex Optimization","authors":"Yiming Jiang;Jinlan Liu;Dongpo Xu;Danilo P. Mandic","doi":"10.1162/neco_a_01692","DOIUrl":"10.1162/neco_a_01692","url":null,"abstract":"Adam-type algorithms have become a preferred choice for optimization in the deep learning setting; however, despite their success, their convergence is still not well understood. To this end, we introduce a unified framework for Adam-type algorithms, termed UAdam. It is equipped with a general form of the second-order moment, which makes it possible to include Adam and its existing and future variants as special cases, such as NAdam, AMSGrad, AdaBound, AdaFom, and Adan. The approach is supported by a rigorous convergence analysis of UAdam in the general nonconvex stochastic setting, showing that UAdam converges to the neighborhood of stationary points with a rate of O(1/T). Furthermore, the size of the neighborhood decreases as the parameter β1 increases. Importantly, our analysis only requires the first-order momentum factor to be close enough to 1, without any restrictions on the second-order momentum factor. Theoretical results also reveal the convergence conditions of vanilla Adam, together with the selection of appropriate hyperparameters. This provides a theoretical guarantee for the analysis, applications, and further developments of the whole general class of Adam-type algorithms. Finally, several numerical experiments are provided to support our theoretical findings.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1912-1938"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141898957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hebbian Descent: A Unified View on Log-Likelihood Learning 希比后裔：对数似然学习的统一观点

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-08-19 DOI: 10.1162/neco_a_01684

Jan Melchior;Robin Schiewer;Laurenz Wiskott

This study discusses the negative impact of the derivative of the activation functions in the output layer of artificial neural networks, in particular in continual learning. We propose Hebbian descent as a theoretical framework to overcome this limitation, which is implemented through an alternative loss function for gradient descent we refer to as Hebbian descent loss. This loss is effectively the generalized log-likelihood loss and corresponds to an alternative weight update rule for the output layer wherein the derivative of the activation function is disregarded. We show how this update avoids vanishing error signals during backpropagation in saturated regions of the activation functions, which is particularly helpful in training shallow neural networks and deep neural networks where saturating activation functions are only used in the output layer. In combination with centering, Hebbian descent leads to better continual learning capabilities. It provides a unifying perspective on Hebbian learning, gradient descent, and generalized linear models, for all of which we discuss the advantages and disadvantages. Given activation functions with strictly positive derivative (as often the case in practice), Hebbian descent inherits the convergence properties of regular gradient descent. While established pairings of loss and output layer activation function (e.g., mean squared error with linear or cross-entropy with sigmoid/softmax) are subsumed by Hebbian descent, we provide general insights for designing arbitrary loss activation function combinations that benefit from Hebbian descent. For shallow networks, we show that Hebbian descent outperforms Hebbian learning, has a performance similar to regular gradient descent, and has a much better performance than all other tested update rules in continual learning. In combination with centering, Hebbian descent implements a forgetting mechanism that prevents catastrophic interference notably better than the other tested update rules. When training deep neural networks, our experimental results suggest that Hebbian descent has better or similar performance as gradient descent.

本研究讨论了人工神经网络输出层激活函数导数的负面影响，尤其是在持续学习中。我们提出了希比安下降作为克服这一限制的理论框架，并通过梯度下降的替代损失函数来实现，我们称之为希比安下降损失。这种损失实际上是广义的对数概率损失，对应于输出层的另一种权重更新规则，其中忽略了激活函数的导数。我们展示了这种更新是如何避免反向传播过程中在激活函数饱和区域出现误差信号消失的，这对于训练浅层神经网络和深层神经网络特别有帮助，因为这些网络只在输出层使用饱和激活函数。结合中心化，希比安下降可以带来更好的持续学习能力。我们将从一个统一的视角来探讨希比安学习、梯度下降和广义线性模型的优缺点。对于具有严格正导数的激活函数（实践中经常出现这种情况），希伯来梯度下降继承了常规梯度下降的收敛特性。虽然损失和输出层激活函数的既定配对（如均方误差与线性或交叉熵与 sigmoid/softmax）已被希比安下降法所涵盖，但我们为设计任意损失激活函数组合提供了一般见解，这些组合都能从希比安下降法中获益。对于浅层网络，我们证明了希比安下降法优于希比安学习法，其性能与常规梯度下降法相似，并且在持续学习中的性能远远优于所有其他测试过的更新规则。结合中心化，希比下降实现了一种遗忘机制，其防止灾难性干扰的效果明显优于其他测试过的更新规则。在训练深度神经网络时，我们的实验结果表明，希比安下降法的性能优于或类似于梯度下降法。

{"title":"Hebbian Descent: A Unified View on Log-Likelihood Learning","authors":"Jan Melchior;Robin Schiewer;Laurenz Wiskott","doi":"10.1162/neco_a_01684","DOIUrl":"10.1162/neco_a_01684","url":null,"abstract":"This study discusses the negative impact of the derivative of the activation functions in the output layer of artificial neural networks, in particular in continual learning. We propose Hebbian descent as a theoretical framework to overcome this limitation, which is implemented through an alternative loss function for gradient descent we refer to as Hebbian descent loss. This loss is effectively the generalized log-likelihood loss and corresponds to an alternative weight update rule for the output layer wherein the derivative of the activation function is disregarded. We show how this update avoids vanishing error signals during backpropagation in saturated regions of the activation functions, which is particularly helpful in training shallow neural networks and deep neural networks where saturating activation functions are only used in the output layer. In combination with centering, Hebbian descent leads to better continual learning capabilities. It provides a unifying perspective on Hebbian learning, gradient descent, and generalized linear models, for all of which we discuss the advantages and disadvantages. Given activation functions with strictly positive derivative (as often the case in practice), Hebbian descent inherits the convergence properties of regular gradient descent. While established pairings of loss and output layer activation function (e.g., mean squared error with linear or cross-entropy with sigmoid/softmax) are subsumed by Hebbian descent, we provide general insights for designing arbitrary loss activation function combinations that benefit from Hebbian descent. For shallow networks, we show that Hebbian descent outperforms Hebbian learning, has a performance similar to regular gradient descent, and has a much better performance than all other tested update rules in continual learning. In combination with centering, Hebbian descent implements a forgetting mechanism that prevents catastrophic interference notably better than the other tested update rules. When training deep neural networks, our experimental results suggest that Hebbian descent has better or similar performance as gradient descent.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1669-1712"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142009906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Intrinsic Rewards for Exploration Without Harm From Observational Noise: A Simulation Study Based on the Free Energy Principle 无观测噪声危害的内在探索奖励：基于自由能原理的模拟研究。

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-08-19 DOI: 10.1162/neco_a_01690

Theodore Jerome Tinker;Kenji Doya;Jun Tani

In reinforcement learning (RL), artificial agents are trained to maximize numerical rewards by performing tasks. Exploration is essential in RL because agents must discover information before exploiting it. Two rewards encouraging efficient exploration are the entropy of action policy and curiosity for information gain. Entropy is well established in the literature, promoting randomized action selection. Curiosity is defined in a broad variety of ways in literature, promoting discovery of novel experiences. One example, prediction error curiosity, rewards agents for discovering observations they cannot accurately predict. However, such agents may be distracted by unpredictable observational noises known as curiosity traps. Based on the free energy principle (FEP), this letter proposes hidden state curiosity, which rewards agents by the KL divergence between the predictive prior and posterior probabilities of latent variables. We trained six types of agents to navigate mazes: baseline agents without rewards for entropy or curiosity and agents rewarded for entropy and/or either prediction error curiosity or hidden state curiosity. We find that entropy and curiosity result in efficient exploration, especially both employed together. Notably, agents with hidden state curiosity demonstrate resilience against curiosity traps, which hinder agents with prediction error curiosity. This suggests implementing the FEP that may enhance the robustness and generalization of RL models, potentially aligning the learning processes of artificial and biological agents.

在强化学习（RL）中，人工代理被训练成通过执行任务来最大化数字奖励。在强化学习中，探索是必不可少的，因为代理必须先发现信息，然后再加以利用。行动策略的熵和对信息增益的好奇心是鼓励高效探索的两种奖励。熵在文献中已得到公认，可促进随机行动选择。好奇心在文献中的定义多种多样，它促进了新经验的发现。其中一个例子是预测错误好奇心，它奖励发现自己无法准确预测的观察结果的代理。然而，这些代理可能会被不可预测的观察噪音所干扰，这些噪音被称为 "好奇心陷阱"。基于自由能原理（FEP），这封信提出了隐态好奇心，它通过潜变量的预测先验概率和后验概率之间的 KL 分歧来奖励代理。我们训练了六种代理进行迷宫导航：没有熵或好奇心奖励的基准代理，以及有熵和/或预测误差好奇心或隐藏状态好奇心奖励的代理。我们发现，熵和好奇心能带来高效的探索，尤其是两者同时使用时。值得注意的是，具有隐藏状态好奇心的代理能够抵御好奇心陷阱，而好奇心陷阱会阻碍具有预测错误好奇心的代理。这表明，实施 FEP 可以增强 RL 模型的稳健性和泛化能力，从而有可能使人工和生物代理的学习过程相一致。

{"title":"Intrinsic Rewards for Exploration Without Harm From Observational Noise: A Simulation Study Based on the Free Energy Principle","authors":"Theodore Jerome Tinker;Kenji Doya;Jun Tani","doi":"10.1162/neco_a_01690","DOIUrl":"10.1162/neco_a_01690","url":null,"abstract":"In reinforcement learning (RL), artificial agents are trained to maximize numerical rewards by performing tasks. Exploration is essential in RL because agents must discover information before exploiting it. Two rewards encouraging efficient exploration are the entropy of action policy and curiosity for information gain. Entropy is well established in the literature, promoting randomized action selection. Curiosity is defined in a broad variety of ways in literature, promoting discovery of novel experiences. One example, prediction error curiosity, rewards agents for discovering observations they cannot accurately predict. However, such agents may be distracted by unpredictable observational noises known as curiosity traps. Based on the free energy principle (FEP), this letter proposes hidden state curiosity, which rewards agents by the KL divergence between the predictive prior and posterior probabilities of latent variables. We trained six types of agents to navigate mazes: baseline agents without rewards for entropy or curiosity and agents rewarded for entropy and/or either prediction error curiosity or hidden state curiosity. We find that entropy and curiosity result in efficient exploration, especially both employed together. Notably, agents with hidden state curiosity demonstrate resilience against curiosity traps, which hinder agents with prediction error curiosity. This suggests implementing the FEP that may enhance the robustness and generalization of RL models, potentially aligning the learning processes of artificial and biological agents.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1854-1885"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141898954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Human Eyes–Inspired Recurrent Neural Networks Are More Robust Against Adversarial Noises 人眼启发的递归神经网络在对抗对抗性噪声时更稳健

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-08-19 DOI: 10.1162/neco_a_01688

Minkyu Choi;Yizhen Zhang;Kuan Han;Xiaokai Wang;Zhongming Liu

Humans actively observe the visual surroundings by focusing on salient objects and ignoring trivial details. However, computer vision models based on convolutional neural networks (CNN) often analyze visual input all at once through a single feedforward pass. In this study, we designed a dual-stream vision model inspired by the human brain. This model features retina-like input layers and includes two streams: one determining the next point of focus (the fixation), while the other interprets the visuals surrounding the fixation. Trained on image recognition, this model examines an image through a sequence of fixations, each time focusing on different parts, thereby progressively building a representation of the image. We evaluated this model against various benchmarks in terms of object recognition, gaze behavior, and adversarial robustness. Our findings suggest that the model can attend and gaze in ways similar to humans without being explicitly trained to mimic human attention and that the model can enhance robustness against adversarial attacks due to its retinal sampling and recurrent processing. In particular, the model can correct its perceptual errors by taking more glances, setting itself apart from all feedforward-only models. In conclusion, the interactions of retinal sampling, eye movement, and recurrent dynamics are important to human-like visual exploration and inference.

人类会主动观察周围的视觉环境，将注意力集中在突出的物体上，而忽略琐碎的细节。然而，基于卷积神经网络（CNN）的计算机视觉模型通常通过单一前馈传递一次性分析视觉输入。在这项研究中，我们设计了一种受人脑启发的双流视觉模型。该模型具有类似视网膜的输入层，包括两个流：一个流确定下一个焦点（固定点），另一个流解释固定点周围的视觉效果。经过图像识别训练后，该模型通过一连串的定点来检查图像，每次都聚焦于不同的部分，从而逐步建立图像的表征。我们根据物体识别、凝视行为和对抗鲁棒性等方面的各种基准对该模型进行了评估。我们的研究结果表明，该模型能够以与人类类似的方式关注和凝视，而无需进行明确的模仿人类关注的训练，并且由于其视网膜采样和循环处理功能，该模型能够增强抵御对抗性攻击的鲁棒性。特别是，该模型可以通过多看一眼来纠正其感知错误，从而与所有纯前馈模型区分开来。总之，视网膜采样、眼球运动和递归动力学的相互作用对于类人视觉探索和推理非常重要。

{"title":"Human Eyes–Inspired Recurrent Neural Networks Are More Robust Against Adversarial Noises","authors":"Minkyu Choi;Yizhen Zhang;Kuan Han;Xiaokai Wang;Zhongming Liu","doi":"10.1162/neco_a_01688","DOIUrl":"10.1162/neco_a_01688","url":null,"abstract":"Humans actively observe the visual surroundings by focusing on salient objects and ignoring trivial details. However, computer vision models based on convolutional neural networks (CNN) often analyze visual input all at once through a single feedforward pass. In this study, we designed a dual-stream vision model inspired by the human brain. This model features retina-like input layers and includes two streams: one determining the next point of focus (the fixation), while the other interprets the visuals surrounding the fixation. Trained on image recognition, this model examines an image through a sequence of fixations, each time focusing on different parts, thereby progressively building a representation of the image. We evaluated this model against various benchmarks in terms of object recognition, gaze behavior, and adversarial robustness. Our findings suggest that the model can attend and gaze in ways similar to humans without being explicitly trained to mimic human attention and that the model can enhance robustness against adversarial attacks due to its retinal sampling and recurrent processing. In particular, the model can correct its perceptual errors by taking more glances, setting itself apart from all feedforward-only models. In conclusion, the interactions of retinal sampling, eye movement, and recurrent dynamics are important to human-like visual exploration and inference.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 9","pages":"1713-1743"},"PeriodicalIF":2.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141898953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Extended Poisson Gaussian-Process Latent Variable Model for Unsupervised Neural Decoding 用于无监督神经解码的扩展泊松高斯过程潜变量模型

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-07-19 DOI: 10.1162/neco_a_01685

Della Daiyi Luo;Bapun Giri;Kamran Diba;Caleb Kemere

Dimension reduction on neural activity paves a way for unsupervised neural decoding by dissociating the measurement of internal neural pattern reactivation from the measurement of external variable tuning. With assumptions only on the smoothness of latent dynamics and of internal tuning curves, the Poisson gaussian-process latent variable model (P-GPLVM; Wu et al., 2017) is a powerful tool to discover the low-dimensional latent structure for high-dimensional spike trains. However, when given novel neural data, the original model lacks a method to infer their latent trajectories in the learned latent space, limiting its ability for estimating the neural reactivation. Here, we extend the P-GPLVM to enable the latent variable inference of new data constrained by previously learned smoothness and mapping information. We also describe a principled approach for the constrained latent variable inference for temporally compressed patterns of activity, such as those found in population burst events during hippocampal sharp-wave ripples, as well as metrics for assessing the validity of neural pattern reactivation and inferring the encoded experience. Applying these approaches to hippocampal ensemble recordings during active maze exploration, we replicate the result that P-GPLVM learns a latent space encoding the animal’s position. We further demonstrate that this latent space can differentiate one maze context from another. By inferring the latent variables of new neural data during running, certain neural patterns are observed to reactivate, in accordance with the similarity of experiences encoded by its nearby neural trajectories in the training data manifold. Finally, reactivation of neural patterns can be estimated for neural activity during population burst events as well, allowing the identification for replay events of versatile behaviors and more general experiences. Thus, our extension of the P-GPLVM framework for unsupervised analysis of neural activity can be used to answer critical questions related to scientific discovery.

通过将内部神经模式再激活的测量与外部变量调谐的测量分离开来，神经活动的降维为无监督神经解码铺平了道路。泊松高斯过程潜变量模型（P-GPLVM；Wu 等人，2017 年）仅假定潜动态和内部调谐曲线的平滑性，是发现高维尖峰列车低维潜结构的有力工具。然而，当给定新的神经数据时，原始模型缺乏一种方法来推断其在所学潜空间中的潜轨迹，从而限制了其估计神经再激活的能力。在此，我们对 P-GPLVM 进行了扩展，使其能够在先前学习的平滑度和映射信息的约束下对新数据进行潜变量推断。我们还介绍了一种原则性方法，用于对时间压缩的活动模式（如海马尖波涟漪中的群体突发性事件）进行受限潜变量推断，以及用于评估神经模式再激活有效性和推断编码经验的指标。将这些方法应用于主动迷宫探索过程中的海马集合记录，我们复制了 P-GPLVM 学习编码动物位置的潜空间的结果。我们进一步证明，这一潜在空间可以区分不同的迷宫环境。通过推断奔跑过程中新神经数据的潜变量，我们观察到某些神经模式会根据训练数据流形中附近神经轨迹所编码经验的相似性而重新激活。最后，神经模式的重新激活还可以对群体爆发事件中的神经活动进行估算，从而对多变行为和更普遍经验的重放事件进行识别。因此，我们对 P-GPLVM 框架的扩展可用于神经活动的无监督分析，从而回答与科学发现相关的关键问题。

{"title":"Extended Poisson Gaussian-Process Latent Variable Model for Unsupervised Neural Decoding","authors":"Della Daiyi Luo;Bapun Giri;Kamran Diba;Caleb Kemere","doi":"10.1162/neco_a_01685","DOIUrl":"10.1162/neco_a_01685","url":null,"abstract":"Dimension reduction on neural activity paves a way for unsupervised neural decoding by dissociating the measurement of internal neural pattern reactivation from the measurement of external variable tuning. With assumptions only on the smoothness of latent dynamics and of internal tuning curves, the Poisson gaussian-process latent variable model (P-GPLVM; Wu et al., 2017) is a powerful tool to discover the low-dimensional latent structure for high-dimensional spike trains. However, when given novel neural data, the original model lacks a method to infer their latent trajectories in the learned latent space, limiting its ability for estimating the neural reactivation. Here, we extend the P-GPLVM to enable the latent variable inference of new data constrained by previously learned smoothness and mapping information. We also describe a principled approach for the constrained latent variable inference for temporally compressed patterns of activity, such as those found in population burst events during hippocampal sharp-wave ripples, as well as metrics for assessing the validity of neural pattern reactivation and inferring the encoded experience. Applying these approaches to hippocampal ensemble recordings during active maze exploration, we replicate the result that P-GPLVM learns a latent space encoding the animal’s position. We further demonstrate that this latent space can differentiate one maze context from another. By inferring the latent variables of new neural data during running, certain neural patterns are observed to reactivate, in accordance with the similarity of experiences encoded by its nearby neural trajectories in the training data manifold. Finally, reactivation of neural patterns can be estimated for neural activity during population burst events as well, allowing the identification for replay events of versatile behaviors and more general experiences. Thus, our extension of the P-GPLVM framework for unsupervised analysis of neural activity can be used to answer critical questions related to scientific discovery.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 8","pages":"1449-1475"},"PeriodicalIF":2.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141728313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Energy Complexity of Convolutional Neural Networks 卷积神经网络的能量复杂性

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-07-19 DOI: 10.1162/neco_a_01676

Jiří Šíma;Petra Vidnerová;Vojtěch Mrázek

The energy efficiency of hardware implementations of convolutional neural networks (CNNs) is critical to their widespread deployment in low-power mobile devices. Recently, a number of methods have been proposed for providing energy-optimal mappings of CNNs onto diverse hardware accelerators. Their estimated energy consumption is related to specific implementation details and hardware parameters, which does not allow for machine-independent exploration of CNN energy measures. In this letter, we introduce a simplified theoretical energy complexity model for CNNs, based on only a two-level memory hierarchy that captures asymptotically all important sources of energy consumption for different CNN hardware implementations. In this model, we derive a simple energy lower bound and calculate the energy complexity of evaluating a CNN layer for two common data flows, providing corresponding upper bounds. According to statistical tests, the theoretical energy upper and lower bounds we present fit asymptotically very well with the real energy consumption of CNN implementations on the Simba and Eyeriss hardware platforms, estimated by the Timeloop/Accelergy program, which validates the proposed energy complexity model for CNNs.

卷积神经网络（CNN）硬件实现的能效对其在低功耗移动设备中的广泛应用至关重要。最近，人们提出了许多方法，将卷积神经网络的能耗优化映射到不同的硬件加速器上。这些方法估计的能耗与具体的实现细节和硬件参数有关，因此无法独立于机器探索 CNN 的能耗措施。在这封信中，我们介绍了一种简化的 CNN 理论能耗复杂度模型，该模型仅基于两级内存层次结构，可近似捕捉不同 CNN 硬件实现的所有重要能耗源。在该模型中，我们得出了一个简单的能耗下限，并计算了针对两种常见数据流评估 CNN 层的能耗复杂度，同时提供了相应的能耗上限。根据统计测试，我们提出的理论能耗上下限与 Simba 和 Eyeriss 硬件平台上 CNN 实现的实际能耗（由 Timeloop/Accelergy 程序估算）在渐近上非常吻合，这验证了我们提出的 CNN 能耗复杂度模型。

{"title":"Energy Complexity of Convolutional Neural Networks","authors":"Jiří Šíma;Petra Vidnerová;Vojtěch Mrázek","doi":"10.1162/neco_a_01676","DOIUrl":"10.1162/neco_a_01676","url":null,"abstract":"The energy efficiency of hardware implementations of convolutional neural networks (CNNs) is critical to their widespread deployment in low-power mobile devices. Recently, a number of methods have been proposed for providing energy-optimal mappings of CNNs onto diverse hardware accelerators. Their estimated energy consumption is related to specific implementation details and hardware parameters, which does not allow for machine-independent exploration of CNN energy measures. In this letter, we introduce a simplified theoretical energy complexity model for CNNs, based on only a two-level memory hierarchy that captures asymptotically all important sources of energy consumption for different CNN hardware implementations. In this model, we derive a simple energy lower bound and calculate the energy complexity of evaluating a CNN layer for two common data flows, providing corresponding upper bounds. According to statistical tests, the theoretical energy upper and lower bounds we present fit asymptotically very well with the real energy consumption of CNN implementations on the Simba and Eyeriss hardware platforms, estimated by the Timeloop/Accelergy program, which validates the proposed energy complexity model for CNNs.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 8","pages":"1601-1625"},"PeriodicalIF":2.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141082351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Trade-Offs Between Energy and Depth of Neural Networks 神经网络能量与深度之间的权衡。

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-07-19 DOI: 10.1162/neco_a_01683

Kei Uchizawa;Haruki Abe

We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources—size (the number of gates), depth (the number of layers), weight (weight resolution), and energy—where the energy is a complexity measure inspired by sparse coding and is defined as the maximum number of gates outputting nonzero values, taken over all the input assignments. As our main result, we prove that if a threshold circuit C of size s, depth d, energy e, and weight w computes a Boolean function f (i.e., a classification task) of n variables, it holds that log( rk (f))≤ed(logs+logw+logn) regardless of the algorithm employed by C to compute f, where rk (f) is a parameter solely determined by a scale of f and defined as the maximum rank of a communication matrix with regard to f taken over all the possible partitions of the n input variables. For example, given a Boolean function CD n(ξ) = ⋁i=1n/2ξi∧ξn/2+i, we can prove that n/2≤ed( log s+logw+logn) holds for any circuit C computing CD n. While its left-hand side is linear in n, its right-hand side is bounded by the product of the logarithmic factors of s,w,n and the linear factors of d,e. If we view the logarithmic terms as having a negligible impact on the bound, our result implies a trade-off between depth and energy: n/2 needs to be smaller than the product of e and d. For other neural network models, such as discretized ReLU circuits and discretized sigmoid circuits, we also prove that a similar trade-off holds. Thus, our results indicate that increasing depth linearly enhances the capability of neural networks to acquire sparse representations when there are hardware constraints on the number of neurons and weight resolution.

我们根据以下四种计算资源--大小（门电路数量）、深度（层数）、权重（权重分辨率）和能量--对阈值电路和其他离散化神经网络进行了研究，其中能量是受稀疏编码启发的一种复杂性度量，定义为在所有输入分配中输出非零值的门电路的最大数量。作为我们的主要结果，我们证明，如果一个大小为 s、深度为 d、能量为 e、权重为 w 的阈值电路 C 计算一个包含 n 个变量的布尔函数 f（即分类任务），那么无论 C 采用何种算法计算 f，log( rk (f))≤ed(logs+logw+logn)都是成立的，其中 rk (f) 是一个完全由 f 的规模决定的参数，定义为在 n 个输入变量的所有可能分区中与 f 有关的通信矩阵的最大秩。例如，给定布尔函数 CD n(ξ) =⋁i=1n/2ξi∧ξn/2+i，我们可以证明，对于任何计算 CD n 的电路 C，n/2≤ed( log s+logw+logn) 都成立。如果我们认为对数项对边界的影响可以忽略不计，那么我们的结果就意味着深度和能量之间的权衡：n/2 必须小于 e 和 d 的乘积。对于其他神经网络模型，如离散化 ReLU 电路和离散化 sigmoid 电路，我们也证明了类似的权衡。因此，我们的研究结果表明，当神经元数量和权重分辨率受到硬件限制时，深度的增加会线性地增强神经网络获取稀疏表征的能力。

{"title":"Trade-Offs Between Energy and Depth of Neural Networks","authors":"Kei Uchizawa;Haruki Abe","doi":"10.1162/neco_a_01683","DOIUrl":"10.1162/neco_a_01683","url":null,"abstract":"We present an investigation on threshold circuits and other discretized neural networks in terms of the following four computational resources—size (the number of gates), depth (the number of layers), weight (weight resolution), and energy—where the energy is a complexity measure inspired by sparse coding and is defined as the maximum number of gates outputting nonzero values, taken over all the input assignments. As our main result, we prove that if a threshold circuit C of size s, depth d, energy e, and weight w computes a Boolean function f (i.e., a classification task) of n variables, it holds that log( rk (f))≤ed(logs+logw+logn) regardless of the algorithm employed by C to compute f, where rk (f) is a parameter solely determined by a scale of f and defined as the maximum rank of a communication matrix with regard to f taken over all the possible partitions of the n input variables. For example, given a Boolean function CD n(ξ) = ⋁i=1n/2ξi∧ξn/2+i, we can prove that n/2≤ed( log s+logw+logn) holds for any circuit C computing CD n. While its left-hand side is linear in n, its right-hand side is bounded by the product of the logarithmic factors of s,w,n and the linear factors of d,e. If we view the logarithmic terms as having a negligible impact on the bound, our result implies a trade-off between depth and energy: n/2 needs to be smaller than the product of e and d. For other neural network models, such as discretized ReLU circuits and discretized sigmoid circuits, we also prove that a similar trade-off holds. Thus, our results indicate that increasing depth linearly enhances the capability of neural networks to acquire sparse representations when there are hardware constraints on the number of neurons and weight resolution.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 8","pages":"1541-1567"},"PeriodicalIF":2.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141728316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Promoting the Shift From Pixel-Level Correlations to Object Semantics Learning by Rethinking Computer Vision Benchmark Data Sets 通过重新思考计算机视觉基准数据集，促进从像素级相关性学习到物体语义学习的转变

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-07-19 DOI: 10.1162/neco_a_01677

Maria Osório;Andreas Wichert

In computer vision research, convolutional neural networks (CNNs) have demonstrated remarkable capabilities at extracting patterns from raw pixel data, achieving state-of-the-art recognition accuracy. However, they significantly differ from human visual perception, prioritizing pixel-level correlations and statistical patterns, often overlooking object semantics. To explore this difference, we propose an approach that isolates core visual features crucial for human perception and object recognition: color, texture, and shape. In experiments on three benchmarks—Fruits 360, CIFAR-10, and Fashion MNIST—each visual feature is individually input into a neural network. Results reveal data set–dependent variations in classification accuracy, highlighting that deep learning models tend to learn pixel-level correlations instead of fundamental visual features. To validate this observation, we used various combinations of concatenated visual features as input for a neural network on the CIFAR-10 data set. CNNs excel at learning statistical patterns in images, achieving exceptional performance when training and test data share similar distributions. To substantiate this point, we trained a CNN on CIFAR-10 data set and evaluated its performance on the “dog” class from CIFAR-10 and on an equivalent number of examples from the Stanford Dogs data set. The CNN poor performance on Stanford Dogs images underlines the disparity between deep learning and human visual perception, highlighting the need for models that learn object semantics. Specialized benchmark data sets with controlled variations hold promise for aligning learned representations with human cognition in computer vision research.

在计算机视觉研究中，卷积神经网络（CNN）在从原始像素数据中提取模式方面表现出了非凡的能力，达到了最先进的识别精度。然而，它们与人类的视觉感知有很大不同，它们优先考虑像素级的相关性和统计模式，往往忽略了物体的语义。为了探索这种差异，我们提出了一种方法，它能分离出对人类感知和物体识别至关重要的核心视觉特征：颜色、纹理和形状。在三个基准--水果 360、CIFAR-10 和时尚 MNIST--的实验中，每个视觉特征都被单独输入到神经网络中。实验结果表明，分类准确率随数据集而变化，这突出表明深度学习模型倾向于学习像素级相关性，而不是基本视觉特征。为了验证这一观点，我们在 CIFAR-10 数据集上使用了不同的视觉特征串联组合作为神经网络的输入。CNN 擅长学习图像中的统计模式，在训练数据和测试数据具有相似分布的情况下，CNN 可实现卓越的性能。为了证明这一点，我们在 CIFAR-10 数据集上训练了一个 CNN，并评估了它在 CIFAR-10 的 "狗 "类和斯坦福狗数据集的同等数量示例上的表现。CNN 在 "斯坦福狗 "图像上的表现不佳，凸显了深度学习与人类视觉感知之间的差距，强调了学习对象语义的模型的必要性。具有可控变化的专用基准数据集有望使计算机视觉研究中的学习表征与人类认知相一致。

{"title":"Promoting the Shift From Pixel-Level Correlations to Object Semantics Learning by Rethinking Computer Vision Benchmark Data Sets","authors":"Maria Osório;Andreas Wichert","doi":"10.1162/neco_a_01677","DOIUrl":"10.1162/neco_a_01677","url":null,"abstract":"In computer vision research, convolutional neural networks (CNNs) have demonstrated remarkable capabilities at extracting patterns from raw pixel data, achieving state-of-the-art recognition accuracy. However, they significantly differ from human visual perception, prioritizing pixel-level correlations and statistical patterns, often overlooking object semantics. To explore this difference, we propose an approach that isolates core visual features crucial for human perception and object recognition: color, texture, and shape. In experiments on three benchmarks—Fruits 360, CIFAR-10, and Fashion MNIST—each visual feature is individually input into a neural network. Results reveal data set–dependent variations in classification accuracy, highlighting that deep learning models tend to learn pixel-level correlations instead of fundamental visual features. To validate this observation, we used various combinations of concatenated visual features as input for a neural network on the CIFAR-10 data set. CNNs excel at learning statistical patterns in images, achieving exceptional performance when training and test data share similar distributions. To substantiate this point, we trained a CNN on CIFAR-10 data set and evaluated its performance on the “dog” class from CIFAR-10 and on an equivalent number of examples from the Stanford Dogs data set. The CNN poor performance on Stanford Dogs images underlines the disparity between deep learning and human visual perception, highlighting the need for models that learn object semantics. Specialized benchmark data sets with controlled variations hold promise for aligning learned representations with human cognition in computer vision research.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 8","pages":"1626-1642"},"PeriodicalIF":2.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141082367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0