Neural Computation最新文献

英文中文

Bounded Rational Decision Networks With Belief Propagation 带信念传播的有界理性决策网络。

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-12-12 DOI: 10.1162/neco_a_01719

Gerrit Schmid;Sebastian Gottwald;Daniel A. Braun

Complex information processing systems that are capable of a wide variety of tasks, such as the human brain, are composed of specialized units that collaborate and communicate with each other. An important property of such information processing networks is locality: there is no single global unit controlling the modules, but information is exchanged locally. Here, we consider a decision-theoretic approach to study networks of bounded rational decision makers that are allowed to specialize and communicate with each other. In contrast to previous work that has focused on feedforward communication between decision-making agents, we consider cyclical information processing paths allowing for back-and-forth communication. We adapt message-passing algorithms to suit this purpose, essentially allowing for local information flow between units and thus enabling circular dependency structures. We provide examples that show how repeated communication can increase performance given that each unit’s information processing capability is limited and that decision-making systems with too few or too many connections and feedback loops achieve suboptimal utility.

能够执行各种任务的复杂信息处理系统（如人脑）是由相互协作和通信的专门单元组成的。这类信息处理网络的一个重要特性是局部性：没有一个控制模块的全局单元，而是在局部交换信息。在这里，我们考虑用决策理论的方法来研究由有界理性决策者组成的网络，这些决策者可以进行专业化分工并相互交流。以往的研究主要关注决策制定者之间的前馈通信，与此不同的是，我们考虑的是允许前后通信的循环信息处理路径。我们调整了信息传递算法以适应这一目的，从根本上允许单元之间的局部信息流，从而实现循环依赖结构。我们举例说明了在每个单元的信息处理能力有限的情况下，重复通信如何提高性能，以及连接和反馈回路过少或过多的决策系统如何实现次优效用。

{"title":"Bounded Rational Decision Networks With Belief Propagation","authors":"Gerrit Schmid;Sebastian Gottwald;Daniel A. Braun","doi":"10.1162/neco_a_01719","DOIUrl":"10.1162/neco_a_01719","url":null,"abstract":"Complex information processing systems that are capable of a wide variety of tasks, such as the human brain, are composed of specialized units that collaborate and communicate with each other. An important property of such information processing networks is locality: there is no single global unit controlling the modules, but information is exchanged locally. Here, we consider a decision-theoretic approach to study networks of bounded rational decision makers that are allowed to specialize and communicate with each other. In contrast to previous work that has focused on feedforward communication between decision-making agents, we consider cyclical information processing paths allowing for back-and-forth communication. We adapt message-passing algorithms to suit this purpose, essentially allowing for local information flow between units and thus enabling circular dependency structures. We provide examples that show how repeated communication can increase performance given that each unit’s information processing capability is limited and that decision-making systems with too few or too many connections and feedback loops achieve suboptimal utility.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 1","pages":"76-127"},"PeriodicalIF":2.7,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10810330","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Computation With Sequences of Assemblies in a Model of the Brain 用大脑模型中的集合序列进行计算

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-12-12 DOI: 10.1162/neco_a_01720

Max Dabagia;Christos H. Papadimitriou;Santosh S. Vempala

Even as machine learning exceeds human-level performance on many applications, the generality, robustness, and rapidity of the brain’s learning capabilities remain unmatched. How cognition arises from neural activity is the central open question in neuroscience, inextricable from the study of intelligence itself. A simple formal model of neural activity was proposed in Papadimitriou et al. (2020) and has been subsequently shown, through both mathematical proofs and simulations, to be capable of implementing certain simple cognitive operations via the creation and manipulation of assemblies of neurons. However, many intelligent behaviors rely on the ability to recognize, store, and manipulate temporal sequences of stimuli (planning, language, navigation, to list a few). Here we show that in the same model, sequential precedence can be captured naturally through synaptic weights and plasticity, and, as a result, a range of computations on sequences of assemblies can be carried out. In particular, repeated presentation of a sequence of stimuli leads to the memorization of the sequence through corresponding neural assemblies: upon future presentation of any stimulus in the sequence, the corresponding assembly and its subsequent ones will be activated, one after the other, until the end of the sequence. If the stimulus sequence is presented to two brain areas simultaneously, a scaffolded representation is created, resulting in more efficient memorization and recall, in agreement with cognitive experiments. Finally, we show that any finite state machine can be learned in a similar way, through the presentation of appropriate patterns of sequences. Through an extension of this mechanism, the model can be shown to be capable of universal computation. Taken together, these results provide a concrete hypothesis for the basis of the brain’s remarkable abilities to compute and learn, with sequences playing a vital role.

即使机器学习在许多应用领域的表现已超过人类水平，大脑学习能力的通用性、鲁棒性和快速性仍然无与伦比。认知是如何从神经活动中产生的，这是神经科学的核心未决问题，与智能本身的研究密不可分。帕帕季米特里乌（Papadimitriou）等人（2020 年）提出了一个简单的神经活动形式模型，随后通过数学证明和模拟证明，该模型能够通过创建和操纵神经元集合实现某些简单的认知操作。然而，许多智能行为都依赖于识别、存储和操纵刺激的时间序列的能力（如规划、语言、导航等）。在这里，我们展示了在同一个模型中，可以通过突触权重和可塑性自然地捕捉顺序优先性，从而可以对集合序列进行一系列计算。特别是，重复呈现刺激序列会导致通过相应的神经集合记忆序列：当序列中的任何刺激在未来呈现时，相应的神经集合及其后续的神经集合都会被激活，一个接一个，直到序列结束。如果刺激序列同时呈现在两个脑区，就会形成一个支架式表征，从而提高记忆和回忆的效率，这与认知实验的结果是一致的。最后，我们证明，通过呈现适当的序列模式，任何有限状态机都能以类似的方式被学习。通过对这一机制的扩展，可以证明该模型能够进行通用计算。综上所述，这些结果为大脑非凡的计算和学习能力的基础提供了一个具体的假设，而序列在其中扮演着至关重要的角色。

{"title":"Computation With Sequences of Assemblies in a Model of the Brain","authors":"Max Dabagia;Christos H. Papadimitriou;Santosh S. Vempala","doi":"10.1162/neco_a_01720","DOIUrl":"10.1162/neco_a_01720","url":null,"abstract":"Even as machine learning exceeds human-level performance on many applications, the generality, robustness, and rapidity of the brain’s learning capabilities remain unmatched. How cognition arises from neural activity is the central open question in neuroscience, inextricable from the study of intelligence itself. A simple formal model of neural activity was proposed in Papadimitriou et al. (2020) and has been subsequently shown, through both mathematical proofs and simulations, to be capable of implementing certain simple cognitive operations via the creation and manipulation of assemblies of neurons. However, many intelligent behaviors rely on the ability to recognize, store, and manipulate temporal sequences of stimuli (planning, language, navigation, to list a few). Here we show that in the same model, sequential precedence can be captured naturally through synaptic weights and plasticity, and, as a result, a range of computations on sequences of assemblies can be carried out. In particular, repeated presentation of a sequence of stimuli leads to the memorization of the sequence through corresponding neural assemblies: upon future presentation of any stimulus in the sequence, the corresponding assembly and its subsequent ones will be activated, one after the other, until the end of the sequence. If the stimulus sequence is presented to two brain areas simultaneously, a scaffolded representation is created, resulting in more efficient memorization and recall, in agreement with cognitive experiments. Finally, we show that any finite state machine can be learned in a similar way, through the presentation of appropriate patterns of sequences. Through an extension of this mechanism, the model can be shown to be capable of universal computation. Taken together, these results provide a concrete hypothesis for the basis of the brain’s remarkable abilities to compute and learn, with sequences playing a vital role.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 1","pages":"193-233"},"PeriodicalIF":2.7,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Computing With Residue Numbers in High-Dimensional Representation 用高维表示法计算残差数

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-12-12 DOI: 10.1162/neco_a_01723

Christopher J. Kymn;Denis Kleyko;E. Paxon Frady;Connor Bybee;Pentti Kanerva;Friedrich T. Sommer;Bruno A. Olshausen

We introduce residue hyperdimensional computing, a computing framework that unifies residue number systems with an algebra defined over random, high-dimensional vectors. We show how residue numbers can be represented as high-dimensional vectors in a manner that allows algebraic operations to be performed with component-wise, parallelizable operations on the vector elements. The resulting framework, when combined with an efficient method for factorizing high-dimensional vectors, can represent and operate on numerical values over a large dynamic range using resources that scale only logarithmically with the range, a vast improvement over previous methods. It also exhibits impressive robustness to noise. We demonstrate the potential for this framework to solve computationally difficult problems in visual perception and combinatorial optimization, showing improvement over baseline methods. More broadly, the framework provides a possible account for the computational operations of grid cells in the brain, and it suggests new machine learning architectures for representing and manipulating numerical data.

我们介绍了残差超维计算，这是一种将残差数系统与定义在随机高维向量上的代数统一起来的计算框架。我们展示了如何将残差数表示为高维向量，从而可以通过对向量元素进行分量式并行运算来执行代数运算。由此产生的框架与对高维向量进行因式分解的高效方法相结合，可以在很大的动态范围内表示和运算数值，所使用的资源仅随动态范围的对数变化而变化，比以前的方法有了很大的改进。它对噪声的鲁棒性也令人印象深刻。我们展示了这一框架在解决视觉感知和组合优化等计算困难问题方面的潜力，并显示出与基线方法相比的改进。更广泛地说，该框架为大脑中网格细胞的计算操作提供了可能的解释，并为表示和处理数字数据提出了新的机器学习架构。

引用次数: 0

Selective Inference for Change Point Detection by Recurrent Neural Network 利用递归神经网络进行变化点检测的选择性推理

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-12-12 DOI: 10.1162/neco_a_01724

Tomohiro Shiraishi;Daiki Miwa;Vo Nguyen Le Duy;Ichiro Takeuchi

In this study, we investigate the quantification of the statistical reliability of detected change points (CPs) in time series using a recurrent neural network (RNN). Thanks to its flexibility, RNN holds the potential to effectively identify CPs in time series characterized by complex dynamics. However, there is an increased risk of erroneously detecting random noise fluctuations as CPs. The primary goal of this study is to rigorously control the risk of false detections by providing theoretically valid p-values to the CPs detected by RNN. To achieve this, we introduce a novel method based on the framework of selective inference (SI). SI enables valid inferences by conditioning on the event of hypothesis selection, thus mitigating bias from generating and testing hypotheses on the same data. In this study, we apply an SI framework to RNN-based CP detection, where characterizing the complex process of RNN selecting CPs is our main technical challenge. We demonstrate the validity and effectiveness of the proposed method through artificial and real data experiments.

在本研究中，我们利用递归神经网络（RNN）对时间序列中检测到的变化点（CP）的统计可靠性进行了量化研究。由于其灵活性，RNN 有潜力在具有复杂动态特征的时间序列中有效识别 CPs。然而，将随机噪声波动错误地检测为 CP 的风险也在增加。本研究的主要目标是为 RNN 检测到的 CP 提供理论上有效的 p 值，从而严格控制误检测的风险。为此，我们引入了一种基于选择性推理（SI）框架的新方法。选择性推理通过对假设选择事件的条件化来实现有效推理，从而减轻在相同数据上生成和测试假设的偏差。在本研究中，我们将 SI 框架应用于基于 RNN 的 CP 检测，其中，描述 RNN 选择 CP 的复杂过程是我们面临的主要技术挑战。我们通过人工和真实数据实验证明了所提方法的有效性和有效性。

引用次数: 0

Relating Human Error–Based Learning to Modern Deep RL Algorithms 将基于人类错误的学习与现代深度 RL 算法联系起来。

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-12-12 DOI: 10.1162/neco_a_01721

Michele Garibbo;Casimir J. H. Ludwig;Nathan F. Lepora;Laurence Aitchison

In human error–based learning, the size and direction of a scalar error (i.e., the “directed error”) are used to update future actions. Modern deep reinforcement learning (RL) methods perform a similar operation but in terms of scalar rewards. Despite this similarity, the relationship between action updates of deep RL and human error–based learning has yet to be investigated. Here, we systematically compare the three major families of deep RL algorithms to human error–based learning. We show that all three deep RL approaches are qualitatively different from human error–based learning, as assessed by a mirror-reversal perturbation experiment. To bridge this gap, we developed an alternative deep RL algorithm inspired by human error–based learning, model-based deterministic policy gradients (MB-DPG). We showed that MB-DPG captures human error–based learning under mirror-reversal and rotational perturbations and that MB-DPG learns faster than canonical model-free algorithms on complex arm-based reaching tasks, while being more robust to (forward) model misspecification than model-based RL.

在基于人类错误的学习中，标量错误（即 "定向错误"）的大小和方向被用来更新未来的行动。现代的深度强化学习（RL）方法也执行类似的操作，但都是以标量奖励为单位。尽管存在这种相似性，但深度强化学习的行动更新与基于人类错误的学习之间的关系仍有待研究。在这里，我们系统地比较了深度 RL 算法的三个主要系列与基于人类错误的学习之间的关系。通过镜像反转扰动实验的评估，我们发现所有三种深度 RL 方法都与基于人类错误的学习有质的区别。为了弥补这一差距，我们开发了另一种受基于人类错误学习启发的深度 RL 算法，即基于模型的确定性策略梯度（MB-DPG）。我们的研究表明，在镜像反转和旋转扰动下，MB-DPG 能捕捉到基于人类错误的学习，而且在复杂的基于手臂的伸手任务上，MB-DPG 比典型的无模型算法学习速度更快，同时比基于模型的 RL 对（前向）模型错误规范的鲁棒性更强。

引用次数: 0

Realizing Synthetic Active Inference Agents, Part II: Variational Message Updates 实现合成主动推理代理，第二部分：变异信息更新。

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-12-12 DOI: 10.1162/neco_a_01713

Thijs van de Laar;Magnus Koudahl;Bert de Vries

The free energy principle (FEP) describes (biological) agents as minimizing a variational free energy (FE) with respect to a generative model of their environment. Active inference (AIF) is a corollary of the FEP that describes how agents explore and exploit their environment by minimizing an expected FE objective. In two related papers, we describe a scalable, epistemic approach to synthetic AIF by message passing on free-form Forney-style factor graphs (FFGs). A companion paper (part I of this article; Koudahl et al., 2023) introduces a constrained FFG (CFFG) notation that visually represents (generalized) FE objectives for AIF. This article (part II) derives message-passing algorithms that minimize (generalized) FE objectives on a CFFG by variational calculus. A comparison between simulated Bethe and generalized FE agents illustrates how the message-passing approach to synthetic AIF induces epistemic behavior on a T-maze navigation task. Extension of the T-maze simulation to learning goal statistics and a multiagent bargaining setting illustrate how this approach encourages reuse of nodes and updates in alternative settings. With a full message-passing account of synthetic AIF agents, it becomes possible to derive and reuse message updates across models and move closer to industrial applications of synthetic AIF.

自由能原理（FEP）将（生物）代理描述为相对于其环境的生成模型最小化可变自由能（FE）。主动推理（AIF）是自由能原理的必然结果，它描述了生物体如何通过最小化预期自由能目标来探索和利用其环境。在两篇相关论文中，我们描述了通过在自由形式的福尼式因子图（FFGs）上进行消息传递来合成 AIF 的可扩展认识论方法。另一篇相关论文（本文第一部分；Koudahl 等人，2023 年）介绍了一种受限 FFG（CFFG）符号，它能直观地表示 AIF 的（广义）FE 目标。本文（第二部分）通过变分法推导了在 CFFG 上最小化（广义）FE 目标的消息传递算法。模拟贝特代理和广义 FE 代理之间的比较说明了合成 AIF 的信息传递方法如何在 T 型迷宫导航任务中诱导认识行为。将 T 型迷宫模拟扩展到学习目标统计和多代理讨价还价设置，说明了这种方法如何鼓励在其他设置中重复使用节点和更新。有了合成 AIF 代理的完整消息传递账户，就有可能在不同模型中推导和重用消息更新，并更接近合成 AIF 的工业应用。

{"title":"Realizing Synthetic Active Inference Agents, Part II: Variational Message Updates","authors":"Thijs van de Laar;Magnus Koudahl;Bert de Vries","doi":"10.1162/neco_a_01713","DOIUrl":"10.1162/neco_a_01713","url":null,"abstract":"The free energy principle (FEP) describes (biological) agents as minimizing a variational free energy (FE) with respect to a generative model of their environment. Active inference (AIF) is a corollary of the FEP that describes how agents explore and exploit their environment by minimizing an expected FE objective. In two related papers, we describe a scalable, epistemic approach to synthetic AIF by message passing on free-form Forney-style factor graphs (FFGs). A companion paper (part I of this article; Koudahl et al., 2023) introduces a constrained FFG (CFFG) notation that visually represents (generalized) FE objectives for AIF. This article (part II) derives message-passing algorithms that minimize (generalized) FE objectives on a CFFG by variational calculus. A comparison between simulated Bethe and generalized FE agents illustrates how the message-passing approach to synthetic AIF induces epistemic behavior on a T-maze navigation task. Extension of the T-maze simulation to learning goal statistics and a multiagent bargaining setting illustrate how this approach encourages reuse of nodes and updates in alternative settings. With a full message-passing account of synthetic AIF agents, it becomes possible to derive and reuse message updates across models and move closer to industrial applications of synthetic AIF.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 1","pages":"38-75"},"PeriodicalIF":2.7,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Fast Algorithm for All-Pairs-Shortest-Paths Suitable for Neural Networks 适合神经网络的全对最短路径快速算法

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-11-19 DOI: 10.1162/neco_a_01716

Zeyu Jing;Markus Meister

Given a directed graph of nodes and edges connecting them, a common problem is to find the shortest path between any two nodes. Here we show that the shortest path distances can be found by a simple matrix inversion: if the edges are given by the adjacency matrix Aij, then with a suitably small value of γ, the shortest path distances are Dij=ceil(logγ[(I-γA)-1]ij).We derive several graph-theoretic bounds on the value of γ and explore its useful range with numerics on different graph types. Even when the distance function is not globally accurate across the entire graph, it still works locally to instruct pursuit of the shortest path. In this mode, it also extends to weighted graphs with positive edge weights. For a wide range of dense graphs, this distance function is computationally faster than the best available alternative. Finally, we show that this method leads naturally to a neural network solution of the all-pairs-shortest-path problem.

给定一个由节点和连接节点的边组成的有向图，常见的问题是找出任意两个节点之间的最短路径。在这里，我们展示了最短路径距离可以通过简单的矩阵反转找到：如果边是由邻接矩阵 Aij 给出的，那么只要γ 的值适当小，最短路径距离就是 Dij=ceil(logγ[(I-γA)-1]ij)。即使距离函数在整个图中不是全局精确的，它仍能在局部发挥作用，指导追求最短路径。在这种模式下，它还能扩展到具有正边权重的加权图。对于各种密集图，该距离函数的计算速度都快于现有的最佳替代方法。最后，我们展示了这种方法自然而然地带来了全对最短路径问题的神经网络解决方案。

引用次数: 0

Orthogonal Gated Recurrent Unit With Neumann-Cayley Transformation 采用 Neumann-Cayley 变换的正交门控循环单元

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-11-19 DOI: 10.1162/neco_a_01710

Vasily Zadorozhnyy;Edison Mucllari;Cole Pospisil;Duc Nguyen;Qiang Ye

In recent years, using orthogonal matrices has been shown to be a promising approach to improving recurrent neural networks (RNNs) with training, stability, and convergence, particularly to control gradients. While gated recurrent unit (GRU) and long short-term memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the use of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and propose a Neumann series–based scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley orthogonal GRU (NC-GRU). We present detailed experiments of our model on several synthetic and real-world tasks, which show that NC-GRU significantly outperforms GRU and several other RNNs.

近年来，利用正交矩阵改进递归神经网络（RNN）的训练、稳定性和收敛性，特别是控制梯度，已被证明是一种很有前途的方法。虽然门控递归单元（GRU）和长短期记忆（LSTM）架构通过使用各种门和记忆单元解决了梯度消失问题，但它们仍然容易出现梯度爆炸问题。在这项工作中，我们分析了 GRU 中的梯度，并建议使用正交矩阵来防止梯度爆炸问题并增强长期记忆。我们研究了在何处使用正交矩阵，并提出了一种基于 Neumann 序列的缩放 Cayley 变换，用于在 GRU 中训练正交矩阵，我们称之为 Neumann-Cayley 正交 GRU（NC-GRU）。我们在多个合成任务和实际任务中对我们的模型进行了详细实验，结果表明 NC-GRU 明显优于 GRU 和其他几个 RNN。

引用次数: 0

Sparse-Coding Variational Autoencoders 稀疏编码变异自动编码器

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-11-19 DOI: 10.1162/neco_a_01715

Victor Geadah;Gabriel Barello;Daniel Greenidge;Adam S. Charles;Jonathan W. Pillow

The sparse coding model posits that the visual system has evolved to efficiently code natural stimuli using a sparse set of features from an overcomplete dictionary. The original sparse coding model suffered from two key limitations; however: (1) computing the neural response to an image patch required minimizing a nonlinear objective function via recurrent dynamics and (2) fitting relied on approximate inference methods that ignored uncertainty. Although subsequent work has developed several methods to overcome these obstacles, we propose a novel solution inspired by the variational autoencoder (VAE) framework. We introduce the sparse coding variational autoencoder (SVAE), which augments the sparse coding model with a probabilistic recognition model parameterized by a deep neural network. This recognition model provides a neurally plausible feedforward implementation for the mapping from image patches to neural activities and enables a principled method for fitting the sparse coding model to data via maximization of the evidence lower bound (ELBO). The SVAE differs from standard VAEs in three key respects: the latent representation is overcomplete (there are more latent dimensions than image pixels), the prior is sparse or heavy-tailed instead of gaussian, and the decoder network is a linear projection instead of a deep network. We fit the SVAE to natural image data under different assumed prior distributions and show that it obtains higher test performance than previous fitting methods. Finally, we examine the response properties of the recognition network and show that it captures important nonlinear properties of neurons in the early visual pathway.

稀疏编码模型认为，视觉系统在进化过程中使用了来自过度完整字典的稀疏特征集，对自然刺激进行了有效编码。然而，最初的稀疏编码模型存在两个主要局限：(1) 计算神经对图像补丁的响应需要通过递归动力学最小化非线性目标函数；(2) 拟合依赖于忽略不确定性的近似推理方法。尽管后续工作已经开发出多种方法来克服这些障碍，但我们还是从变异自动编码器（VAE）框架中获得启发，提出了一种新的解决方案。我们引入了稀疏编码变异自动编码器（SVAE），它通过深度神经网络参数化的概率识别模型来增强稀疏编码模型。该识别模型为从图像斑块到神经活动的映射提供了神经上可信的前馈实现，并通过证据下限（ELBO）最大化实现了稀疏编码模型与数据拟合的原则性方法。SVAE 在三个关键方面不同于标准 VAE：潜在表示过于完整（潜在维度多于图像像素），先验是稀疏或重尾而非高斯，解码器网络是线性投影而非深度网络。我们在不同的假定先验分布下对自然图像数据进行了 SVAE 拟合，结果表明它比以前的拟合方法获得了更高的测试性能。最后，我们检验了识别网络的响应特性，结果表明它捕捉到了早期视觉通路中神经元的重要非线性特性。

{"title":"Sparse-Coding Variational Autoencoders","authors":"Victor Geadah;Gabriel Barello;Daniel Greenidge;Adam S. Charles;Jonathan W. Pillow","doi":"10.1162/neco_a_01715","DOIUrl":"10.1162/neco_a_01715","url":null,"abstract":"The sparse coding model posits that the visual system has evolved to efficiently code natural stimuli using a sparse set of features from an overcomplete dictionary. The original sparse coding model suffered from two key limitations; however: (1) computing the neural response to an image patch required minimizing a nonlinear objective function via recurrent dynamics and (2) fitting relied on approximate inference methods that ignored uncertainty. Although subsequent work has developed several methods to overcome these obstacles, we propose a novel solution inspired by the variational autoencoder (VAE) framework. We introduce the sparse coding variational autoencoder (SVAE), which augments the sparse coding model with a probabilistic recognition model parameterized by a deep neural network. This recognition model provides a neurally plausible feedforward implementation for the mapping from image patches to neural activities and enables a principled method for fitting the sparse coding model to data via maximization of the evidence lower bound (ELBO). The SVAE differs from standard VAEs in three key respects: the latent representation is overcomplete (there are more latent dimensions than image pixels), the prior is sparse or heavy-tailed instead of gaussian, and the decoder network is a linear projection instead of a deep network. We fit the SVAE to natural image data under different assumed prior distributions and show that it obtains higher test performance than previous fitting methods. Finally, we examine the response properties of the recognition network and show that it captures important nonlinear properties of neurons in the early visual pathway.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 12","pages":"2571-2601"},"PeriodicalIF":2.7,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fine Granularity Is Critical for Intelligent Neural Network Pruning 精细度对智能神经网络剪枝至关重要

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-11-19 DOI: 10.1162/neco_a_01717

Alex Heyman;Joel Zylberberg

Neural network pruning is a popular approach to reducing the computational costs of training and/or deploying a network and aims to do so while minimizing accuracy loss. Pruning methods that remove individual weights (fine granularity) can remove more total network parameters before reaching a given degree of accuracy loss, while methods that preserve some or all of a network’s structure (coarser granularity, such as pruning channels from a CNN) take better advantage of hardware and software optimized for dense matrix computations. We compare intelligent iterative pruning using several different criteria sampled from the literature against random pruning at initialization across multiple granularities on two different architectures and three image classification tasks. Our work is the first direct and comprehensive investigation of the relationship between granularity and the efficacy of intelligent pruning relative to a random-pruning baseline. We find that the accuracy advantage of intelligent over random pruning decreases dramatically as granularity becomes coarser, with minimal advantage for intelligent pruning at granularity coarse enough to fully preserve network structure. For instance, at pruning rates where random pruning leaves ResNet-20 at 85.0% test accuracy on CIFAR-10 after 30,000 training iterations, intelligent weight pruning with the best-in-context criterion leaves it at about 90.0% accuracy (on par with the unpruned network), kernel pruning leaves it at about 86.5%, and channel pruning leaves it at about 85.5%. Our results suggest that compared to coarse pruning, fine pruning combined with efficient implementation of the resulting networks is a more promising direction for easing the trade-off between high accuracy and low computational cost.

神经网络剪枝是降低训练和/或部署网络计算成本的一种常用方法，其目的是在降低计算成本的同时最大限度地减少精度损失。去除单个权重（细粒度）的剪枝方法可以在达到一定的精度损失程度之前去除更多的网络总参数，而保留部分或全部网络结构（粗粒度，如从 CNN 中剪枝通道）的方法可以更好地利用针对密集矩阵计算进行优化的硬件和软件。我们在两种不同的架构和三种图像分类任务上，将使用文献中几种不同标准的智能迭代剪枝与初始化时的多粒度随机剪枝进行了比较。我们的工作是对粒度与智能剪枝相对于随机剪枝基线的功效之间关系的首次直接而全面的研究。我们发现，随着粒度变得越来越粗，智能剪枝相对于随机剪枝的准确性优势急剧下降，当粒度粗到足以完全保留网络结构时，智能剪枝的优势微乎其微。例如，在 30,000 次训练迭代后，随机剪枝使 ResNet-20 在 CIFAR-10 上的测试准确率为 85.0%，而采用最佳上下文准则的智能权重剪枝使其准确率约为 90.0%（与未剪枝网络相当），内核剪枝使其准确率约为 86.5%，通道剪枝使其准确率约为 85.5%。我们的研究结果表明，与粗剪枝相比，精细剪枝结合高效实施所生成的网络，是一个更有前景的方向，可以缓解高准确度和低计算成本之间的权衡。

{"title":"Fine Granularity Is Critical for Intelligent Neural Network Pruning","authors":"Alex Heyman;Joel Zylberberg","doi":"10.1162/neco_a_01717","DOIUrl":"10.1162/neco_a_01717","url":null,"abstract":"Neural network pruning is a popular approach to reducing the computational costs of training and/or deploying a network and aims to do so while minimizing accuracy loss. Pruning methods that remove individual weights (fine granularity) can remove more total network parameters before reaching a given degree of accuracy loss, while methods that preserve some or all of a network’s structure (coarser granularity, such as pruning channels from a CNN) take better advantage of hardware and software optimized for dense matrix computations. We compare intelligent iterative pruning using several different criteria sampled from the literature against random pruning at initialization across multiple granularities on two different architectures and three image classification tasks. Our work is the first direct and comprehensive investigation of the relationship between granularity and the efficacy of intelligent pruning relative to a random-pruning baseline. We find that the accuracy advantage of intelligent over random pruning decreases dramatically as granularity becomes coarser, with minimal advantage for intelligent pruning at granularity coarse enough to fully preserve network structure. For instance, at pruning rates where random pruning leaves ResNet-20 at 85.0% test accuracy on CIFAR-10 after 30,000 training iterations, intelligent weight pruning with the best-in-context criterion leaves it at about 90.0% accuracy (on par with the unpruned network), kernel pruning leaves it at about 86.5%, and channel pruning leaves it at about 85.5%. Our results suggest that compared to coarse pruning, fine pruning combined with efficient implementation of the resulting networks is a more promising direction for easing the trade-off between high accuracy and low computational cost.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 12","pages":"2677-2709"},"PeriodicalIF":2.7,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Neural Computation

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀