Neural Computation最新文献_第8页

Obtaining Lower Query Complexities Through Lightweight Zeroth-Order Proximal Gradient Algorithms 通过轻量级零阶近似梯度算法降低查询复杂度

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-04-23 DOI: 10.1162/neco_a_01636

Bin Gu;Xiyuan Wei;Hualin Zhang;Yi Chang;Heng Huang

Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance, reduced ZO proximal algorithms have been proposed to speed up ZO optimization for nonsmooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces a larger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only O(1) computation, which is significantly less than O(d) computation of the coordinated ZO estimator, with d being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance-reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from Omindn1/2ε2,dε3 to O˜n+dε2 under d>n12 for nonconvex problems, and from Odε2 to O˜nlog1ε+dε for convex problems. Finally, we conduct experiments to verify the superiority of our proposed methods.

对于梯度计算昂贵或无法实现的机器学习问题，零阶（ZO）优化是一项关键技术。为了加快非光滑问题的 ZO 优化速度，人们提出了几种方差缩小 ZO 近似算法，所有这些算法在逼近真实梯度时都选择了协调 ZO 估计器，而不是随机 ZO 估计器，因为前者更准确。虽然与协调 ZO 估计器相比，随机 ZO 估计器引入的误差更大，收敛分析更具挑战性，但它只需要 O(1) 计算量，明显少于协调 ZO 估计器的 O(d) 计算量（d 为问题空间的维数）。为了利用随机 ZO 估计器的高效计算特性，我们首先提出了一种 ZO 目标下降（ZOOD）特性，它可以将两种不同类型的误差纳入收敛速率的上限。接下来，我们提出了两种通用的 ZO 优化还原框架，只要内求解器的收敛速率满足 ZOOD 属性，它们就能分别自动推导出凸问题和非凸问题的收敛结果。在我们提出的 ZOR-ProxSVRG 和 ZOR-ProxSAGA 这两个具有全随机 ZO 估计子的方差降低 ZO 近似算法上应用了两个降低框架，我们将最先进的函数查询复杂度从 Omindn1/2ε2,dε3 提高到 O˜n+dε2（d>n12 时）（适用于非凸问题），并将凸问题的复杂度从 Odε2 提高到 O˜nlog1ε+dε。最后，我们通过实验验证了所提方法的优越性。

{"title":"Obtaining Lower Query Complexities Through Lightweight Zeroth-Order Proximal Gradient Algorithms","authors":"Bin Gu;Xiyuan Wei;Hualin Zhang;Yi Chang;Heng Huang","doi":"10.1162/neco_a_01636","DOIUrl":"10.1162/neco_a_01636","url":null,"abstract":"Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance, reduced ZO proximal algorithms have been proposed to speed up ZO optimization for nonsmooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces a larger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only O(1) computation, which is significantly less than O(d) computation of the coordinated ZO estimator, with d being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance-reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from Omindn1/2ε2,dε3 to O˜n+dε2 under d>n12 for nonconvex problems, and from Odε2 to O˜nlog1ε+dε for convex problems. Finally, we conduct experiments to verify the superiority of our proposed methods.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"897-935"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Overview of the Free Energy Principle and Related Research 自由能原理及相关研究概述。

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-04-23 DOI: 10.1162/neco_a_01642

Zhengquan Zhang;Feng Xu

The free energy principle and its corollary, the active inference framework, serve as theoretical foundations in the domain of neuroscience, explaining the genesis of intelligent behavior. This principle states that the processes of perception, learning, and decision making—within an agent—are all driven by the objective of “minimizing free energy,” evincing the following behaviors: learning and employing a generative model of the environment to interpret observations, thereby achieving perception, and selecting actions to maintain a stable preferred state and minimize the uncertainty about the environment, thereby achieving decision making. This fundamental principle can be used to explain how the brain processes perceptual information, learns about the environment, and selects actions. Two pivotal tenets are that the agent employs a generative model for perception and planning and that interaction with the world (and other agents) enhances the performance of the generative model and augments perception. With the evolution of control theory and deep learning tools, agents based on the FEP have been instantiated in various ways across different domains, guiding the design of a multitude of generative models and decision-making algorithms. This letter first introduces the basic concepts of the FEP, followed by its historical development and connections with other theories of intelligence, and then delves into the specific application of the FEP to perception and decision making, encompassing both low-dimensional simple situations and high-dimensional complex situations. It compares the FEP with model-based reinforcement learning to show that the FEP provides a better objective function. We illustrate this using numerical studies of Dreamer3 by adding expected information gain into the standard objective function. In a complementary fashion, existing reinforcement learning, and deep learning algorithms can also help implement the FEP-based agents. Finally, we discuss the various capabilities that agents need to possess in complex environments and state that the FEP can aid agents in acquiring these capabilities.

自由能原理及其推论，即主动推理框架，是神经科学领域的理论基础，解释了智能行为的起源。该原理指出，一个行为主体的感知、学习和决策过程都是由 "自由能最小化 "这一目标驱动的，并表现出以下行为：学习并运用环境生成模型来解释观察结果，从而实现感知；选择行动以维持稳定的首选状态，并将环境的不确定性最小化，从而实现决策。这一基本原理可以用来解释大脑是如何处理感知信息、学习环境知识和选择行动的。两个关键原则是，代理采用生成模型进行感知和规划，而与世界（和其他代理）的互动可提高生成模型的性能并增强感知。随着控制理论和深度学习工具的发展，基于 FEP 的代理已在不同领域以各种方式得到实例化，并指导了大量生成模型和决策算法的设计。这封信首先介绍了 FEP 的基本概念，然后介绍了它的历史发展及其与其他智能理论的联系，最后深入探讨了 FEP 在感知和决策方面的具体应用，包括低维简单情况和高维复杂情况。它将 FEP 与基于模型的强化学习进行了比较，表明 FEP 提供了更好的目标函数。我们通过对 Dreamer3 的数值研究，在标准目标函数中加入了预期信息增益，从而说明了这一点。作为补充，现有的强化学习和深度学习算法也可以帮助实现基于 FEP 的代理。最后，我们讨论了代理在复杂环境中需要具备的各种能力，并指出 FEP 可以帮助代理获得这些能力。

{"title":"An Overview of the Free Energy Principle and Related Research","authors":"Zhengquan Zhang;Feng Xu","doi":"10.1162/neco_a_01642","DOIUrl":"10.1162/neco_a_01642","url":null,"abstract":"The free energy principle and its corollary, the active inference framework, serve as theoretical foundations in the domain of neuroscience, explaining the genesis of intelligent behavior. This principle states that the processes of perception, learning, and decision making—within an agent—are all driven by the objective of “minimizing free energy,” evincing the following behaviors: learning and employing a generative model of the environment to interpret observations, thereby achieving perception, and selecting actions to maintain a stable preferred state and minimize the uncertainty about the environment, thereby achieving decision making. This fundamental principle can be used to explain how the brain processes perceptual information, learns about the environment, and selects actions. Two pivotal tenets are that the agent employs a generative model for perception and planning and that interaction with the world (and other agents) enhances the performance of the generative model and augments perception. With the evolution of control theory and deep learning tools, agents based on the FEP have been instantiated in various ways across different domains, guiding the design of a multitude of generative models and decision-making algorithms. This letter first introduces the basic concepts of the FEP, followed by its historical development and connections with other theories of intelligence, and then delves into the specific application of the FEP to perception and decision making, encompassing both low-dimensional simple situations and high-dimensional complex situations. It compares the FEP with model-based reinforcement learning to show that the FEP provides a better objective function. We illustrate this using numerical studies of Dreamer3 by adding expected information gain into the standard objective function. In a complementary fashion, existing reinforcement learning, and deep learning algorithms can also help implement the FEP-based agents. Finally, we discuss the various capabilities that agents need to possess in complex environments and state that the FEP can aid agents in acquiring these capabilities.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"963-1021"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Approximating Nonlinear Functions With Latent Boundaries in Low-Rank Excitatory-Inhibitory Spiking Networks 在低函数兴奋-抑制尖峰网络中利用潜在边界逼近非线性函数

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-04-23 DOI: 10.1162/neco_a_01658

William F. Podlaski;Christian K. Machens

Deep feedforward and recurrent neural networks have become successful functional models of the brain, but they neglect obvious biological details such as spikes and Dale's law. Here we argue that these details are crucial in order to understand how real neural circuits operate. Towards this aim, we put forth a new framework for spike-based computation in low-rank excitatory-inhibitory spiking networks. By considering populations with rank-1 connectivity, we cast each neuron's spiking threshold as a boundary in a low-dimensional input-output space. We then show how the combined thresholds of a population of inhibitory neurons form a stable boundary in this space, and those of a population of excitatory neurons form an unstable boundary. Combining the two boundaries results in a rank-2 excitatory-inhibitory (EI) network with inhibition-stabilized dynamics at the intersection of the two boundaries. The computation of the resulting networks can be understood as the difference of two convex functions and is thereby capable of approximating arbitrary non-linear input-output mappings. We demonstrate several properties of these networks, including noise suppression and amplification, irregular activity and synaptic balance, as well as how they relate to rate network dynamics in the limit that the boundary becomes soft. Finally, while our work focuses on small networks (5-50 neurons), we discuss potential avenues for scaling up to much larger networks. Overall, our work proposes a new perspective on spiking networks that may serve as a starting point for a mechanistic understanding of biological spike-based computation.

深度前馈和递归神经网络已成为成功的大脑功能模型，但它们忽略了明显的生物细节，如尖峰和戴尔定律。在这里，我们认为这些细节对于理解真实神经回路的运行方式至关重要。为此，我们提出了一个在低等级兴奋-抑制尖峰网络中进行基于尖峰计算的新框架。通过考虑具有秩-1 连接性的群体，我们将每个神经元的尖峰阈值视为低维输入-输出空间的边界。然后，我们展示了抑制性神经元群的组合阈值如何在该空间中形成稳定的边界，而兴奋性神经元群的组合阈值又如何形成不稳定的边界。将这两条边界结合起来，就会在两条边界的交汇处形成具有抑制稳定动态的秩-2 兴奋-抑制（EI）网络。由此产生的网络的计算可以理解为两个凸函数的差分，因此能够近似任意非线性输入-输出映射。我们展示了这些网络的若干特性，包括噪声抑制和放大、不规则活动和突触平衡，以及它们与边界变软的极限速率网络动力学的关系。最后，虽然我们的工作侧重于小型网络（5-50 个神经元），但我们讨论了将其扩展到更大网络的潜在途径。总之，我们的工作为尖峰网络提出了一个新的视角，可以作为从机理上理解基于尖峰的生物计算的起点。

{"title":"Approximating Nonlinear Functions With Latent Boundaries in Low-Rank Excitatory-Inhibitory Spiking Networks","authors":"William F. Podlaski;Christian K. Machens","doi":"10.1162/neco_a_01658","DOIUrl":"10.1162/neco_a_01658","url":null,"abstract":"Deep feedforward and recurrent neural networks have become successful functional models of the brain, but they neglect obvious biological details such as spikes and Dale's law. Here we argue that these details are crucial in order to understand how real neural circuits operate. Towards this aim, we put forth a new framework for spike-based computation in low-rank excitatory-inhibitory spiking networks. By considering populations with rank-1 connectivity, we cast each neuron's spiking threshold as a boundary in a low-dimensional input-output space. We then show how the combined thresholds of a population of inhibitory neurons form a stable boundary in this space, and those of a population of excitatory neurons form an unstable boundary. Combining the two boundaries results in a rank-2 excitatory-inhibitory (EI) network with inhibition-stabilized dynamics at the intersection of the two boundaries. The computation of the resulting networks can be understood as the difference of two convex functions and is thereby capable of approximating arbitrary non-linear input-output mappings. We demonstrate several properties of these networks, including noise suppression and amplification, irregular activity and synaptic balance, as well as how they relate to rate network dynamics in the limit that the boundary becomes soft. Finally, while our work focuses on small networks (5-50 neurons), we discuss potential avenues for scaling up to much larger networks. Overall, our work proposes a new perspective on spiking networks that may serve as a starting point for a mechanistic understanding of biological spike-based computation.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"803-857"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10535068","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140805834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Toward Improving the Generation Quality of Autoregressive Slot VAEs 努力提高自回归槽式 VAE 的生成质量

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-04-23 DOI: 10.1162/neco_a_01635

Patrick Emami;Pan He;Sanjay Ranka;Anand Rangarajan

Unconditional scene inference and generation are challenging to learn jointly with a single compositional model. Despite encouraging progress on models that extract object-centric representations (“slots”) from images, unconditional generation of scenes from slots has received less attention. This is primarily because learning the multiobject relations necessary to imagine coherent scenes is difficult. We hypothesize that most existing slot-based models have a limited ability to learn object correlations. We propose two improvements that strengthen object correlation learning. The first is to condition the slots on a global, scene-level variable that captures higher-order correlations between slots. Second, we address the fundamental lack of a canonical order for objects in images by proposing to learn a consistent order to use for the autoregressive generation of scene objects. Specifically, we train an autoregressive slot prior to sequentially generate scene objects following a learned order. Ordered slot inference entails first estimating a randomly ordered set of slots using existing approaches for extracting slots from images, then aligning those slots to ordered slots generated autoregressively with the slot prior. Our experiments across three multiobject environments demonstrate clear gains in unconditional scene generation quality. Detailed ablation studies are also provided that validate the two proposed improvements.

无条件场景推理和生成对使用单一合成模型进行联合学习具有挑战性。尽管在从图像中提取以物体为中心的表征（"槽"）的模型方面取得了令人鼓舞的进展，但从 "槽 "中无条件生成场景的研究却较少受到关注。这主要是因为学习想象连贯场景所需的多物体关系非常困难。我们假设，大多数现有的基于插槽的模型学习物体相关性的能力有限。我们提出了两个改进方案来加强物体相关性学习。首先，将捕捉槽间高阶相关性的全局场景级变量作为槽的条件。其次，我们针对图像中物体缺乏典型顺序这一根本问题，提出了学习一致的顺序，用于场景物体的自回归生成。具体来说，我们先训练一个自回归插槽，然后按照学习到的顺序依次生成场景对象。有序插槽推理首先需要使用现有的从图像中提取插槽的方法来估计一组随机有序的插槽，然后将这些插槽与使用插槽先验自回归生成的有序插槽对齐。我们在三个多目标环境中进行的实验表明，无条件场景生成质量明显提高。我们还提供了详细的消融研究，验证了这两项改进建议。

{"title":"Toward Improving the Generation Quality of Autoregressive Slot VAEs","authors":"Patrick Emami;Pan He;Sanjay Ranka;Anand Rangarajan","doi":"10.1162/neco_a_01635","DOIUrl":"10.1162/neco_a_01635","url":null,"abstract":"Unconditional scene inference and generation are challenging to learn jointly with a single compositional model. Despite encouraging progress on models that extract object-centric representations (“slots”) from images, unconditional generation of scenes from slots has received less attention. This is primarily because learning the multiobject relations necessary to imagine coherent scenes is difficult. We hypothesize that most existing slot-based models have a limited ability to learn object correlations. We propose two improvements that strengthen object correlation learning. The first is to condition the slots on a global, scene-level variable that captures higher-order correlations between slots. Second, we address the fundamental lack of a canonical order for objects in images by proposing to learn a consistent order to use for the autoregressive generation of scene objects. Specifically, we train an autoregressive slot prior to sequentially generate scene objects following a learned order. Ordered slot inference entails first estimating a randomly ordered set of slots using existing approaches for extracting slots from images, then aligning those slots to ordered slots generated autoregressively with the slot prior. Our experiments across three multiobject environments demonstrate clear gains in unconditional scene generation quality. Detailed ablation studies are also provided that validate the two proposed improvements.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"858-896"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synaptic Information Storage Capacity Measured With Information Theory 用信息论测量突触信息存储能力

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-04-23 DOI: 10.1162/neco_a_01659

Mohammad Samavat;Thomas M. Bartol;Kristen M. Harris;Terrence J. Sejnowski

Variation in the strength of synapses can be quantified by measuring the anatomical properties of synapses. Quantifying precision of synaptic plasticity is fundamental to understanding information storage and retrieval in neural circuits. Synapses from the same axon onto the same dendrite have a common history of coactivation, making them ideal candidates for determining the precision of synaptic plasticity based on the similarity of their physical dimensions. Here, the precision and amount of information stored in synapse dimensions were quantified with Shannon information theory, expanding prior analysis that used signal detection theory (Bartol et al., 2015). The two methods were compared using dendritic spine head volumes in the middle of the stratum radiatum of hippocampal area CA1 as well-defined measures of synaptic strength. Information theory delineated the number of distinguishable synaptic strengths based on nonoverlapping bins of dendritic spine head volumes. Shannon entropy was applied to measure synaptic information storage capacity (SISC) and resulted in a lower bound of 4.1 bits and upper bound of 4.59 bits of information based on 24 distinguishable sizes. We further compared the distribution of distinguishable sizes and a uniform distribution using Kullback-Leibler divergence and discovered that there was a nearly uniform distribution of spine head volumes across the sizes, suggesting optimal use of the distinguishable values. Thus, SISC provides a new analytical measure that can be generalized to probe synaptic strengths and capacity for plasticity in different brain regions of different species and among animals raised in different conditions or during learning. How brain diseases and disorders affect the precision of synaptic plasticity can also be probed.

摘要通过测量突触的解剖特性，可以量化突触强度的变化。量化突触可塑性的精确度是理解神经回路中信息存储和检索的基础。从同一轴突到同一树突的突触具有共同的共激活历史，这使它们成为根据其物理尺寸的相似性确定突触可塑性精度的理想候选者。在这里，我们用香农信息理论量化了存储在突触尺寸中的信息的精度和数量，扩展了之前使用信号检测理论的分析（Bartol 等人，2015 年）。这两种方法使用海马 CA1 区放射层中部的树突棘头体积作为突触强度的明确测量指标进行比较。信息论根据树突棘头体积的非重叠区划分了可区分的突触强度数量。香农熵（Shannon entropy）被用于测量突触信息存储容量（SISC），结果是基于 24 种可区分大小的信息下限为 4.1 比特，上限为 4.59 比特。我们使用库尔贝-莱伯勒发散法进一步比较了可区分大小的分布和均匀分布，发现不同大小的脊柱头体积几乎均匀分布，这表明可区分值得到了最佳利用。因此，SISC提供了一种新的分析测量方法，可用于探测不同物种不同脑区的突触强度和可塑性能力，以及在不同条件下或学习过程中饲养的动物之间的突触强度和可塑性能力。此外，还可以探究大脑疾病和失调如何影响突触可塑性的精确性。

{"title":"Synaptic Information Storage Capacity Measured With Information Theory","authors":"Mohammad Samavat;Thomas M. Bartol;Kristen M. Harris;Terrence J. Sejnowski","doi":"10.1162/neco_a_01659","DOIUrl":"10.1162/neco_a_01659","url":null,"abstract":"Variation in the strength of synapses can be quantified by measuring the anatomical properties of synapses. Quantifying precision of synaptic plasticity is fundamental to understanding information storage and retrieval in neural circuits. Synapses from the same axon onto the same dendrite have a common history of coactivation, making them ideal candidates for determining the precision of synaptic plasticity based on the similarity of their physical dimensions. Here, the precision and amount of information stored in synapse dimensions were quantified with Shannon information theory, expanding prior analysis that used signal detection theory (Bartol et al., 2015). The two methods were compared using dendritic spine head volumes in the middle of the stratum radiatum of hippocampal area CA1 as well-defined measures of synaptic strength. Information theory delineated the number of distinguishable synaptic strengths based on nonoverlapping bins of dendritic spine head volumes. Shannon entropy was applied to measure synaptic information storage capacity (SISC) and resulted in a lower bound of 4.1 bits and upper bound of 4.59 bits of information based on 24 distinguishable sizes. We further compared the distribution of distinguishable sizes and a uniform distribution using Kullback-Leibler divergence and discovered that there was a nearly uniform distribution of spine head volumes across the sizes, suggesting optimal use of the distinguishable values. Thus, SISC provides a new analytical measure that can be generalized to probe synaptic strengths and capacity for plasticity in different brain regions of different species and among animals raised in different conditions or during learning. How brain diseases and disorders affect the precision of synaptic plasticity can also be probed.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"781-802"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140779632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Heterogeneous Forgetting Rates and Greedy Allocation in Slot-Based Memory Networks Promotes Signal Retention 基于插槽的记忆网络中的异质遗忘率和贪婪分配促进信号保持

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-04-23 DOI: 10.1162/neco_a_01655

BethAnna Jones;Lawrence Snyder;ShiNung Ching

A key question in the neuroscience of memory encoding pertains to the mechanisms by which afferent stimuli are allocated within memory networks. This issue is especially pronounced in the domain of working memory, where capacity is finite. Presumably the brain must embed some “policy” by which to allocate these mnemonic resources in an online manner in order to maximally represent and store afferent information for as long as possible and without interference from subsequent stimuli. Here, we engage this question through a top-down theoretical modeling framework. We formally optimize a gating mechanism that projects afferent stimuli onto a finite number of memory slots within a recurrent network architecture. In the absence of external input, the activity in each slot attenuates over time (i.e., a process of gradual forgetting). It turns out that the optimal gating policy consists of a direct projection from sensory activity to memory slots, alongside an activity-dependent lateral inhibition. Interestingly, allocating resources myopically (greedily with respect to the current stimulus) leads to efficient utilization of slots over time. In other words, later-arriving stimuli are distributed across slots in such a way that the network state is minimally shifted and so prior signals are minimally “overwritten.” Further, networks with heterogeneity in the timescales of their forgetting rates retain stimuli better than those that are more homogeneous. Our results suggest how online, recurrent networks working on temporally localized objectives without high-level supervision can nonetheless implement efficient allocation of memory resources over time.

摘要记忆编码神经科学中的一个关键问题涉及传入刺激在记忆网络中的分配机制。这个问题在容量有限的工作记忆领域尤为突出。据推测，大脑必须嵌入某种 "政策"，以在线方式分配这些记忆资源，从而在尽可能长的时间内最大限度地表征和存储传入信息，并且不受后续刺激的干扰。在这里，我们通过一个自上而下的理论建模框架来探讨这个问题。我们正式优化了一种门控机制，该机制将传入刺激投射到递归网络结构中有限数量的记忆槽中。在没有外部输入的情况下，每个记忆槽中的活动会随着时间的推移而减弱（即逐渐遗忘的过程）。事实证明，最佳门控策略包括从感觉活动到记忆槽的直接投射，以及依赖于活动的横向抑制。有趣的是，近视地分配资源（对当前刺激的贪婪）会随着时间的推移有效地利用记忆槽。换句话说，后来到达的刺激会以这样一种方式分配到各个槽中，即网络状态会发生最小程度的偏移，因此先前的信号会被最小程度地 "覆盖"。此外，遗忘率时间尺度具有异质性的网络比同质性较高的网络能更好地保留刺激。我们的研究结果表明，在线递归网络如何在没有高层监督的情况下实现时间局部目标，并随着时间的推移有效分配内存资源。

{"title":"Heterogeneous Forgetting Rates and Greedy Allocation in Slot-Based Memory Networks Promotes Signal Retention","authors":"BethAnna Jones;Lawrence Snyder;ShiNung Ching","doi":"10.1162/neco_a_01655","DOIUrl":"10.1162/neco_a_01655","url":null,"abstract":"A key question in the neuroscience of memory encoding pertains to the mechanisms by which afferent stimuli are allocated within memory networks. This issue is especially pronounced in the domain of working memory, where capacity is finite. Presumably the brain must embed some “policy” by which to allocate these mnemonic resources in an online manner in order to maximally represent and store afferent information for as long as possible and without interference from subsequent stimuli. Here, we engage this question through a top-down theoretical modeling framework. We formally optimize a gating mechanism that projects afferent stimuli onto a finite number of memory slots within a recurrent network architecture. In the absence of external input, the activity in each slot attenuates over time (i.e., a process of gradual forgetting). It turns out that the optimal gating policy consists of a direct projection from sensory activity to memory slots, alongside an activity-dependent lateral inhibition. Interestingly, allocating resources myopically (greedily with respect to the current stimulus) leads to efficient utilization of slots over time. In other words, later-arriving stimuli are distributed across slots in such a way that the network state is minimally shifted and so prior signals are minimally “overwritten.” Further, networks with heterogeneity in the timescales of their forgetting rates retain stimuli better than those that are more homogeneous. Our results suggest how online, recurrent networks working on temporally localized objectives without high-level supervision can nonetheless implement efficient allocation of memory resources over time.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"1022-1040"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140772905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Instance-Specific Model Perturbation Improves Generalized Zero-Shot Learning 针对具体实例的模型扰动改进了广义零点学习。

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-04-23 DOI: 10.1162/neco_a_01639

Guanyu Yang;Kaizhu Huang;Rui Zhang;Xi Yang

Zero-shot learning (ZSL) refers to the design of predictive functions on new classes (unseen classes) of data that have never been seen during training. In a more practical scenario, generalized zero-shot learning (GZSL) requires predicting both seen and unseen classes accurately. In the absence of target samples, many GZSL models may overfit training data and are inclined to predict individuals as categories that have been seen in training. To alleviate this problem, we develop a parameter-wise adversarial training process that promotes robust recognition of seen classes while designing during the test a novel model perturbation mechanism to ensure sufficient sensitivity to unseen classes. Concretely, adversarial perturbation is conducted on the model to obtain instance-specific parameters so that predictions can be biased to unseen classes in the test. Meanwhile, the robust training encourages the model robustness, leading to nearly unaffected prediction for seen classes. Moreover, perturbations in the parameter space, computed from multiple individuals simultaneously, can be used to avoid the effect of perturbations that are too extreme and ruin the predictions. Comparison results on four benchmark ZSL data sets show the effective improvement that the proposed framework made on zero-shot methods with learned metrics.

零点学习（Zero-shot Learning，ZSL）指的是对训练过程中从未见过的新数据类别（未见类别）设计预测函数。在更实际的情况下，广义零点学习（GZSL）需要同时准确预测已见类和未见类。在没有目标样本的情况下，许多 GZSL 模型可能会过度拟合训练数据，并倾向于将个体预测为训练中出现过的类别。为了缓解这一问题，我们开发了一种参数化对抗训练过程，该过程可促进对已见类别的稳健识别，同时在测试过程中设计一种新颖的模型扰动机制，以确保对未见类别有足够的灵敏度。具体来说，对模型进行对抗扰动以获得特定实例的参数，从而在测试中对未见类别进行有偏差的预测。同时，鲁棒性训练可提高模型的鲁棒性，从而使预测结果几乎不受所见类别的影响。此外，通过同时计算多个个体的参数空间扰动，可以避免过于极端的扰动影响预测结果。在四个基准 ZSL 数据集上的比较结果表明，所提出的框架有效地改进了使用已学指标的零点方法。

{"title":"Instance-Specific Model Perturbation Improves Generalized Zero-Shot Learning","authors":"Guanyu Yang;Kaizhu Huang;Rui Zhang;Xi Yang","doi":"10.1162/neco_a_01639","DOIUrl":"10.1162/neco_a_01639","url":null,"abstract":"Zero-shot learning (ZSL) refers to the design of predictive functions on new classes (unseen classes) of data that have never been seen during training. In a more practical scenario, generalized zero-shot learning (GZSL) requires predicting both seen and unseen classes accurately. In the absence of target samples, many GZSL models may overfit training data and are inclined to predict individuals as categories that have been seen in training. To alleviate this problem, we develop a parameter-wise adversarial training process that promotes robust recognition of seen classes while designing during the test a novel model perturbation mechanism to ensure sufficient sensitivity to unseen classes. Concretely, adversarial perturbation is conducted on the model to obtain instance-specific parameters so that predictions can be biased to unseen classes in the test. Meanwhile, the robust training encourages the model robustness, leading to nearly unaffected prediction for seen classes. Moreover, perturbations in the parameter space, computed from multiple individuals simultaneously, can be used to avoid the effect of perturbations that are too extreme and ruin the predictions. Comparison results on four benchmark ZSL data sets show the effective improvement that the proposed framework made on zero-shot methods with learned metrics.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"936-962"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CA3 Circuit Model Compressing Sequential Information in Theta Oscillation and Replay 在 Theta 振荡和重放中压缩序列信息的 CA3 电路模型。

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-03-21 DOI: 10.1162/neco_a_01641

Satoshi Kuroki;Kenji Mizuseki

The hippocampus plays a critical role in the compression and retrieval of sequential information. During wakefulness, it achieves this through theta phase precession and theta sequences. Subsequently, during periods of sleep or rest, the compressed information reactivates through sharp-wave ripple events, manifesting as memory replay. However, how these sequential neuronal activities are generated and how they store information about the external environment remain unknown. We developed a hippocampal cornu ammonis 3 (CA3) computational model based on anatomical and electrophysiological evidence from the biological CA3 circuit to address these questions. The model comprises theta rhythm inhibition, place input, and CA3-CA3 plastic recurrent connection. The model can compress the sequence of the external inputs, reproduce theta phase precession and replay, learn additional sequences, and reorganize previously learned sequences. A gradual increase in synaptic inputs, controlled by interactions between theta-paced inhibition and place inputs, explained the mechanism of sequence acquisition. This model highlights the crucial role of plasticity in the CA3 recurrent connection and theta oscillational dynamics and hypothesizes how the CA3 circuit acquires, compresses, and replays sequential information.

海马体在序列信息的压缩和检索中发挥着至关重要的作用。在清醒状态下，海马体通过θ相位前冲和θ序列实现这一功能。随后，在睡眠或休息期间，被压缩的信息通过锐波波纹事件重新激活，表现为记忆重放。然而，这些连续的神经元活动是如何产生的，它们又是如何存储外部环境信息的，这些仍然是未知数。为了解决这些问题，我们基于生物 CA3 电路的解剖学和电生理学证据，建立了一个海马角弓 3（CA3）计算模型。该模型包括θ节律抑制、位置输入和CA3-CA3可塑性递归连接。该模型可以压缩外部输入的序列，重现θ相位前冲和重放，学习额外的序列，并重组以前学习过的序列。在θ步抑制和位置输入的相互作用控制下，突触输入的逐渐增加解释了序列习得的机制。该模型强调了可塑性在CA3递归连接和θ振荡动态中的关键作用，并假设了CA3回路是如何获取、压缩和重放序列信息的。

{"title":"CA3 Circuit Model Compressing Sequential Information in Theta Oscillation and Replay","authors":"Satoshi Kuroki;Kenji Mizuseki","doi":"10.1162/neco_a_01641","DOIUrl":"10.1162/neco_a_01641","url":null,"abstract":"The hippocampus plays a critical role in the compression and retrieval of sequential information. During wakefulness, it achieves this through theta phase precession and theta sequences. Subsequently, during periods of sleep or rest, the compressed information reactivates through sharp-wave ripple events, manifesting as memory replay. However, how these sequential neuronal activities are generated and how they store information about the external environment remain unknown. We developed a hippocampal cornu ammonis 3 (CA3) computational model based on anatomical and electrophysiological evidence from the biological CA3 circuit to address these questions. The model comprises theta rhythm inhibition, place input, and CA3-CA3 plastic recurrent connection. The model can compress the sequence of the external inputs, reproduce theta phase precession and replay, learn additional sequences, and reorganize previously learned sequences. A gradual increase in synaptic inputs, controlled by interactions between theta-paced inhibition and place inputs, explained the mechanism of sequence acquisition. This model highlights the crucial role of plasticity in the CA3 recurrent connection and theta oscillational dynamics and hypothesizes how the CA3 circuit acquires, compresses, and replays sequential information.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 4","pages":"501-548"},"PeriodicalIF":2.9,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10535082","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Column Row Convolutional Neural Network: Reducing Parameters for Efficient Image Processing 列行卷积神经网络：减少参数，实现高效图像处理

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-03-21 DOI: 10.1162/neco_a_01653

Seongil Im;Jae-Seung Jeong;Junseo Lee;Changhwan Shin;Jeong Ho Cho;Hyunsu Ju

Recent advancements in deep learning have achieved significant progress by increasing the number of parameters in a given model. However, this comes at the cost of computing resources, prompting researchers to explore model compression techniques that reduce the number of parameters while maintaining or even improving performance. Convolutional neural networks (CNN) have been recognized as more efficient and effective than fully connected (FC) networks. We propose a column row convolutional neural network (CRCNN) in this letter that applies 1D convolution to image data, significantly reducing the number of learning parameters and operational steps. The CRCNN uses column and row local receptive fields to perform data abstraction, concatenating each direction's feature before connecting it to an FC layer. Experimental results demonstrate that the CRCNN maintains comparable accuracy while reducing the number of parameters and compared to prior work. Moreover, the CRCNN is employed for one-class anomaly detection, demonstrating its feasibility for various applications.

通过增加给定模型中的参数数量，深度学习最近取得了重大进展。然而，这是以计算资源为代价的，这促使研究人员探索模型压缩技术，以减少参数数量，同时保持甚至提高性能。卷积神经网络（CNN）已被公认为比全连接（FC）网络更高效、更有效。我们在这封信中提出了一种列行卷积神经网络（CRCNN），它将一维卷积应用于图像数据，大大减少了学习参数和操作步骤的数量。CRCNN 利用列和行局部感受野进行数据抽象，在将每个方向的特征连接到 FC 层之前将其串联起来。实验结果表明，与之前的研究相比，CRCNN 在减少参数数量的同时保持了相当的准确性。此外，CRCNN 被用于单类异常检测，证明了它在各种应用中的可行性。

引用次数: 0

Frequency Propagation: Multimechanism Learning in Nonlinear Physical Networks 频率传播：非线性物理网络中的多机制学习

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-03-21 DOI: 10.1162/neco_a_01648

Vidyesh Rao Anisetti;Ananth Kandala;Benjamin Scellier;J. M. Schwarz

We introduce frequency propagation, a learning algorithm for nonlinear physical networks. In a resistive electrical circuit with variable resistors, an activation current is applied at a set of input nodes at one frequency and an error current is applied at a set of output nodes at another frequency. The voltage response of the circuit to these boundary currents is the superposition of an activation signal and an error signal whose coefficients can be read in different frequencies of the frequency domain. Each conductance is updated proportionally to the product of the two coefficients. The learning rule is local and proved to perform gradient descent on a loss function. We argue that frequency propagation is an instance of a multimechanism learning strategy for physical networks, be it resistive, elastic, or flow networks. Multimechanism learning strategies incorporate at least two physical quantities, potentially governed by independent physical mechanisms, to act as activation and error signals in the training process. Locally available information about these two signals is then used to update the trainable parameters to perform gradient descent. We demonstrate how earlier work implementing learning via chemical signaling in flow networks (Anisetti, Scellier, et al., 2023) also falls under the rubric of multimechanism learning.

我们介绍一种非线性物理网络的学习算法--频率传播。在一个带有可变电阻的电阻电路中，一组输入节点上施加一个频率的激活电流，一组输出节点上施加一个频率的误差电流。电路对这些边界电流的电压响应是激活信号和误差信号的叠加，这两个信号的系数可在频域的不同频率下读取。每个电导的更新都与这两个系数的乘积成比例。学习规则是局部的，并被证明可在损失函数上执行梯度下降。我们认为，频率传播是物理网络（无论是电阻网络、弹性网络还是流动网络）多机制学习策略的一个实例。多机制学习策略包含至少两个物理量，可能由独立的物理机制控制，作为训练过程中的激活信号和误差信号。关于这两个信号的局部可用信息随后被用于更新可训练参数，以执行梯度下降。我们展示了早先在流网络中通过化学信号进行学习的工作（Anisetti, Scellier, et al.

{"title":"Frequency Propagation: Multimechanism Learning in Nonlinear Physical Networks","authors":"Vidyesh Rao Anisetti;Ananth Kandala;Benjamin Scellier;J. M. Schwarz","doi":"10.1162/neco_a_01648","DOIUrl":"10.1162/neco_a_01648","url":null,"abstract":"We introduce frequency propagation, a learning algorithm for nonlinear physical networks. In a resistive electrical circuit with variable resistors, an activation current is applied at a set of input nodes at one frequency and an error current is applied at a set of output nodes at another frequency. The voltage response of the circuit to these boundary currents is the superposition of an activation signal and an error signal whose coefficients can be read in different frequencies of the frequency domain. Each conductance is updated proportionally to the product of the two coefficients. The learning rule is local and proved to perform gradient descent on a loss function. We argue that frequency propagation is an instance of a multimechanism learning strategy for physical networks, be it resistive, elastic, or flow networks. Multimechanism learning strategies incorporate at least two physical quantities, potentially governed by independent physical mechanisms, to act as activation and error signals in the training process. Locally available information about these two signals is then used to update the trainable parameters to perform gradient descent. We demonstrate how earlier work implementing learning via chemical signaling in flow networks (Anisetti, Scellier, et al., 2023) also falls under the rubric of multimechanism learning.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 4","pages":"596-620"},"PeriodicalIF":2.9,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0