Bin Gu;Xiyuan Wei;Hualin Zhang;Yi Chang;Heng Huang
Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance, reduced ZO proximal algorithms have been proposed to speed up ZO optimization for nonsmooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces a larger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only O(1) computation, which is significantly less than O(d) computation of the coordinated ZO estimator, with d being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance-reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from Omindn1/2ε2,dε3 to O˜n+dε2 under d>n12 for nonconvex problems, and from Odε2 to O˜nlog1ε+dε for convex problems. Finally, we conduct experiments to verify the superiority of our proposed methods.
对于梯度计算昂贵或无法实现的机器学习问题,零阶(ZO)优化是一项关键技术。为了加快非光滑问题的 ZO 优化速度,人们提出了几种方差缩小 ZO 近似算法,所有这些算法在逼近真实梯度时都选择了协调 ZO 估计器,而不是随机 ZO 估计器,因为前者更准确。虽然与协调 ZO 估计器相比,随机 ZO 估计器引入的误差更大,收敛分析更具挑战性,但它只需要 O(1) 计算量,明显少于协调 ZO 估计器的 O(d) 计算量(d 为问题空间的维数)。为了利用随机 ZO 估计器的高效计算特性,我们首先提出了一种 ZO 目标下降(ZOOD)特性,它可以将两种不同类型的误差纳入收敛速率的上限。接下来,我们提出了两种通用的 ZO 优化还原框架,只要内求解器的收敛速率满足 ZOOD 属性,它们就能分别自动推导出凸问题和非凸问题的收敛结果。在我们提出的 ZOR-ProxSVRG 和 ZOR-ProxSAGA 这两个具有全随机 ZO 估计子的方差降低 ZO 近似算法上应用了两个降低框架,我们将最先进的函数查询复杂度从 Omindn1/2ε2,dε3 提高到 O˜n+dε2(d>n12 时)(适用于非凸问题),并将凸问题的复杂度从 Odε2 提高到 O˜nlog1ε+dε。最后,我们通过实验验证了所提方法的优越性。
{"title":"Obtaining Lower Query Complexities Through Lightweight Zeroth-Order Proximal Gradient Algorithms","authors":"Bin Gu;Xiyuan Wei;Hualin Zhang;Yi Chang;Heng Huang","doi":"10.1162/neco_a_01636","DOIUrl":"10.1162/neco_a_01636","url":null,"abstract":"Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance, reduced ZO proximal algorithms have been proposed to speed up ZO optimization for nonsmooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces a larger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only O(1) computation, which is significantly less than O(d) computation of the coordinated ZO estimator, with d being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance-reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from Omindn1/2ε2,dε3 to O˜n+dε2 under d>n12 for nonconvex problems, and from Odε2 to O˜nlog1ε+dε for convex problems. Finally, we conduct experiments to verify the superiority of our proposed methods.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"897-935"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The free energy principle and its corollary, the active inference framework, serve as theoretical foundations in the domain of neuroscience, explaining the genesis of intelligent behavior. This principle states that the processes of perception, learning, and decision making—within an agent—are all driven by the objective of “minimizing free energy,” evincing the following behaviors: learning and employing a generative model of the environment to interpret observations, thereby achieving perception, and selecting actions to maintain a stable preferred state and minimize the uncertainty about the environment, thereby achieving decision making. This fundamental principle can be used to explain how the brain processes perceptual information, learns about the environment, and selects actions. Two pivotal tenets are that the agent employs a generative model for perception and planning and that interaction with the world (and other agents) enhances the performance of the generative model and augments perception. With the evolution of control theory and deep learning tools, agents based on the FEP have been instantiated in various ways across different domains, guiding the design of a multitude of generative models and decision-making algorithms. This letter first introduces the basic concepts of the FEP, followed by its historical development and connections with other theories of intelligence, and then delves into the specific application of the FEP to perception and decision making, encompassing both low-dimensional simple situations and high-dimensional complex situations. It compares the FEP with model-based reinforcement learning to show that the FEP provides a better objective function. We illustrate this using numerical studies of Dreamer3 by adding expected information gain into the standard objective function. In a complementary fashion, existing reinforcement learning, and deep learning algorithms can also help implement the FEP-based agents. Finally, we discuss the various capabilities that agents need to possess in complex environments and state that the FEP can aid agents in acquiring these capabilities.
{"title":"An Overview of the Free Energy Principle and Related Research","authors":"Zhengquan Zhang;Feng Xu","doi":"10.1162/neco_a_01642","DOIUrl":"10.1162/neco_a_01642","url":null,"abstract":"The free energy principle and its corollary, the active inference framework, serve as theoretical foundations in the domain of neuroscience, explaining the genesis of intelligent behavior. This principle states that the processes of perception, learning, and decision making—within an agent—are all driven by the objective of “minimizing free energy,” evincing the following behaviors: learning and employing a generative model of the environment to interpret observations, thereby achieving perception, and selecting actions to maintain a stable preferred state and minimize the uncertainty about the environment, thereby achieving decision making. This fundamental principle can be used to explain how the brain processes perceptual information, learns about the environment, and selects actions. Two pivotal tenets are that the agent employs a generative model for perception and planning and that interaction with the world (and other agents) enhances the performance of the generative model and augments perception. With the evolution of control theory and deep learning tools, agents based on the FEP have been instantiated in various ways across different domains, guiding the design of a multitude of generative models and decision-making algorithms. This letter first introduces the basic concepts of the FEP, followed by its historical development and connections with other theories of intelligence, and then delves into the specific application of the FEP to perception and decision making, encompassing both low-dimensional simple situations and high-dimensional complex situations. It compares the FEP with model-based reinforcement learning to show that the FEP provides a better objective function. We illustrate this using numerical studies of Dreamer3 by adding expected information gain into the standard objective function. In a complementary fashion, existing reinforcement learning, and deep learning algorithms can also help implement the FEP-based agents. Finally, we discuss the various capabilities that agents need to possess in complex environments and state that the FEP can aid agents in acquiring these capabilities.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"963-1021"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep feedforward and recurrent neural networks have become successful functional models of the brain, but they neglect obvious biological details such as spikes and Dale's law. Here we argue that these details are crucial in order to understand how real neural circuits operate. Towards this aim, we put forth a new framework for spike-based computation in low-rank excitatory-inhibitory spiking networks. By considering populations with rank-1 connectivity, we cast each neuron's spiking threshold as a boundary in a low-dimensional input-output space. We then show how the combined thresholds of a population of inhibitory neurons form a stable boundary in this space, and those of a population of excitatory neurons form an unstable boundary. Combining the two boundaries results in a rank-2 excitatory-inhibitory (EI) network with inhibition-stabilized dynamics at the intersection of the two boundaries. The computation of the resulting networks can be understood as the difference of two convex functions and is thereby capable of approximating arbitrary non-linear input-output mappings. We demonstrate several properties of these networks, including noise suppression and amplification, irregular activity and synaptic balance, as well as how they relate to rate network dynamics in the limit that the boundary becomes soft. Finally, while our work focuses on small networks (5-50 neurons), we discuss potential avenues for scaling up to much larger networks. Overall, our work proposes a new perspective on spiking networks that may serve as a starting point for a mechanistic understanding of biological spike-based computation.
{"title":"Approximating Nonlinear Functions With Latent Boundaries in Low-Rank Excitatory-Inhibitory Spiking Networks","authors":"William F. Podlaski;Christian K. Machens","doi":"10.1162/neco_a_01658","DOIUrl":"10.1162/neco_a_01658","url":null,"abstract":"Deep feedforward and recurrent neural networks have become successful functional models of the brain, but they neglect obvious biological details such as spikes and Dale's law. Here we argue that these details are crucial in order to understand how real neural circuits operate. Towards this aim, we put forth a new framework for spike-based computation in low-rank excitatory-inhibitory spiking networks. By considering populations with rank-1 connectivity, we cast each neuron's spiking threshold as a boundary in a low-dimensional input-output space. We then show how the combined thresholds of a population of inhibitory neurons form a stable boundary in this space, and those of a population of excitatory neurons form an unstable boundary. Combining the two boundaries results in a rank-2 excitatory-inhibitory (EI) network with inhibition-stabilized dynamics at the intersection of the two boundaries. The computation of the resulting networks can be understood as the difference of two convex functions and is thereby capable of approximating arbitrary non-linear input-output mappings. We demonstrate several properties of these networks, including noise suppression and amplification, irregular activity and synaptic balance, as well as how they relate to rate network dynamics in the limit that the boundary becomes soft. Finally, while our work focuses on small networks (5-50 neurons), we discuss potential avenues for scaling up to much larger networks. Overall, our work proposes a new perspective on spiking networks that may serve as a starting point for a mechanistic understanding of biological spike-based computation.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"803-857"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10535068","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140805834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patrick Emami;Pan He;Sanjay Ranka;Anand Rangarajan
Unconditional scene inference and generation are challenging to learn jointly with a single compositional model. Despite encouraging progress on models that extract object-centric representations (“slots”) from images, unconditional generation of scenes from slots has received less attention. This is primarily because learning the multiobject relations necessary to imagine coherent scenes is difficult. We hypothesize that most existing slot-based models have a limited ability to learn object correlations. We propose two improvements that strengthen object correlation learning. The first is to condition the slots on a global, scene-level variable that captures higher-order correlations between slots. Second, we address the fundamental lack of a canonical order for objects in images by proposing to learn a consistent order to use for the autoregressive generation of scene objects. Specifically, we train an autoregressive slot prior to sequentially generate scene objects following a learned order. Ordered slot inference entails first estimating a randomly ordered set of slots using existing approaches for extracting slots from images, then aligning those slots to ordered slots generated autoregressively with the slot prior. Our experiments across three multiobject environments demonstrate clear gains in unconditional scene generation quality. Detailed ablation studies are also provided that validate the two proposed improvements.
{"title":"Toward Improving the Generation Quality of Autoregressive Slot VAEs","authors":"Patrick Emami;Pan He;Sanjay Ranka;Anand Rangarajan","doi":"10.1162/neco_a_01635","DOIUrl":"10.1162/neco_a_01635","url":null,"abstract":"Unconditional scene inference and generation are challenging to learn jointly with a single compositional model. Despite encouraging progress on models that extract object-centric representations (“slots”) from images, unconditional generation of scenes from slots has received less attention. This is primarily because learning the multiobject relations necessary to imagine coherent scenes is difficult. We hypothesize that most existing slot-based models have a limited ability to learn object correlations. We propose two improvements that strengthen object correlation learning. The first is to condition the slots on a global, scene-level variable that captures higher-order correlations between slots. Second, we address the fundamental lack of a canonical order for objects in images by proposing to learn a consistent order to use for the autoregressive generation of scene objects. Specifically, we train an autoregressive slot prior to sequentially generate scene objects following a learned order. Ordered slot inference entails first estimating a randomly ordered set of slots using existing approaches for extracting slots from images, then aligning those slots to ordered slots generated autoregressively with the slot prior. Our experiments across three multiobject environments demonstrate clear gains in unconditional scene generation quality. Detailed ablation studies are also provided that validate the two proposed improvements.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"858-896"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Samavat;Thomas M. Bartol;Kristen M. Harris;Terrence J. Sejnowski
Variation in the strength of synapses can be quantified by measuring the anatomical properties of synapses. Quantifying precision of synaptic plasticity is fundamental to understanding information storage and retrieval in neural circuits. Synapses from the same axon onto the same dendrite have a common history of coactivation, making them ideal candidates for determining the precision of synaptic plasticity based on the similarity of their physical dimensions. Here, the precision and amount of information stored in synapse dimensions were quantified with Shannon information theory, expanding prior analysis that used signal detection theory (Bartol et al., 2015). The two methods were compared using dendritic spine head volumes in the middle of the stratum radiatum of hippocampal area CA1 as well-defined measures of synaptic strength. Information theory delineated the number of distinguishable synaptic strengths based on nonoverlapping bins of dendritic spine head volumes. Shannon entropy was applied to measure synaptic information storage capacity (SISC) and resulted in a lower bound of 4.1 bits and upper bound of 4.59 bits of information based on 24 distinguishable sizes. We further compared the distribution of distinguishable sizes and a uniform distribution using Kullback-Leibler divergence and discovered that there was a nearly uniform distribution of spine head volumes across the sizes, suggesting optimal use of the distinguishable values. Thus, SISC provides a new analytical measure that can be generalized to probe synaptic strengths and capacity for plasticity in different brain regions of different species and among animals raised in different conditions or during learning. How brain diseases and disorders affect the precision of synaptic plasticity can also be probed.
{"title":"Synaptic Information Storage Capacity Measured With Information Theory","authors":"Mohammad Samavat;Thomas M. Bartol;Kristen M. Harris;Terrence J. Sejnowski","doi":"10.1162/neco_a_01659","DOIUrl":"10.1162/neco_a_01659","url":null,"abstract":"Variation in the strength of synapses can be quantified by measuring the anatomical properties of synapses. Quantifying precision of synaptic plasticity is fundamental to understanding information storage and retrieval in neural circuits. Synapses from the same axon onto the same dendrite have a common history of coactivation, making them ideal candidates for determining the precision of synaptic plasticity based on the similarity of their physical dimensions. Here, the precision and amount of information stored in synapse dimensions were quantified with Shannon information theory, expanding prior analysis that used signal detection theory (Bartol et al., 2015). The two methods were compared using dendritic spine head volumes in the middle of the stratum radiatum of hippocampal area CA1 as well-defined measures of synaptic strength. Information theory delineated the number of distinguishable synaptic strengths based on nonoverlapping bins of dendritic spine head volumes. Shannon entropy was applied to measure synaptic information storage capacity (SISC) and resulted in a lower bound of 4.1 bits and upper bound of 4.59 bits of information based on 24 distinguishable sizes. We further compared the distribution of distinguishable sizes and a uniform distribution using Kullback-Leibler divergence and discovered that there was a nearly uniform distribution of spine head volumes across the sizes, suggesting optimal use of the distinguishable values. Thus, SISC provides a new analytical measure that can be generalized to probe synaptic strengths and capacity for plasticity in different brain regions of different species and among animals raised in different conditions or during learning. How brain diseases and disorders affect the precision of synaptic plasticity can also be probed.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"781-802"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140779632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A key question in the neuroscience of memory encoding pertains to the mechanisms by which afferent stimuli are allocated within memory networks. This issue is especially pronounced in the domain of working memory, where capacity is finite. Presumably the brain must embed some “policy” by which to allocate these mnemonic resources in an online manner in order to maximally represent and store afferent information for as long as possible and without interference from subsequent stimuli. Here, we engage this question through a top-down theoretical modeling framework. We formally optimize a gating mechanism that projects afferent stimuli onto a finite number of memory slots within a recurrent network architecture. In the absence of external input, the activity in each slot attenuates over time (i.e., a process of gradual forgetting). It turns out that the optimal gating policy consists of a direct projection from sensory activity to memory slots, alongside an activity-dependent lateral inhibition. Interestingly, allocating resources myopically (greedily with respect to the current stimulus) leads to efficient utilization of slots over time. In other words, later-arriving stimuli are distributed across slots in such a way that the network state is minimally shifted and so prior signals are minimally “overwritten.” Further, networks with heterogeneity in the timescales of their forgetting rates retain stimuli better than those that are more homogeneous. Our results suggest how online, recurrent networks working on temporally localized objectives without high-level supervision can nonetheless implement efficient allocation of memory resources over time.
{"title":"Heterogeneous Forgetting Rates and Greedy Allocation in Slot-Based Memory Networks Promotes Signal Retention","authors":"BethAnna Jones;Lawrence Snyder;ShiNung Ching","doi":"10.1162/neco_a_01655","DOIUrl":"10.1162/neco_a_01655","url":null,"abstract":"A key question in the neuroscience of memory encoding pertains to the mechanisms by which afferent stimuli are allocated within memory networks. This issue is especially pronounced in the domain of working memory, where capacity is finite. Presumably the brain must embed some “policy” by which to allocate these mnemonic resources in an online manner in order to maximally represent and store afferent information for as long as possible and without interference from subsequent stimuli. Here, we engage this question through a top-down theoretical modeling framework. We formally optimize a gating mechanism that projects afferent stimuli onto a finite number of memory slots within a recurrent network architecture. In the absence of external input, the activity in each slot attenuates over time (i.e., a process of gradual forgetting). It turns out that the optimal gating policy consists of a direct projection from sensory activity to memory slots, alongside an activity-dependent lateral inhibition. Interestingly, allocating resources myopically (greedily with respect to the current stimulus) leads to efficient utilization of slots over time. In other words, later-arriving stimuli are distributed across slots in such a way that the network state is minimally shifted and so prior signals are minimally “overwritten.” Further, networks with heterogeneity in the timescales of their forgetting rates retain stimuli better than those that are more homogeneous. Our results suggest how online, recurrent networks working on temporally localized objectives without high-level supervision can nonetheless implement efficient allocation of memory resources over time.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"1022-1040"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140772905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zero-shot learning (ZSL) refers to the design of predictive functions on new classes (unseen classes) of data that have never been seen during training. In a more practical scenario, generalized zero-shot learning (GZSL) requires predicting both seen and unseen classes accurately. In the absence of target samples, many GZSL models may overfit training data and are inclined to predict individuals as categories that have been seen in training. To alleviate this problem, we develop a parameter-wise adversarial training process that promotes robust recognition of seen classes while designing during the test a novel model perturbation mechanism to ensure sufficient sensitivity to unseen classes. Concretely, adversarial perturbation is conducted on the model to obtain instance-specific parameters so that predictions can be biased to unseen classes in the test. Meanwhile, the robust training encourages the model robustness, leading to nearly unaffected prediction for seen classes. Moreover, perturbations in the parameter space, computed from multiple individuals simultaneously, can be used to avoid the effect of perturbations that are too extreme and ruin the predictions. Comparison results on four benchmark ZSL data sets show the effective improvement that the proposed framework made on zero-shot methods with learned metrics.
{"title":"Instance-Specific Model Perturbation Improves Generalized Zero-Shot Learning","authors":"Guanyu Yang;Kaizhu Huang;Rui Zhang;Xi Yang","doi":"10.1162/neco_a_01639","DOIUrl":"10.1162/neco_a_01639","url":null,"abstract":"Zero-shot learning (ZSL) refers to the design of predictive functions on new classes (unseen classes) of data that have never been seen during training. In a more practical scenario, generalized zero-shot learning (GZSL) requires predicting both seen and unseen classes accurately. In the absence of target samples, many GZSL models may overfit training data and are inclined to predict individuals as categories that have been seen in training. To alleviate this problem, we develop a parameter-wise adversarial training process that promotes robust recognition of seen classes while designing during the test a novel model perturbation mechanism to ensure sufficient sensitivity to unseen classes. Concretely, adversarial perturbation is conducted on the model to obtain instance-specific parameters so that predictions can be biased to unseen classes in the test. Meanwhile, the robust training encourages the model robustness, leading to nearly unaffected prediction for seen classes. Moreover, perturbations in the parameter space, computed from multiple individuals simultaneously, can be used to avoid the effect of perturbations that are too extreme and ruin the predictions. Comparison results on four benchmark ZSL data sets show the effective improvement that the proposed framework made on zero-shot methods with learned metrics.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"936-962"},"PeriodicalIF":2.9,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The hippocampus plays a critical role in the compression and retrieval of sequential information. During wakefulness, it achieves this through theta phase precession and theta sequences. Subsequently, during periods of sleep or rest, the compressed information reactivates through sharp-wave ripple events, manifesting as memory replay. However, how these sequential neuronal activities are generated and how they store information about the external environment remain unknown. We developed a hippocampal cornu ammonis 3 (CA3) computational model based on anatomical and electrophysiological evidence from the biological CA3 circuit to address these questions. The model comprises theta rhythm inhibition, place input, and CA3-CA3 plastic recurrent connection. The model can compress the sequence of the external inputs, reproduce theta phase precession and replay, learn additional sequences, and reorganize previously learned sequences. A gradual increase in synaptic inputs, controlled by interactions between theta-paced inhibition and place inputs, explained the mechanism of sequence acquisition. This model highlights the crucial role of plasticity in the CA3 recurrent connection and theta oscillational dynamics and hypothesizes how the CA3 circuit acquires, compresses, and replays sequential information.
{"title":"CA3 Circuit Model Compressing Sequential Information in Theta Oscillation and Replay","authors":"Satoshi Kuroki;Kenji Mizuseki","doi":"10.1162/neco_a_01641","DOIUrl":"10.1162/neco_a_01641","url":null,"abstract":"The hippocampus plays a critical role in the compression and retrieval of sequential information. During wakefulness, it achieves this through theta phase precession and theta sequences. Subsequently, during periods of sleep or rest, the compressed information reactivates through sharp-wave ripple events, manifesting as memory replay. However, how these sequential neuronal activities are generated and how they store information about the external environment remain unknown. We developed a hippocampal cornu ammonis 3 (CA3) computational model based on anatomical and electrophysiological evidence from the biological CA3 circuit to address these questions. The model comprises theta rhythm inhibition, place input, and CA3-CA3 plastic recurrent connection. The model can compress the sequence of the external inputs, reproduce theta phase precession and replay, learn additional sequences, and reorganize previously learned sequences. A gradual increase in synaptic inputs, controlled by interactions between theta-paced inhibition and place inputs, explained the mechanism of sequence acquisition. This model highlights the crucial role of plasticity in the CA3 recurrent connection and theta oscillational dynamics and hypothesizes how the CA3 circuit acquires, compresses, and replays sequential information.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 4","pages":"501-548"},"PeriodicalIF":2.9,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10535082","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seongil Im;Jae-Seung Jeong;Junseo Lee;Changhwan Shin;Jeong Ho Cho;Hyunsu Ju
Recent advancements in deep learning have achieved significant progress by increasing the number of parameters in a given model. However, this comes at the cost of computing resources, prompting researchers to explore model compression techniques that reduce the number of parameters while maintaining or even improving performance. Convolutional neural networks (CNN) have been recognized as more efficient and effective than fully connected (FC) networks. We propose a column row convolutional neural network (CRCNN) in this letter that applies 1D convolution to image data, significantly reducing the number of learning parameters and operational steps. The CRCNN uses column and row local receptive fields to perform data abstraction, concatenating each direction's feature before connecting it to an FC layer. Experimental results demonstrate that the CRCNN maintains comparable accuracy while reducing the number of parameters and compared to prior work. Moreover, the CRCNN is employed for one-class anomaly detection, demonstrating its feasibility for various applications.
通过增加给定模型中的参数数量,深度学习最近取得了重大进展。然而,这是以计算资源为代价的,这促使研究人员探索模型压缩技术,以减少参数数量,同时保持甚至提高性能。卷积神经网络(CNN)已被公认为比全连接(FC)网络更高效、更有效。我们在这封信中提出了一种列行卷积神经网络(CRCNN),它将一维卷积应用于图像数据,大大减少了学习参数和操作步骤的数量。CRCNN 利用列和行局部感受野进行数据抽象,在将每个方向的特征连接到 FC 层之前将其串联起来。实验结果表明,与之前的研究相比,CRCNN 在减少参数数量的同时保持了相当的准确性。此外,CRCNN 被用于单类异常检测,证明了它在各种应用中的可行性。
{"title":"Column Row Convolutional Neural Network: Reducing Parameters for Efficient Image Processing","authors":"Seongil Im;Jae-Seung Jeong;Junseo Lee;Changhwan Shin;Jeong Ho Cho;Hyunsu Ju","doi":"10.1162/neco_a_01653","DOIUrl":"10.1162/neco_a_01653","url":null,"abstract":"Recent advancements in deep learning have achieved significant progress by increasing the number of parameters in a given model. However, this comes at the cost of computing resources, prompting researchers to explore model compression techniques that reduce the number of parameters while maintaining or even improving performance. Convolutional neural networks (CNN) have been recognized as more efficient and effective than fully connected (FC) networks. We propose a column row convolutional neural network (CRCNN) in this letter that applies 1D convolution to image data, significantly reducing the number of learning parameters and operational steps. The CRCNN uses column and row local receptive fields to perform data abstraction, concatenating each direction's feature before connecting it to an FC layer. Experimental results demonstrate that the CRCNN maintains comparable accuracy while reducing the number of parameters and compared to prior work. Moreover, the CRCNN is employed for one-class anomaly detection, demonstrating its feasibility for various applications.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 4","pages":"744-758"},"PeriodicalIF":2.9,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vidyesh Rao Anisetti;Ananth Kandala;Benjamin Scellier;J. M. Schwarz
We introduce frequency propagation, a learning algorithm for nonlinear physical networks. In a resistive electrical circuit with variable resistors, an activation current is applied at a set of input nodes at one frequency and an error current is applied at a set of output nodes at another frequency. The voltage response of the circuit to these boundary currents is the superposition of an activation signal and an error signal whose coefficients can be read in different frequencies of the frequency domain. Each conductance is updated proportionally to the product of the two coefficients. The learning rule is local and proved to perform gradient descent on a loss function. We argue that frequency propagation is an instance of a multimechanism learning strategy for physical networks, be it resistive, elastic, or flow networks. Multimechanism learning strategies incorporate at least two physical quantities, potentially governed by independent physical mechanisms, to act as activation and error signals in the training process. Locally available information about these two signals is then used to update the trainable parameters to perform gradient descent. We demonstrate how earlier work implementing learning via chemical signaling in flow networks (Anisetti, Scellier, et al., 2023) also falls under the rubric of multimechanism learning.
我们介绍一种非线性物理网络的学习算法--频率传播。在一个带有可变电阻的电阻电路中,一组输入节点上施加一个频率的激活电流,一组输出节点上施加一个频率的误差电流。电路对这些边界电流的电压响应是激活信号和误差信号的叠加,这两个信号的系数可在频域的不同频率下读取。每个电导的更新都与这两个系数的乘积成比例。学习规则是局部的,并被证明可在损失函数上执行梯度下降。我们认为,频率传播是物理网络(无论是电阻网络、弹性网络还是流动网络)多机制学习策略的一个实例。多机制学习策略包含至少两个物理量,可能由独立的物理机制控制,作为训练过程中的激活信号和误差信号。关于这两个信号的局部可用信息随后被用于更新可训练参数,以执行梯度下降。我们展示了早先在流网络中通过化学信号进行学习的工作(Anisetti, Scellier, et al.
{"title":"Frequency Propagation: Multimechanism Learning in Nonlinear Physical Networks","authors":"Vidyesh Rao Anisetti;Ananth Kandala;Benjamin Scellier;J. M. Schwarz","doi":"10.1162/neco_a_01648","DOIUrl":"10.1162/neco_a_01648","url":null,"abstract":"We introduce frequency propagation, a learning algorithm for nonlinear physical networks. In a resistive electrical circuit with variable resistors, an activation current is applied at a set of input nodes at one frequency and an error current is applied at a set of output nodes at another frequency. The voltage response of the circuit to these boundary currents is the superposition of an activation signal and an error signal whose coefficients can be read in different frequencies of the frequency domain. Each conductance is updated proportionally to the product of the two coefficients. The learning rule is local and proved to perform gradient descent on a loss function. We argue that frequency propagation is an instance of a multimechanism learning strategy for physical networks, be it resistive, elastic, or flow networks. Multimechanism learning strategies incorporate at least two physical quantities, potentially governed by independent physical mechanisms, to act as activation and error signals in the training process. Locally available information about these two signals is then used to update the trainable parameters to perform gradient descent. We demonstrate how earlier work implementing learning via chemical signaling in flow networks (Anisetti, Scellier, et al., 2023) also falls under the rubric of multimechanism learning.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 4","pages":"596-620"},"PeriodicalIF":2.9,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140066264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}