Rajesh P. N. Rao;Dimitrios C. Gklezakos;Vishwas Sathish
There is growing interest in predictive coding as a model of how the brain learns through predictions and prediction errors. Predictive coding models have traditionally focused on sensory coding and perception. Here we introduce active predictive coding (APC) as a unifying model for perception, action, and cognition. The APC model addresses important open problems in cognitive science and AI, including (1) how we learn compositional representations (e.g., part-whole hierarchies for equivariant vision) and (2) how we solve large-scale planning problems, which are hard for traditional reinforcement learning, by composing complex state dynamics and abstract actions from simpler dynamics and primitive actions. By using hypernetworks, self-supervised learning, and reinforcement learning, APC learns hierarchical world models by combining task-invariant state transition networks and task-dependent policy networks at multiple abstraction levels. We illustrate the applicability of the APC model to active visual perception and hierarchical planning. Our results represent, to our knowledge, the first proof-of-concept demonstration of a unified approach to addressing the part-whole learning problem in vision, the nested reference frames learning problem in cognition, and the integrated state-action hierarchy learning problem in reinforcement learning.
{"title":"Active Predictive Coding: A Unifying Neural Model for Active Perception, Compositional Learning, and Hierarchical Planning","authors":"Rajesh P. N. Rao;Dimitrios C. Gklezakos;Vishwas Sathish","doi":"10.1162/neco_a_01627","DOIUrl":"10.1162/neco_a_01627","url":null,"abstract":"There is growing interest in predictive coding as a model of how the brain learns through predictions and prediction errors. Predictive coding models have traditionally focused on sensory coding and perception. Here we introduce active predictive coding (APC) as a unifying model for perception, action, and cognition. The APC model addresses important open problems in cognitive science and AI, including (1) how we learn compositional representations (e.g., part-whole hierarchies for equivariant vision) and (2) how we solve large-scale planning problems, which are hard for traditional reinforcement learning, by composing complex state dynamics and abstract actions from simpler dynamics and primitive actions. By using hypernetworks, self-supervised learning, and reinforcement learning, APC learns hierarchical world models by combining task-invariant state transition networks and task-dependent policy networks at multiple abstraction levels. We illustrate the applicability of the APC model to active visual perception and hierarchical planning. Our results represent, to our knowledge, the first proof-of-concept demonstration of a unified approach to addressing the part-whole learning problem in vision, the nested reference frames learning problem in cognition, and the integrated state-action hierarchy learning problem in reinforcement learning.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138489040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prior applications of the cerebellar adaptive filter model have included a range of tasks within simulated and robotic systems. However, this has been limited to systems driven by continuous signals. Here, the adaptive filter model of the cerebellum is applied to the control of a system driven by spiking inputs by considering the problem of controlling muscle force. The performance of the standard adaptive filter algorithm is compared with the algorithm with a modified learning rule that minimizes inputs and a simple proportional-integral-derivative (PID) controller. Control performance is evaluated in terms of the number of spikes, the accuracy of spike input locations, and the accuracy of muscle force output. Results show that the cerebellar adaptive filter model can be applied without change to the control of systems driven by spiking inputs. The cerebellar algorithm results in good agreement between input spikes and force outputs and significantly improves on a PID controller. Input minimization can be used to reduce the number of spike inputs, but at the expense of a decrease in accuracy of spike input location and force output. This work extends the applications of the cerebellar algorithm and demonstrates the potential of the adaptive filter model to be used to improve functional electrical stimulation muscle control.
{"title":"Adaptive Filter Model of Cerebellum for Biological Muscle Control With Spike Train Inputs","authors":"Emma Wilson","doi":"10.1162/neco_a_01617","DOIUrl":"10.1162/neco_a_01617","url":null,"abstract":"Prior applications of the cerebellar adaptive filter model have included a range of tasks within simulated and robotic systems. However, this has been limited to systems driven by continuous signals. Here, the adaptive filter model of the cerebellum is applied to the control of a system driven by spiking inputs by considering the problem of controlling muscle force. The performance of the standard adaptive filter algorithm is compared with the algorithm with a modified learning rule that minimizes inputs and a simple proportional-integral-derivative (PID) controller. Control performance is evaluated in terms of the number of spikes, the accuracy of spike input locations, and the accuracy of muscle force output. Results show that the cerebellar adaptive filter model can be applied without change to the control of systems driven by spiking inputs. The cerebellar algorithm results in good agreement between input spikes and force outputs and significantly improves on a PID controller. Input minimization can be used to reduce the number of spike inputs, but at the expense of a decrease in accuracy of spike input location and force output. This work extends the applications of the cerebellar algorithm and demonstrates the potential of the adaptive filter model to be used to improve functional electrical stimulation muscle control.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41240925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura Smets;Werner Van Leekwijck;Ing Jyh Tsang;Steven Latré
Hyperdimensional computing (HDC) has become popular for light-weight and energy-efficient machine learning, suitable for wearable Internet-of-Things devices and near-sensor or on-device processing. HDC is computationally less complex than traditional deep learning algorithms and achieves moderate to good classification performance. This letter proposes to extend the training procedure in HDC by taking into account not only wrongly classified samples but also samples that are correctly classified by the HDC model but with low confidence. We introduce a confidence threshold that can be tuned for each data set to achieve the best classification accuracy. The proposed training procedure is tested on UCIHAR, CTG, ISOLET, and HAND data sets for which the performance consistently improves compared to the baseline across a range of confidence threshold values. The extended training procedure also results in a shift toward higher confidence values of the correctly classified samples, making the classifier not only more accurate but also more confident about its predictions.
{"title":"Training a Hyperdimensional Computing Classifier Using a Threshold on Its Confidence","authors":"Laura Smets;Werner Van Leekwijck;Ing Jyh Tsang;Steven Latré","doi":"10.1162/neco_a_01618","DOIUrl":"10.1162/neco_a_01618","url":null,"abstract":"Hyperdimensional computing (HDC) has become popular for light-weight and energy-efficient machine learning, suitable for wearable Internet-of-Things devices and near-sensor or on-device processing. HDC is computationally less complex than traditional deep learning algorithms and achieves moderate to good classification performance. This letter proposes to extend the training procedure in HDC by taking into account not only wrongly classified samples but also samples that are correctly classified by the HDC model but with low confidence. We introduce a confidence threshold that can be tuned for each data set to achieve the best classification accuracy. The proposed training procedure is tested on UCIHAR, CTG, ISOLET, and HAND data sets for which the performance consistently improves compared to the baseline across a range of confidence threshold values. The extended training procedure also results in a shift toward higher confidence values of the correctly classified samples, making the classifier not only more accurate but also more confident about its predictions.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41240929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hojin Jang;Syed Suleman Abbas Zaidi;Xavier Boix;Neeraj Prasad;Sharon Gilad-Gutnick;Shlomit Ben-Ami;Pawan Sinha
Deep convolutional neural networks (DCNNs) have demonstrated impressive robustness to recognize objects under transformations (e.g., blur or noise) when these transformations are included in the training set. A hypothesis to explain such robustness is that DCNNs develop invariant neural representations that remain unaltered when the image is transformed. However, to what extent this hypothesis holds true is an outstanding question, as robustness to transformations could be achieved with properties different from invariance; for example, parts of the network could be specialized to recognize either transformed or nontransformed images. This article investigates the conditions under which invariant neural representations emerge by leveraging that they facilitate robustness to transformations beyond the training distribution. Concretely, we analyze a training paradigm in which only some object categories are seen transformed during training and evaluate whether the DCNN is robust to transformations across categories not seen transformed. Our results with state-of-the-art DCNNs indicate that invariant neural representations do not always drive robustness to transformations, as networks show robustness for categories seen transformed during training even in the absence of invariant neural representations. Invariance emerges only as the number of transformed categories in the training set is increased. This phenomenon is much more prominent with local transformations such as blurring and high-pass filtering than geometric transformations such as rotation and thinning, which entail changes in the spatial arrangement of the object. Our results contribute to a better understanding of invariant neural representations in deep learning and the conditions under which it spontaneously emerges.
{"title":"Robustness to Transformations Across Categories: Is Robustness Driven by Invariant Neural Representations?","authors":"Hojin Jang;Syed Suleman Abbas Zaidi;Xavier Boix;Neeraj Prasad;Sharon Gilad-Gutnick;Shlomit Ben-Ami;Pawan Sinha","doi":"10.1162/neco_a_01621","DOIUrl":"10.1162/neco_a_01621","url":null,"abstract":"Deep convolutional neural networks (DCNNs) have demonstrated impressive robustness to recognize objects under transformations (e.g., blur or noise) when these transformations are included in the training set. A hypothesis to explain such robustness is that DCNNs develop invariant neural representations that remain unaltered when the image is transformed. However, to what extent this hypothesis holds true is an outstanding question, as robustness to transformations could be achieved with properties different from invariance; for example, parts of the network could be specialized to recognize either transformed or nontransformed images. This article investigates the conditions under which invariant neural representations emerge by leveraging that they facilitate robustness to transformations beyond the training distribution. Concretely, we analyze a training paradigm in which only some object categories are seen transformed during training and evaluate whether the DCNN is robust to transformations across categories not seen transformed. Our results with state-of-the-art DCNNs indicate that invariant neural representations do not always drive robustness to transformations, as networks show robustness for categories seen transformed during training even in the absence of invariant neural representations. Invariance emerges only as the number of transformed categories in the training set is increased. This phenomenon is much more prominent with local transformations such as blurring and high-pass filtering than geometric transformations such as rotation and thinning, which entail changes in the spatial arrangement of the object. Our results contribute to a better understanding of invariant neural representations in deep learning and the conditions under which it spontaneously emerges.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41240928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Backpropagation has rapidly become the workhorse credit assignment algorithm for modern deep learning methods. Recently, modified forms of predictive coding (PC), an algorithm with origins in computational neuroscience, have been shown to result in approximately or exactly equal parameter updates to those under backpropagation. Due to this connection, it has been suggested that PC can act as an alternative to backpropagation with desirable properties that may facilitate implementation in neuromorphic systems. Here, we explore these claims using the different contemporary PC variants proposed in the literature. We obtain time complexity bounds for these PC variants, which we show are lower bounded by backpropagation. We also present key properties of these variants that have implications for neurobiological plausibility and their interpretations, particularly from the perspective of standard PC as a variational Bayes algorithm for latent probabilistic models. Our findings shed new light on the connection between the two learning frameworks and suggest that in its current forms, PC may have more limited potential as a direct replacement of backpropagation than previously envisioned.
{"title":"Predictive Coding as a Neuromorphic Alternative to Backpropagation: A Critical Evaluation","authors":"Umais Zahid;Qinghai Guo;Zafeirios Fountas","doi":"10.1162/neco_a_01620","DOIUrl":"10.1162/neco_a_01620","url":null,"abstract":"Backpropagation has rapidly become the workhorse credit assignment algorithm for modern deep learning methods. Recently, modified forms of predictive coding (PC), an algorithm with origins in computational neuroscience, have been shown to result in approximately or exactly equal parameter updates to those under backpropagation. Due to this connection, it has been suggested that PC can act as an alternative to backpropagation with desirable properties that may facilitate implementation in neuromorphic systems. Here, we explore these claims using the different contemporary PC variants proposed in the literature. We obtain time complexity bounds for these PC variants, which we show are lower bounded by backpropagation. We also present key properties of these variants that have implications for neurobiological plausibility and their interpretations, particularly from the perspective of standard PC as a variational Bayes algorithm for latent probabilistic models. Our findings shed new light on the connection between the two learning frameworks and suggest that in its current forms, PC may have more limited potential as a direct replacement of backpropagation than previously envisioned.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41240927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, we have developed an incremental machine learning (ML) method that efficiently obtains the optimal model when a small number of instances or features are added or removed. This problem holds practical importance in model selection, such as cross-validation (CV) and feature selection. Among the class of ML methods known as linear estimators, there exists an efficient model update framework, the low-rank update, that can effectively handle changes in a small number of rows and columns within the data matrix. However, for ML methods beyond linear estimators, there is currently no comprehensive framework available to obtain knowledge about the updated solution within a specific computational complexity. In light of this, our study introduces a the generalized low-rank update (GLRU) method, which extends the low-rank update framework of linear estimators to ML methods formulated as a certain class of regularized empirical risk minimization, including commonly used methods such as support vector machines and logistic regression. The proposed GLRU method not only expands the range of its applicability but also provides information about the updated solutions with a computational complexity proportional to the number of data set changes. To demonstrate the effectiveness of the GLRU method, we conduct experiments showcasing its efficiency in performing cross-validation and feature selection compared to other baseline methods.
{"title":"Generalized Low-Rank Update: Model Parameter Bounds for Low-Rank Training Data Modifications","authors":"Hiroyuki Hanada;Noriaki Hashimoto;Kouichi Taji;Ichiro Takeuchi","doi":"10.1162/neco_a_01619","DOIUrl":"10.1162/neco_a_01619","url":null,"abstract":"In this study, we have developed an incremental machine learning (ML) method that efficiently obtains the optimal model when a small number of instances or features are added or removed. This problem holds practical importance in model selection, such as cross-validation (CV) and feature selection. Among the class of ML methods known as linear estimators, there exists an efficient model update framework, the low-rank update, that can effectively handle changes in a small number of rows and columns within the data matrix. However, for ML methods beyond linear estimators, there is currently no comprehensive framework available to obtain knowledge about the updated solution within a specific computational complexity. In light of this, our study introduces a the generalized low-rank update (GLRU) method, which extends the low-rank update framework of linear estimators to ML methods formulated as a certain class of regularized empirical risk minimization, including commonly used methods such as support vector machines and logistic regression. The proposed GLRU method not only expands the range of its applicability but also provides information about the updated solutions with a computational complexity proportional to the number of data set changes. To demonstrate the effectiveness of the GLRU method, we conduct experiments showcasing its efficiency in performing cross-validation and feature selection compared to other baseline methods.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41240926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural activity in the brain exhibits correlated fluctuations that may strongly influence the properties of neural population coding. However, how such correlated neural fluctuations may arise from the intrinsic neural circuit dynamics and subsequently affect the computational properties of neural population activity remains poorly understood. The main difficulty lies in resolving the nonlinear coupling between correlated fluctuations with the overall dynamics of the system. In this study, we investigate the emergence of synergistic neural population codes from the intrinsic dynamics of correlated neural fluctuations in a neural circuit model capturing realistic nonlinear noise coupling of spiking neurons. We show that a rich repertoire of spatial correlation patterns naturally emerges in a bump attractor network and further reveals the dynamical regime under which the interplay between differential and noise correlations leads to synergistic codes. Moreover, we find that negative correlations may induce stable bound states between two bumps, a phenomenon previously unobserved in firing rate models. These noise-induced effects of bump attractors lead to a number of computational advantages including enhanced working memory capacity and efficient spatiotemporal multiplexing and can account for a range of cognitive and behavioral phenomena related to working memory. This study offers a dynamical approach to investigating realistic correlated neural fluctuations and insights to their roles in cortical computations.
{"title":"Self-Organization of Nonlinearly Coupled Neural Fluctuations Into Synergistic Population Codes","authors":"Hengyuan Ma;Yang Qi;Pulin Gong;Jie Zhang;Wen-lian Lu;Jianfeng Feng","doi":"10.1162/neco_a_01612","DOIUrl":"10.1162/neco_a_01612","url":null,"abstract":"Neural activity in the brain exhibits correlated fluctuations that may strongly influence the properties of neural population coding. However, how such correlated neural fluctuations may arise from the intrinsic neural circuit dynamics and subsequently affect the computational properties of neural population activity remains poorly understood. The main difficulty lies in resolving the nonlinear coupling between correlated fluctuations with the overall dynamics of the system. In this study, we investigate the emergence of synergistic neural population codes from the intrinsic dynamics of correlated neural fluctuations in a neural circuit model capturing realistic nonlinear noise coupling of spiking neurons. We show that a rich repertoire of spatial correlation patterns naturally emerges in a bump attractor network and further reveals the dynamical regime under which the interplay between differential and noise correlations leads to synergistic codes. Moreover, we find that negative correlations may induce stable bound states between two bumps, a phenomenon previously unobserved in firing rate models. These noise-induced effects of bump attractors lead to a number of computational advantages including enhanced working memory capacity and efficient spatiotemporal multiplexing and can account for a range of cognitive and behavioral phenomena related to working memory. This study offers a dynamical approach to investigating realistic correlated neural fluctuations and insights to their roles in cortical computations.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41166103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mikail Khona;Sarthak Chandra;Joy J. Ma;Ila R. Fiete
Recurrent neural networks (RNNs) are often used to model circuits in the brain and can solve a variety of difficult computational problems requiring memory, error correction, or selection (Hopfield, 1982; Maass et al., 2002; Maass, 2011). However, fully connected RNNs contrast structurally with their biological counterparts, which are extremely sparse (about 0.1%). Motivated by the neocortex, where neural connectivity is constrained by physical distance along cortical sheets and other synaptic wiring costs, we introduce locality masked RNNs (LM-RNNs) that use task-agnostic predetermined graphs with sparsity as low as 4%. We study LM-RNNs in a multitask learning setting relevant to cognitive systems neuroscience with a commonly used set of tasks, 20-Cog-tasks (Yang et al., 2019). We show through reductio ad absurdum that 20-Cog-tasks can be solved by a small pool of separated autapses that we can mechanistically analyze and understand. Thus, these tasks fall short of the goal of inducing complex recurrent dynamics and modular structure in RNNs. We next contribute a new cognitive multitask battery, Mod-Cog, consisting of up to 132 tasks that expands by about seven-fold the number of tasks and task complexity of 20-Cog-tasks. Importantly, while autapses can solve the simple 20-Cog-tasks, the expanded task set requires richer neural architectures and continuous attractor dynamics. On these tasks, we show that LM-RNNs with an optimal sparsity result in faster training and better data efficiency than fully connected networks.
递归神经网络(RNN)通常用于对大脑中的电路进行建模,可以解决各种需要记忆、纠错或选择的计算难题(Hopfield,1982;Maas等人,2002年;Maas,2011年)。然而,完全连接的RNN在结构上与极为稀疏(约0.1%)的生物对应物形成对比。受新皮质的启发,神经连接受到沿皮质片的物理距离和其他突触布线成本的限制,我们引入了局部掩蔽RNN(LM RNN),它使用稀疏度低至4%的任务不可知的预定图。我们在与认知系统神经科学相关的多任务学习环境中研究LM RNN,使用一组常用的任务,即20个Cog任务(Yang et al.,2019)。我们通过荒谬的还原表明,20个Cog任务可以通过我们可以机械地分析和理解的一小部分分离的自闭症来解决。因此,这些任务没有达到在RNN中引入复杂的递归动力学和模块化结构的目标。接下来,我们贡献了一个新的认知多任务组Mod Cog,它由多达132个任务组成,任务数量和任务复杂性是20个Cog任务的7倍。重要的是,虽然自闭症可以解决简单的20个Cog任务,但扩展的任务集需要更丰富的神经结构和连续的吸引子动力学。在这些任务中,我们表明,与完全连接的网络相比,具有最佳稀疏性的LM RNN可以获得更快的训练和更好的数据效率。
{"title":"Winning the Lottery With Neural Connectivity Constraints: Faster Learning Across Cognitive Tasks With Spatially Constrained Sparse RNNs","authors":"Mikail Khona;Sarthak Chandra;Joy J. Ma;Ila R. Fiete","doi":"10.1162/neco_a_01613","DOIUrl":"10.1162/neco_a_01613","url":null,"abstract":"Recurrent neural networks (RNNs) are often used to model circuits in the brain and can solve a variety of difficult computational problems requiring memory, error correction, or selection (Hopfield, 1982; Maass et al., 2002; Maass, 2011). However, fully connected RNNs contrast structurally with their biological counterparts, which are extremely sparse (about 0.1%). Motivated by the neocortex, where neural connectivity is constrained by physical distance along cortical sheets and other synaptic wiring costs, we introduce locality masked RNNs (LM-RNNs) that use task-agnostic predetermined graphs with sparsity as low as 4%. We study LM-RNNs in a multitask learning setting relevant to cognitive systems neuroscience with a commonly used set of tasks, 20-Cog-tasks (Yang et al., 2019). We show through reductio ad absurdum that 20-Cog-tasks can be solved by a small pool of separated autapses that we can mechanistically analyze and understand. Thus, these tasks fall short of the goal of inducing complex recurrent dynamics and modular structure in RNNs. We next contribute a new cognitive multitask battery, Mod-Cog, consisting of up to 132 tasks that expands by about seven-fold the number of tasks and task complexity of 20-Cog-tasks. Importantly, while autapses can solve the simple 20-Cog-tasks, the expanded task set requires richer neural architectures and continuous attractor dynamics. On these tasks, we show that LM-RNNs with an optimal sparsity result in faster training and better data efficiency than fully connected networks.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41175279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Catastrophic forgetting remains an outstanding challenge in continual learning. Recently, methods inspired by the brain, such as continual representation learning and memory replay, have been used to combat catastrophic forgetting. Associative learning (retaining associations between inputs and outputs, even after good representations are learned) plays an important function in the brain; however, its role in continual learning has not been carefully studied. Here, we identified a two-layer neural circuit in the fruit fly olfactory system that performs continual associative learning between odors and their associated valences. In the first layer, inputs (odors) are encoded using sparse, high-dimensional representations, which reduces memory interference by activating nonoverlapping populations of neurons for different odors. In the second layer, only the synapses between odor-activated neurons and the odor’s associated output neuron are modified during learning; the rest of the weights are frozen to prevent unrelated memories from being overwritten. We prove theoretically that these two perceptron-like layers help reduce catastrophic forgetting compared to the original perceptron algorithm, under continual learning. We then show empirically on benchmark data sets that this simple and lightweight architecture outperforms other popular neural-inspired algorithms when also using a two-layer feedforward architecture. Overall, fruit flies evolved an efficient continual associative learning algorithm, and circuit mechanisms from neuroscience can be translated to improve machine computation.
{"title":"Reducing Catastrophic Forgetting With Associative Learning: A Lesson From Fruit Flies","authors":"Yang Shen;Sanjoy Dasgupta;Saket Navlakha","doi":"10.1162/neco_a_01615","DOIUrl":"10.1162/neco_a_01615","url":null,"abstract":"Catastrophic forgetting remains an outstanding challenge in continual learning. Recently, methods inspired by the brain, such as continual representation learning and memory replay, have been used to combat catastrophic forgetting. Associative learning (retaining associations between inputs and outputs, even after good representations are learned) plays an important function in the brain; however, its role in continual learning has not been carefully studied. Here, we identified a two-layer neural circuit in the fruit fly olfactory system that performs continual associative learning between odors and their associated valences. In the first layer, inputs (odors) are encoded using sparse, high-dimensional representations, which reduces memory interference by activating nonoverlapping populations of neurons for different odors. In the second layer, only the synapses between odor-activated neurons and the odor’s associated output neuron are modified during learning; the rest of the weights are frozen to prevent unrelated memories from being overwritten. We prove theoretically that these two perceptron-like layers help reduce catastrophic forgetting compared to the original perceptron algorithm, under continual learning. We then show empirically on benchmark data sets that this simple and lightweight architecture outperforms other popular neural-inspired algorithms when also using a two-layer feedforward architecture. Overall, fruit flies evolved an efficient continual associative learning algorithm, and circuit mechanisms from neuroscience can be translated to improve machine computation.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41151453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Markov chains are a class of probabilistic models that have achieved widespread application in the quantitative sciences. This is in part due to their versatility, but is compounded by the ease with which they can be probed analytically. This tutorial provides an in-depth introduction to Markov chains and explores their connection to graphs and random walks. We use tools from linear algebra and graph theory to describe the transition matrices of different types of Markov chains, with a particular focus on exploring properties of the eigenvalues and eigenvectors corresponding to these matrices. The results presented are relevant to a number of methods in machine learning and data mining, which we describe at various stages. Rather than being a novel academic study in its own right, this text presents a collection of known results, together with some new concepts. Moreover, the tutorial focuses on offering intuition to readers rather than formal understanding and only assumes basic exposure to concepts from linear algebra and probability theory. It is therefore accessible to students and researchers from a wide variety of disciplines.
{"title":"A Tutorial on the Spectral Theory of Markov Chains","authors":"Eddie Seabrook;Laurenz Wiskott","doi":"10.1162/neco_a_01611","DOIUrl":"10.1162/neco_a_01611","url":null,"abstract":"Markov chains are a class of probabilistic models that have achieved widespread application in the quantitative sciences. This is in part due to their versatility, but is compounded by the ease with which they can be probed analytically. This tutorial provides an in-depth introduction to Markov chains and explores their connection to graphs and random walks. We use tools from linear algebra and graph theory to describe the transition matrices of different types of Markov chains, with a particular focus on exploring properties of the eigenvalues and eigenvectors corresponding to these matrices. The results presented are relevant to a number of methods in machine learning and data mining, which we describe at various stages. Rather than being a novel academic study in its own right, this text presents a collection of known results, together with some new concepts. Moreover, the tutorial focuses on offering intuition to readers rather than formal understanding and only assumes basic exposure to concepts from linear algebra and probability theory. It is therefore accessible to students and researchers from a wide variety of disciplines.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41174469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}