Juliana Londono Alvarez, Katherine Morrison, Carina Curto
Neural circuits in the brain perform a variety of essential functions, including input classification, pattern completion, and the generation of rhythms and oscillations that support functions such as breathing and locomotion. There is also substantial evidence that the brain encodes memories and processes information via sequences of neural activity. Traditionally, rhythmic activity and pattern generation have been modeled using coupled oscillators, whereas input classification and pattern completion have been modeled using at-tractor neural networks. Here, we present a theoretical framework that demonstrates how attractor-based networks can also generate diverse rhythmic patterns, such as those of central pattern generator circuits (CPGs). Additionally, we propose a mechanism for transitioning between patterns. Specifically, we construct a network that can step through a sequence of five different quadruped gaits. It is composed of two dynamically distinct modules: a counter network that can count the number of external inputs it receives via a sequence of fixed points and a locomotion network that encodes five different quadruped gaits as limit cycles. A sequence of locomotive gaits is obtained by connecting the counter network with the locomotion network. Specifically, we introduce a new architecture for layering networks that produces fusion attractors, binding pairs of attractors from individual layers. All of this is accomplished within a unified framework of attractor-based models using threshold-linear networks.
{"title":"Attractor-Based Models for Sequences and Pattern Generation in Neural Circuits.","authors":"Juliana Londono Alvarez, Katherine Morrison, Carina Curto","doi":"10.1162/NECO.a.1492","DOIUrl":"10.1162/NECO.a.1492","url":null,"abstract":"<p><p>Neural circuits in the brain perform a variety of essential functions, including input classification, pattern completion, and the generation of rhythms and oscillations that support functions such as breathing and locomotion. There is also substantial evidence that the brain encodes memories and processes information via sequences of neural activity. Traditionally, rhythmic activity and pattern generation have been modeled using coupled oscillators, whereas input classification and pattern completion have been modeled using at-tractor neural networks. Here, we present a theoretical framework that demonstrates how attractor-based networks can also generate diverse rhythmic patterns, such as those of central pattern generator circuits (CPGs). Additionally, we propose a mechanism for transitioning between patterns. Specifically, we construct a network that can step through a sequence of five different quadruped gaits. It is composed of two dynamically distinct modules: a counter network that can count the number of external inputs it receives via a sequence of fixed points and a locomotion network that encodes five different quadruped gaits as limit cycles. A sequence of locomotive gaits is obtained by connecting the counter network with the locomotion network. Specifically, we introduce a new architecture for layering networks that produces fusion attractors, binding pairs of attractors from individual layers. All of this is accomplished within a unified framework of attractor-based models using threshold-linear networks.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-35"},"PeriodicalIF":2.1,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Presynaptic axon terminals maintain in their cytosol an almost constant level of adenosine triphosphate (ATP) to safeguard neurotransmission during varying workloads. In the study reported in this letter, it is argued that the vesicular release of neurotransmitter and the recycling of transmitter via astrocytes may itself be a mechanism of ATP homeostasis. In a minimal metabolic model of a presynaptic axon bouton, the accumulation of glutamate into vesicles and the activity-dependent supply of its precursor glutamine by astrocytes generated a steady-state level of ATP that was independent of the workload. When the workload increased, an enhanced supply of glutamine raised the rate of ATP production through the conversion of glutamate to the Krebs cycle intermediate α-ketoglutarate. The accumulation and release of glutamate, on the other hand, acted as a leak that diminished ATP production when the workload decreased. The fraction of ATP that the axon spent on the release and recycling of glutamate was small (4.7%), irrespective of the workload. Increasing this fraction enhanced the speed of ATP homeostasis and reduced the futile production of ATP. The model can be extended to axons releasing other, or coreleasing multiple, transmitters. Hence, the activity-dependent formation and release of neurotransmitter may be a universal mechanism of ATP homeostasis.
{"title":"Local Glutamate-Glutamine Cycling Underlies Presynaptic ATP Homeostasis.","authors":"Reinoud Maex","doi":"10.1162/NECO.a.1490","DOIUrl":"https://doi.org/10.1162/NECO.a.1490","url":null,"abstract":"<p><p>Presynaptic axon terminals maintain in their cytosol an almost constant level of adenosine triphosphate (ATP) to safeguard neurotransmission during varying workloads. In the study reported in this letter, it is argued that the vesicular release of neurotransmitter and the recycling of transmitter via astrocytes may itself be a mechanism of ATP homeostasis. In a minimal metabolic model of a presynaptic axon bouton, the accumulation of glutamate into vesicles and the activity-dependent supply of its precursor glutamine by astrocytes generated a steady-state level of ATP that was independent of the workload. When the workload increased, an enhanced supply of glutamine raised the rate of ATP production through the conversion of glutamate to the Krebs cycle intermediate α-ketoglutarate. The accumulation and release of glutamate, on the other hand, acted as a leak that diminished ATP production when the workload decreased. The fraction of ATP that the axon spent on the release and recycling of glutamate was small (4.7%), irrespective of the workload. Increasing this fraction enhanced the speed of ATP homeostasis and reduced the futile production of ATP. The model can be extended to axons releasing other, or coreleasing multiple, transmitters. Hence, the activity-dependent formation and release of neurotransmitter may be a universal mechanism of ATP homeostasis.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-36"},"PeriodicalIF":2.1,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Théophile Champion, Howard Bowman, Dimitrije Marković, Marek Grześ
Active inference is a process theory of perception, learning, and decision making that is applied to a range of research fields, including neuroscience, robotics, psychology, and machine learning. Active inference rests on an objective function called the expected free energy, which can be justified by the intuitive plausibility of its formulations-for example, the risk plus ambiguity and information gain/pragmatic value formulations. This letter seeks to formalize the problem of deriving these formulations from a single root expected free energy definition-the unification problem. Then we analyze two approaches to defining expected free energy. More precisely, the expected free energy is either defined as (1) the risk over observations plus ambiguity or (2) the risk over states plus ambiguity. In the first setting, no rigorous mathematical justification for the expected free energy has been proposed to date, but all the formulations can be recovered from it by assuming that the likelihood of target distribution T(o|s) is the likelihood of the generative model P(o|s). Importantly, under this likelihood constraint, if the likelihood is lossless,1 then prior preferences over observations can be defined arbitrarily. However, in the more general case of partially observable Markov decision processes (POMDPs), we demonstrate that the likelihood constraint effectively restricts the set of valid prior preferences over observations. Indeed, only a limited class of prior preferences over observations is compatible with the likelihood mapping of the generative model. In the second setting, a justification of the root expected free energy definition exists, but this setting only accounts for two formulations: the risk over states plus ambiguity and entropy plus expected energy formulations. We conclude with a discussion of the conditions under which a unification of expected free energy formulations has been proposed in the literature by appeal to the free energy principle in the specific context of systems without random fluctuations.
{"title":"Reframing the Expected Free Energy: Four Formulations and a Unification.","authors":"Théophile Champion, Howard Bowman, Dimitrije Marković, Marek Grześ","doi":"10.1162/NECO.a.1491","DOIUrl":"https://doi.org/10.1162/NECO.a.1491","url":null,"abstract":"<p><p>Active inference is a process theory of perception, learning, and decision making that is applied to a range of research fields, including neuroscience, robotics, psychology, and machine learning. Active inference rests on an objective function called the expected free energy, which can be justified by the intuitive plausibility of its formulations-for example, the risk plus ambiguity and information gain/pragmatic value formulations. This letter seeks to formalize the problem of deriving these formulations from a single root expected free energy definition-the unification problem. Then we analyze two approaches to defining expected free energy. More precisely, the expected free energy is either defined as (1) the risk over observations plus ambiguity or (2) the risk over states plus ambiguity. In the first setting, no rigorous mathematical justification for the expected free energy has been proposed to date, but all the formulations can be recovered from it by assuming that the likelihood of target distribution T(o|s) is the likelihood of the generative model P(o|s). Importantly, under this likelihood constraint, if the likelihood is lossless,1 then prior preferences over observations can be defined arbitrarily. However, in the more general case of partially observable Markov decision processes (POMDPs), we demonstrate that the likelihood constraint effectively restricts the set of valid prior preferences over observations. Indeed, only a limited class of prior preferences over observations is compatible with the likelihood mapping of the generative model. In the second setting, a justification of the root expected free energy definition exists, but this setting only accounts for two formulations: the risk over states plus ambiguity and entropy plus expected energy formulations. We conclude with a discussion of the conditions under which a unification of expected free energy formulations has been proposed in the literature by appeal to the free energy principle in the specific context of systems without random fluctuations.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-31"},"PeriodicalIF":2.1,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amir Hossein Ghaderi, Hongye Wang, Andrea B Protzner
Exploring the dynamics and complexity of brain signal is critical to advancing our understanding of brain function. Recent fMRI studies have revealed links between BOLD signal variability or complexity with static/dynamics features of functional brain networks (FBN). However, the association between variability/complexity and regional centrality is still understudied. Here we investigate the association between variability/complexity and static/dynamic nodal features of FBN using graph theory analysis with fMRI BOLD data acquired during naturalistic movie watching. We found that variability positively correlated with fine-scale complexity but negatively correlated with coarse-scale complexity. Specifically, regions with high centrality and clustering coefficient were related to less variable but more complex signal. Similar relationships persisted for dynamic FBN, but the associations with certain aspects (e.g., eigenvector centrality) of regional centrality dynamics became insignificant. Our findings demonstrate that the relationship between BOLD signal variability and static/dynamic FBN with BOLD signal complexity depends on the temporal scale of signal complexity and that time-varying features of FBN reflect the complexities of how BOLD signal variability/complexity coevolve with dynamic FBN.
{"title":"Exploring the Interplay between BOLD Signal Variability, Complexity, Static and Dynamic Functional Brain Network Features During Movie Viewing.","authors":"Amir Hossein Ghaderi, Hongye Wang, Andrea B Protzner","doi":"10.1162/NECO.a.1488","DOIUrl":"https://doi.org/10.1162/NECO.a.1488","url":null,"abstract":"<p><p>Exploring the dynamics and complexity of brain signal is critical to advancing our understanding of brain function. Recent fMRI studies have revealed links between BOLD signal variability or complexity with static/dynamics features of functional brain networks (FBN). However, the association between variability/complexity and regional centrality is still understudied. Here we investigate the association between variability/complexity and static/dynamic nodal features of FBN using graph theory analysis with fMRI BOLD data acquired during naturalistic movie watching. We found that variability positively correlated with fine-scale complexity but negatively correlated with coarse-scale complexity. Specifically, regions with high centrality and clustering coefficient were related to less variable but more complex signal. Similar relationships persisted for dynamic FBN, but the associations with certain aspects (e.g., eigenvector centrality) of regional centrality dynamics became insignificant. Our findings demonstrate that the relationship between BOLD signal variability and static/dynamic FBN with BOLD signal complexity depends on the temporal scale of signal complexity and that time-varying features of FBN reflect the complexities of how BOLD signal variability/complexity coevolve with dynamic FBN.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-30"},"PeriodicalIF":2.1,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ben Tsuda, Stefan C Pate, Kay M Tye, Hava T Siegelmann, Terrence J Sejnowski
Neuromodulators are critical controllers of neural states, with dysfunctions linked to various neuropsychiatric disorders. Although many biological aspects of neuromodulation have been studied, the computational principles underlying how neuromodulation of distributed neural populations controls brain states remain unclear. In contrast to external contextual inputs, neuromodulation can act as a single scalar signal that is broadcast to a vast population of neurons. We model the modulation of synaptic weight in a recurrent neural network model and show that neuromodulators can dramatically alter the function of a network, even when highly simplified. We find that under structural constraints like those in brains, this provides a fundamental mechanism that can increase the computational capability and flexibility of a neural network. Diffuse synaptic weight modulation enables storage of multiple memories using a common set of synapses that are able to generate diverse, even diametrically opposed, behaviors. Our findings help explain how neuromodulators unlock specific behaviors by creating task-specific hyperchannels in neural activity space and motivate more flexible, compact and capable machine learning architectures.
{"title":"Neuromodulators Generate Multiple Context-Relevant Behaviors in Recurrent Neural Networks.","authors":"Ben Tsuda, Stefan C Pate, Kay M Tye, Hava T Siegelmann, Terrence J Sejnowski","doi":"10.1162/NECO.a.1489","DOIUrl":"https://doi.org/10.1162/NECO.a.1489","url":null,"abstract":"<p><p>Neuromodulators are critical controllers of neural states, with dysfunctions linked to various neuropsychiatric disorders. Although many biological aspects of neuromodulation have been studied, the computational principles underlying how neuromodulation of distributed neural populations controls brain states remain unclear. In contrast to external contextual inputs, neuromodulation can act as a single scalar signal that is broadcast to a vast population of neurons. We model the modulation of synaptic weight in a recurrent neural network model and show that neuromodulators can dramatically alter the function of a network, even when highly simplified. We find that under structural constraints like those in brains, this provides a fundamental mechanism that can increase the computational capability and flexibility of a neural network. Diffuse synaptic weight modulation enables storage of multiple memories using a common set of synapses that are able to generate diverse, even diametrically opposed, behaviors. Our findings help explain how neuromodulators unlock specific behaviors by creating task-specific hyperchannels in neural activity space and motivate more flexible, compact and capable machine learning architectures.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-36"},"PeriodicalIF":2.1,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Faris B Rustom, Rohan Sharma, Haluk Öğmen, Arash Yazdanbakhsh
Object detection and recognition are fundamental functions that play a significant role in the success of species. Because the appearance of an object exhibits large variability, the brain has to group these different stimuli under the same object identity, a process of generalization. Does the process of generalization follow some general principles, or is it an ad hoc bag of tricks? The universal law of generalization (ULoG) provides evidence that generalization follows similar properties across a variety of species and tasks. Here, we tested the hypothesis derived from ULoG that the internal representations underlying generalization reflect the natural properties of object detection and recognition in our environment rather than the specifics of the system solving these problems. Neural networks with universal-approximation capability have been successful in many object detection and recognition tasks; however, how these networks reach their decisions remains opaque. To provide a strong test for ecological validity, we used natural camouflage, which is nature's test bed for object detection and recognition. We trained a deep neural network with natural images of "clear" and "camouflaged" animals and examined the emerging internal representations. We extended ULoG to a realistic learning regime, with multiple consequential stimuli, and developed two methods to determine category prototypes. Our results show that with a proper choice of category prototypes, the generalization functions are monotone decreasing, similar to the generalization functions of biological systems. Critically, we show that camouflaged inputs are not represented randomly but rather systematically appear at the tail of the monotone decreasing functions. Our results support the hypothesis that the internal representations underlying generalization in object detection and recognition are shaped mainly by the properties of the ecological environment, even though different biological and artificial systems may generate these internal representations through drastically different learning and adaptation processes. Furthermore, the extended version of ULoG provides a tool to analyze how the system organizes its internal representations during learning as well as how it makes its decisions.
{"title":"Object Detection, Recognition, Deep Learning, and the Universal Law of Generalization.","authors":"Faris B Rustom, Rohan Sharma, Haluk Öğmen, Arash Yazdanbakhsh","doi":"10.1162/NECO.a.1483","DOIUrl":"https://doi.org/10.1162/NECO.a.1483","url":null,"abstract":"<p><p>Object detection and recognition are fundamental functions that play a significant role in the success of species. Because the appearance of an object exhibits large variability, the brain has to group these different stimuli under the same object identity, a process of generalization. Does the process of generalization follow some general principles, or is it an ad hoc bag of tricks? The universal law of generalization (ULoG) provides evidence that generalization follows similar properties across a variety of species and tasks. Here, we tested the hypothesis derived from ULoG that the internal representations underlying generalization reflect the natural properties of object detection and recognition in our environment rather than the specifics of the system solving these problems. Neural networks with universal-approximation capability have been successful in many object detection and recognition tasks; however, how these networks reach their decisions remains opaque. To provide a strong test for ecological validity, we used natural camouflage, which is nature's test bed for object detection and recognition. We trained a deep neural network with natural images of \"clear\" and \"camouflaged\" animals and examined the emerging internal representations. We extended ULoG to a realistic learning regime, with multiple consequential stimuli, and developed two methods to determine category prototypes. Our results show that with a proper choice of category prototypes, the generalization functions are monotone decreasing, similar to the generalization functions of biological systems. Critically, we show that camouflaged inputs are not represented randomly but rather systematically appear at the tail of the monotone decreasing functions. Our results support the hypothesis that the internal representations underlying generalization in object detection and recognition are shaped mainly by the properties of the ecological environment, even though different biological and artificial systems may generate these internal representations through drastically different learning and adaptation processes. Furthermore, the extended version of ULoG provides a tool to analyze how the system organizes its internal representations during learning as well as how it makes its decisions.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"1-45"},"PeriodicalIF":2.1,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The visual system performs a remarkable feat: it takes complex retinal activation patterns and decodes them for object recognition. This operation, termed "representational untangling," organizes neural representations by clustering similar objects together while separating different categories of objects. While representational untangling is usually associated with higher-order visual areas like the inferior temporal cortex, it remains unclear how the early visual system contributes to this process-whether through highly selective neurons or high-dimensional population codes. This article investigates how a computational model of early vision contributes to representational untangling. Using a computational visual hierarchy and two different data sets consisting of numerals and objects, we demonstrate that simulated complex cells significantly contribute to representational untangling for object recognition. Our findings challenge prior theories by showing that untangling does not depend on skewed, sparse, or high-dimensional representations. Instead, simulated complex cells reformat visual information into a low-dimensional, yet more separable, neural code, striking a balance between representational untangling and computational efficiency.
{"title":"Simulated Complex Cells Contribute to Object Recognition Through Representational Untangling.","authors":"Mitchell B Slapik, Harel Z Shouval","doi":"10.1162/NECO.a.1480","DOIUrl":"10.1162/NECO.a.1480","url":null,"abstract":"<p><p>The visual system performs a remarkable feat: it takes complex retinal activation patterns and decodes them for object recognition. This operation, termed \"representational untangling,\" organizes neural representations by clustering similar objects together while separating different categories of objects. While representational untangling is usually associated with higher-order visual areas like the inferior temporal cortex, it remains unclear how the early visual system contributes to this process-whether through highly selective neurons or high-dimensional population codes. This article investigates how a computational model of early vision contributes to representational untangling. Using a computational visual hierarchy and two different data sets consisting of numerals and objects, we demonstrate that simulated complex cells significantly contribute to representational untangling for object recognition. Our findings challenge prior theories by showing that untangling does not depend on skewed, sparse, or high-dimensional representations. Instead, simulated complex cells reformat visual information into a low-dimensional, yet more separable, neural code, striking a balance between representational untangling and computational efficiency.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"145-164"},"PeriodicalIF":2.1,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12848683/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Echo state networks (ESNs) are a class of recurrent neural networks in which only the readout layer is trainable, while the recurrent and input layers are fixed. This architectural constraint enables computationally efficient processing of time-series data. Traditionally, the readout layer in ESNs is trained using supervised learning with target outputs. In this study, we focus on input reconstruction (IR), where the readout layer is trained to reconstruct the input time series fed into the ESN. We show that IR can be achieved through unsupervised learning (UL), without access to supervised targets, provided that the ESN parameters are known a priori and satisfy invertibility conditions. This formulation allows applications relying on IR, such as dynamical system replication and noise filtering, to be reformulated within the UL framework via straightforward integration with existing algorithms. Our results suggest that prior knowledge of ESN parameters can reduce reliance on supervision, thereby establishing a new principle—not only by fixing part of the network parameters but also by exploiting their specific values. Furthermore, our UL-based algorithms for input reconstruction and related tasks are suitable for autonomous processing, offering insights into how analogous computational mechanisms might operate in the brain in principle. These findings contribute to a deeper understanding of the mathematical foundations of ESNs and their relevance to models in computational neuroscience.
{"title":"Unsupervised Learning in Echo State Networks for Input Reconstruction","authors":"Taiki Yamada;Yuichi Katori;Kantaro Fujiwara","doi":"10.1162/NECO.a.38","DOIUrl":"10.1162/NECO.a.38","url":null,"abstract":"Echo state networks (ESNs) are a class of recurrent neural networks in which only the readout layer is trainable, while the recurrent and input layers are fixed. This architectural constraint enables computationally efficient processing of time-series data. Traditionally, the readout layer in ESNs is trained using supervised learning with target outputs. In this study, we focus on input reconstruction (IR), where the readout layer is trained to reconstruct the input time series fed into the ESN. We show that IR can be achieved through unsupervised learning (UL), without access to supervised targets, provided that the ESN parameters are known a priori and satisfy invertibility conditions. This formulation allows applications relying on IR, such as dynamical system replication and noise filtering, to be reformulated within the UL framework via straightforward integration with existing algorithms. Our results suggest that prior knowledge of ESN parameters can reduce reliance on supervision, thereby establishing a new principle—not only by fixing part of the network parameters but also by exploiting their specific values. Furthermore, our UL-based algorithms for input reconstruction and related tasks are suitable for autonomous processing, offering insights into how analogous computational mechanisms might operate in the brain in principle. These findings contribute to a deeper understanding of the mathematical foundations of ESNs and their relevance to models in computational neuroscience.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 2","pages":"198-227"},"PeriodicalIF":2.1,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145403093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When applying nonnegative matrix factorization (NMF), the rank parameter is generally unknown. This rank, called the nonnegative rank, is usually estimated heuristically since computing its exact value is NP-hard. In this work, we propose an approximation method to estimate the rank on the fly while solving NMF. We use the sum-of-norm (SON), a group-lasso structure that encourages pairwise similarity, to reduce the rank of a factor matrix when the initial rank is overestimated. On various data sets, SON-NMF can reveal the correct nonnegative rank of the data without prior knowledge or parameter tuning. SON-NMF is a nonconvex, nonsmooth, nonseparable, and nonproximable problem, making it nontrivial to solve. First, since rank estimation in NMF is NP-hard, the proposed approach does not benefit from lower computational complexity. Using a graph-theoretic argument, we prove that the complexity of SON NMF is essentially irreducible. Second, the per iteration cost of algorithms for SON-NMF can be high. This motivates us to propose a first-order BCD algorithm that approximately solves SON-NMF with low per iteration cost via the proximal average operator. SON-NMF exhibits favorable features for applications. Besides the ability to automatically estimate the rank from data, SON-NMF can handle rank-deficient data matrices and detect weak components with little energy. Furthermore, in hyperspectral imaging, SON-NMF naturally addresses the issue of spectral variability.
{"title":"Sum-of-Norms Regularized Nonnegative Matrix Factorization","authors":"Andersen Ang;Waqas Bin Hamed;Hans De Sterck","doi":"10.1162/NECO.a.1482","DOIUrl":"10.1162/NECO.a.1482","url":null,"abstract":"When applying nonnegative matrix factorization (NMF), the rank parameter is generally unknown. This rank, called the nonnegative rank, is usually estimated heuristically since computing its exact value is NP-hard. In this work, we propose an approximation method to estimate the rank on the fly while solving NMF. We use the sum-of-norm (SON), a group-lasso structure that encourages pairwise similarity, to reduce the rank of a factor matrix when the initial rank is overestimated. On various data sets, SON-NMF can reveal the correct nonnegative rank of the data without prior knowledge or parameter tuning. SON-NMF is a nonconvex, nonsmooth, nonseparable, and nonproximable problem, making it nontrivial to solve. First, since rank estimation in NMF is NP-hard, the proposed approach does not benefit from lower computational complexity. Using a graph-theoretic argument, we prove that the complexity of SON NMF is essentially irreducible. Second, the per iteration cost of algorithms for SON-NMF can be high. This motivates us to propose a first-order BCD algorithm that approximately solves SON-NMF with low per iteration cost via the proximal average operator. SON-NMF exhibits favorable features for applications. Besides the ability to automatically estimate the rank from data, SON-NMF can handle rank-deficient data matrices and detect weak components with little energy. Furthermore, in hyperspectral imaging, SON-NMF naturally addresses the issue of spectral variability.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 2","pages":"228-255"},"PeriodicalIF":2.1,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Operator learning is a recent development in the simulation of partial differential equations by means of neural networks. The idea behind this approach is to learn the behavior of an operator, such that the resulting neural network is an approximate mapping in infinite-dimensional spaces that is capable of (approximately) simulating the solution operator governed by the partial differential equation. In our work, we study some general approximation capabilities for linear differential operators by approximating the corresponding symbol in the Fourier domain. Analogous to the structure of the class of Hörmander symbols, we consider the approximation with respect to a topology that is induced by a sequence of semi-norms. In that sense, we measure the approximation error in terms of a Fréchet metric, and our main result identifies sufficient conditions for achieving a predefined approximation error. We then focus on a natural extension of our main theorem, in which we reduce the assumptions on the sequence of seminorms. Based on existing approximation results for the exponential spectral Barron space, we then present a concrete example of symbols that can be approximated well.
{"title":"Approximation Rates in Fréchet Metrics: Barron Spaces, Paley-Wiener Spaces, and Fourier Multipliers","authors":"Ahmed Abdeljawad;Thomas Dittrich","doi":"10.1162/NECO.a.1481","DOIUrl":"10.1162/NECO.a.1481","url":null,"abstract":"Operator learning is a recent development in the simulation of partial differential equations by means of neural networks. The idea behind this approach is to learn the behavior of an operator, such that the resulting neural network is an approximate mapping in infinite-dimensional spaces that is capable of (approximately) simulating the solution operator governed by the partial differential equation. In our work, we study some general approximation capabilities for linear differential operators by approximating the corresponding symbol in the Fourier domain. Analogous to the structure of the class of Hörmander symbols, we consider the approximation with respect to a topology that is induced by a sequence of semi-norms. In that sense, we measure the approximation error in terms of a Fréchet metric, and our main result identifies sufficient conditions for achieving a predefined approximation error. We then focus on a natural extension of our main theorem, in which we reduce the assumptions on the sequence of seminorms. Based on existing approximation results for the exponential spectral Barron space, we then present a concrete example of symbols that can be approximated well.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 2","pages":"165-197"},"PeriodicalIF":2.1,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}