Amir Hossein Ghaderi;Hongye Wang;Andrea B. Protzner
Exploring the dynamics and complexity of brain signal is critical to advancing our understanding of brain function. Recent fMRI studies have revealed links between BOLD signal variability or complexity with static/dynamics features of functional brain networks (FBN). However, the association between variability/complexity and regional centrality is still understudied. Here we investigate the association between variability/complexity and static/dynamic nodal features of FBN using graph theory analysis with fMRI BOLD data acquired during naturalistic movie watching. We found that variability positively correlated with fine-scale complexity but negatively correlated with coarse-scale complexity. Specifically, regions with high centrality and clustering coefficient were related to less variable but more complex signal. Similar relationships persisted for dynamic FBN, but the associations with certain aspects (e.g., eigenvector centrality) of regional centrality dynamics became insignificant. Our findings demonstrate that the relationship between BOLD signal variability and static/dynamic FBN with BOLD signal complexity depends on the temporal scale of signal complexity and that time-varying features of FBN reflect the complexities of how BOLD signal variability/complexity coevolve with dynamic FBN.
{"title":"Exploring the Interplay Between BOLD Signal Variability, Complexity, Static and Dynamic Functional Brain Network Features During Movie Viewing","authors":"Amir Hossein Ghaderi;Hongye Wang;Andrea B. Protzner","doi":"10.1162/NECO.a.1488","DOIUrl":"10.1162/NECO.a.1488","url":null,"abstract":"Exploring the dynamics and complexity of brain signal is critical to advancing our understanding of brain function. Recent fMRI studies have revealed links between BOLD signal variability or complexity with static/dynamics features of functional brain networks (FBN). However, the association between variability/complexity and regional centrality is still understudied. Here we investigate the association between variability/complexity and static/dynamic nodal features of FBN using graph theory analysis with fMRI BOLD data acquired during naturalistic movie watching. We found that variability positively correlated with fine-scale complexity but negatively correlated with coarse-scale complexity. Specifically, regions with high centrality and clustering coefficient were related to less variable but more complex signal. Similar relationships persisted for dynamic FBN, but the associations with certain aspects (e.g., eigenvector centrality) of regional centrality dynamics became insignificant. Our findings demonstrate that the relationship between BOLD signal variability and static/dynamic FBN with BOLD signal complexity depends on the temporal scale of signal complexity and that time-varying features of FBN reflect the complexities of how BOLD signal variability/complexity coevolve with dynamic FBN.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 3","pages":"373-402"},"PeriodicalIF":2.1,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Faris B. Rustom;Rohan Sharma;Haluk Öğmen;Arash Yazdanbakhsh
Object detection and recognition are fundamental functions that play a significant role in the success of species. Because the appearance of an object exhibits large variability, the brain has to group these different stimuli under the same object identity, a process of generalization. Does the process of generalization follow some general principles, or is it an ad hoc bag of tricks? The universal law of generalization (ULoG) provides evidence that generalization follows similar properties across a variety of species and tasks. Here, we tested the hypothesis derived from ULoG that the internal representations underlying generalization reflect the natural properties of object detection and recognition in our environment rather than the specifics of the system solving these problems. Neural networks with universal-approximation capability have been successful in many object detection and recognition tasks; however, how these networks reach their decisions remains opaque. To provide a strong test for ecological validity, we used natural camouflage, which is nature's test bed for object detection and recognition. We trained a deep neural network with natural images of “clear” and “camouflaged” animals and examined the emerging internal representations. We extended ULoG to a realistic learning regime, with multiple consequential stimuli, and developed two methods to determine category prototypes. Our results show that with a proper choice of category prototypes, the generalization functions are monotone decreasing, similar to the generalization functions of biological systems. Critically, we show that camouflaged inputs are not represented randomly but rather systematically appear at the tail of the monotone decreasing functions. Our results support the hypothesis that the internal representations underlying generalization in object detection and recognition are shaped mainly by the properties of the ecological environment, even though different biological and artificial systems may generate these internal representations through drastically different learning and adaptation processes. Furthermore, the extended version of ULoG provides a tool to analyze how the system organizes its internal representations during learning as well as how it makes its decisions.
{"title":"Object Detection, Recognition, Deep Learning, and the Universal Law of Generalization","authors":"Faris B. Rustom;Rohan Sharma;Haluk Öğmen;Arash Yazdanbakhsh","doi":"10.1162/NECO.a.1483","DOIUrl":"10.1162/NECO.a.1483","url":null,"abstract":"Object detection and recognition are fundamental functions that play a significant role in the success of species. Because the appearance of an object exhibits large variability, the brain has to group these different stimuli under the same object identity, a process of generalization. Does the process of generalization follow some general principles, or is it an ad hoc bag of tricks? The universal law of generalization (ULoG) provides evidence that generalization follows similar properties across a variety of species and tasks. Here, we tested the hypothesis derived from ULoG that the internal representations underlying generalization reflect the natural properties of object detection and recognition in our environment rather than the specifics of the system solving these problems. Neural networks with universal-approximation capability have been successful in many object detection and recognition tasks; however, how these networks reach their decisions remains opaque. To provide a strong test for ecological validity, we used natural camouflage, which is nature's test bed for object detection and recognition. We trained a deep neural network with natural images of “clear” and “camouflaged” animals and examined the emerging internal representations. We extended ULoG to a realistic learning regime, with multiple consequential stimuli, and developed two methods to determine category prototypes. Our results show that with a proper choice of category prototypes, the generalization functions are monotone decreasing, similar to the generalization functions of biological systems. Critically, we show that camouflaged inputs are not represented randomly but rather systematically appear at the tail of the monotone decreasing functions. Our results support the hypothesis that the internal representations underlying generalization in object detection and recognition are shaped mainly by the properties of the ecological environment, even though different biological and artificial systems may generate these internal representations through drastically different learning and adaptation processes. Furthermore, the extended version of ULoG provides a tool to analyze how the system organizes its internal representations during learning as well as how it makes its decisions.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 3","pages":"328-372"},"PeriodicalIF":2.1,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ben Tsuda;Stefan C. Pate;Kay M. Tye;Hava T. Siegelmann;Terrence J. Sejnowski
Neuromodulators are critical controllers of neural states, with dysfunctions linked to various neuropsychiatric disorders. Although many biological aspects of neuromodulation have been studied, the computational principles underlying how neuromodulation of distributed neural populations controls brain states remain unclear. In contrast to external contextual inputs, neuromodulation can act as a single scalar signal that is broadcast to a vast population of neurons. We model the modulation of synaptic weight in a recurrent neural network model and show that neuromodulators can dramatically alter the function of a network, even when highly simplified. We find that under structural constraints like those in brains, this provides a fundamental mechanism that can increase the computational capability and flexibility of a neural network. Diffuse synaptic weight modulation enables storage of multiple memories using a common set of synapses that are able to generate diverse, even diametrically opposed, behaviors. Our findings help explain how neuromodulators unlock specific behaviors by creating task-specific hyperchannels in neural activity space and motivate more flexible, compact and capable machine learning architectures.
{"title":"Neuromodulators Generate Multiple Context-Relevant Behaviors in Recurrent Neural Networks","authors":"Ben Tsuda;Stefan C. Pate;Kay M. Tye;Hava T. Siegelmann;Terrence J. Sejnowski","doi":"10.1162/NECO.a.1489","DOIUrl":"10.1162/NECO.a.1489","url":null,"abstract":"Neuromodulators are critical controllers of neural states, with dysfunctions linked to various neuropsychiatric disorders. Although many biological aspects of neuromodulation have been studied, the computational principles underlying how neuromodulation of distributed neural populations controls brain states remain unclear. In contrast to external contextual inputs, neuromodulation can act as a single scalar signal that is broadcast to a vast population of neurons. We model the modulation of synaptic weight in a recurrent neural network model and show that neuromodulators can dramatically alter the function of a network, even when highly simplified. We find that under structural constraints like those in brains, this provides a fundamental mechanism that can increase the computational capability and flexibility of a neural network. Diffuse synaptic weight modulation enables storage of multiple memories using a common set of synapses that are able to generate diverse, even diametrically opposed, behaviors. Our findings help explain how neuromodulators unlock specific behaviors by creating task-specific hyperchannels in neural activity space and motivate more flexible, compact and capable machine learning architectures.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 3","pages":"292-327"},"PeriodicalIF":2.1,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The visual system performs a remarkable feat: it takes complex retinal activation patterns and decodes them for object recognition. This operation, termed "representational untangling," organizes neural representations by clustering similar objects together while separating different categories of objects. While representational untangling is usually associated with higher-order visual areas like the inferior temporal cortex, it remains unclear how the early visual system contributes to this process-whether through highly selective neurons or high-dimensional population codes. This article investigates how a computational model of early vision contributes to representational untangling. Using a computational visual hierarchy and two different data sets consisting of numerals and objects, we demonstrate that simulated complex cells significantly contribute to representational untangling for object recognition. Our findings challenge prior theories by showing that untangling does not depend on skewed, sparse, or high-dimensional representations. Instead, simulated complex cells reformat visual information into a low-dimensional, yet more separable, neural code, striking a balance between representational untangling and computational efficiency.
{"title":"Simulated Complex Cells Contribute to Object Recognition Through Representational Untangling.","authors":"Mitchell B Slapik, Harel Z Shouval","doi":"10.1162/NECO.a.1480","DOIUrl":"10.1162/NECO.a.1480","url":null,"abstract":"<p><p>The visual system performs a remarkable feat: it takes complex retinal activation patterns and decodes them for object recognition. This operation, termed \"representational untangling,\" organizes neural representations by clustering similar objects together while separating different categories of objects. While representational untangling is usually associated with higher-order visual areas like the inferior temporal cortex, it remains unclear how the early visual system contributes to this process-whether through highly selective neurons or high-dimensional population codes. This article investigates how a computational model of early vision contributes to representational untangling. Using a computational visual hierarchy and two different data sets consisting of numerals and objects, we demonstrate that simulated complex cells significantly contribute to representational untangling for object recognition. Our findings challenge prior theories by showing that untangling does not depend on skewed, sparse, or high-dimensional representations. Instead, simulated complex cells reformat visual information into a low-dimensional, yet more separable, neural code, striking a balance between representational untangling and computational efficiency.</p>","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":" ","pages":"145-164"},"PeriodicalIF":2.1,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12848683/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Echo state networks (ESNs) are a class of recurrent neural networks in which only the readout layer is trainable, while the recurrent and input layers are fixed. This architectural constraint enables computationally efficient processing of time-series data. Traditionally, the readout layer in ESNs is trained using supervised learning with target outputs. In this study, we focus on input reconstruction (IR), where the readout layer is trained to reconstruct the input time series fed into the ESN. We show that IR can be achieved through unsupervised learning (UL), without access to supervised targets, provided that the ESN parameters are known a priori and satisfy invertibility conditions. This formulation allows applications relying on IR, such as dynamical system replication and noise filtering, to be reformulated within the UL framework via straightforward integration with existing algorithms. Our results suggest that prior knowledge of ESN parameters can reduce reliance on supervision, thereby establishing a new principle—not only by fixing part of the network parameters but also by exploiting their specific values. Furthermore, our UL-based algorithms for input reconstruction and related tasks are suitable for autonomous processing, offering insights into how analogous computational mechanisms might operate in the brain in principle. These findings contribute to a deeper understanding of the mathematical foundations of ESNs and their relevance to models in computational neuroscience.
{"title":"Unsupervised Learning in Echo State Networks for Input Reconstruction","authors":"Taiki Yamada;Yuichi Katori;Kantaro Fujiwara","doi":"10.1162/NECO.a.38","DOIUrl":"10.1162/NECO.a.38","url":null,"abstract":"Echo state networks (ESNs) are a class of recurrent neural networks in which only the readout layer is trainable, while the recurrent and input layers are fixed. This architectural constraint enables computationally efficient processing of time-series data. Traditionally, the readout layer in ESNs is trained using supervised learning with target outputs. In this study, we focus on input reconstruction (IR), where the readout layer is trained to reconstruct the input time series fed into the ESN. We show that IR can be achieved through unsupervised learning (UL), without access to supervised targets, provided that the ESN parameters are known a priori and satisfy invertibility conditions. This formulation allows applications relying on IR, such as dynamical system replication and noise filtering, to be reformulated within the UL framework via straightforward integration with existing algorithms. Our results suggest that prior knowledge of ESN parameters can reduce reliance on supervision, thereby establishing a new principle—not only by fixing part of the network parameters but also by exploiting their specific values. Furthermore, our UL-based algorithms for input reconstruction and related tasks are suitable for autonomous processing, offering insights into how analogous computational mechanisms might operate in the brain in principle. These findings contribute to a deeper understanding of the mathematical foundations of ESNs and their relevance to models in computational neuroscience.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 2","pages":"198-227"},"PeriodicalIF":2.1,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145403093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When applying nonnegative matrix factorization (NMF), the rank parameter is generally unknown. This rank, called the nonnegative rank, is usually estimated heuristically since computing its exact value is NP-hard. In this work, we propose an approximation method to estimate the rank on the fly while solving NMF. We use the sum-of-norm (SON), a group-lasso structure that encourages pairwise similarity, to reduce the rank of a factor matrix when the initial rank is overestimated. On various data sets, SON-NMF can reveal the correct nonnegative rank of the data without prior knowledge or parameter tuning. SON-NMF is a nonconvex, nonsmooth, nonseparable, and nonproximable problem, making it nontrivial to solve. First, since rank estimation in NMF is NP-hard, the proposed approach does not benefit from lower computational complexity. Using a graph-theoretic argument, we prove that the complexity of SON NMF is essentially irreducible. Second, the per iteration cost of algorithms for SON-NMF can be high. This motivates us to propose a first-order BCD algorithm that approximately solves SON-NMF with low per iteration cost via the proximal average operator. SON-NMF exhibits favorable features for applications. Besides the ability to automatically estimate the rank from data, SON-NMF can handle rank-deficient data matrices and detect weak components with little energy. Furthermore, in hyperspectral imaging, SON-NMF naturally addresses the issue of spectral variability.
{"title":"Sum-of-Norms Regularized Nonnegative Matrix Factorization","authors":"Andersen Ang;Waqas Bin Hamed;Hans De Sterck","doi":"10.1162/NECO.a.1482","DOIUrl":"10.1162/NECO.a.1482","url":null,"abstract":"When applying nonnegative matrix factorization (NMF), the rank parameter is generally unknown. This rank, called the nonnegative rank, is usually estimated heuristically since computing its exact value is NP-hard. In this work, we propose an approximation method to estimate the rank on the fly while solving NMF. We use the sum-of-norm (SON), a group-lasso structure that encourages pairwise similarity, to reduce the rank of a factor matrix when the initial rank is overestimated. On various data sets, SON-NMF can reveal the correct nonnegative rank of the data without prior knowledge or parameter tuning. SON-NMF is a nonconvex, nonsmooth, nonseparable, and nonproximable problem, making it nontrivial to solve. First, since rank estimation in NMF is NP-hard, the proposed approach does not benefit from lower computational complexity. Using a graph-theoretic argument, we prove that the complexity of SON NMF is essentially irreducible. Second, the per iteration cost of algorithms for SON-NMF can be high. This motivates us to propose a first-order BCD algorithm that approximately solves SON-NMF with low per iteration cost via the proximal average operator. SON-NMF exhibits favorable features for applications. Besides the ability to automatically estimate the rank from data, SON-NMF can handle rank-deficient data matrices and detect weak components with little energy. Furthermore, in hyperspectral imaging, SON-NMF naturally addresses the issue of spectral variability.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 2","pages":"228-255"},"PeriodicalIF":2.1,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Operator learning is a recent development in the simulation of partial differential equations by means of neural networks. The idea behind this approach is to learn the behavior of an operator, such that the resulting neural network is an approximate mapping in infinite-dimensional spaces that is capable of (approximately) simulating the solution operator governed by the partial differential equation. In our work, we study some general approximation capabilities for linear differential operators by approximating the corresponding symbol in the Fourier domain. Analogous to the structure of the class of Hörmander symbols, we consider the approximation with respect to a topology that is induced by a sequence of semi-norms. In that sense, we measure the approximation error in terms of a Fréchet metric, and our main result identifies sufficient conditions for achieving a predefined approximation error. We then focus on a natural extension of our main theorem, in which we reduce the assumptions on the sequence of seminorms. Based on existing approximation results for the exponential spectral Barron space, we then present a concrete example of symbols that can be approximated well.
{"title":"Approximation Rates in Fréchet Metrics: Barron Spaces, Paley-Wiener Spaces, and Fourier Multipliers","authors":"Ahmed Abdeljawad;Thomas Dittrich","doi":"10.1162/NECO.a.1481","DOIUrl":"10.1162/NECO.a.1481","url":null,"abstract":"Operator learning is a recent development in the simulation of partial differential equations by means of neural networks. The idea behind this approach is to learn the behavior of an operator, such that the resulting neural network is an approximate mapping in infinite-dimensional spaces that is capable of (approximately) simulating the solution operator governed by the partial differential equation. In our work, we study some general approximation capabilities for linear differential operators by approximating the corresponding symbol in the Fourier domain. Analogous to the structure of the class of Hörmander symbols, we consider the approximation with respect to a topology that is induced by a sequence of semi-norms. In that sense, we measure the approximation error in terms of a Fréchet metric, and our main result identifies sufficient conditions for achieving a predefined approximation error. We then focus on a natural extension of our main theorem, in which we reduce the assumptions on the sequence of seminorms. Based on existing approximation results for the exponential spectral Barron space, we then present a concrete example of symbols that can be approximated well.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 2","pages":"165-197"},"PeriodicalIF":2.1,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This simulation study shows how a set of working memory tasks can be acquired simultaneously through interaction between a stacked recurrent neural network (RNN) and multiple working memories. In these tasks, temporal patterns are provided, followed by linguistically specified task goals. Training is performed in a supervised manner by minimizing the free energy, and goal-directed tasks are performed using the active inference (AIF) framework. Our simulation results show that the best task performance is obtained when two working memory modules are used instead of one or none and when self-directed inner speech is incorporated during task execution. Detailed analysis indicates that a temporal hierarchy develops in the stacked RNN module under these optimal conditions. We argue that the model’s capacity for generalization across novel task configurations is supported by the structured interplay between working memory and the generation of self-directed language outputs during task execution. This interplay promotes internal representations that reflect task structure, which in turn support generalization by enabling a functional separation between content encoding and control dynamics within the memory architecture.
{"title":"Working Memory and Self-Directed Inner Speech Enhance Multitask Generalization in Active Inference","authors":"Jeffrey Frederic Queißer;Jun Tani","doi":"10.1162/NECO.a.36","DOIUrl":"10.1162/NECO.a.36","url":null,"abstract":"This simulation study shows how a set of working memory tasks can be acquired simultaneously through interaction between a stacked recurrent neural network (RNN) and multiple working memories. In these tasks, temporal patterns are provided, followed by linguistically specified task goals. Training is performed in a supervised manner by minimizing the free energy, and goal-directed tasks are performed using the active inference (AIF) framework. Our simulation results show that the best task performance is obtained when two working memory modules are used instead of one or none and when self-directed inner speech is incorporated during task execution. Detailed analysis indicates that a temporal hierarchy develops in the stacked RNN module under these optimal conditions. We argue that the model’s capacity for generalization across novel task configurations is supported by the structured interplay between working memory and the generation of self-directed language outputs during task execution. This interplay promotes internal representations that reflect task structure, which in turn support generalization by enabling a functional separation between content encoding and control dynamics within the memory architecture.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 1","pages":"28-70"},"PeriodicalIF":2.1,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145403074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We establish that a broad class of effective learning rules—those that improve a scalar performance measure over a given time window—can be expressed as natural gradient descent with respect to an appropriately defined metric. Specifically, parameter updates in this class can always be written as the product of a symmetric positive-definite matrix and the negative gradient of a loss function encoding the task. Given the high level of generality, our findings formally support the idea that the gradient is a fundamental object underlying all learning processes. Our results are valid across a wide range of common settings, including continuous- time, discrete-time, stochastic, and higher-order learning rules, as well as loss functions with explicit time dependence. Beyond providing a unified framework for learning, our results also have practical implications for control as well as experimental neuroscience.
{"title":"Effective Learning Rules as Natural Gradient Descent","authors":"Lucas Shoji;Kenta Suzuki;Leo Kozachkov","doi":"10.1162/NECO.a.1474","DOIUrl":"10.1162/NECO.a.1474","url":null,"abstract":"We establish that a broad class of effective learning rules—those that improve a scalar performance measure over a given time window—can be expressed as natural gradient descent with respect to an appropriately defined metric. Specifically, parameter updates in this class can always be written as the product of a symmetric positive-definite matrix and the negative gradient of a loss function encoding the task. Given the high level of generality, our findings formally support the idea that the gradient is a fundamental object underlying all learning processes. Our results are valid across a wide range of common settings, including continuous- time, discrete-time, stochastic, and higher-order learning rules, as well as loss functions with explicit time dependence. Beyond providing a unified framework for learning, our results also have practical implications for control as well as experimental neuroscience.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 1","pages":"71-96"},"PeriodicalIF":2.1,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145524833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lancelot Da Costa;Tomáš Gavenčiak;David Hyland;Mandana Samiei;Cristian Dragos-Manta;Candice Pattisapu;Adeel Razi;Karl Friston
This paper offers a road map for the development of scalable aligned artificial intelligence (AI) from first principle descriptions of natural intelligence. In brief, a possible path toward scalable aligned AI rests on enabling artificial agents to learn a good model of the world that includes a good model of our preferences. For this, the main objective is creating agents that learn to represent the world and other agents’ world models, a problem that falls under structure learning (also known as causal representation learning or model discovery). We expose the structure learning and alignment problems with this goal in mind, as well as principles to guide us forward, synthesizing various ideas across mathematics, statistics, and cognitive science. We discuss the essential role of core knowledge, information geometry, and model reduction in structure learning and suggest core structural modules to learn a wide range of naturalistic worlds. We then outline a way toward aligned agents through structure learning and theory of mind. As an illustrative example, we mathematically sketch Asimov’s laws of robotics, which prescribe agents to act cautiously to minimize the ill-being of other agents. We supplement this example by proposing refined approaches to alignment. These observations may guide the development of artificial intelligence in helping to scale existing, or design new, aligned structure learning systems.
{"title":"Possible Principles for Aligned Structure Learning Agents","authors":"Lancelot Da Costa;Tomáš Gavenčiak;David Hyland;Mandana Samiei;Cristian Dragos-Manta;Candice Pattisapu;Adeel Razi;Karl Friston","doi":"10.1162/NECO.a.39","DOIUrl":"10.1162/NECO.a.39","url":null,"abstract":"This paper offers a road map for the development of scalable aligned artificial intelligence (AI) from first principle descriptions of natural intelligence. In brief, a possible path toward scalable aligned AI rests on enabling artificial agents to learn a good model of the world that includes a good model of our preferences. For this, the main objective is creating agents that learn to represent the world and other agents’ world models, a problem that falls under structure learning (also known as causal representation learning or model discovery). We expose the structure learning and alignment problems with this goal in mind, as well as principles to guide us forward, synthesizing various ideas across mathematics, statistics, and cognitive science. We discuss the essential role of core knowledge, information geometry, and model reduction in structure learning and suggest core structural modules to learn a wide range of naturalistic worlds. We then outline a way toward aligned agents through structure learning and theory of mind. As an illustrative example, we mathematically sketch Asimov’s laws of robotics, which prescribe agents to act cautiously to minimize the ill-being of other agents. We supplement this example by proposing refined approaches to alignment. These observations may guide the development of artificial intelligence in helping to scale existing, or design new, aligned structure learning systems.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 1","pages":"97-143"},"PeriodicalIF":2.1,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145524952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}