This simulation study shows how a set of working memory tasks can be acquired simultaneously through interaction between a stacked recurrent neural network (RNN) and multiple working memories. In these tasks, temporal patterns are provided, followed by linguistically specified task goals. Training is performed in a supervised manner by minimizing the free energy, and goal-directed tasks are performed using the active inference (AIF) framework. Our simulation results show that the best task performance is obtained when two working memory modules are used instead of one or none and when self-directed inner speech is incorporated during task execution. Detailed analysis indicates that a temporal hierarchy develops in the stacked RNN module under these optimal conditions. We argue that the model’s capacity for generalization across novel task configurations is supported by the structured interplay between working memory and the generation of self-directed language outputs during task execution. This interplay promotes internal representations that reflect task structure, which in turn support generalization by enabling a functional separation between content encoding and control dynamics within the memory architecture.
{"title":"Working Memory and Self-Directed Inner Speech Enhance Multitask Generalization in Active Inference","authors":"Jeffrey Frederic Queißer;Jun Tani","doi":"10.1162/NECO.a.36","DOIUrl":"10.1162/NECO.a.36","url":null,"abstract":"This simulation study shows how a set of working memory tasks can be acquired simultaneously through interaction between a stacked recurrent neural network (RNN) and multiple working memories. In these tasks, temporal patterns are provided, followed by linguistically specified task goals. Training is performed in a supervised manner by minimizing the free energy, and goal-directed tasks are performed using the active inference (AIF) framework. Our simulation results show that the best task performance is obtained when two working memory modules are used instead of one or none and when self-directed inner speech is incorporated during task execution. Detailed analysis indicates that a temporal hierarchy develops in the stacked RNN module under these optimal conditions. We argue that the model’s capacity for generalization across novel task configurations is supported by the structured interplay between working memory and the generation of self-directed language outputs during task execution. This interplay promotes internal representations that reflect task structure, which in turn support generalization by enabling a functional separation between content encoding and control dynamics within the memory architecture.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 1","pages":"28-70"},"PeriodicalIF":2.1,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145403074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We establish that a broad class of effective learning rules—those that improve a scalar performance measure over a given time window—can be expressed as natural gradient descent with respect to an appropriately defined metric. Specifically, parameter updates in this class can always be written as the product of a symmetric positive-definite matrix and the negative gradient of a loss function encoding the task. Given the high level of generality, our findings formally support the idea that the gradient is a fundamental object underlying all learning processes. Our results are valid across a wide range of common settings, including continuous- time, discrete-time, stochastic, and higher-order learning rules, as well as loss functions with explicit time dependence. Beyond providing a unified framework for learning, our results also have practical implications for control as well as experimental neuroscience.
{"title":"Effective Learning Rules as Natural Gradient Descent","authors":"Lucas Shoji;Kenta Suzuki;Leo Kozachkov","doi":"10.1162/NECO.a.1474","DOIUrl":"10.1162/NECO.a.1474","url":null,"abstract":"We establish that a broad class of effective learning rules—those that improve a scalar performance measure over a given time window—can be expressed as natural gradient descent with respect to an appropriately defined metric. Specifically, parameter updates in this class can always be written as the product of a symmetric positive-definite matrix and the negative gradient of a loss function encoding the task. Given the high level of generality, our findings formally support the idea that the gradient is a fundamental object underlying all learning processes. Our results are valid across a wide range of common settings, including continuous- time, discrete-time, stochastic, and higher-order learning rules, as well as loss functions with explicit time dependence. Beyond providing a unified framework for learning, our results also have practical implications for control as well as experimental neuroscience.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 1","pages":"71-96"},"PeriodicalIF":2.1,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145524833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lancelot Da Costa;Tomáš Gavenčiak;David Hyland;Mandana Samiei;Cristian Dragos-Manta;Candice Pattisapu;Adeel Razi;Karl Friston
This paper offers a road map for the development of scalable aligned artificial intelligence (AI) from first principle descriptions of natural intelligence. In brief, a possible path toward scalable aligned AI rests on enabling artificial agents to learn a good model of the world that includes a good model of our preferences. For this, the main objective is creating agents that learn to represent the world and other agents’ world models, a problem that falls under structure learning (also known as causal representation learning or model discovery). We expose the structure learning and alignment problems with this goal in mind, as well as principles to guide us forward, synthesizing various ideas across mathematics, statistics, and cognitive science. We discuss the essential role of core knowledge, information geometry, and model reduction in structure learning and suggest core structural modules to learn a wide range of naturalistic worlds. We then outline a way toward aligned agents through structure learning and theory of mind. As an illustrative example, we mathematically sketch Asimov’s laws of robotics, which prescribe agents to act cautiously to minimize the ill-being of other agents. We supplement this example by proposing refined approaches to alignment. These observations may guide the development of artificial intelligence in helping to scale existing, or design new, aligned structure learning systems.
{"title":"Possible Principles for Aligned Structure Learning Agents","authors":"Lancelot Da Costa;Tomáš Gavenčiak;David Hyland;Mandana Samiei;Cristian Dragos-Manta;Candice Pattisapu;Adeel Razi;Karl Friston","doi":"10.1162/NECO.a.39","DOIUrl":"10.1162/NECO.a.39","url":null,"abstract":"This paper offers a road map for the development of scalable aligned artificial intelligence (AI) from first principle descriptions of natural intelligence. In brief, a possible path toward scalable aligned AI rests on enabling artificial agents to learn a good model of the world that includes a good model of our preferences. For this, the main objective is creating agents that learn to represent the world and other agents’ world models, a problem that falls under structure learning (also known as causal representation learning or model discovery). We expose the structure learning and alignment problems with this goal in mind, as well as principles to guide us forward, synthesizing various ideas across mathematics, statistics, and cognitive science. We discuss the essential role of core knowledge, information geometry, and model reduction in structure learning and suggest core structural modules to learn a wide range of naturalistic worlds. We then outline a way toward aligned agents through structure learning and theory of mind. As an illustrative example, we mathematically sketch Asimov’s laws of robotics, which prescribe agents to act cautiously to minimize the ill-being of other agents. We supplement this example by proposing refined approaches to alignment. These observations may guide the development of artificial intelligence in helping to scale existing, or design new, aligned structure learning systems.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 1","pages":"97-143"},"PeriodicalIF":2.1,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145524952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pranav Mahajan;Mufeng Tang;T. Ed Li;Ioannis Havoutis;Ben Seymour
Modern robots face a challenge shared by biological systems: how to learn and adaptively express multiple sensorimotor skills. A key aspect of this is developing an internal model of expected sensorimotor experiences to detect and react to unexpected events, guiding self-preserving behaviors. Associative skill memories (ASMs) address this by linking movement primitives to sensory feedback, but existing implementations rely on hard-coded libraries of individual skills. A key unresolved problem is how a single neural network can learn a repertoire of skills while enabling integrated fault detection and context-aware execution. Here we introduce neural associative skill memories (neural ASMs), a framework that uses self-supervised temporal predictive coding to integrate skill learning and expression using biologically plausible local learning rules. Unlike traditional ASMs, which require explicit skill selection, neural ASMs implicitly recognize and express skills through contextual inference, enabling fault detection using “predictive surprise” across the entire learned repertoire. Compared to recurrent neural networks trained using backpropagation through time, our model achieves comparable qualitative performance in skill memory expression while using local learning rules and predicts a biologically relevant speed-versus-accuracy trade-off. By integrating fault detection, reactive control, and skill expression into a single energy-based architecture, neural ASMs contribute to safer, self-preserving robotics and provide a computational lens to study biological sensorimotor learning.
{"title":"Neural Associative Skill Memories for Safer Robotics and Modeling Human Sensorimotor Repertoires","authors":"Pranav Mahajan;Mufeng Tang;T. Ed Li;Ioannis Havoutis;Ben Seymour","doi":"10.1162/NECO.a.1475","DOIUrl":"10.1162/NECO.a.1475","url":null,"abstract":"Modern robots face a challenge shared by biological systems: how to learn and adaptively express multiple sensorimotor skills. A key aspect of this is developing an internal model of expected sensorimotor experiences to detect and react to unexpected events, guiding self-preserving behaviors. Associative skill memories (ASMs) address this by linking movement primitives to sensory feedback, but existing implementations rely on hard-coded libraries of individual skills. A key unresolved problem is how a single neural network can learn a repertoire of skills while enabling integrated fault detection and context-aware execution. Here we introduce neural associative skill memories (neural ASMs), a framework that uses self-supervised temporal predictive coding to integrate skill learning and expression using biologically plausible local learning rules. Unlike traditional ASMs, which require explicit skill selection, neural ASMs implicitly recognize and express skills through contextual inference, enabling fault detection using “predictive surprise” across the entire learned repertoire. Compared to recurrent neural networks trained using backpropagation through time, our model achieves comparable qualitative performance in skill memory expression while using local learning rules and predicts a biologically relevant speed-versus-accuracy trade-off. By integrating fault detection, reactive control, and skill expression into a single energy-based architecture, neural ASMs contribute to safer, self-preserving robotics and provide a computational lens to study biological sensorimotor learning.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"38 1","pages":"1-27"},"PeriodicalIF":2.1,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145524867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Humans (and many vertebrates) face the problem of fusing together multiple fixations of a scene in order to obtain a representation of the whole, where each fixation uses a high-resolution fovea and decreasing resolution in the periphery. In this letter, we explicitly represent the retinal transformation of a fixation as a linear downsampling of a high-resolution latent image of the scene, exploiting the known geometry. This linear transformation allows us to carry out exact inference for the latent variables in factor analysis (FA) and mixtures of FA models of the scene. This also allows us to formulate and solve the choice of where to look next as a Bayesian experimental design problem using the expected information gain criterion. Experiments on the Frey faces and MNIST data sets demonstrate the effectiveness of our models.
{"title":"Fusing Foveal Fixations Using Linear Retinal Transformations and Bayesian Experimental Design","authors":"Christopher K. I. Williams","doi":"10.1162/neco.a.33","DOIUrl":"10.1162/neco.a.33","url":null,"abstract":"Humans (and many vertebrates) face the problem of fusing together multiple fixations of a scene in order to obtain a representation of the whole, where each fixation uses a high-resolution fovea and decreasing resolution in the periphery. In this letter, we explicitly represent the retinal transformation of a fixation as a linear downsampling of a high-resolution latent image of the scene, exploiting the known geometry. This linear transformation allows us to carry out exact inference for the latent variables in factor analysis (FA) and mixtures of FA models of the scene. This also allows us to formulate and solve the choice of where to look next as a Bayesian experimental design problem using the expected information gain criterion. Experiments on the Frey faces and MNIST data sets demonstrate the effectiveness of our models.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 12","pages":"2235-2256"},"PeriodicalIF":2.1,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural manifolds are an attractive theoretical framework for characterizing the complex behaviors of neural populations. However, many of the tools for identifying these low-dimensional subspaces are correlational and provide limited insight into the underlying dynamics. The ability to precisely control the latent activity of a circuit would allow researchers to investigate the structure and function of neural manifolds. We simulate controlling the latent dynamics of a neural population using closed-loop, dynamically generated sensory inputs. Using a spiking neural network (SNN) as a model of a neural circuit, we find low-dimensional representations of both the network activity (the neural manifold) and a set of salient visual stimuli. The fields of classical and optimal control offer a range of methods to choose from for controlling dynamics on the neural manifold, which differ in performance, computational cost, and ease of implementation. Here, we focus on two commonly used control methods: proportional-integral-derivative (PID) control and model predictive control (MPC). PID is a computationally lightweight controller that is simple to implement. In contrast, MPC is a model-based, anticipatory controller with a much higher computational cost and engineering overhead. We evaluate both methods on trajectory-following tasks in latent space, under partial observability and in the presence of unknown noise. While both controllers in some cases were able to successfully control the latent dynamics on the neural manifold, MPC consistently produced more accurate control and required less hyperparameter tuning. These results demonstrate how MPC can be applied on the neural manifold using data-driven dynamics models and provide a framework to experimentally test for causal relationships between manifold dynamics and external stimuli.
{"title":"Model Predictive Control on the Neural Manifold","authors":"Christof Fehrman;C. Daniel Meliza","doi":"10.1162/neco.a.37","DOIUrl":"10.1162/neco.a.37","url":null,"abstract":"Neural manifolds are an attractive theoretical framework for characterizing the complex behaviors of neural populations. However, many of the tools for identifying these low-dimensional subspaces are correlational and provide limited insight into the underlying dynamics. The ability to precisely control the latent activity of a circuit would allow researchers to investigate the structure and function of neural manifolds. We simulate controlling the latent dynamics of a neural population using closed-loop, dynamically generated sensory inputs. Using a spiking neural network (SNN) as a model of a neural circuit, we find low-dimensional representations of both the network activity (the neural manifold) and a set of salient visual stimuli. The fields of classical and optimal control offer a range of methods to choose from for controlling dynamics on the neural manifold, which differ in performance, computational cost, and ease of implementation. Here, we focus on two commonly used control methods: proportional-integral-derivative (PID) control and model predictive control (MPC). PID is a computationally lightweight controller that is simple to implement. In contrast, MPC is a model-based, anticipatory controller with a much higher computational cost and engineering overhead. We evaluate both methods on trajectory-following tasks in latent space, under partial observability and in the presence of unknown noise. While both controllers in some cases were able to successfully control the latent dynamics on the neural manifold, MPC consistently produced more accurate control and required less hyperparameter tuning. These results demonstrate how MPC can be applied on the neural manifold using data-driven dynamics models and provide a framework to experimentally test for causal relationships between manifold dynamics and external stimuli.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 12","pages":"2125-2157"},"PeriodicalIF":2.1,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145403057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Active inference, grounded in the free energy principle, provides a powerful lens for understanding how agents balance exploration and goal-directed behavior in uncertain environments. Here, we propose a new planning framework that integrates Monte Carlo tree search (MCTS) with active inference objectives to systematically reduce epistemic uncertainty while pursuing extrinsic rewards. Our key insight is that MCTS, already renowned for its search efficiency, can be naturally extended to incorporate free energy minimization by blending expected rewards with information gain. Concretely, the cross-entropy method (CEM) is used to optimize action proposals at the root node, while tree expansions leverage reward modeling alongside intrinsic exploration bonuses. This synergy allows our planner to maintain coherent estimates of value and uncertainty throughout planning, without sacrificing computational tractability. Empirically, we benchmark our planner on a diverse set of continuous control tasks, where it demonstrates performance gains over both stand-alone CEM and MCTS with random rollouts.
{"title":"Boosting MCTS With Free Energy Minimization","authors":"Mawaba Pascal Dao;Adrian M. Peter","doi":"10.1162/neco.a.31","DOIUrl":"10.1162/neco.a.31","url":null,"abstract":"Active inference, grounded in the free energy principle, provides a powerful lens for understanding how agents balance exploration and goal-directed behavior in uncertain environments. Here, we propose a new planning framework that integrates Monte Carlo tree search (MCTS) with active inference objectives to systematically reduce epistemic uncertainty while pursuing extrinsic rewards. Our key insight is that MCTS, already renowned for its search efficiency, can be naturally extended to incorporate free energy minimization by blending expected rewards with information gain. Concretely, the cross-entropy method (CEM) is used to optimize action proposals at the root node, while tree expansions leverage reward modeling alongside intrinsic exploration bonuses. This synergy allows our planner to maintain coherent estimates of value and uncertainty throughout planning, without sacrificing computational tractability. Empirically, we benchmark our planner on a diverse set of continuous control tasks, where it demonstrates performance gains over both stand-alone CEM and MCTS with random rollouts.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 12","pages":"2205-2234"},"PeriodicalIF":2.1,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simon Wilshin;Matthew D. Kvalheim;Clayton Scott;Shai Revzen
Oscillators are ubiquitous in nature and are usually associated with the existence of an asymptotic phase that governs the long-term dynamics of the oscillator. We show that the asymptotic phase can be estimated using a carefully chosen series expansion that directly computes the phase response curve (PRC) and provides an algorithm for estimating the coefficients of this series. Unlike previously available data-driven phase estimation methods, our algorithm can use observations that are much shorter than a cycle; has proven convergence rate bounds as a function of the properties of measurement noise and system noise; will recover phase within any forward invariant region for which sufficient data are available; recovers the PRCs that govern weak oscillator coupling; and recovers isochron curvature and recovers nonlinear features of isochron geometry. Our method may find application wherever models of oscillator dynamics need to be constructed from measured or simulated time-series.
{"title":"Estimating Phase From Observed Trajectories Using the Temporal 1-Form","authors":"Simon Wilshin;Matthew D. Kvalheim;Clayton Scott;Shai Revzen","doi":"10.1162/neco.a.32","DOIUrl":"10.1162/neco.a.32","url":null,"abstract":"Oscillators are ubiquitous in nature and are usually associated with the existence of an asymptotic phase that governs the long-term dynamics of the oscillator. We show that the asymptotic phase can be estimated using a carefully chosen series expansion that directly computes the phase response curve (PRC) and provides an algorithm for estimating the coefficients of this series. Unlike previously available data-driven phase estimation methods, our algorithm can use observations that are much shorter than a cycle; has proven convergence rate bounds as a function of the properties of measurement noise and system noise; will recover phase within any forward invariant region for which sufficient data are available; recovers the PRCs that govern weak oscillator coupling; and recovers isochron curvature and recovers nonlinear features of isochron geometry. Our method may find application wherever models of oscillator dynamics need to be constructed from measured or simulated time-series.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 12","pages":"2158-2204"},"PeriodicalIF":2.1,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Number sense, the ability to rapidly estimate object quantities in a visual scene without precise counting, is a crucial cognitive capacity found in humans and many other animals. Recent studies have identified artificial neurons tuned to numbers of items in biologically inspired vision models, even before training, and proposed these artificial neural networks as candidate models for the emergence of number sense in the brain. But real-world numerosity perception requires abstraction from the properties of individual objects and their contexts, unlike the simplified dot patterns used in previous studies. Using novel synthetically generated photorealistic stimuli, we show that deep convolutional neural networks optimized for object recognition encode information on approximate numerosity across diverse objects and scene types, which could be linearly read out from distributed activity patterns of later convolutional layers of different network architectures tested. In contrast, untrained networks with random weights failed to represent numerosity with abstractness to other visual properties and instead captured mainly low-level visual features. Our findings emphasize the importance of using complex, naturalistic stimuli to investigate mechanisms of number sense in both biological and artificial systems, and they suggest that the capacity of untrained networks to account for early-life numerical abilities should be reassessed. They further point to a possible, so far underappreciated, contribution of the brain's ventral visual pathway to representing numerosity with abstractness to other high-level visual properties.
{"title":"Encoding of Numerosity With Robustness to Object and Scene Identity in Biologically Inspired Object Recognition Networks","authors":"Thomas Chapalain;Bertrand Thirion;Evelyn Eger","doi":"10.1162/neco.a.30","DOIUrl":"10.1162/neco.a.30","url":null,"abstract":"Number sense, the ability to rapidly estimate object quantities in a visual scene without precise counting, is a crucial cognitive capacity found in humans and many other animals. Recent studies have identified artificial neurons tuned to numbers of items in biologically inspired vision models, even before training, and proposed these artificial neural networks as candidate models for the emergence of number sense in the brain. But real-world numerosity perception requires abstraction from the properties of individual objects and their contexts, unlike the simplified dot patterns used in previous studies. Using novel synthetically generated photorealistic stimuli, we show that deep convolutional neural networks optimized for object recognition encode information on approximate numerosity across diverse objects and scene types, which could be linearly read out from distributed activity patterns of later convolutional layers of different network architectures tested. In contrast, untrained networks with random weights failed to represent numerosity with abstractness to other visual properties and instead captured mainly low-level visual features. Our findings emphasize the importance of using complex, naturalistic stimuli to investigate mechanisms of number sense in both biological and artificial systems, and they suggest that the capacity of untrained networks to account for early-life numerical abilities should be reassessed. They further point to a possible, so far underappreciated, contribution of the brain's ventral visual pathway to representing numerosity with abstractness to other high-level visual properties.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 11","pages":"1975-2010"},"PeriodicalIF":2.1,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11210824","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144857070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neurons process sensory stimuli efficiently, showing sparse yet highly variable ensemble spiking activity involving structured higher-order interactions. Notably, while neural populations are mostly silent, they occasionally exhibit highly synchronous activity, resulting in sparse and heavy-tailed spike-count distributions. However, its mechanistic origin—specifically, what types of nonlinear properties in individual neurons induce such population-level patterns—remains unclear. In this study, we derive sufficient conditions under which the joint activity of homogeneous binary neurons generates sparse and widespread population firing rate distributions in infinitely large networks. We then propose a subclass of exponential family distributions that satisfy this condition. This class incorporates structured higher-order interactions with alternating signs and shrinking magnitudes, along with a base-measure function that offsets distributional concentration, giving rise to parameter-dependent sparsity and heavy-tailed population firing rate distributions. Analysis of recurrent neural networks that recapitulate these distributions reveals that individual neurons possess threshold-like nonlinearity, followed by supralinear activation that jointly facilitates sparse and synchronous population activity. These nonlinear features resemble those in modern Hopfield networks, suggesting a connection between widespread population activity and the network’s memory capacity. The theory establishes sparse and heavy-tailed distributions for binary patterns, forming a foundation for developing energy-efficient spike-based learning machines.
{"title":"Modeling Higher-Order Interactions in Sparse and Heavy-Tailed Neural Population Activity","authors":"Ulises Rodríguez-Domínguez;Hideaki Shimazaki","doi":"10.1162/neco.a.35","DOIUrl":"10.1162/neco.a.35","url":null,"abstract":"Neurons process sensory stimuli efficiently, showing sparse yet highly variable ensemble spiking activity involving structured higher-order interactions. Notably, while neural populations are mostly silent, they occasionally exhibit highly synchronous activity, resulting in sparse and heavy-tailed spike-count distributions. However, its mechanistic origin—specifically, what types of nonlinear properties in individual neurons induce such population-level patterns—remains unclear. In this study, we derive sufficient conditions under which the joint activity of homogeneous binary neurons generates sparse and widespread population firing rate distributions in infinitely large networks. We then propose a subclass of exponential family distributions that satisfy this condition. This class incorporates structured higher-order interactions with alternating signs and shrinking magnitudes, along with a base-measure function that offsets distributional concentration, giving rise to parameter-dependent sparsity and heavy-tailed population firing rate distributions. Analysis of recurrent neural networks that recapitulate these distributions reveals that individual neurons possess threshold-like nonlinearity, followed by supralinear activation that jointly facilitates sparse and synchronous population activity. These nonlinear features resemble those in modern Hopfield networks, suggesting a connection between widespread population activity and the network’s memory capacity. The theory establishes sparse and heavy-tailed distributions for binary patterns, forming a foundation for developing energy-efficient spike-based learning machines.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 11","pages":"2011-2078"},"PeriodicalIF":2.1,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11210441","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}