Jérôme Emonet;Selma Souihel;Frédéric Chavane;Alain Destexhe;Matteo di Volo;Bruno Cessac
We propose a mean field model of the primary visual cortex (V1), connected to a realistic retina model, to study the impact of the retina on motion anticipation. We first consider the case where the retina does not itself provide anticipation—which is then only triggered by a cortical mechanism, the “anticipation by latency”—and unravel the effects of the retinal input amplitude, of stimulus features such as speed and contrast and of the size of cortical extensions and fiber conduction speed. Then we explore the changes in the cortical wave of anticipation when V1 is triggered by retina-driven anticipatory mechanisms: gain control and lateral inhibition by amacrine cells. Here, we show how retinal and cortical anticipation combine to provide an efficient processing where the simulated cortical response is in advance over the moving object that triggers this response, compensating the delays in visual processing.
{"title":"A Chimera Model for Motion Anticipation in the Retina and the Primary Visual Cortex","authors":"Jérôme Emonet;Selma Souihel;Frédéric Chavane;Alain Destexhe;Matteo di Volo;Bruno Cessac","doi":"10.1162/neco.a.34","DOIUrl":"10.1162/neco.a.34","url":null,"abstract":"We propose a mean field model of the primary visual cortex (V1), connected to a realistic retina model, to study the impact of the retina on motion anticipation. We first consider the case where the retina does not itself provide anticipation—which is then only triggered by a cortical mechanism, the “anticipation by latency”—and unravel the effects of the retinal input amplitude, of stimulus features such as speed and contrast and of the size of cortical extensions and fiber conduction speed. Then we explore the changes in the cortical wave of anticipation when V1 is triggered by retina-driven anticipatory mechanisms: gain control and lateral inhibition by amacrine cells. Here, we show how retinal and cortical anticipation combine to provide an efficient processing where the simulated cortical response is in advance over the moving object that triggers this response, compensating the delays in visual processing.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 11","pages":"1925-1974"},"PeriodicalIF":2.1,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Contrastive learning is a self-supervised representation learning framework where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Noncontrastive learning, represented by BYOL and SimSiam, gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong regularization may still collapse the dynamics, an unnatural behavior under the presence of feature normalization. Therefore, we extend the previous theory based on the L2 loss by considering the cosine loss instead, which involves feature normalization. We show that the cosine loss induces sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there are only collapsed solutions with given initial parameters. Thus, we offer a new understanding that feature normalization plays an important role in robustly preventing the dynamics collapse.
{"title":"Feature Normalization Prevents Collapse of Noncontrastive Learning Dynamics","authors":"Han Bao","doi":"10.1162/neco.a.27","DOIUrl":"10.1162/neco.a.27","url":null,"abstract":"Contrastive learning is a self-supervised representation learning framework where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Noncontrastive learning, represented by BYOL and SimSiam, gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong regularization may still collapse the dynamics, an unnatural behavior under the presence of feature normalization. Therefore, we extend the previous theory based on the L2 loss by considering the cosine loss instead, which involves feature normalization. We show that the cosine loss induces sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there are only collapsed solutions with given initial parameters. Thus, we offer a new understanding that feature normalization plays an important role in robustly preventing the dynamics collapse.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 11","pages":"2079-2124"},"PeriodicalIF":2.1,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144857071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Firing rate models are dynamical systems widely used in applied and theoretical neuroscience to describe local cortical dynamics in neuronal populations. By providing a macroscopic perspective of neuronal activity, these models are essential for investigating oscillatory phenomena, chaotic behavior, and associative memory processes. Despite their widespread use, the application of firing rate models to associative memory networks has received limited mathematical exploration, and most existing studies are focused on specific models. Conversely, well-established associative memory designs, such as Hopfield networks, lack key biologically relevant features intrinsic to firing rate models, including positivity and interpretable synaptic matrices reflecting the action of long-term potentiation and long-term depression. To address this gap, we propose a general framework that ensures the emergence of rescaled memory patterns as stable equilibria in the firing rate dynamics. Furthermore, we analyze the conditions under which the memories are locally and globally asymptotically stable, providing insights into constructing biologically plausible and robust systems for associative memory retrieval.
{"title":"Firing Rate Models as Associative Memory: Synaptic Design for Robust Retrieval","authors":"Simone Betteti;Giacomo Baggio;Francesco Bullo;Sandro Zampieri","doi":"10.1162/neco.a.28","DOIUrl":"10.1162/neco.a.28","url":null,"abstract":"Firing rate models are dynamical systems widely used in applied and theoretical neuroscience to describe local cortical dynamics in neuronal populations. By providing a macroscopic perspective of neuronal activity, these models are essential for investigating oscillatory phenomena, chaotic behavior, and associative memory processes. Despite their widespread use, the application of firing rate models to associative memory networks has received limited mathematical exploration, and most existing studies are focused on specific models. Conversely, well-established associative memory designs, such as Hopfield networks, lack key biologically relevant features intrinsic to firing rate models, including positivity and interpretable synaptic matrices reflecting the action of long-term potentiation and long-term depression. To address this gap, we propose a general framework that ensures the emergence of rescaled memory patterns as stable equilibria in the firing rate dynamics. Furthermore, we analyze the conditions under which the memories are locally and globally asymptotically stable, providing insights into constructing biologically plausible and robust systems for associative memory retrieval.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 10","pages":"1807-1838"},"PeriodicalIF":2.1,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144857072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William Turner;Oh-Sang Kwon;Minwoo J.B. Kim;Hinze Hogendoorn
A striking perceptual phenomenon has recently been described wherein people report seeing abrupt jumps in the location of a smoothly moving object (“position resets”). Here, we show that this phenomenon can be understood within the framework of recursive Bayesian estimation as arising from transient gain changes, temporarily prioritizing sensory input over predictive beliefs. From this perspective, position resets reveal a capacity for rapid adaptive precision weighting in human visual perception and offer a possible test bed within which to study the timing and flexibility of sensory gain control.
{"title":"Rapid Reweighting of Sensory Inputs and Predictions in Visual Perception","authors":"William Turner;Oh-Sang Kwon;Minwoo J.B. Kim;Hinze Hogendoorn","doi":"10.1162/neco.a.26","DOIUrl":"10.1162/neco.a.26","url":null,"abstract":"A striking perceptual phenomenon has recently been described wherein people report seeing abrupt jumps in the location of a smoothly moving object (“position resets”). Here, we show that this phenomenon can be understood within the framework of recursive Bayesian estimation as arising from transient gain changes, temporarily prioritizing sensory input over predictive beliefs. From this perspective, position resets reveal a capacity for rapid adaptive precision weighting in human visual perception and offer a possible test bed within which to study the timing and flexibility of sensory gain control.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 10","pages":"1853-1862"},"PeriodicalIF":2.1,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144857073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sequential learning involves learning tasks in a sequence and proves challenging for most neural networks. Biological neural networks regularly succeed at the sequential learning challenge and are even capable of transferring knowledge both forward and backward between tasks. Artificial neural networks often totally fail to transfer performance between tasks and regularly suffer from degraded performance or catastrophic forgetting on previous tasks. Models of associative memory have been used to investigate the discrepancy between biological and artificial neural networks due to their biological ties and inspirations, of which the Hopfield network is the most studied model. The dense associative memory (DAM), or modern Hopfield network, generalizes the Hopfield network, allowing for greater capacities and prototype learning behaviors while still retaining the associative memory structure. We give a substantial review of the sequential learning space with particular respect to the Hopfield network and associative memories. We present the first published benchmarks of sequential learning in the DAM using various sequential learning techniques and analyze the results of the sequential learning to demonstrate previously unseen transitions in the behavior of the DAM. This letter also discusses the departure from biological plausibility that may affect the utility of the DAM as a tool for studying biological neural networks. We present our findings, including the effectiveness of a range of state-of-the-art sequential learning methods when applied to the DAM, and use these methods to further the understanding of DAM properties and behaviors.
{"title":"Sequential Learning in the Dense Associative Memory","authors":"Hayden McAlister;Anthony Robins;Lech Szymanski","doi":"10.1162/neco.a.20","DOIUrl":"10.1162/neco.a.20","url":null,"abstract":"Sequential learning involves learning tasks in a sequence and proves challenging for most neural networks. Biological neural networks regularly succeed at the sequential learning challenge and are even capable of transferring knowledge both forward and backward between tasks. Artificial neural networks often totally fail to transfer performance between tasks and regularly suffer from degraded performance or catastrophic forgetting on previous tasks. Models of associative memory have been used to investigate the discrepancy between biological and artificial neural networks due to their biological ties and inspirations, of which the Hopfield network is the most studied model. The dense associative memory (DAM), or modern Hopfield network, generalizes the Hopfield network, allowing for greater capacities and prototype learning behaviors while still retaining the associative memory structure. We give a substantial review of the sequential learning space with particular respect to the Hopfield network and associative memories. We present the first published benchmarks of sequential learning in the DAM using various sequential learning techniques and analyze the results of the sequential learning to demonstrate previously unseen transitions in the behavior of the DAM. This letter also discusses the departure from biological plausibility that may affect the utility of the DAM as a tool for studying biological neural networks. We present our findings, including the effectiveness of a range of state-of-the-art sequential learning methods when applied to the DAM, and use these methods to further the understanding of DAM properties and behaviors.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 10","pages":"1877-1924"},"PeriodicalIF":2.1,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144700385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The remarkable success of the transformer machine learning architecture for processing language sequences far exceeds the performance of classical signal processing methods. A unique component of transformer models is the scaled dot-product attention (SDPA) layer, which does not appear to have an analog in prior signal processing algorithms. Here, we show that SDPA operates using a novel principle that projects the current state estimate onto the space spanned by prior estimates. We show that SDPA, when used for causal recursive state estimation, implements constrained state estimation in circumstances where the constraint is unknown and may be time varying. Since constraints in high-dimensional space may represent the complex relationships that define nonlinear signals and models, this suggests that the SDPA layer and transformer models leverage constrained estimation to achieve their success. This also suggests that transformers and the SPDA layer could be a computational model for previously unexplained capabilities of human behavior.
{"title":"Transformer Models for Signal Processing: Scaled Dot-Product Attention Implements Constrained Filtering","authors":"Terence D. Sanger","doi":"10.1162/neco.a.29","DOIUrl":"10.1162/neco.a.29","url":null,"abstract":"The remarkable success of the transformer machine learning architecture for processing language sequences far exceeds the performance of classical signal processing methods. A unique component of transformer models is the scaled dot-product attention (SDPA) layer, which does not appear to have an analog in prior signal processing algorithms. Here, we show that SDPA operates using a novel principle that projects the current state estimate onto the space spanned by prior estimates. We show that SDPA, when used for causal recursive state estimation, implements constrained state estimation in circumstances where the constraint is unknown and may be time varying. Since constraints in high-dimensional space may represent the complex relationships that define nonlinear signals and models, this suggests that the SDPA layer and transformer models leverage constrained estimation to achieve their success. This also suggests that transformers and the SPDA layer could be a computational model for previously unexplained capabilities of human behavior.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 10","pages":"1839-1852"},"PeriodicalIF":2.1,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144857074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matrix factorization is a central paradigm in matrix completion and collaborative filtering. Low-rank factorizations have been extremely successful in reconstructing and generalizing high-dimensional data in a wide variety of machine learning problems from drug-target discovery to music recommendations. Virtually all proposed matrix factorization techniques use the dot product between latent factor vectors to reconstruct the original matrix. We propose a reformulation of the widely used logistic matrix factorization in which we use the distance, rather than the dot product, to measure similarity between latent factors. We show that this measure of similarity, which can draw nonlinear decision boundaries and respect triangle inequalities between points, has more expressive power and modeling capacity. The distance-based model implemented in Euclidean and hyperbolic space outperforms previous formulations of logistic matrix factorization on three different biological test problems with disparate structure and statistics. In particular, we show that a distance-based factorization (1) generalizes better to test data, (2) achieves optimal performance at lower factor space dimension, and (3) clusters data better in the latent factor space.
{"title":"Distance-Based Logistic Matrix Factorization","authors":"Anoop Praturu;Tatyana O. Sharpee","doi":"10.1162/neco.a.25","DOIUrl":"10.1162/neco.a.25","url":null,"abstract":"Matrix factorization is a central paradigm in matrix completion and collaborative filtering. Low-rank factorizations have been extremely successful in reconstructing and generalizing high-dimensional data in a wide variety of machine learning problems from drug-target discovery to music recommendations. Virtually all proposed matrix factorization techniques use the dot product between latent factor vectors to reconstruct the original matrix. We propose a reformulation of the widely used logistic matrix factorization in which we use the distance, rather than the dot product, to measure similarity between latent factors. We show that this measure of similarity, which can draw nonlinear decision boundaries and respect triangle inequalities between points, has more expressive power and modeling capacity. The distance-based model implemented in Euclidean and hyperbolic space outperforms previous formulations of logistic matrix factorization on three different biological test problems with disparate structure and statistics. In particular, we show that a distance-based factorization (1) generalizes better to test data, (2) achieves optimal performance at lower factor space dimension, and (3) clusters data better in the latent factor space.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 10","pages":"1863-1876"},"PeriodicalIF":2.1,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144857068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yorie Nakahira;Quanying Liu;Xiyu Deng;Terrence J. Sejnowski;John C. Doyle
Human sensorimotor control is remarkably fast and accurate at the system level despite severe speed-accuracy trade-offs at the component level. The discrepancy between the contrasting speed-accuracy trade-offs at these two levels is a paradox. Meanwhile, speed accuracy trade-offs, heterogeneity, and layered architectures are ubiquitous in nerves, skeletons, and muscles, but they have only been studied in isolation using domain-specific models. In this article, we develop a mechanistic model for how component speed-accuracy trade-offs constrain sensorimotor control that is consistent with Fitts’ law for reaching. The model suggests that diversity among components deconstrains the limitations of individual components in sensorimotor control. Such diversity-enabled sweet spots (DESSs) are ubiquitous in nature, explaining why large heterogeneities exist in the components of biological systems and how natural selection routinely evolves systems with fast and accurate responses using imperfect components.
{"title":"Diversity Deconstrains Component Limitations in Sensorimotor Control","authors":"Yorie Nakahira;Quanying Liu;Xiyu Deng;Terrence J. Sejnowski;John C. Doyle","doi":"10.1162/neco.a.24","DOIUrl":"10.1162/neco.a.24","url":null,"abstract":"Human sensorimotor control is remarkably fast and accurate at the system level despite severe speed-accuracy trade-offs at the component level. The discrepancy between the contrasting speed-accuracy trade-offs at these two levels is a paradox. Meanwhile, speed accuracy trade-offs, heterogeneity, and layered architectures are ubiquitous in nerves, skeletons, and muscles, but they have only been studied in isolation using domain-specific models. In this article, we develop a mechanistic model for how component speed-accuracy trade-offs constrain sensorimotor control that is consistent with Fitts’ law for reaching. The model suggests that diversity among components deconstrains the limitations of individual components in sensorimotor control. Such diversity-enabled sweet spots (DESSs) are ubiquitous in nature, explaining why large heterogeneities exist in the components of biological systems and how natural selection routinely evolves systems with fast and accurate responses using imperfect components.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 10","pages":"1783-1806"},"PeriodicalIF":2.1,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144857069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Evren Gokcen;Anna I. Jasper;Adam Kohn;Christian K. Machens;Byron M. Yu
Gaussian processes are now commonly used in dimensionality reduction approaches tailored to neuroscience, especially to describe changes in high-dimensional neural activity over time. As recording capabilities expand to include neuronal populations across multiple brain areas, cortical layers, and cell types, interest in extending gaussian process factor models to characterize multipopulation interactions has grown. However, the cubic runtime scaling of current methods with the length of experimental trials and the number of recorded populations (groups) precludes their application to large-scale multipopulation recordings. Here, we improve this scaling from cubic to linear in both trial length and group number. We present two approximate approaches to fitting multigroup gaussian process factor models based on inducing variables and the frequency domain. Empirically, both methods achieved orders of magnitude speed-up with minimal impact on statistical performance, in simulation and on neural recordings of hundreds of neurons across three brain areas. The frequency domain approach, in particular, consistently provided the greatest runtime benefits with the fewest trade-offs in statistical performance. We further characterize the estimation biases introduced by the frequency domain approach and demonstrate effective strategies to mitigate them. This work enables a powerful class of analysis techniques to keep pace with the growing scale of multipopulation recordings, opening new avenues for exploring brain function.
{"title":"Fast Multigroup Gaussian Process Factor Models","authors":"Evren Gokcen;Anna I. Jasper;Adam Kohn;Christian K. Machens;Byron M. Yu","doi":"10.1162/neco.a.22","DOIUrl":"10.1162/neco.a.22","url":null,"abstract":"Gaussian processes are now commonly used in dimensionality reduction approaches tailored to neuroscience, especially to describe changes in high-dimensional neural activity over time. As recording capabilities expand to include neuronal populations across multiple brain areas, cortical layers, and cell types, interest in extending gaussian process factor models to characterize multipopulation interactions has grown. However, the cubic runtime scaling of current methods with the length of experimental trials and the number of recorded populations (groups) precludes their application to large-scale multipopulation recordings. Here, we improve this scaling from cubic to linear in both trial length and group number. We present two approximate approaches to fitting multigroup gaussian process factor models based on inducing variables and the frequency domain. Empirically, both methods achieved orders of magnitude speed-up with minimal impact on statistical performance, in simulation and on neural recordings of hundreds of neurons across three brain areas. The frequency domain approach, in particular, consistently provided the greatest runtime benefits with the fewest trade-offs in statistical performance. We further characterize the estimation biases introduced by the frequency domain approach and demonstrate effective strategies to mitigate them. This work enables a powerful class of analysis techniques to keep pace with the growing scale of multipopulation recordings, opening new avenues for exploring brain function.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 9","pages":"1709-1782"},"PeriodicalIF":2.1,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144700382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Convolutional neural networks (CNNs) are reported to be overparametrized. The search for optimal (minimal) and sufficient architecture is an NP-hard problem: if the network has N neurons, then there are 2N possibilities to connect them—and therefore 2N possible architectures and 2N Boolean hyperparameters to encode them. Selecting the best possible hyperparameter out of them becomes an Np-hard problem since 2N grows in N faster then any polynomial Np. Here, we introduce a layer-by-layer data-driven pruning method based on the mathematical idea aiming at a computationally scalable entropic relaxation of the pruning problem. The sparse subnetwork is found from the pretrained (full) CNN using the network entropy minimization as a sparsity constraint. This allows deploying a numerically scalable algorithm with a sublinear scaling cost. The method is validated on several benchmarks (architectures): on MNIST (LeNet), resulting in sparsity of 55% to 84% and loss in accuracy of just 0.1% to 0.5%, and on CIFAR-10 (VGG-16, ResNet18), resulting in sparsity of 73% to 89% and loss in accuracy of 0.1% to 0.5%.
{"title":"Toward Generalized Entropic Sparsification for Convolutional Neural Networks","authors":"Tin Barisin;Illia Horenko","doi":"10.1162/neco.a.21","DOIUrl":"10.1162/neco.a.21","url":null,"abstract":"Convolutional neural networks (CNNs) are reported to be overparametrized. The search for optimal (minimal) and sufficient architecture is an NP-hard problem: if the network has N neurons, then there are 2N possibilities to connect them—and therefore 2N possible architectures and 2N Boolean hyperparameters to encode them. Selecting the best possible hyperparameter out of them becomes an Np-hard problem since 2N grows in N faster then any polynomial Np. Here, we introduce a layer-by-layer data-driven pruning method based on the mathematical idea aiming at a computationally scalable entropic relaxation of the pruning problem. The sparse subnetwork is found from the pretrained (full) CNN using the network entropy minimization as a sparsity constraint. This allows deploying a numerically scalable algorithm with a sublinear scaling cost. The method is validated on several benchmarks (architectures): on MNIST (LeNet), resulting in sparsity of 55% to 84% and loss in accuracy of just 0.1% to 0.5%, and on CIFAR-10 (VGG-16, ResNet18), resulting in sparsity of 73% to 89% and loss in accuracy of 0.1% to 0.5%.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 9","pages":"1648-1676"},"PeriodicalIF":2.1,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144700387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}