Small data learning problems are characterized by a significant discrepancy between the limited number of response variable observations and the large feature space dimension. In this setting, the common learning tools struggle to identify the features important for the classification task from those that bear no relevant information and cannot derive an appropriate learning rule that allows discriminating among different classes. As a potential solution to this problem, here we exploit the idea of reducing and rotating the feature space in a lower-dimensional gauge and propose the gauge-optimal approximate learning (GOAL) algorithm, which provides an analytically tractable joint solution to the dimension reduction, feature segmentation, and classification problems for small data learning problems. We prove that the optimal solution of the GOAL algorithm consists in piecewise-linear functions in the Euclidean space and that it can be approximated through a monotonically convergent algorithm that presents—under the assumption of a discrete segmentation of the feature space—a closed-form solution for each optimization substep and an overall linear iteration cost scaling. The GOAL algorithm has been compared to other state-of-the-art machine learning tools on both synthetic data and challenging real-world applications from climate science and bioinformatics (i.e., prediction of the El Niño Southern Oscillation and inference of epigenetically induced gene-activity networks from limited experimental data). The experimental results show that the proposed algorithm outperforms the reported best competitors for these problems in both learning performance and computational cost.
{"title":"Gauge-Optimal Approximate Learning for Small Data Classification","authors":"Edoardo Vecchi;Davide Bassetti;Fabio Graziato;Lukáš Pospíšil;Illia Horenko","doi":"10.1162/neco_a_01664","DOIUrl":"10.1162/neco_a_01664","url":null,"abstract":"Small data learning problems are characterized by a significant discrepancy between the limited number of response variable observations and the large feature space dimension. In this setting, the common learning tools struggle to identify the features important for the classification task from those that bear no relevant information and cannot derive an appropriate learning rule that allows discriminating among different classes. As a potential solution to this problem, here we exploit the idea of reducing and rotating the feature space in a lower-dimensional gauge and propose the gauge-optimal approximate learning (GOAL) algorithm, which provides an analytically tractable joint solution to the dimension reduction, feature segmentation, and classification problems for small data learning problems. We prove that the optimal solution of the GOAL algorithm consists in piecewise-linear functions in the Euclidean space and that it can be approximated through a monotonically convergent algorithm that presents—under the assumption of a discrete segmentation of the feature space—a closed-form solution for each optimization substep and an overall linear iteration cost scaling. The GOAL algorithm has been compared to other state-of-the-art machine learning tools on both synthetic data and challenging real-world applications from climate science and bioinformatics (i.e., prediction of the El Niño Southern Oscillation and inference of epigenetically induced gene-activity networks from limited experimental data). The experimental results show that the proposed algorithm outperforms the reported best competitors for these problems in both learning performance and computational cost.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 6","pages":"1198-1227"},"PeriodicalIF":2.7,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140805847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose and analyze a continuous-time firing-rate neural network, the positive firing-rate competitive network (PFCN), to tackle sparse reconstruction problems with non-negativity constraints. These problems, which involve approximating a given input stimulus from a dictionary using a set of sparse (active) neurons, play a key role in a wide range of domains, including, for example, neuroscience, signal processing, and machine learning. First, by leveraging the theory of proximal operators, we relate the equilibria of a family of continuous-time firing-rate neural networks to the optimal solutions of sparse reconstruction problems. Then we prove that the PFCN is a positive system and give rigorous conditions for the convergence to the equilibrium. Specifically, we show that the convergence depends only on a property of the dictionary and is linear-exponential in the sense that initially, the convergence rate is at worst linear and then, after a transient, becomes exponential. We also prove a number of technical results to assess the contractivity properties of the neural dynamics of interest. Our analysis leverages contraction theory to characterize the behavior of a family of firing-rate competitive networks for sparse reconstruction with and without non-negativity constraints. Finally, we validate the effectiveness of our approach via a numerical example.
{"title":"Positive Competitive Networks for Sparse Reconstruction","authors":"Veronica Centorrino;Anand Gokhale;Alexander Davydov;Giovanni Russo;Francesco Bullo","doi":"10.1162/neco_a_01657","DOIUrl":"10.1162/neco_a_01657","url":null,"abstract":"We propose and analyze a continuous-time firing-rate neural network, the positive firing-rate competitive network (PFCN), to tackle sparse reconstruction problems with non-negativity constraints. These problems, which involve approximating a given input stimulus from a dictionary using a set of sparse (active) neurons, play a key role in a wide range of domains, including, for example, neuroscience, signal processing, and machine learning. First, by leveraging the theory of proximal operators, we relate the equilibria of a family of continuous-time firing-rate neural networks to the optimal solutions of sparse reconstruction problems. Then we prove that the PFCN is a positive system and give rigorous conditions for the convergence to the equilibrium. Specifically, we show that the convergence depends only on a property of the dictionary and is linear-exponential in the sense that initially, the convergence rate is at worst linear and then, after a transient, becomes exponential. We also prove a number of technical results to assess the contractivity properties of the neural dynamics of interest. Our analysis leverages contraction theory to characterize the behavior of a family of firing-rate competitive networks for sparse reconstruction with and without non-negativity constraints. Finally, we validate the effectiveness of our approach via a numerical example.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 6","pages":"1163-1197"},"PeriodicalIF":2.7,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140805846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning (DL), a variant of the neural network algorithms originally proposed in the 1980s (Rumelhart et al., 1986), has made surprising progress in artificial intelligence (AI), ranging from language translation, protein folding (Jumper et al., 2021), autonomous cars, and, more recently, human-like language models (chatbots). All that seemed intractable until very recently. Despite the growing use of DL networks, little is understood about the learning mechanisms and representations that make these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and, of course, the large scale of the data, since not much has changed since 1986. But the nature of deep learned representations remains largely unknown. Unfortunately, training sets with millions or billions of tokens have unknown combinatorics, and networks with millions or billions of hidden units can't easily be visualized and their mechanisms can't be easily revealed. In this letter, we explore these challenges with a large (1.24 million weights VGG) DL in a novel high-density sample task (five unique tokens with more than 500 exemplars per token), which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrapping. From these results, we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results.
{"title":"Dense Sample Deep Learning","authors":"Stephen José Hanson;Vivek Yadav;Catherine Hanson","doi":"10.1162/neco_a_01666","DOIUrl":"10.1162/neco_a_01666","url":null,"abstract":"Deep learning (DL), a variant of the neural network algorithms originally proposed in the 1980s (Rumelhart et al., 1986), has made surprising progress in artificial intelligence (AI), ranging from language translation, protein folding (Jumper et al., 2021), autonomous cars, and, more recently, human-like language models (chatbots). All that seemed intractable until very recently. Despite the growing use of DL networks, little is understood about the learning mechanisms and representations that make these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and, of course, the large scale of the data, since not much has changed since 1986. But the nature of deep learned representations remains largely unknown. Unfortunately, training sets with millions or billions of tokens have unknown combinatorics, and networks with millions or billions of hidden units can't easily be visualized and their mechanisms can't be easily revealed. In this letter, we explore these challenges with a large (1.24 million weights VGG) DL in a novel high-density sample task (five unique tokens with more than 500 exemplars per token), which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrapping. From these results, we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 6","pages":"1228-1244"},"PeriodicalIF":2.7,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10661260","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140805850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyperdimensional computing (HDC) is an emerging computational paradigm for representing compositional information as high-dimensional vectors and has a promising potential in applications ranging from machine learning to neuromorphic computing. One of the long-standing challenges in HDC is factoring a compositional representation to its constituent factors, also known as the recovery problem. In this article, we take a novel approach to solve the recovery problem and propose the use of random linear codes. These codes are subspaces over the Boolean field and are a well-studied topic in information theory with various applications in digital communication. We begin by showing that hyperdimensional encoding using random linear codes retains favorable properties of the prevalent (ordinary) random codes; hence, HD representations using the two methods have comparable information storage capabilities. We proceed to show that random linear codes offer a rich subcode structure that can be used to form key-value stores, which encapsulate the most used cases of HDC. Most important, we show that under the framework we develop, random linear codes admit simple recovery algorithms to factor (either bundled or bound) compositional representations. The former relies on constructing certain linear equation systems over the Boolean field, the solution to which reduces the search space dramatically and strictly outperforms exhaustive search in many cases. The latter employs the subspace structure of these codes to achieve provably correct factorization. Both methods are strictly faster than the state-of-the-art resonator networks, often by an order of magnitude. We implemented our techniques in Python using a benchmark software library and demonstrated promising experimental results.
{"title":"Linear Codes for Hyperdimensional Computing","authors":"Netanel Raviv","doi":"10.1162/neco_a_01665","DOIUrl":"10.1162/neco_a_01665","url":null,"abstract":"Hyperdimensional computing (HDC) is an emerging computational paradigm for representing compositional information as high-dimensional vectors and has a promising potential in applications ranging from machine learning to neuromorphic computing. One of the long-standing challenges in HDC is factoring a compositional representation to its constituent factors, also known as the recovery problem. In this article, we take a novel approach to solve the recovery problem and propose the use of random linear codes. These codes are subspaces over the Boolean field and are a well-studied topic in information theory with various applications in digital communication. We begin by showing that hyperdimensional encoding using random linear codes retains favorable properties of the prevalent (ordinary) random codes; hence, HD representations using the two methods have comparable information storage capabilities. We proceed to show that random linear codes offer a rich subcode structure that can be used to form key-value stores, which encapsulate the most used cases of HDC. Most important, we show that under the framework we develop, random linear codes admit simple recovery algorithms to factor (either bundled or bound) compositional representations. The former relies on constructing certain linear equation systems over the Boolean field, the solution to which reduces the search space dramatically and strictly outperforms exhaustive search in many cases. The latter employs the subspace structure of these codes to achieve provably correct factorization. Both methods are strictly faster than the state-of-the-art resonator networks, often by an order of magnitude. We implemented our techniques in Python using a benchmark software library and demonstrated promising experimental results.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 6","pages":"1084-1120"},"PeriodicalIF":2.7,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140806119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distinct neural processes such as sensory and memory processes are often encoded over distinct timescales of neural activations. Animal studies have shown that this multiscale coding strategy is also implemented for individual components of a single process, such as individual features of a multifeature stimulus in sensory coding. However, the generalizability of this encoding strategy to the human brain has remained unclear. We asked if individual features of visual stimuli were encoded over distinct timescales. We applied a multiscale time-resolved decoding method to electroencephalography (EEG) collected from human subjects presented with grating visual stimuli to estimate the timescale of individual stimulus features. We observed that the orientation and color of the stimuli were encoded in shorter timescales, whereas spatial frequency and the contrast of the same stimuli were encoded in longer timescales. The stimulus features appeared in temporally overlapping windows along the trial supporting a multiplexed coding strategy. These results provide evidence for a multiplexed, multiscale coding strategy in the human visual system.
{"title":"Evidence for Multiscale Multiplexed Representation of Visual Features in EEG","authors":"Hamid Karimi-Rouzbahani","doi":"10.1162/neco_a_01649","DOIUrl":"10.1162/neco_a_01649","url":null,"abstract":"Distinct neural processes such as sensory and memory processes are often encoded over distinct timescales of neural activations. Animal studies have shown that this multiscale coding strategy is also implemented for individual components of a single process, such as individual features of a multifeature stimulus in sensory coding. However, the generalizability of this encoding strategy to the human brain has remained unclear. We asked if individual features of visual stimuli were encoded over distinct timescales. We applied a multiscale time-resolved decoding method to electroencephalography (EEG) collected from human subjects presented with grating visual stimuli to estimate the timescale of individual stimulus features. We observed that the orientation and color of the stimuli were encoded in shorter timescales, whereas spatial frequency and the contrast of the same stimuli were encoded in longer timescales. The stimulus features appeared in temporally overlapping windows along the trial supporting a multiplexed coding strategy. These results provide evidence for a multiplexed, multiscale coding strategy in the human visual system.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 3","pages":"412-436"},"PeriodicalIF":2.9,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139747702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Claus Metzner;Marius E. Yamakou;Dennis Voelkl;Achim Schilling;Patrick Krauss
Free-running recurrent neural networks (RNNs), especially probabilistic models, generate an ongoing information flux that can be quantified with the mutual information I[x→(t),x→(t+1)] between subsequent system states x→. Although previous studies have shown that I depends on the statistics of the network's connection weights, it is unclear how to maximize I systematically and how to quantify the flux in large systems where computing the mutual information becomes intractable. Here, we address these questions using Boltzmann machines as model systems. We find that in networks with moderately strong connections, the mutual information I is approximately a monotonic transformation of the root-mean-square averaged Pearson correlations between neuron pairs, a quantity that can be efficiently computed even in large systems. Furthermore, evolutionary maximization of I[x→(t),x→(t+1)] reveals a general design principle for the weight matrices enabling the systematic construction of systems with a high spontaneous information flux. Finally, we simultaneously maximize information flux and the mean period length of cyclic attractors in the state-space of these dynamical networks. Our results are potentially useful for the construction of RNNs that serve as short-time memories or pattern generators.
自由运行的循环神经网络(RNN),尤其是概率模型,会产生持续的信息通量,可以用后续系统状态 x→ 之间的互信息 I[x→(t),x→(t+1)]来量化。尽管之前的研究表明 I 取决于网络连接权重的统计量,但目前还不清楚如何系统地最大化 I,以及如何量化大型系统中的通量,因为在大型系统中计算互信息变得非常困难。在这里,我们使用玻尔兹曼机作为模型系统来解决这些问题。我们发现,在具有中等强度连接的网络中,互信息 I 近似于神经元对之间的均方根平均皮尔逊相关性的单调变换,即使在大型系统中也能高效计算。此外,I[x→(t),x→(t+1)]的进化最大化揭示了权重矩阵的一般设计原则,从而能够系统地构建具有高自发信息通量的系统。最后,我们在这些动力学网络的状态空间中同时最大化了信息通量和循环吸引子的平均周期长度。我们的研究成果对构建作为短时记忆或模式发生器的 RNNs 有潜在的帮助。
{"title":"Quantifying and Maximizing the Information Flux in Recurrent Neural Networks","authors":"Claus Metzner;Marius E. Yamakou;Dennis Voelkl;Achim Schilling;Patrick Krauss","doi":"10.1162/neco_a_01651","DOIUrl":"10.1162/neco_a_01651","url":null,"abstract":"Free-running recurrent neural networks (RNNs), especially probabilistic models, generate an ongoing information flux that can be quantified with the mutual information I[x→(t),x→(t+1)] between subsequent system states x→. Although previous studies have shown that I depends on the statistics of the network's connection weights, it is unclear how to maximize I systematically and how to quantify the flux in large systems where computing the mutual information becomes intractable. Here, we address these questions using Boltzmann machines as model systems. We find that in networks with moderately strong connections, the mutual information I is approximately a monotonic transformation of the root-mean-square averaged Pearson correlations between neuron pairs, a quantity that can be efficiently computed even in large systems. Furthermore, evolutionary maximization of I[x→(t),x→(t+1)] reveals a general design principle for the weight matrices enabling the systematic construction of systems with a high spontaneous information flux. Finally, we simultaneously maximize information flux and the mean period length of cyclic attractors in the state-space of these dynamical networks. Our results are potentially useful for the construction of RNNs that serve as short-time memories or pattern generators.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 3","pages":"351-384"},"PeriodicalIF":2.9,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139747774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Active learning seeks to reduce the amount of data required to fit the parameters of a model, thus forming an important class of techniques in modern machine learning. However, past work on active learning has largely overlooked latent variable models, which play a vital role in neuroscience, psychology, and a variety of other engineering and scientific disciplines. Here we address this gap by proposing a novel framework for maximum-mutual-information input selection for discrete latent variable regression models. We first apply our method to a class of models known as mixtures of linear regressions (MLR). While it is well known that active learning confers no advantage for linear-gaussian regression models, we use Fisher information to show analytically that active learning can nevertheless achieve large gains for mixtures of such models, and we validate this improvement using both simulations and real-world data. We then consider a powerful class of temporally structured latent variable models given by a hidden Markov model (HMM) with generalized linear model (GLM) observations, which has recently been used to identify discrete states from animal decision-making data. We show that our method substantially reduces the amount of data needed to fit GLM-HMMs and outperforms a variety of approximate methods based on variational and amortized inference. Infomax learning for latent variable models thus offers a powerful approach for characterizing temporally structured latent states, with a wide variety of applications in neuroscience and beyond.
{"title":"Active Learning for Discrete Latent Variable Models","authors":"Aditi Jha;Zoe C. Ashwood;Jonathan W. Pillow","doi":"10.1162/neco_a_01646","DOIUrl":"10.1162/neco_a_01646","url":null,"abstract":"Active learning seeks to reduce the amount of data required to fit the parameters of a model, thus forming an important class of techniques in modern machine learning. However, past work on active learning has largely overlooked latent variable models, which play a vital role in neuroscience, psychology, and a variety of other engineering and scientific disciplines. Here we address this gap by proposing a novel framework for maximum-mutual-information input selection for discrete latent variable regression models. We first apply our method to a class of models known as mixtures of linear regressions (MLR). While it is well known that active learning confers no advantage for linear-gaussian regression models, we use Fisher information to show analytically that active learning can nevertheless achieve large gains for mixtures of such models, and we validate this improvement using both simulations and real-world data. We then consider a powerful class of temporally structured latent variable models given by a hidden Markov model (HMM) with generalized linear model (GLM) observations, which has recently been used to identify discrete states from animal decision-making data. We show that our method substantially reduces the amount of data needed to fit GLM-HMMs and outperforms a variety of approximate methods based on variational and amortized inference. Infomax learning for latent variable models thus offers a powerful approach for characterizing temporally structured latent states, with a wide variety of applications in neuroscience and beyond.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 3","pages":"437-474"},"PeriodicalIF":2.9,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139747685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many cognitive functions are represented as cell assemblies. In the case of spatial navigation, the population activity of place cells in the hippocampus and grid cells in the entorhinal cortex represents self-location in the environment. The brain cannot directly observe self-location information in the environment. Instead, it relies on sensory information and memory to estimate self-location. Therefore, estimating low-dimensional dynamics, such as the movement trajectory of an animal exploring its environment, from only the high-dimensional neural activity is important in deciphering the information represented in the brain. Most previous studies have estimated the low-dimensional dynamics (i.e., latent variables) behind neural activity by unsupervised learning with Bayesian population decoding using artificial neural networks or gaussian processes. Recently, persistent cohomology has been used to estimate latent variables from the phase information (i.e., circular coordinates) of manifolds created by neural activity. However, the advantages of persistent cohomology over Bayesian population decoding are not well understood. We compared persistent cohomology and Bayesian population decoding in estimating the animal location from simulated and actual grid cell population activity. We found that persistent cohomology can estimate the animal location with fewer neurons than Bayesian population decoding and robustly estimate the animal location from actual noisy data.
{"title":"Advantages of Persistent Cohomology in Estimating Animal Location From Grid Cell Population Activity","authors":"Daisuke Kawahara;Shigeyoshi Fujisawa","doi":"10.1162/neco_a_01645","DOIUrl":"10.1162/neco_a_01645","url":null,"abstract":"Many cognitive functions are represented as cell assemblies. In the case of spatial navigation, the population activity of place cells in the hippocampus and grid cells in the entorhinal cortex represents self-location in the environment. The brain cannot directly observe self-location information in the environment. Instead, it relies on sensory information and memory to estimate self-location. Therefore, estimating low-dimensional dynamics, such as the movement trajectory of an animal exploring its environment, from only the high-dimensional neural activity is important in deciphering the information represented in the brain. Most previous studies have estimated the low-dimensional dynamics (i.e., latent variables) behind neural activity by unsupervised learning with Bayesian population decoding using artificial neural networks or gaussian processes. Recently, persistent cohomology has been used to estimate latent variables from the phase information (i.e., circular coordinates) of manifolds created by neural activity. However, the advantages of persistent cohomology over Bayesian population decoding are not well understood. We compared persistent cohomology and Bayesian population decoding in estimating the animal location from simulated and actual grid cell population activity. We found that persistent cohomology can estimate the animal location with fewer neurons than Bayesian population decoding and robustly estimate the animal location from actual noisy data.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 3","pages":"385-411"},"PeriodicalIF":2.9,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139747686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Errata to “A Tutorial on the Spectral Theory of Markov Chains” by Eddie Seabrook and Laurenz Wiskott (Neural Computation, November 2023, Vol. 35, No. 11, pp. 1713–1796, https://doi.org/10.1162/neco_a_01611)","authors":"","doi":"10.1162/neco_e_01662","DOIUrl":"10.1162/neco_e_01662","url":null,"abstract":"","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 3","pages":"499-500"},"PeriodicalIF":2.9,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139747701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, deep learning surrogates and neural operators have shown promise in solving partial differential equations (PDEs). However, they often require a large amount of training data and are limited to bounded domains. In this work, we present a novel physics-informed neural operator method to solve parameterized boundary value problems without labeled data. By reformulating the PDEs into boundary integral equations (BIEs), we can train the operator network solely on the boundary of the domain. This approach reduces the number of required sample points from O(Nd) to O(Nd-1), where d is the domain's dimension, leading to a significant acceleration of the training process. Additionally, our method can handle unbounded problems, which are unattainable for existing physics-informed neural networks (PINNs) and neural operators. Our numerical experiments show the effectiveness of parameterized complex geometries and unbounded problems.
最近,深度学习代理和神经算子在求解偏微分方程(PDE)方面大有可为。然而,它们往往需要大量的训练数据,而且仅限于有界域。在这项工作中,我们提出了一种新颖的物理信息神经算子方法,用于解决无标记数据的参数化边界值问题。通过将 PDE 重新表述为边界积分方程 (BIE),我们可以只在域的边界上训练算子网络。这种方法将所需采样点的数量从 O(Nd) 减少到 O(Nd-1),其中 d 是域的维数,从而显著加快了训练过程。此外,我们的方法还能处理无边界问题,这是现有的物理信息神经网络(PINN)和神经算子无法实现的。我们的数值实验显示了参数化复杂几何图形和无界问题的有效性。
{"title":"Learning Only on Boundaries: A Physics-Informed Neural Operator for Solving Parametric Partial Differential Equations in Complex Geometries","authors":"Zhiwei Fang;Sifan Wang;Paris Perdikaris","doi":"10.1162/neco_a_01647","DOIUrl":"10.1162/neco_a_01647","url":null,"abstract":"Recently, deep learning surrogates and neural operators have shown promise in solving partial differential equations (PDEs). However, they often require a large amount of training data and are limited to bounded domains. In this work, we present a novel physics-informed neural operator method to solve parameterized boundary value problems without labeled data. By reformulating the PDEs into boundary integral equations (BIEs), we can train the operator network solely on the boundary of the domain. This approach reduces the number of required sample points from O(Nd) to O(Nd-1), where d is the domain's dimension, leading to a significant acceleration of the training process. Additionally, our method can handle unbounded problems, which are unattainable for existing physics-informed neural networks (PINNs) and neural operators. Our numerical experiments show the effectiveness of parameterized complex geometries and unbounded problems.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 3","pages":"475-498"},"PeriodicalIF":2.9,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139747703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}