Neural Computation最新文献_第9页

Gauge-Optimal Approximate Learning for Small Data Classification 小数据分类的量纲最优近似学习

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-03-10 DOI: 10.1162/neco_a_01664

Edoardo Vecchi;Davide Bassetti;Fabio Graziato;Lukáš Pospíšil;Illia Horenko

Small data learning problems are characterized by a significant discrepancy between the limited number of response variable observations and the large feature space dimension. In this setting, the common learning tools struggle to identify the features important for the classification task from those that bear no relevant information and cannot derive an appropriate learning rule that allows discriminating among different classes. As a potential solution to this problem, here we exploit the idea of reducing and rotating the feature space in a lower-dimensional gauge and propose the gauge-optimal approximate learning (GOAL) algorithm, which provides an analytically tractable joint solution to the dimension reduction, feature segmentation, and classification problems for small data learning problems. We prove that the optimal solution of the GOAL algorithm consists in piecewise-linear functions in the Euclidean space and that it can be approximated through a monotonically convergent algorithm that presents—under the assumption of a discrete segmentation of the feature space—a closed-form solution for each optimization substep and an overall linear iteration cost scaling. The GOAL algorithm has been compared to other state-of-the-art machine learning tools on both synthetic data and challenging real-world applications from climate science and bioinformatics (i.e., prediction of the El Niño Southern Oscillation and inference of epigenetically induced gene-activity networks from limited experimental data). The experimental results show that the proposed algorithm outperforms the reported best competitors for these problems in both learning performance and computational cost.

小数据学习问题的特点是，有限的响应变量观测数据与庞大的特征空间维度之间存在巨大差异。在这种情况下，普通的学习工具很难从不具相关信息的特征中识别出对分类任务重要的特征，也无法得出适当的学习规则来区分不同的类别。作为这一问题的潜在解决方案，我们在这里利用了在低维尺度上缩小和旋转特征空间的想法，并提出了尺度最优近似学习（GOAL）算法，它为小数据学习问题中的维度缩小、特征分割和分类问题提供了一种可分析的联合解决方案。我们证明，GOAL 算法的最优解由欧几里得空间中的片断线性函数组成，它可以通过一种单调收敛的算法来近似，该算法在特征空间离散分割的假设下，为每个优化子步骤和整体线性迭代成本缩放提供了闭式解。在合成数据以及气候科学和生物信息学等具有挑战性的实际应用（即预测厄尔尼诺南方涛动和从有限的实验数据推断表观遗传诱导的基因活动网络）上，GOAL 算法与其他最先进的机器学习工具进行了比较。实验结果表明，在这些问题上，所提出的算法在学习性能和计算成本上都优于已报道的最佳竞争对手。

{"title":"Gauge-Optimal Approximate Learning for Small Data Classification","authors":"Edoardo Vecchi;Davide Bassetti;Fabio Graziato;Lukáš Pospíšil;Illia Horenko","doi":"10.1162/neco_a_01664","DOIUrl":"10.1162/neco_a_01664","url":null,"abstract":"Small data learning problems are characterized by a significant discrepancy between the limited number of response variable observations and the large feature space dimension. In this setting, the common learning tools struggle to identify the features important for the classification task from those that bear no relevant information and cannot derive an appropriate learning rule that allows discriminating among different classes. As a potential solution to this problem, here we exploit the idea of reducing and rotating the feature space in a lower-dimensional gauge and propose the gauge-optimal approximate learning (GOAL) algorithm, which provides an analytically tractable joint solution to the dimension reduction, feature segmentation, and classification problems for small data learning problems. We prove that the optimal solution of the GOAL algorithm consists in piecewise-linear functions in the Euclidean space and that it can be approximated through a monotonically convergent algorithm that presents—under the assumption of a discrete segmentation of the feature space—a closed-form solution for each optimization substep and an overall linear iteration cost scaling. The GOAL algorithm has been compared to other state-of-the-art machine learning tools on both synthetic data and challenging real-world applications from climate science and bioinformatics (i.e., prediction of the El Niño Southern Oscillation and inference of epigenetically induced gene-activity networks from limited experimental data). The experimental results show that the proposed algorithm outperforms the reported best competitors for these problems in both learning performance and computational cost.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 6","pages":"1198-1227"},"PeriodicalIF":2.7,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140805847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Positive Competitive Networks for Sparse Reconstruction 用于稀疏重建的正竞争网络

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-03-10 DOI: 10.1162/neco_a_01657

Veronica Centorrino;Anand Gokhale;Alexander Davydov;Giovanni Russo;Francesco Bullo

We propose and analyze a continuous-time firing-rate neural network, the positive firing-rate competitive network (PFCN), to tackle sparse reconstruction problems with non-negativity constraints. These problems, which involve approximating a given input stimulus from a dictionary using a set of sparse (active) neurons, play a key role in a wide range of domains, including, for example, neuroscience, signal processing, and machine learning. First, by leveraging the theory of proximal operators, we relate the equilibria of a family of continuous-time firing-rate neural networks to the optimal solutions of sparse reconstruction problems. Then we prove that the PFCN is a positive system and give rigorous conditions for the convergence to the equilibrium. Specifically, we show that the convergence depends only on a property of the dictionary and is linear-exponential in the sense that initially, the convergence rate is at worst linear and then, after a transient, becomes exponential. We also prove a number of technical results to assess the contractivity properties of the neural dynamics of interest. Our analysis leverages contraction theory to characterize the behavior of a family of firing-rate competitive networks for sparse reconstruction with and without non-negativity constraints. Finally, we validate the effectiveness of our approach via a numerical example.

我们提出并分析了一种连续时间发射率神经网络--正发射率竞争网络（PFCN），用于解决具有非负性约束的稀疏重构问题。这些问题涉及使用一组稀疏（活跃）神经元从字典中逼近给定的输入刺激，在神经科学、信号处理和机器学习等广泛领域发挥着关键作用。首先，我们利用近算子理论，将连续时间发射率神经网络家族的均衡点与稀疏重构问题的最优解联系起来。然后，我们证明了 PFCN 是一个正系统，并给出了收敛到均衡的严格条件。具体来说，我们证明了收敛只取决于字典的一个属性，并且是线性-指数收敛，即最初的收敛率在最坏情况下是线性的，然后在瞬态之后变成指数收敛。我们还证明了一系列技术结果，以评估相关神经动力学的收缩特性。我们的分析利用了收缩理论来描述有非负性约束和无非负性约束的稀疏重构的发射率竞争网络家族的行为特征。最后，我们通过一个数值示例验证了我们方法的有效性。

{"title":"Positive Competitive Networks for Sparse Reconstruction","authors":"Veronica Centorrino;Anand Gokhale;Alexander Davydov;Giovanni Russo;Francesco Bullo","doi":"10.1162/neco_a_01657","DOIUrl":"10.1162/neco_a_01657","url":null,"abstract":"We propose and analyze a continuous-time firing-rate neural network, the positive firing-rate competitive network (PFCN), to tackle sparse reconstruction problems with non-negativity constraints. These problems, which involve approximating a given input stimulus from a dictionary using a set of sparse (active) neurons, play a key role in a wide range of domains, including, for example, neuroscience, signal processing, and machine learning. First, by leveraging the theory of proximal operators, we relate the equilibria of a family of continuous-time firing-rate neural networks to the optimal solutions of sparse reconstruction problems. Then we prove that the PFCN is a positive system and give rigorous conditions for the convergence to the equilibrium. Specifically, we show that the convergence depends only on a property of the dictionary and is linear-exponential in the sense that initially, the convergence rate is at worst linear and then, after a transient, becomes exponential. We also prove a number of technical results to assess the contractivity properties of the neural dynamics of interest. Our analysis leverages contraction theory to characterize the behavior of a family of firing-rate competitive networks for sparse reconstruction with and without non-negativity constraints. Finally, we validate the effectiveness of our approach via a numerical example.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 6","pages":"1163-1197"},"PeriodicalIF":2.7,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140805846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dense Sample Deep Learning 密集样本深度学习

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-03-10 DOI: 10.1162/neco_a_01666

Stephen José Hanson;Vivek Yadav;Catherine Hanson

Deep learning (DL), a variant of the neural network algorithms originally proposed in the 1980s (Rumelhart et al., 1986), has made surprising progress in artificial intelligence (AI), ranging from language translation, protein folding (Jumper et al., 2021), autonomous cars, and, more recently, human-like language models (chatbots). All that seemed intractable until very recently. Despite the growing use of DL networks, little is understood about the learning mechanisms and representations that make these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and, of course, the large scale of the data, since not much has changed since 1986. But the nature of deep learned representations remains largely unknown. Unfortunately, training sets with millions or billions of tokens have unknown combinatorics, and networks with millions or billions of hidden units can't easily be visualized and their mechanisms can't be easily revealed. In this letter, we explore these challenges with a large (1.24 million weights VGG) DL in a novel high-density sample task (five unique tokens with more than 500 exemplars per token), which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrapping. From these results, we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results.

深度学习（DL）是神经网络算法的一种变体，最初于 20 世纪 80 年代提出（Rumelhart 等人，1986 年），它在人工智能（AI）领域取得了令人惊讶的进展，包括语言翻译、蛋白质折叠（Jumper 等人，2021 年）、自动驾驶汽车以及最近的类人语言模型（聊天机器人）。直到最近，所有这些似乎都难以解决。尽管 DL 网络的应用越来越广泛，但人们对其学习机制和表征却知之甚少。部分原因肯定是架构的巨大规模，当然还有数据的巨大规模，因为自 1986 年以来并没有发生太大变化。但是，深度学习表征的本质在很大程度上仍然是未知的。不幸的是，拥有数百万或数十亿词条的训练集具有未知的组合性，而拥有数百万或数十亿隐藏单元的网络不容易可视化，其机制也不容易揭示。在这封信中，我们在一个新颖的高密度样本任务（5 个独特的标记，每个标记有 500 多个示例）中使用一个大型（124 万权重；VGG）DL 探索了这些挑战，这使我们能够更仔细地跟踪类别结构和特征构建的出现。我们使用各种可视化方法来跟踪分类的出现以及特征检测器和结构耦合的发展，这些方法提供了一种图形引导。从这些结果中，我们收获了对 DL 学习动态的一些基本观察，并在此基础上提出了复杂特征构建的新理论。

{"title":"Dense Sample Deep Learning","authors":"Stephen José Hanson;Vivek Yadav;Catherine Hanson","doi":"10.1162/neco_a_01666","DOIUrl":"10.1162/neco_a_01666","url":null,"abstract":"Deep learning (DL), a variant of the neural network algorithms originally proposed in the 1980s (Rumelhart et al., 1986), has made surprising progress in artificial intelligence (AI), ranging from language translation, protein folding (Jumper et al., 2021), autonomous cars, and, more recently, human-like language models (chatbots). All that seemed intractable until very recently. Despite the growing use of DL networks, little is understood about the learning mechanisms and representations that make these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and, of course, the large scale of the data, since not much has changed since 1986. But the nature of deep learned representations remains largely unknown. Unfortunately, training sets with millions or billions of tokens have unknown combinatorics, and networks with millions or billions of hidden units can't easily be visualized and their mechanisms can't be easily revealed. In this letter, we explore these challenges with a large (1.24 million weights VGG) DL in a novel high-density sample task (five unique tokens with more than 500 exemplars per token), which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrapping. From these results, we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 6","pages":"1228-1244"},"PeriodicalIF":2.7,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10661260","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140805850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Linear Codes for Hyperdimensional Computing 超维计算线性代码

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-03-10 DOI: 10.1162/neco_a_01665

Netanel Raviv

Hyperdimensional computing (HDC) is an emerging computational paradigm for representing compositional information as high-dimensional vectors and has a promising potential in applications ranging from machine learning to neuromorphic computing. One of the long-standing challenges in HDC is factoring a compositional representation to its constituent factors, also known as the recovery problem. In this article, we take a novel approach to solve the recovery problem and propose the use of random linear codes. These codes are subspaces over the Boolean field and are a well-studied topic in information theory with various applications in digital communication. We begin by showing that hyperdimensional encoding using random linear codes retains favorable properties of the prevalent (ordinary) random codes; hence, HD representations using the two methods have comparable information storage capabilities. We proceed to show that random linear codes offer a rich subcode structure that can be used to form key-value stores, which encapsulate the most used cases of HDC. Most important, we show that under the framework we develop, random linear codes admit simple recovery algorithms to factor (either bundled or bound) compositional representations. The former relies on constructing certain linear equation systems over the Boolean field, the solution to which reduces the search space dramatically and strictly outperforms exhaustive search in many cases. The latter employs the subspace structure of these codes to achieve provably correct factorization. Both methods are strictly faster than the state-of-the-art resonator networks, often by an order of magnitude. We implemented our techniques in Python using a benchmark software library and demonstrated promising experimental results.

超维计算（HDC）是一种新兴的计算范式，用于将组合信息表示为高维向量，在机器学习和神经形态计算等应用领域具有广阔的发展前景。高维计算长期面临的挑战之一是将组成表示分解为其组成因子，这也被称为恢复问题。在这封信中，我们采用了一种新方法来解决恢复问题，并建议使用随机线性编码。这些代码是布尔域上的子空间，是信息论中一个研究得很透彻的课题，在数字通信中有各种应用。我们首先证明，使用随机线性编码的超维度编码保留了流行的（普通）随机编码的有利特性；因此，使用这两种方法的高清表示具有可比的信息存储能力。我们接着证明，随机线性编码提供了丰富的子编码结构，可用于形成键值存储，从而封装了最常用的 HDC 案例。最重要的是，我们表明，在我们开发的框架下，随机线性编码允许使用简单的恢复算法来因子（捆绑或绑定）组合表示。前者依赖于在布尔域上构建某些线性方程组，其求解方法极大地缩小了搜索空间，在许多情况下严格优于穷举搜索。后者利用这些代码的子空间结构来实现可证明正确的因式分解。这两种方法都比最先进的谐振网络快，通常快一个数量级。我们使用基准软件库在 Python 中实现了我们的技术，并展示了很有前景的实验结果。

{"title":"Linear Codes for Hyperdimensional Computing","authors":"Netanel Raviv","doi":"10.1162/neco_a_01665","DOIUrl":"10.1162/neco_a_01665","url":null,"abstract":"Hyperdimensional computing (HDC) is an emerging computational paradigm for representing compositional information as high-dimensional vectors and has a promising potential in applications ranging from machine learning to neuromorphic computing. One of the long-standing challenges in HDC is factoring a compositional representation to its constituent factors, also known as the recovery problem. In this article, we take a novel approach to solve the recovery problem and propose the use of random linear codes. These codes are subspaces over the Boolean field and are a well-studied topic in information theory with various applications in digital communication. We begin by showing that hyperdimensional encoding using random linear codes retains favorable properties of the prevalent (ordinary) random codes; hence, HD representations using the two methods have comparable information storage capabilities. We proceed to show that random linear codes offer a rich subcode structure that can be used to form key-value stores, which encapsulate the most used cases of HDC. Most important, we show that under the framework we develop, random linear codes admit simple recovery algorithms to factor (either bundled or bound) compositional representations. The former relies on constructing certain linear equation systems over the Boolean field, the solution to which reduces the search space dramatically and strictly outperforms exhaustive search in many cases. The latter employs the subspace structure of these codes to achieve provably correct factorization. Both methods are strictly faster than the state-of-the-art resonator networks, often by an order of magnitude. We implemented our techniques in Python using a benchmark software library and demonstrated promising experimental results.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 6","pages":"1084-1120"},"PeriodicalIF":2.7,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140806119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evidence for Multiscale Multiplexed Representation of Visual Features in EEG 脑电图中视觉特征多尺度复用表征的证据

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-02-16 DOI: 10.1162/neco_a_01649

Hamid Karimi-Rouzbahani

Distinct neural processes such as sensory and memory processes are often encoded over distinct timescales of neural activations. Animal studies have shown that this multiscale coding strategy is also implemented for individual components of a single process, such as individual features of a multifeature stimulus in sensory coding. However, the generalizability of this encoding strategy to the human brain has remained unclear. We asked if individual features of visual stimuli were encoded over distinct timescales. We applied a multiscale time-resolved decoding method to electroencephalography (EEG) collected from human subjects presented with grating visual stimuli to estimate the timescale of individual stimulus features. We observed that the orientation and color of the stimuli were encoded in shorter timescales, whereas spatial frequency and the contrast of the same stimuli were encoded in longer timescales. The stimulus features appeared in temporally overlapping windows along the trial supporting a multiplexed coding strategy. These results provide evidence for a multiplexed, multiscale coding strategy in the human visual system.

不同的神经过程（如感觉和记忆过程）通常在不同的神经激活时间尺度上进行编码。动物研究表明，这种多尺度编码策略也适用于单个过程的单个成分，如感觉编码中多特征刺激的单个特征。然而，这种编码策略在人脑中的通用性仍不清楚。我们想知道视觉刺激的单个特征是否在不同的时间尺度上进行编码。我们将多尺度时间分辨解码方法应用于从人类受试者身上收集到的光栅视觉刺激脑电图（EEG），以估计单个刺激特征的时间尺度。我们观察到，刺激物的方向和颜色以较短的时间尺度编码，而相同刺激物的空间频率和对比度则以较长的时间尺度编码。刺激物的特征在试验过程中出现在时间上重叠的窗口中，这支持了多路编码策略。这些结果为人类视觉系统的多路复用、多尺度编码策略提供了证据。

引用次数: 0

Quantifying and Maximizing the Information Flux in Recurrent Neural Networks 量化和最大化递归神经网络中的信息通量

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-02-16 DOI: 10.1162/neco_a_01651

Claus Metzner;Marius E. Yamakou;Dennis Voelkl;Achim Schilling;Patrick Krauss

Free-running recurrent neural networks (RNNs), especially probabilistic models, generate an ongoing information flux that can be quantified with the mutual information I[x→(t),x→(t+1)] between subsequent system states x→. Although previous studies have shown that I depends on the statistics of the network's connection weights, it is unclear how to maximize I systematically and how to quantify the flux in large systems where computing the mutual information becomes intractable. Here, we address these questions using Boltzmann machines as model systems. We find that in networks with moderately strong connections, the mutual information I is approximately a monotonic transformation of the root-mean-square averaged Pearson correlations between neuron pairs, a quantity that can be efficiently computed even in large systems. Furthermore, evolutionary maximization of I[x→(t),x→(t+1)] reveals a general design principle for the weight matrices enabling the systematic construction of systems with a high spontaneous information flux. Finally, we simultaneously maximize information flux and the mean period length of cyclic attractors in the state-space of these dynamical networks. Our results are potentially useful for the construction of RNNs that serve as short-time memories or pattern generators.

自由运行的循环神经网络（RNN），尤其是概率模型，会产生持续的信息通量，可以用后续系统状态 x→ 之间的互信息 I[x→(t),x→(t+1)]来量化。尽管之前的研究表明 I 取决于网络连接权重的统计量，但目前还不清楚如何系统地最大化 I，以及如何量化大型系统中的通量，因为在大型系统中计算互信息变得非常困难。在这里，我们使用玻尔兹曼机作为模型系统来解决这些问题。我们发现，在具有中等强度连接的网络中，互信息 I 近似于神经元对之间的均方根平均皮尔逊相关性的单调变换，即使在大型系统中也能高效计算。此外，I[x→(t),x→(t+1)]的进化最大化揭示了权重矩阵的一般设计原则，从而能够系统地构建具有高自发信息通量的系统。最后，我们在这些动力学网络的状态空间中同时最大化了信息通量和循环吸引子的平均周期长度。我们的研究成果对构建作为短时记忆或模式发生器的 RNNs 有潜在的帮助。

{"title":"Quantifying and Maximizing the Information Flux in Recurrent Neural Networks","authors":"Claus Metzner;Marius E. Yamakou;Dennis Voelkl;Achim Schilling;Patrick Krauss","doi":"10.1162/neco_a_01651","DOIUrl":"10.1162/neco_a_01651","url":null,"abstract":"Free-running recurrent neural networks (RNNs), especially probabilistic models, generate an ongoing information flux that can be quantified with the mutual information I[x→(t),x→(t+1)] between subsequent system states x→. Although previous studies have shown that I depends on the statistics of the network's connection weights, it is unclear how to maximize I systematically and how to quantify the flux in large systems where computing the mutual information becomes intractable. Here, we address these questions using Boltzmann machines as model systems. We find that in networks with moderately strong connections, the mutual information I is approximately a monotonic transformation of the root-mean-square averaged Pearson correlations between neuron pairs, a quantity that can be efficiently computed even in large systems. Furthermore, evolutionary maximization of I[x→(t),x→(t+1)] reveals a general design principle for the weight matrices enabling the systematic construction of systems with a high spontaneous information flux. Finally, we simultaneously maximize information flux and the mean period length of cyclic attractors in the state-space of these dynamical networks. Our results are potentially useful for the construction of RNNs that serve as short-time memories or pattern generators.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 3","pages":"351-384"},"PeriodicalIF":2.9,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139747774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Active Learning for Discrete Latent Variable Models 离散潜变量模型的主动学习

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-02-16 DOI: 10.1162/neco_a_01646

Aditi Jha;Zoe C. Ashwood;Jonathan W. Pillow

Active learning seeks to reduce the amount of data required to fit the parameters of a model, thus forming an important class of techniques in modern machine learning. However, past work on active learning has largely overlooked latent variable models, which play a vital role in neuroscience, psychology, and a variety of other engineering and scientific disciplines. Here we address this gap by proposing a novel framework for maximum-mutual-information input selection for discrete latent variable regression models. We first apply our method to a class of models known as mixtures of linear regressions (MLR). While it is well known that active learning confers no advantage for linear-gaussian regression models, we use Fisher information to show analytically that active learning can nevertheless achieve large gains for mixtures of such models, and we validate this improvement using both simulations and real-world data. We then consider a powerful class of temporally structured latent variable models given by a hidden Markov model (HMM) with generalized linear model (GLM) observations, which has recently been used to identify discrete states from animal decision-making data. We show that our method substantially reduces the amount of data needed to fit GLM-HMMs and outperforms a variety of approximate methods based on variational and amortized inference. Infomax learning for latent variable models thus offers a powerful approach for characterizing temporally structured latent states, with a wide variety of applications in neuroscience and beyond.

主动学习旨在减少拟合模型参数所需的数据量，因此是现代机器学习的一类重要技术。然而，过去关于主动学习的研究在很大程度上忽视了潜变量模型，而潜变量模型在神经科学、心理学以及其他各种工程和科学学科中发挥着重要作用。在这里，我们提出了一个新框架，用于离散潜变量回归模型的最大相互信息输入选择，从而弥补了这一不足。我们首先将我们的方法应用于一类称为线性回归混合物（MLR）的模型。众所周知，主动学习对线性高斯回归模型没有优势，但我们利用费雪信息分析表明，主动学习仍能为这类模型的混合物带来巨大收益，我们还利用模拟和实际数据验证了这种改进。然后，我们考虑了一类强大的时间结构潜变量模型，该模型由具有广义线性模型（GLM）观测值的隐马尔可夫模型（HMM）给出，最近已被用于从动物决策数据中识别离散状态。我们的研究表明，我们的方法大大减少了拟合 GLM-HMM 所需的数据量，并且优于各种基于变异推理和摊销推理的近似方法。因此，潜变量模型的 Infomax 学习为描述时间结构的潜状态提供了一种强大的方法，在神经科学及其他领域有着广泛的应用。

{"title":"Active Learning for Discrete Latent Variable Models","authors":"Aditi Jha;Zoe C. Ashwood;Jonathan W. Pillow","doi":"10.1162/neco_a_01646","DOIUrl":"10.1162/neco_a_01646","url":null,"abstract":"Active learning seeks to reduce the amount of data required to fit the parameters of a model, thus forming an important class of techniques in modern machine learning. However, past work on active learning has largely overlooked latent variable models, which play a vital role in neuroscience, psychology, and a variety of other engineering and scientific disciplines. Here we address this gap by proposing a novel framework for maximum-mutual-information input selection for discrete latent variable regression models. We first apply our method to a class of models known as mixtures of linear regressions (MLR). While it is well known that active learning confers no advantage for linear-gaussian regression models, we use Fisher information to show analytically that active learning can nevertheless achieve large gains for mixtures of such models, and we validate this improvement using both simulations and real-world data. We then consider a powerful class of temporally structured latent variable models given by a hidden Markov model (HMM) with generalized linear model (GLM) observations, which has recently been used to identify discrete states from animal decision-making data. We show that our method substantially reduces the amount of data needed to fit GLM-HMMs and outperforms a variety of approximate methods based on variational and amortized inference. Infomax learning for latent variable models thus offers a powerful approach for characterizing temporally structured latent states, with a wide variety of applications in neuroscience and beyond.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 3","pages":"437-474"},"PeriodicalIF":2.9,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139747685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advantages of Persistent Cohomology in Estimating Animal Location From Grid Cell Population Activity 从网格细胞种群活动推测动物位置的持续同源性优势

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-02-16 DOI: 10.1162/neco_a_01645

Daisuke Kawahara;Shigeyoshi Fujisawa

Many cognitive functions are represented as cell assemblies. In the case of spatial navigation, the population activity of place cells in the hippocampus and grid cells in the entorhinal cortex represents self-location in the environment. The brain cannot directly observe self-location information in the environment. Instead, it relies on sensory information and memory to estimate self-location. Therefore, estimating low-dimensional dynamics, such as the movement trajectory of an animal exploring its environment, from only the high-dimensional neural activity is important in deciphering the information represented in the brain. Most previous studies have estimated the low-dimensional dynamics (i.e., latent variables) behind neural activity by unsupervised learning with Bayesian population decoding using artificial neural networks or gaussian processes. Recently, persistent cohomology has been used to estimate latent variables from the phase information (i.e., circular coordinates) of manifolds created by neural activity. However, the advantages of persistent cohomology over Bayesian population decoding are not well understood. We compared persistent cohomology and Bayesian population decoding in estimating the animal location from simulated and actual grid cell population activity. We found that persistent cohomology can estimate the animal location with fewer neurons than Bayesian population decoding and robustly estimate the animal location from actual noisy data.

许多认知功能都是以细胞群的形式表现出来的。就空间导航而言，海马体中的位置细胞和内侧皮层中的网格细胞的群体活动代表了环境中的自我定位。大脑无法直接观察环境中的自我定位信息。相反，它依靠感觉信息和记忆来估计自我定位。因此，仅从高维神经活动中估算低维动态，如动物探索环境的运动轨迹，对于解读大脑所代表的信息非常重要。以往的大多数研究都是通过使用人工神经网络或高斯过程进行贝叶斯群体解码的无监督学习来估计神经活动背后的低维动态（即潜在变量）。最近，持续共生被用于从神经活动所创建流形的相位信息（即圆坐标）中估计潜变量。然而，与贝叶斯群体解码法相比，持久同调法的优势尚不十分明确。我们比较了持续同构和贝叶斯种群解码在从模拟和实际网格细胞种群活动中估计动物位置方面的优势。我们发现，与贝叶斯种群解码相比，持久同调法能以更少的神经元估计动物位置，并能从实际的噪声数据中稳健地估计动物位置。

{"title":"Advantages of Persistent Cohomology in Estimating Animal Location From Grid Cell Population Activity","authors":"Daisuke Kawahara;Shigeyoshi Fujisawa","doi":"10.1162/neco_a_01645","DOIUrl":"10.1162/neco_a_01645","url":null,"abstract":"Many cognitive functions are represented as cell assemblies. In the case of spatial navigation, the population activity of place cells in the hippocampus and grid cells in the entorhinal cortex represents self-location in the environment. The brain cannot directly observe self-location information in the environment. Instead, it relies on sensory information and memory to estimate self-location. Therefore, estimating low-dimensional dynamics, such as the movement trajectory of an animal exploring its environment, from only the high-dimensional neural activity is important in deciphering the information represented in the brain. Most previous studies have estimated the low-dimensional dynamics (i.e., latent variables) behind neural activity by unsupervised learning with Bayesian population decoding using artificial neural networks or gaussian processes. Recently, persistent cohomology has been used to estimate latent variables from the phase information (i.e., circular coordinates) of manifolds created by neural activity. However, the advantages of persistent cohomology over Bayesian population decoding are not well understood. We compared persistent cohomology and Bayesian population decoding in estimating the animal location from simulated and actual grid cell population activity. We found that persistent cohomology can estimate the animal location with fewer neurons than Bayesian population decoding and robustly estimate the animal location from actual noisy data.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 3","pages":"385-411"},"PeriodicalIF":2.9,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139747686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Errata to “A Tutorial on the Spectral Theory of Markov Chains” by Eddie Seabrook and Laurenz Wiskott (Neural Computation, November 2023, Vol. 35, No. 11, pp. 1713–1796, https://doi.org/10.1162/neco_a_01611) Eddie Seabrook 和 Laurenz Wiskott 所著《马尔可夫链谱理论教程》的勘误（《神经计算》，2023 年 11 月，第 35 卷，第 11 期，第 1713-1796 页，https://doi.org/10.1162/neco_a_01611）。

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-02-16 DOI: 10.1162/neco_e_01662

引用次数: 0

Learning Only on Boundaries: A Physics-Informed Neural Operator for Solving Parametric Partial Differential Equations in Complex Geometries 只在边界上学习：用于求解复杂几何中参数偏微分方程的物理信息神经算子。

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-02-16 DOI: 10.1162/neco_a_01647

Zhiwei Fang;Sifan Wang;Paris Perdikaris

Recently, deep learning surrogates and neural operators have shown promise in solving partial differential equations (PDEs). However, they often require a large amount of training data and are limited to bounded domains. In this work, we present a novel physics-informed neural operator method to solve parameterized boundary value problems without labeled data. By reformulating the PDEs into boundary integral equations (BIEs), we can train the operator network solely on the boundary of the domain. This approach reduces the number of required sample points from O(Nd) to O(Nd-1), where d is the domain's dimension, leading to a significant acceleration of the training process. Additionally, our method can handle unbounded problems, which are unattainable for existing physics-informed neural networks (PINNs) and neural operators. Our numerical experiments show the effectiveness of parameterized complex geometries and unbounded problems.

最近，深度学习代理和神经算子在求解偏微分方程（PDE）方面大有可为。然而，它们往往需要大量的训练数据，而且仅限于有界域。在这项工作中，我们提出了一种新颖的物理信息神经算子方法，用于解决无标记数据的参数化边界值问题。通过将 PDE 重新表述为边界积分方程 (BIE)，我们可以只在域的边界上训练算子网络。这种方法将所需采样点的数量从 O(Nd) 减少到 O(Nd-1)，其中 d 是域的维数，从而显著加快了训练过程。此外，我们的方法还能处理无边界问题，这是现有的物理信息神经网络（PINN）和神经算子无法实现的。我们的数值实验显示了参数化复杂几何图形和无界问题的有效性。

引用次数: 0