Neural Computation最新文献_第10页

Cooperativity, Information Gain, and Energy Cost During Early LTP in Dendritic Spines 树突棘早期 LTP 期间的合作性、信息增益和能量成本

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-01-18 DOI: 10.1162/neco_a_01632

Jan Karbowski;Paulina Urban

We investigate a mutual relationship between information and energy during the early phase of LTP induction and maintenance in a large-scale system of mutually coupled dendritic spines, with discrete internal states and probabilistic dynamics, within the framework of nonequilibrium stochastic thermodynamics. In order to analyze this computationally intractable stochastic multidimensional system, we introduce a pair approximation, which allows us to reduce the spine dynamics into a lower-dimensional manageable system of closed equations. We found that the rates of information gain and energy attain their maximal values during an initial period of LTP (i.e., during stimulation), and after that, they recover to their baseline low values, as opposed to a memory trace that lasts much longer. This suggests that the learning phase is much more energy demanding than the memory phase. We show that positive correlations between neighboring spines increase both a duration of memory trace and energy cost during LTP, but the memory time per invested energy increases dramatically for very strong, positive synaptic cooperativity, suggesting a beneficial role of synaptic clustering on memory duration. In contrast, information gain after LTP is the largest for negative correlations, and energy efficiency of that information generally declines with increasing synaptic cooperativity. We also find that dendritic spines can use sparse representations for encoding long-term information, as both energetic and structural efficiencies of retained information and its lifetime exhibit maxima for low fractions of stimulated synapses during LTP. Moreover, we find that such efficiencies drop significantly with increasing the number of spines. In general, our stochastic thermodynamics approach provides a unifying framework for studying, from first principles, information encoding, and its energy cost during learning and memory in stochastic systems of interacting synapses.

我们在非平衡随机热力学的框架内，研究了由相互耦合的树突棘组成的大规模系统在 LTP 诱导和维持的早期阶段中信息与能量之间的相互关系，该系统具有离散的内部状态和概率动态。为了分析这种难以计算的随机多维系统，我们引入了一种成对近似方法，它使我们能够将脊柱动力学还原为一个可管理的低维封闭方程系统。我们发现，在 LTP 的初始阶段（即刺激期间），信息增益率和能量达到最大值，之后它们会恢复到基线低值，这与持续时间更长的记忆痕迹不同。这表明学习阶段比记忆阶段需要更多的能量。我们的研究表明，在 LTP 期间，相邻棘突之间的正相关性会增加记忆痕迹的持续时间和能量成本，但当突触合作性非常强、呈正相关性时，每投入能量的记忆时间会急剧增加，这表明突触集群对记忆持续时间起着有益的作用。与此相反，负相关的 LTP 后信息增益最大，而该信息的能效通常会随着突触合作性的增加而降低。我们还发现，树突棘可以使用稀疏表征来编码长期信息，因为在 LTP 期间，当刺激突触的比例较低时，保留信息的能量效率和结构效率及其寿命都会达到最大值。此外，我们还发现，随着棘突数量的增加，这种效率会显著下降。总之，我们的随机热力学方法提供了一个统一的框架，可以从第一性原理出发，研究信息编码及其在相互作用突触的随机系统中学习和记忆过程中的能量成本。

{"title":"Cooperativity, Information Gain, and Energy Cost During Early LTP in Dendritic Spines","authors":"Jan Karbowski;Paulina Urban","doi":"10.1162/neco_a_01632","DOIUrl":"10.1162/neco_a_01632","url":null,"abstract":"We investigate a mutual relationship between information and energy during the early phase of LTP induction and maintenance in a large-scale system of mutually coupled dendritic spines, with discrete internal states and probabilistic dynamics, within the framework of nonequilibrium stochastic thermodynamics. In order to analyze this computationally intractable stochastic multidimensional system, we introduce a pair approximation, which allows us to reduce the spine dynamics into a lower-dimensional manageable system of closed equations. We found that the rates of information gain and energy attain their maximal values during an initial period of LTP (i.e., during stimulation), and after that, they recover to their baseline low values, as opposed to a memory trace that lasts much longer. This suggests that the learning phase is much more energy demanding than the memory phase. We show that positive correlations between neighboring spines increase both a duration of memory trace and energy cost during LTP, but the memory time per invested energy increases dramatically for very strong, positive synaptic cooperativity, suggesting a beneficial role of synaptic clustering on memory duration. In contrast, information gain after LTP is the largest for negative correlations, and energy efficiency of that information generally declines with increasing synaptic cooperativity. We also find that dendritic spines can use sparse representations for encoding long-term information, as both energetic and structural efficiencies of retained information and its lifetime exhibit maxima for low fractions of stimulated synapses during LTP. Moreover, we find that such efficiencies drop significantly with increasing the number of spines. In general, our stochastic thermodynamics approach provides a unifying framework for studying, from first principles, information encoding, and its energy cost during learning and memory in stochastic systems of interacting synapses.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 2","pages":"271-311"},"PeriodicalIF":2.9,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138687565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Emergence of Universal Computations Through Neural Manifold Dynamics 通过神经曼菲尔德动力学实现通用计算

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-01-18 DOI: 10.1162/neco_a_01631

Joan Gort

There is growing evidence that many forms of neural computation may be implemented by low-dimensional dynamics unfolding at the population scale. However, neither the connectivity structure nor the general capabilities of these embedded dynamical processes are currently understood. In this work, the two most common formalisms of firing-rate models are evaluated using tools from analysis, topology, and nonlinear dynamics in order to provide plausible explanations for these problems. It is shown that low-rank structured connectivities predict the formation of invariant and globally attracting manifolds in all these models. Regarding the dynamics arising in these manifolds, it is proved they are topologically equivalent across the considered formalisms. This letter also shows that under the low-rank hypothesis, the flows emerging in neural manifolds, including input-driven systems, are universal, which broadens previous findings. It explores how low-dimensional orbits can bear the production of continuous sets of muscular trajectories, the implementation of central pattern generators, and the storage of memory states. These dynamics can robustly simulate any Turing machine over arbitrary bounded memory strings, virtually endowing rate models with the power of universal computation. In addition, the letter shows how the low-rank hypothesis predicts the parsimonious correlation structure observed in cortical activity. Finally, it discusses how this theory could provide a useful tool from which to study neuropsychological phenomena using mathematical methods.

越来越多的证据表明，许多形式的神经计算可能是通过在群体尺度上展开的低维动力学来实现的。然而，目前人们对这些内嵌动态过程的连接结构和一般能力都不甚了解。在这项研究中，我们使用分析、拓扑学和非线性动力学工具对两种最常见的发射率模型形式进行了评估，以便为这些问题提供合理的解释。研究表明，在所有这些模型中，低等级结构连通性预示着不变流形和全局吸引流形的形成。关于这些流形中产生的动力学，研究证明它们在所考虑的各种形式中具有拓扑等价性。这封信还表明，在低阶假设下，神经流形（包括输入驱动系统）中出现的流动是普遍的，这拓宽了之前的发现。它探讨了低维轨道如何承载肌肉轨迹连续集的产生、中心模式发生器的实现以及记忆状态的存储。这些动力学可以在任意有界记忆串上稳健地模拟任何图灵机，实际上赋予了速率模型以通用计算的能力。此外，这封信还展示了低阶假说是如何预测皮层活动中观察到的相关结构的。最后，它还讨论了这一理论如何为使用数学方法研究神经心理学现象提供有用的工具。

{"title":"Emergence of Universal Computations Through Neural Manifold Dynamics","authors":"Joan Gort","doi":"10.1162/neco_a_01631","DOIUrl":"10.1162/neco_a_01631","url":null,"abstract":"There is growing evidence that many forms of neural computation may be implemented by low-dimensional dynamics unfolding at the population scale. However, neither the connectivity structure nor the general capabilities of these embedded dynamical processes are currently understood. In this work, the two most common formalisms of firing-rate models are evaluated using tools from analysis, topology, and nonlinear dynamics in order to provide plausible explanations for these problems. It is shown that low-rank structured connectivities predict the formation of invariant and globally attracting manifolds in all these models. Regarding the dynamics arising in these manifolds, it is proved they are topologically equivalent across the considered formalisms. This letter also shows that under the low-rank hypothesis, the flows emerging in neural manifolds, including input-driven systems, are universal, which broadens previous findings. It explores how low-dimensional orbits can bear the production of continuous sets of muscular trajectories, the implementation of central pattern generators, and the storage of memory states. These dynamics can robustly simulate any Turing machine over arbitrary bounded memory strings, virtually endowing rate models with the power of universal computation. In addition, the letter shows how the low-rank hypothesis predicts the parsimonious correlation structure observed in cortical activity. Finally, it discusses how this theory could provide a useful tool from which to study neuropsychological phenomena using mathematical methods.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 2","pages":"227-270"},"PeriodicalIF":2.9,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138687695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Q&A Label Learning 问答标签学习

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-01-18 DOI: 10.1162/neco_a_01633

Kota Kawamoto;Masato Uchida

Assigning labels to instances is crucial for supervised machine learning. In this letter, we propose a novel annotation method, Q&A labeling, which involves a question generator that asks questions about the labels of the instances to be assigned and an annotator that answers the questions and assigns the corresponding labels to the instances. We derived a generative model of labels assigned according to two Q&A labeling procedures that differ in the way questions are asked and answered. We showed that in both procedures, the derived model is partially consistent with that assumed in previous studies. The main distinction of this study from previous ones lies in the fact that the label generative model was not assumed but, rather, derived based on the definition of a specific annotation method, Q&A labeling. We also derived a loss function to evaluate the classification risk of ordinary supervised machine learning using instances assigned Q&A labels and evaluated the upper bound of the classification error. The results indicate statistical consistency in learning with Q&A labels.

为实例分配标签对于有监督机器学习至关重要。在这封信中，我们提出了一种新颖的标注方法 Q&A labeling（Q&A 标注），它包括一个问题生成器和一个标注器，前者会就要分配的实例标签提出问题，后者会回答问题并为实例分配相应的标签。我们推导出了根据两种 Q&A 标签程序分配标签的生成模型，这两种程序在提问和回答问题的方式上有所不同。我们发现，在这两种程序中，推导出的模型与以往研究中假设的模型部分一致。本研究与以往研究的主要区别在于，"生成模型 "这一标签不是假定的，而是根据特定注释方法（Q&A 标签）的定义推导出来的。我们还导出了一个损失函数，用于评估使用分配 Q&A 标签的实例进行普通监督机器学习的分类风险，并评估了分类误差的上限。结果表明，使用 Q&A 标签进行学习具有统计一致性。

引用次数: 0

Efficient Decoding of Large-Scale Neural Population Responses With Gaussian-Process Multiclass Regression 利用高斯过程多类回归对大规模神经群体响应进行高效解码

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2024-01-18 DOI: 10.1162/neco_a_01630

C. Daniel Greenidge;Benjamin Scholl;Jacob L. Yates;Jonathan W. Pillow

Neural decoding methods provide a powerful tool for quantifying the information content of neural population codes and the limits imposed by correlations in neural activity. However, standard decoding methods are prone to overfitting and scale poorly to high-dimensional settings. Here, we introduce a novel decoding method to overcome these limitations. Our approach, the gaussian process multiclass decoder (GPMD), is well suited to decoding a continuous low-dimensional variable from high-dimensional population activity and provides a platform for assessing the importance of correlations in neural population codes. The GPMD is a multinomial logistic regression model with a gaussian process prior over the decoding weights. The prior includes hyperparameters that govern the smoothness of each neuron's decoding weights, allowing automatic pruning of uninformative neurons during inference. We provide a variational inference method for fitting the GPMD to data, which scales to hundreds or thousands of neurons and performs well even in data sets with more neurons than trials. We apply the GPMD to recordings from primary visual cortex in three species: monkey, ferret, and mouse. Our decoder achieves state-of-the-art accuracy on all three data sets and substantially outperforms independent Bayesian decoding, showing that knowledge of the correlation structure is essential for optimal decoding in all three species.

神经解码方法为量化神经群编码的信息含量以及神经活动中相关性所带来的限制提供了强有力的工具。然而，标准的解码方法容易出现过度拟合，并且在高维环境下扩展性较差。在这里，我们引入了一种新的解码方法来克服这些限制。我们的方法，即高斯过程多类解码器（GPMD），非常适合从高维群体活动中解码连续的低维变量，并为评估神经群体编码中相关性的重要性提供了一个平台。GPMD 是一个多二项逻辑回归模型，解码权重采用高斯过程先验。该先验包含超参数，可控制每个神经元解码权重的平滑度，从而在推理过程中自动剪除无信息的神经元。我们提供了一种将 GPMD 拟合到数据中的变分推理方法，该方法可扩展到数百或数千个神经元，即使在神经元数量多于试验次数的数据集中也能表现出色。我们将 GPMD 应用于猴子、雪貂和小鼠三个物种的初级视觉皮层记录。我们的解码器在所有三个数据集上都达到了最先进的准确度，并大大优于独立贝叶斯解码，这表明在所有三个物种中，相关结构知识对于优化解码至关重要。

{"title":"Efficient Decoding of Large-Scale Neural Population Responses With Gaussian-Process Multiclass Regression","authors":"C. Daniel Greenidge;Benjamin Scholl;Jacob L. Yates;Jonathan W. Pillow","doi":"10.1162/neco_a_01630","DOIUrl":"10.1162/neco_a_01630","url":null,"abstract":"Neural decoding methods provide a powerful tool for quantifying the information content of neural population codes and the limits imposed by correlations in neural activity. However, standard decoding methods are prone to overfitting and scale poorly to high-dimensional settings. Here, we introduce a novel decoding method to overcome these limitations. Our approach, the gaussian process multiclass decoder (GPMD), is well suited to decoding a continuous low-dimensional variable from high-dimensional population activity and provides a platform for assessing the importance of correlations in neural population codes. The GPMD is a multinomial logistic regression model with a gaussian process prior over the decoding weights. The prior includes hyperparameters that govern the smoothness of each neuron's decoding weights, allowing automatic pruning of uninformative neurons during inference. We provide a variational inference method for fitting the GPMD to data, which scales to hundreds or thousands of neurons and performs well even in data sets with more neurons than trials. We apply the GPMD to recordings from primary visual cortex in three species: monkey, ferret, and mouse. Our decoder achieves state-of-the-art accuracy on all three data sets and substantially outperforms independent Bayesian decoding, showing that knowledge of the correlation structure is essential for optimal decoding in all three species.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 2","pages":"175-226"},"PeriodicalIF":2.9,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138687239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling the Role of Contour Integration in Visual Inference 模拟轮廓整合在视觉推理中的作用

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2023-12-12 DOI: 10.1162/neco_a_01625

Salman Khan;Alexander Wong;Bryan Tripp

Under difficult viewing conditions, the brain's visual system uses a variety of recurrent modulatory mechanisms to augment feedforward processing. One resulting phenomenon is contour integration, which occurs in the primary visual (V1) cortex and strengthens neural responses to edges if they belong to a larger smooth contour. Computational models have contributed to an understanding of the circuit mechanisms of contour integration, but less is known about its role in visual perception. To address this gap, we embedded a biologically grounded model of contour integration in a task-driven artificial neural network and trained it using a gradient-descent variant. We used this model to explore how brain-like contour integration may be optimized for high-level visual objectives as well as its potential roles in perception. When the model was trained to detect contours in a background of random edges, a task commonly used to examine contour integration in the brain, it closely mirrored the brain in terms of behavior, neural responses, and lateral connection patterns. When trained on natural images, the model enhanced weaker contours and distinguished whether two points lay on the same versus different contours. The model learned robust features that generalized well to out-of-training-distribution stimuli. Surprisingly, and in contrast with the synthetic task, a parameter-matched control network without recurrence performed the same as or better than the model on the natural-image tasks. Thus, a contour integration mechanism is not essential to perform these more naturalistic contour-related tasks. Finally, the best performance in all tasks was achieved by a modified contour integration model that did not distinguish between excitatory and inhibitory neurons.

在困难的观察条件下，大脑的视觉系统会使用各种递归调节机制来增强前馈处理。由此产生的一种现象是轮廓整合，它发生在初级视觉（V1）皮层，如果边缘属于一个更大的平滑轮廓，它就会加强神经对边缘的反应。计算模型有助于人们理解轮廓整合的电路机制，但人们对其在视觉感知中的作用却知之甚少。为了填补这一空白，我们在任务驱动的人工神经网络中嵌入了一个以生物学为基础的轮廓整合模型，并使用梯度-后裔变体对其进行训练。我们利用这一模型来探索类脑轮廓整合如何针对高级视觉目标进行优化，以及它在感知中的潜在作用。当训练该模型在随机边缘背景中检测轮廓时（这是一项常用于研究大脑轮廓整合的任务），它在行为、神经反应和横向连接模式方面都与大脑密切相关。在自然图像上进行训练时，该模型能增强较弱的轮廓，并区分两个点是否位于相同或不同的轮廓上。该模型学习到的稳健特征能很好地泛化到训练外的分布刺激。令人惊讶的是，与合成任务形成鲜明对比的是，在自然图像任务中，参数匹配的无递归控制网络的表现与模型相同或更好。因此，轮廓整合机制并不是完成这些更自然的轮廓相关任务的必要条件。最后，修改后的轮廓整合模型在所有任务中表现最佳，该模型不区分兴奋性神经元和抑制性神经元。

{"title":"Modeling the Role of Contour Integration in Visual Inference","authors":"Salman Khan;Alexander Wong;Bryan Tripp","doi":"10.1162/neco_a_01625","DOIUrl":"10.1162/neco_a_01625","url":null,"abstract":"Under difficult viewing conditions, the brain's visual system uses a variety of recurrent modulatory mechanisms to augment feedforward processing. One resulting phenomenon is contour integration, which occurs in the primary visual (V1) cortex and strengthens neural responses to edges if they belong to a larger smooth contour. Computational models have contributed to an understanding of the circuit mechanisms of contour integration, but less is known about its role in visual perception. To address this gap, we embedded a biologically grounded model of contour integration in a task-driven artificial neural network and trained it using a gradient-descent variant. We used this model to explore how brain-like contour integration may be optimized for high-level visual objectives as well as its potential roles in perception. When the model was trained to detect contours in a background of random edges, a task commonly used to examine contour integration in the brain, it closely mirrored the brain in terms of behavior, neural responses, and lateral connection patterns. When trained on natural images, the model enhanced weaker contours and distinguished whether two points lay on the same versus different contours. The model learned robust features that generalized well to out-of-training-distribution stimuli. Surprisingly, and in contrast with the synthetic task, a parameter-matched control network without recurrence performed the same as or better than the model on the natural-image tasks. Thus, a contour integration mechanism is not essential to perform these more naturalistic contour-related tasks. Finally, the best performance in all tasks was achieved by a modified contour integration model that did not distinguish between excitatory and inhibitory neurons.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 1","pages":"33-74"},"PeriodicalIF":2.9,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10534913","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138489120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Limiting Dynamics of SGD: Modified Loss, Phase-Space Oscillations, and Anomalous Diffusion SGD 的极限动力学：修正损失、相空间振荡和反常扩散。

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2023-12-12 DOI: 10.1162/neco_a_01626

Daniel Kunin;Javier Sagastuy-Brena;Lauren Gillespie;Eshed Margalit;Hidenori Tanaka;Surya Ganguli;Daniel L. K. Yamins

In this work, we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD). As observed previously, long after performance has converged, networks continue to move through parameter space by a process of anomalous diffusion in which distance traveled grows as a power law in the number of gradient updates with a nontrivial exponent. We reveal an intricate interaction among the hyperparameters of optimization, the structure in the gradient noise, and the Hessian matrix at the end of training that explains this anomalous diffusion. To build this understanding, we first derive a continuous-time model for SGD with finite learning rates and batch sizes as an underdamped Langevin equation. We study this equation in the setting of linear regression, where we can derive exact, analytic expressions for the phase-space dynamics of the parameters and their instantaneous velocities from initialization to stationarity. Using the Fokker-Planck equation, we show that the key ingredient driving these dynamics is not the original training loss but rather the combination of a modified loss, which implicitly regularizes the velocity, and probability currents that cause oscillations in phase space. We identify qualitative and quantitative predictions of this theory in the dynamics of a ResNet-18 model trained on ImageNet. Through the lens of statistical physics, we uncover a mechanistic origin for the anomalous limiting dynamics of deep neural networks trained with SGD. Understanding the limiting dynamics of SGD, and its dependence on various important hyperparameters like batch size, learning rate, and momentum, can serve as a basis for future work that can turn these insights into algorithmic gains.

在这项研究中，我们探索了使用随机梯度下降（SGD）训练的深度神经网络的极限动态。正如之前所观察到的，在性能收敛后的很长一段时间内，网络会通过异常扩散过程继续在参数空间中移动，在这个过程中，移动距离会随着梯度更新次数的幂律增长而增长，并具有一个非微妙的指数。我们揭示了优化的超参数、梯度噪声的结构以及训练结束时的黑森矩阵之间错综复杂的相互作用，从而解释了这种反常扩散。为了建立这种理解，我们首先推导出一个具有有限学习率和批量大小的 SGD 连续时间模型，即欠阻尼朗文方程。我们在线性回归的背景下研究这个方程，从而得出参数的相空间动态及其从初始化到静止的瞬时速度的精确解析表达式。利用福克-普朗克方程，我们证明了驱动这些动态变化的关键因素不是原始训练损失，而是修正损失（隐含正则化速度）与导致相空间振荡的概率电流的组合。我们在基于 ImageNet 训练的 ResNet-18 模型的动态过程中确定了这一理论的定性和定量预测。通过统计物理学的视角，我们发现了使用 SGD 训练的深度神经网络异常极限动力学的机理起源。了解 SGD 的极限动力学及其对批量大小、学习率和动量等各种重要超参数的依赖性，可以作为未来工作的基础，将这些见解转化为算法收益。

{"title":"The Limiting Dynamics of SGD: Modified Loss, Phase-Space Oscillations, and Anomalous Diffusion","authors":"Daniel Kunin;Javier Sagastuy-Brena;Lauren Gillespie;Eshed Margalit;Hidenori Tanaka;Surya Ganguli;Daniel L. K. Yamins","doi":"10.1162/neco_a_01626","DOIUrl":"10.1162/neco_a_01626","url":null,"abstract":"In this work, we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD). As observed previously, long after performance has converged, networks continue to move through parameter space by a process of anomalous diffusion in which distance traveled grows as a power law in the number of gradient updates with a nontrivial exponent. We reveal an intricate interaction among the hyperparameters of optimization, the structure in the gradient noise, and the Hessian matrix at the end of training that explains this anomalous diffusion. To build this understanding, we first derive a continuous-time model for SGD with finite learning rates and batch sizes as an underdamped Langevin equation. We study this equation in the setting of linear regression, where we can derive exact, analytic expressions for the phase-space dynamics of the parameters and their instantaneous velocities from initialization to stationarity. Using the Fokker-Planck equation, we show that the key ingredient driving these dynamics is not the original training loss but rather the combination of a modified loss, which implicitly regularizes the velocity, and probability currents that cause oscillations in phase space. We identify qualitative and quantitative predictions of this theory in the dynamics of a ResNet-18 model trained on ImageNet. Through the lens of statistical physics, we uncover a mechanistic origin for the anomalous limiting dynamics of deep neural networks trained with SGD. Understanding the limiting dynamics of SGD, and its dependence on various important hyperparameters like batch size, learning rate, and momentum, can serve as a basis for future work that can turn these insights into algorithmic gains.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 1","pages":"151-174"},"PeriodicalIF":2.9,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138489123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance Evaluation of Matrix Factorization for fMRI Data 针对 fMRI 数据的矩阵因式分解性能评估。

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2023-12-12 DOI: 10.1162/neco_a_01628

Yusuke Endo;Koujin Takeda

A hypothesis in the study of the brain is that sparse coding is realized in information representation of external stimuli, which has been experimentally confirmed for visual stimulus recently. However, unlike the specific functional region in the brain, sparse coding in information processing in the whole brain has not been clarified sufficiently. In this study, we investigate the validity of sparse coding in the whole human brain by applying various matrix factorization methods to functional magnetic resonance imaging data of neural activities in the brain. The result suggests the sparse coding hypothesis in information representation in the whole human brain, because extracted features from the sparse matrix factorization (MF) method, sparse principal component analysis (SparsePCA), or method of optimal directions (MOD) under a high sparsity setting or an approximate sparse MF method, fast independent component analysis (FastICA), can classify external visual stimuli more accurately than the nonsparse MF method or sparse MF method under a low sparsity setting.

大脑研究中的一个假设是，稀疏编码在外部刺激的信息表征中得以实现，这一假设最近在视觉刺激方面得到了实验证实。然而，与大脑中的特定功能区不同，稀疏编码在全脑信息处理中的作用尚未得到充分阐明。在本研究中，我们通过对大脑神经活动的功能磁共振成像数据应用各种矩阵因式分解方法，研究稀疏编码在整个人脑中的有效性。结果表明，稀疏矩阵因式分解法（MF）、稀疏主成分分析法（SparsePCA）、高稀疏性设置下的最优方向法（MOD）或近似稀疏MF法、快速独立成分分析法（FastICA）提取的特征比非稀疏MF法或低稀疏性设置下的稀疏MF法能更准确地对外部视觉刺激进行分类，因此稀疏编码假说在整个人脑的信息表征中得到了证实。

引用次数: 0

Cocaine Use Prediction With Tensor-Based Machine Learning on Multimodal MRI Connectome Data 利用基于张量的机器学习对多模态核磁共振成像连接组数据进行可卡因使用预测

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2023-12-12 DOI: 10.1162/neco_a_01623

Anru R. Zhang;Ryan P. Bell;Chen An;Runshi Tang;Shana A. Hall;Cliburn Chan;Kareem Al-Khalil;Christina S. Meade

This letter considers the use of machine learning algorithms for predicting cocaine use based on magnetic resonance imaging (MRI) connectomic data. The study used functional MRI (fMRI) and diffusion MRI (dMRI) data collected from 275 individuals, which was then parcellated into 246 regions of interest (ROIs) using the Brainnetome atlas. After data preprocessing, the data sets were transformed into tensor form. We developed a tensor-based unsupervised machine learning algorithm to reduce the size of the data tensor from 275 (individuals) × 2 (fMRI and dMRI) × 246 (ROIs) × 246 (ROIs) to 275 (individuals) × 2 (fMRI and dMRI) × 6 (clusters) × 6 (clusters). This was achieved by applying the high-order Lloyd algorithm to group the ROI data into six clusters. Features were extracted from the reduced tensor and combined with demographic features (age, gender, race, and HIV status). The resulting data set was used to train a Catboost model using subsampling and nested cross-validation techniques, which achieved a prediction accuracy of 0.857 for identifying cocaine users. The model was also compared with other models, and the feature importance of the model was presented. Overall, this study highlights the potential for using tensor-based machine learning algorithms to predict cocaine use based on MRI connectomic data and presents a promising approach for identifying individuals at risk of substance abuse.

这封信探讨了根据磁共振成像（MRI）连接组学数据使用机器学习算法预测可卡因使用情况的问题。研究使用了从 275 人身上收集的功能磁共振成像（fMRI）和弥散磁共振成像（dMRI）数据，然后使用 Brainnetome 图集将这些数据分割成 246 个感兴趣区域（ROI）。数据预处理后，数据集被转换成张量形式。我们开发了一种基于张量的无监督机器学习算法，将数据张量的大小从 275（个体）×2（fMRI 和 dMRI）×246（感兴趣区）×246（感兴趣区）缩小到 275（个体）×2（fMRI 和 dMRI）×6（簇）×6（簇）。这是通过应用高阶 Lloyd 算法将 ROI 数据分成六个簇来实现的。从缩小的张量中提取特征，并与人口特征（年龄、性别、种族和 HIV 感染状况）相结合。利用子采样和嵌套交叉验证技术对所得到的数据集进行 Catboost 模型训练，该模型在识别可卡因使用者方面的预测准确率达到了 0.857。该模型还与其他模型进行了比较，并介绍了该模型的重要特征。总之，这项研究强调了使用基于张量的机器学习算法根据核磁共振成像连接组学数据预测可卡因使用情况的潜力，并为识别有药物滥用风险的个体提供了一种很有前途的方法。

{"title":"Cocaine Use Prediction With Tensor-Based Machine Learning on Multimodal MRI Connectome Data","authors":"Anru R. Zhang;Ryan P. Bell;Chen An;Runshi Tang;Shana A. Hall;Cliburn Chan;Kareem Al-Khalil;Christina S. Meade","doi":"10.1162/neco_a_01623","DOIUrl":"10.1162/neco_a_01623","url":null,"abstract":"This letter considers the use of machine learning algorithms for predicting cocaine use based on magnetic resonance imaging (MRI) connectomic data. The study used functional MRI (fMRI) and diffusion MRI (dMRI) data collected from 275 individuals, which was then parcellated into 246 regions of interest (ROIs) using the Brainnetome atlas. After data preprocessing, the data sets were transformed into tensor form. We developed a tensor-based unsupervised machine learning algorithm to reduce the size of the data tensor from 275 (individuals) × 2 (fMRI and dMRI) × 246 (ROIs) × 246 (ROIs) to 275 (individuals) × 2 (fMRI and dMRI) × 6 (clusters) × 6 (clusters). This was achieved by applying the high-order Lloyd algorithm to group the ROI data into six clusters. Features were extracted from the reduced tensor and combined with demographic features (age, gender, race, and HIV status). The resulting data set was used to train a Catboost model using subsampling and nested cross-validation techniques, which achieved a prediction accuracy of 0.857 for identifying cocaine users. The model was also compared with other models, and the feature importance of the model was presented. Overall, this study highlights the potential for using tensor-based machine learning algorithms to predict cocaine use based on MRI connectomic data and presents a promising approach for identifying individuals at risk of substance abuse.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 1","pages":"107-127"},"PeriodicalIF":2.9,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138489119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synchronization and Clustering in Complex Quadratic Networks 复杂二次元网络中的同步与聚类。

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2023-12-12 DOI: 10.1162/neco_a_01624

Anca Rǎdulescu;Danae Evans;Amani-Dasia Augustin;Anthony Cooper;Johan Nakuci;Sarah Muldoon

Synchronization and clustering are well studied in the context of networks of oscillators, such as neuronal networks. However, this relationship is notoriously difficult to approach mathematically in natural, complex networks. Here, we aim to understand it in a canonical framework, using complex quadratic node dynamics, coupled in networks that we call complex quadratic networks (CQNs). We review previously defined extensions of the Mandelbrot and Julia sets for networks, focusing on the behavior of the node-wise projections of these sets and on describing the phenomena of node clustering and synchronization. One aspect of our work consists of exploring ties between a network's connectivity and its ensemble dynamics by identifying mechanisms that lead to clusters of nodes exhibiting identical or different Mandelbrot sets. Based on our preliminary analytical results (obtained primarily in two-dimensional networks), we propose that clustering is strongly determined by the network connectivity patterns, with the geometry of these clusters further controlled by the connection weights. Here, we first explore this relationship further, using examples of synthetic networks, increasing in size (from 3, to 5, to 20 nodes). We then illustrate the potential practical implications of synchronization in an existing set of whole brain, tractography-based networks obtained from 197 human subjects using diffusion tensor imaging. Understanding the similarities to how these concepts apply to CQNs contributes to our understanding of universal principles in dynamic networks and may help extend theoretical results to natural, complex systems.

在神经元网络等振荡器网络中，对同步和聚类进行了深入研究。然而，在自然、复杂的网络中，这种关系却很难用数学方法来处理。在这里，我们旨在通过一个典型框架，利用复杂二次节点动力学，在我们称之为复杂二次网络（CQNs）的网络中耦合来理解这种关系。我们回顾了之前为网络定义的曼德尔布罗特集和朱莉娅集的扩展，重点关注这些集的节点投影行为，以及节点集群和同步现象的描述。我们工作的一个方面是通过识别导致节点集群展示相同或不同曼德勃罗特集的机制，探索网络连通性与其集合动力学之间的联系。根据我们的初步分析结果（主要在二维网络中获得），我们提出聚类由网络连接模式强烈决定，而这些聚类的几何形状则进一步由连接权重控制。在此，我们首先利用规模不断增大（从 3 节点到 4 节点，再到 20 节点）的合成网络实例，进一步探讨这种关系。然后，我们将利用扩散张量成像技术，从 197 名人类受试者身上获得的一组基于牵引成像的全脑网络，来说明同步化的潜在实际意义。了解这些概念如何应用于 CQN 的相似性有助于我们理解动态网络中的普遍原则，并有助于将理论结果扩展到自然复杂系统中。

{"title":"Synchronization and Clustering in Complex Quadratic Networks","authors":"Anca Rǎdulescu;Danae Evans;Amani-Dasia Augustin;Anthony Cooper;Johan Nakuci;Sarah Muldoon","doi":"10.1162/neco_a_01624","DOIUrl":"10.1162/neco_a_01624","url":null,"abstract":"Synchronization and clustering are well studied in the context of networks of oscillators, such as neuronal networks. However, this relationship is notoriously difficult to approach mathematically in natural, complex networks. Here, we aim to understand it in a canonical framework, using complex quadratic node dynamics, coupled in networks that we call complex quadratic networks (CQNs). We review previously defined extensions of the Mandelbrot and Julia sets for networks, focusing on the behavior of the node-wise projections of these sets and on describing the phenomena of node clustering and synchronization. One aspect of our work consists of exploring ties between a network's connectivity and its ensemble dynamics by identifying mechanisms that lead to clusters of nodes exhibiting identical or different Mandelbrot sets. Based on our preliminary analytical results (obtained primarily in two-dimensional networks), we propose that clustering is strongly determined by the network connectivity patterns, with the geometry of these clusters further controlled by the connection weights. Here, we first explore this relationship further, using examples of synthetic networks, increasing in size (from 3, to 5, to 20 nodes). We then illustrate the potential practical implications of synchronization in an existing set of whole brain, tractography-based networks obtained from 197 human subjects using diffusion tensor imaging. Understanding the similarities to how these concepts apply to CQNs contributes to our understanding of universal principles in dynamic networks and may help extend theoretical results to natural, complex systems.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 1","pages":"75-106"},"PeriodicalIF":2.9,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138489122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Active Predictive Coding: A Unifying Neural Model for Active Perception, Compositional Learning, and Hierarchical Planning 主动预测编码：主动感知、组合学习和分层规划的统一神经模型

IF 2.9 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation

Pub Date : 2023-12-12 DOI: 10.1162/neco_a_01627

Rajesh P. N. Rao;Dimitrios C. Gklezakos;Vishwas Sathish

There is growing interest in predictive coding as a model of how the brain learns through predictions and prediction errors. Predictive coding models have traditionally focused on sensory coding and perception. Here we introduce active predictive coding (APC) as a unifying model for perception, action, and cognition. The APC model addresses important open problems in cognitive science and AI, including (1) how we learn compositional representations (e.g., part-whole hierarchies for equivariant vision) and (2) how we solve large-scale planning problems, which are hard for traditional reinforcement learning, by composing complex state dynamics and abstract actions from simpler dynamics and primitive actions. By using hypernetworks, self-supervised learning, and reinforcement learning, APC learns hierarchical world models by combining task-invariant state transition networks and task-dependent policy networks at multiple abstraction levels. We illustrate the applicability of the APC model to active visual perception and hierarchical planning. Our results represent, to our knowledge, the first proof-of-concept demonstration of a unified approach to addressing the part-whole learning problem in vision, the nested reference frames learning problem in cognition, and the integrated state-action hierarchy learning problem in reinforcement learning.

预测编码作为大脑如何通过预测和预测错误进行学习的模型，越来越受到人们的关注。预测编码模型历来侧重于感官编码和感知。在这里，我们将主动预测编码（APC）作为感知、行动和认知的统一模型加以介绍。主动预测编码模型解决了认知科学和人工智能领域的重要开放性问题，包括：（1）我们如何学习组合表征（如等变视觉的部分-整体层次结构）；（2）我们如何通过将复杂的状态动态和抽象动作与较简单的动态和原始动作组合起来，解决传统强化学习难以解决的大规模规划问题。通过使用超网络、自监督学习和强化学习，APC 在多个抽象层级上结合任务不变状态转换网络和任务相关策略网络，学习分层世界模型。我们说明了 APC 模型在主动视觉感知和分层规划方面的适用性。据我们所知，我们的研究结果代表了首次概念验证，展示了一种统一的方法来解决视觉中的部分-整体学习问题、认知中的嵌套参照系学习问题以及强化学习中的综合状态-行动层次学习问题。

{"title":"Active Predictive Coding: A Unifying Neural Model for Active Perception, Compositional Learning, and Hierarchical Planning","authors":"Rajesh P. N. Rao;Dimitrios C. Gklezakos;Vishwas Sathish","doi":"10.1162/neco_a_01627","DOIUrl":"10.1162/neco_a_01627","url":null,"abstract":"There is growing interest in predictive coding as a model of how the brain learns through predictions and prediction errors. Predictive coding models have traditionally focused on sensory coding and perception. Here we introduce active predictive coding (APC) as a unifying model for perception, action, and cognition. The APC model addresses important open problems in cognitive science and AI, including (1) how we learn compositional representations (e.g., part-whole hierarchies for equivariant vision) and (2) how we solve large-scale planning problems, which are hard for traditional reinforcement learning, by composing complex state dynamics and abstract actions from simpler dynamics and primitive actions. By using hypernetworks, self-supervised learning, and reinforcement learning, APC learns hierarchical world models by combining task-invariant state transition networks and task-dependent policy networks at multiple abstraction levels. We illustrate the applicability of the APC model to active visual perception and hierarchical planning. Our results represent, to our knowledge, the first proof-of-concept demonstration of a unified approach to addressing the part-whole learning problem in vision, the nested reference frames learning problem in cognition, and the integrated state-action hierarchy learning problem in reinforcement learning.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 1","pages":"1-32"},"PeriodicalIF":2.9,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138489040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0