首页 > 最新文献

Neural Computation最新文献

英文 中文
Toward a Free-Response Paradigm of Decision Making in Spiking Neural Networks 基于脉冲神经网络的自由反应决策模式研究。
IF 2.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-14 DOI: 10.1162/neco_a_01733
Zhichao Zhu;Yang Qi;Wenlian Lu;Zhigang Wang;Lu Cao;Jianfeng Feng
Spiking neural networks (SNNs) have attracted significant interest in the development of brain-inspired computing systems due to their energy efficiency and similarities to biological information processing. In contrast to continuous-valued artificial neural networks, which produce results in a single step, SNNs require multiple steps during inference to achieve a desired accuracy level, resulting in a burden in real-time response and energy efficiency. Inspired by the tradeoff between speed and accuracy in human and animal decision-making processes, which exhibit correlations among reaction times, task complexity, and decision confidence, an inquiry emerges regarding how an SNN model can benefit by implementing these attributes. Here, we introduce a theory of decision making in SNNs by untangling the interplay between signal and noise. Under this theory, we introduce a new learning objective that trains an SNN not only to make the correct decisions but also to shape its confidence. Numerical experiments demonstrate that SNNs trained in this way exhibit improved confidence expression, reduced trial-to-trial variability, and shorter latency to reach the desired accuracy. We then introduce a stopping policy that can stop inference in a way that further enhances the time efficiency of SNNs. The stopping time can serve as an indicator to whether a decision is correct, akin to the reaction time in animal behavior experiments. By integrating stochasticity into decision making, this study opens up new possibilities to explore the capabilities of SNNs and advance SNNs and their applications in complex decision-making scenarios where model performance is limited.
脉冲神经网络(snn)由于其能量效率和与生物信息处理的相似性,在脑启发计算系统的发展中引起了极大的兴趣。连续值人工神经网络只需要一步就能得到结果,而snn在推理过程中需要多个步骤才能达到理想的精度水平,这给实时响应和能源效率带来了负担。受人类和动物决策过程中速度和准确性之间的权衡(反应时间、任务复杂性和决策置信度之间存在相关性)的启发,人们开始研究SNN模型如何通过实现这些属性而受益。在这里,我们通过解开信号和噪声之间的相互作用,介绍了一种snn决策理论。在这个理论下,我们引入了一个新的学习目标,训练SNN不仅做出正确的决策,而且塑造它的信心。数值实验表明,以这种方式训练的snn表现出更好的置信度表达,减少了试验间的可变性,并缩短了达到所需精度的延迟。然后,我们引入了一个停止策略,该策略可以以进一步提高snn时间效率的方式停止推理。停止时间可以作为一个决定是否正确的指标,类似于动物行为实验中的反应时间。通过将随机性整合到决策中,本研究为探索snn的能力开辟了新的可能性,并推进了snn及其在模型性能有限的复杂决策场景中的应用。
{"title":"Toward a Free-Response Paradigm of Decision Making in Spiking Neural Networks","authors":"Zhichao Zhu;Yang Qi;Wenlian Lu;Zhigang Wang;Lu Cao;Jianfeng Feng","doi":"10.1162/neco_a_01733","DOIUrl":"10.1162/neco_a_01733","url":null,"abstract":"Spiking neural networks (SNNs) have attracted significant interest in the development of brain-inspired computing systems due to their energy efficiency and similarities to biological information processing. In contrast to continuous-valued artificial neural networks, which produce results in a single step, SNNs require multiple steps during inference to achieve a desired accuracy level, resulting in a burden in real-time response and energy efficiency. Inspired by the tradeoff between speed and accuracy in human and animal decision-making processes, which exhibit correlations among reaction times, task complexity, and decision confidence, an inquiry emerges regarding how an SNN model can benefit by implementing these attributes. Here, we introduce a theory of decision making in SNNs by untangling the interplay between signal and noise. Under this theory, we introduce a new learning objective that trains an SNN not only to make the correct decisions but also to shape its confidence. Numerical experiments demonstrate that SNNs trained in this way exhibit improved confidence expression, reduced trial-to-trial variability, and shorter latency to reach the desired accuracy. We then introduce a stopping policy that can stop inference in a way that further enhances the time efficiency of SNNs. The stopping time can serve as an indicator to whether a decision is correct, akin to the reaction time in animal behavior experiments. By integrating stochasticity into decision making, this study opens up new possibilities to explore the capabilities of SNNs and advance SNNs and their applications in complex decision-making scenarios where model performance is limited.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 3","pages":"481-521"},"PeriodicalIF":2.7,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908351","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142958972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Recall in Sparse Associative Memories That Use Neurogenesis 利用神经发生提高稀疏联想记忆的回忆准确性。
IF 2.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-02-14 DOI: 10.1162/neco_a_01732
Katy Warr;Jonathon Hare;David Thomas
The creation of future low-power neuromorphic solutions requires specialist spiking neural network (SNN) algorithms that are optimized for neuromorphic settings. One such algorithmic challenge is the ability to recall learned patterns from their noisy variants. Solutions to this problem may be required to memorize vast numbers of patterns based on limited training data and subsequently recall the patterns in the presence of noise. To solve this problem, previous work has explored sparse associative memory (SAM)—associative memory neural models that exploit the principle of sparse neural coding observed in the brain. Research into a subcategory of SAM has been inspired by the biological process of adult neurogenesis, whereby new neurons are generated to facilitate adaptive and effective lifelong learning. Although these neurogenesis models have been demonstrated in previous research, they have limitations in terms of recall memory capacity and robustness to noise. In this article, we provide a unifying framework for characterizing a type of SAM network that has been pretrained using a learning strategy that incorporated a simple neurogenesis model. Using this characterization, we formally define network topology and threshold optimization methods to empirically demonstrate greater than 104 times improvement in memory capacity compared to previous work. We show that these optimizations can facilitate the development of networks that have reduced interneuron connectivity while maintaining high recall efficacy. This paves the way for ongoing research into fast, effective, low-power realizations of associative memory on neuromorphic platforms.
未来的低功耗神经形态解决方案需要专门针对神经形态设置进行优化的峰值神经网络(SNN)算法。其中一个算法挑战就是从嘈杂的变体中回忆已学习模式的能力。这个问题的解决方案可能需要基于有限的训练数据记忆大量的模式,然后在存在噪声的情况下回忆模式。为了解决这个问题,以前的工作已经探索了稀疏联想记忆(SAM)-利用在大脑中观察到的稀疏神经编码原理的联想记忆神经模型。对SAM的一个子类的研究受到成人神经发生的生物学过程的启发,在这个过程中,新的神经元产生以促进适应性和有效的终身学习。虽然这些神经发生模型已经在以前的研究中得到证实,但它们在回忆记忆能力和对噪声的鲁棒性方面存在局限性。在这封信中,我们提供了一个统一的框架来描述一种SAM网络,该网络使用一种包含简单神经发生模型的学习策略进行预训练。使用这种特性,我们正式定义了网络拓扑和阈值优化方法,以经验证明与以前的工作相比,内存容量提高了10$^{{4}}$。我们表明,这些优化可以促进网络的发展,减少神经元之间的连接,同时保持高回忆效率。这为在神经形态平台上快速、有效、低功耗地实现联想记忆铺平了道路。
{"title":"Improving Recall in Sparse Associative Memories That Use Neurogenesis","authors":"Katy Warr;Jonathon Hare;David Thomas","doi":"10.1162/neco_a_01732","DOIUrl":"10.1162/neco_a_01732","url":null,"abstract":"The creation of future low-power neuromorphic solutions requires specialist spiking neural network (SNN) algorithms that are optimized for neuromorphic settings. One such algorithmic challenge is the ability to recall learned patterns from their noisy variants. Solutions to this problem may be required to memorize vast numbers of patterns based on limited training data and subsequently recall the patterns in the presence of noise. To solve this problem, previous work has explored sparse associative memory (SAM)—associative memory neural models that exploit the principle of sparse neural coding observed in the brain. Research into a subcategory of SAM has been inspired by the biological process of adult neurogenesis, whereby new neurons are generated to facilitate adaptive and effective lifelong learning. Although these neurogenesis models have been demonstrated in previous research, they have limitations in terms of recall memory capacity and robustness to noise. In this article, we provide a unifying framework for characterizing a type of SAM network that has been pretrained using a learning strategy that incorporated a simple neurogenesis model. Using this characterization, we formally define network topology and threshold optimization methods to empirically demonstrate greater than 104 times improvement in memory capacity compared to previous work. We show that these optimizations can facilitate the development of networks that have reduced interneuron connectivity while maintaining high recall efficacy. This paves the way for ongoing research into fast, effective, low-power realizations of associative memory on neuromorphic platforms.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 3","pages":"437-480"},"PeriodicalIF":2.7,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142958803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of the Multi-Armed Bandit 多臂强盗实值组合纯探索的快速算法。
IF 2.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-21 DOI: 10.1162/neco_a_01728
Shintaro Nakamura;Masashi Sugiyama
We study the real-valued combinatorial pure exploration problem in the stochastic multi-armed bandit (R-CPE-MAB). We study the case where the size of the action set is polynomial with respect to the number of arms. In such a case, the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. We introduce the combinatorial gap-based exploration (CombGapE) algorithm, whose sample complexity upper-bound-matches the lower bound up to a problem-dependent constant factor. We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world data sets.
研究随机多臂土匪(R-CPE-MAB)中的实值组合纯勘探问题。我们研究了动作集的大小是关于臂数的多项式的情况。在这种情况下,R-CPE-MAB可以被视为所谓的转导线性强盗的特殊情况。提出了一种基于组合间隙的探索算法(CombGapE),该算法的样本复杂度上界与下界匹配到一个与问题相关的常数因子。数值结果表明,在合成数据集和真实数据集中,CombGapE算法都明显优于现有方法。
{"title":"A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of the Multi-Armed Bandit","authors":"Shintaro Nakamura;Masashi Sugiyama","doi":"10.1162/neco_a_01728","DOIUrl":"10.1162/neco_a_01728","url":null,"abstract":"We study the real-valued combinatorial pure exploration problem in the stochastic multi-armed bandit (R-CPE-MAB). We study the case where the size of the action set is polynomial with respect to the number of arms. In such a case, the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. We introduce the combinatorial gap-based exploration (CombGapE) algorithm, whose sample complexity upper-bound-matches the lower bound up to a problem-dependent constant factor. We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world data sets.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 2","pages":"294-310"},"PeriodicalIF":2.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142774828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Compressive Power of Autoencoders With Linear and ReLU Activation Functions 具有线性和ReLU激活函数的自编码器的压缩能力。
IF 2.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-21 DOI: 10.1162/neco_a_01729
Liangjie Sun;Chenyao Wu;Wai-Ki Ching;Tatsuya Akutsu
In this article, we mainly study the depth and width of autoencoders consisting of rectified linear unit (ReLU) activation functions. An autoencoder is a layered neural network consisting of an encoder, which compresses an input vector to a lower-dimensional vector, and a decoder, which transforms the low-dimensional vector back to the original input vector exactly (or approximately). In a previous study, Melkman et al. (2023) studied the depth and width of autoencoders using linear threshold activation functions with binary input and output vectors. We show that similar theoretical results hold if autoencoders using ReLU activation functions with real input and output vectors are used. Furthermore, we show that it is possible to compress input vectors to one-dimensional vectors using ReLU activation functions, although the size of compressed vectors is trivially Ω(log n) for autoencoders with linear threshold activation functions, where n is the number of input vectors. We also study the cases of linear activation functions. The results suggest that the compressive power of autoencoders using linear activation functions is considerably limited compared with those using ReLU activation functions.
本文主要研究了由整流线性单元(ReLU)激活函数组成的自编码器的深度和宽度。自编码器是一个分层神经网络,由编码器和解码器组成,编码器将输入向量压缩为低维向量,解码器将低维向量精确(或近似)转换回原始输入向量。在先前的研究中,Melkman等人(2023)使用具有二进制输入和输出向量的线性阈值激活函数研究了自编码器的深度和宽度。如果使用具有真实输入和输出向量的ReLU激活函数的自编码器,我们证明了类似的理论结果。此外,我们表明可以使用ReLU激活函数将输入向量压缩为一维向量,尽管对于具有线性阈值激活函数的自编码器,压缩向量的大小是微不足道的Ω(log n),其中n是输入向量的数量。我们还研究了线性激活函数的情况。结果表明,与使用ReLU激活函数的自编码器相比,使用线性激活函数的自编码器的压缩能力明显有限。
{"title":"On the Compressive Power of Autoencoders With Linear and ReLU Activation Functions","authors":"Liangjie Sun;Chenyao Wu;Wai-Ki Ching;Tatsuya Akutsu","doi":"10.1162/neco_a_01729","DOIUrl":"10.1162/neco_a_01729","url":null,"abstract":"In this article, we mainly study the depth and width of autoencoders consisting of rectified linear unit (ReLU) activation functions. An autoencoder is a layered neural network consisting of an encoder, which compresses an input vector to a lower-dimensional vector, and a decoder, which transforms the low-dimensional vector back to the original input vector exactly (or approximately). In a previous study, Melkman et al. (2023) studied the depth and width of autoencoders using linear threshold activation functions with binary input and output vectors. We show that similar theoretical results hold if autoencoders using ReLU activation functions with real input and output vectors are used. Furthermore, we show that it is possible to compress input vectors to one-dimensional vectors using ReLU activation functions, although the size of compressed vectors is trivially Ω(log n) for autoencoders with linear threshold activation functions, where n is the number of input vectors. We also study the cases of linear activation functions. The results suggest that the compressive power of autoencoders using linear activation functions is considerably limited compared with those using ReLU activation functions.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 2","pages":"235-259"},"PeriodicalIF":2.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142774849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalization Analysis of Transformers in Distribution Regression 配电回归中变压器的归纳分析。
IF 2.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-21 DOI: 10.1162/neco_a_01726
Peilin Liu;Ding-Xuan Zhou
In recent years, models based on the transformer architecture have seen widespread applications and have become one of the core tools in the field of deep learning. Numerous successful and efficient techniques, such as parameter-efficient fine-tuning and efficient scaling, have been proposed surrounding their applications to further enhance performance. However, the success of these strategies has always lacked the support of rigorous mathematical theory. To study the underlying mechanisms behind transformers and related techniques, we first propose a transformer learning framework motivated by distribution regression, with distributions being inputs, connect a two-stage sampling process with natural language processing, and present a mathematical formulation of the attention mechanism called attention operator. We demonstrate that by the attention operator, transformers can compress distributions into function representations without loss of information. Moreover, with the advantages of our novel attention operator, transformers exhibit a stronger capability to learn functionals with more complex structures than convolutional neural networks and fully connected networks. Finally, we obtain a generalization bound within the distribution regression framework. Throughout theoretical results, we further discuss some successful techniques emerging with large language models (LLMs), such as prompt tuning, parameter-efficient fine-tuning, and efficient scaling. We also provide theoretical insights behind these techniques within our novel analysis framework.
近年来,基于变压器架构的模型得到了广泛应用,并已成为深度学习领域的核心工具之一。为了进一步提高性能,人们围绕其应用提出了许多成功而高效的技术,如参数高效微调和高效缩放。然而,这些策略的成功始终缺乏严谨数学理论的支持。为了研究变换器和相关技术背后的内在机制,我们首先提出了一个以分布回归为动机的变换器学习框架,以分布为输入,将两阶段采样过程与自然语言处理联系起来,并提出了一种名为注意力算子的注意力机制的数学表述。我们证明,通过注意力算子,变换器可以在不损失信息的情况下将分布压缩为函数表示。此外,与卷积神经网络和全连接网络相比,利用我们新颖的注意力算子的优势,变换器在学习结构更复杂的函数方面表现出更强的能力。最后,我们获得了分布回归框架内的泛化约束。通过理论结果,我们进一步讨论了大型语言模型(LLM)中出现的一些成功技术,如及时调整、参数高效微调和高效缩放。我们还在新颖的分析框架内提供了这些技术背后的理论见解。
{"title":"Generalization Analysis of Transformers in Distribution Regression","authors":"Peilin Liu;Ding-Xuan Zhou","doi":"10.1162/neco_a_01726","DOIUrl":"10.1162/neco_a_01726","url":null,"abstract":"In recent years, models based on the transformer architecture have seen widespread applications and have become one of the core tools in the field of deep learning. Numerous successful and efficient techniques, such as parameter-efficient fine-tuning and efficient scaling, have been proposed surrounding their applications to further enhance performance. However, the success of these strategies has always lacked the support of rigorous mathematical theory. To study the underlying mechanisms behind transformers and related techniques, we first propose a transformer learning framework motivated by distribution regression, with distributions being inputs, connect a two-stage sampling process with natural language processing, and present a mathematical formulation of the attention mechanism called attention operator. We demonstrate that by the attention operator, transformers can compress distributions into function representations without loss of information. Moreover, with the advantages of our novel attention operator, transformers exhibit a stronger capability to learn functionals with more complex structures than convolutional neural networks and fully connected networks. Finally, we obtain a generalization bound within the distribution regression framework. Throughout theoretical results, we further discuss some successful techniques emerging with large language models (LLMs), such as prompt tuning, parameter-efficient fine-tuning, and efficient scaling. We also provide theoretical insights behind these techniques within our novel analysis framework.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 2","pages":"260-293"},"PeriodicalIF":2.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142669939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning in Associative Networks Through Pavlovian Dynamics 巴甫洛夫动力学在联想网络中的学习。
IF 2.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-21 DOI: 10.1162/neco_a_01730
Daniele Lotito;Miriam Aquaro;Chiara Marullo
Hebbian learning theory is rooted in Pavlov’s classical conditioning While mathematical models of the former have been proposed and studied in the past decades, especially in spin glass theory, only recently has it been numerically shown that it is possible to write neural and synaptic dynamics that mirror Pavlov conditioning mechanisms and also give rise to synaptic weights that correspond to the Hebbian learning rule. In this article we show that the same dynamics can be derived with equilibrium statistical mechanics tools and basic and motivated modeling assumptions. Then we show how to study the resulting system of coupled stochastic differential equations assuming the reasonable separation of neural and synaptic timescale. In particular, we analytically demonstrate that this synaptic evolution converges to the Hebbian learning rule in various settings and compute the variance of the stochastic process. Finally, drawing from evidence on pure memory reinforcement during sleep stages, we show how the proposed model can simulate neural networks that undergo sleep-associated memory consolidation processes, thereby proving the compatibility of Pavlovian learning with dreaming mechanisms.
Hebbian学习理论源于巴甫洛夫的经典条件作用,虽然前者的数学模型在过去的几十年里已经被提出和研究,特别是在自旋玻璃理论中,但直到最近才有数字表明,有可能写出反映巴甫洛夫条件作用机制的神经和突触动力学,并产生与Hebbian学习规则相对应的突触权重。在这封信中,我们表明,同样的动力学可以导出与平衡统计力学工具和基本的和激励的建模假设。然后,我们展示了如何研究耦合随机微分方程的结果系统,假设神经和突触时间尺度的合理分离。特别是,我们分析证明了这种突触进化在各种设置下收敛于Hebbian学习规则,并计算了随机过程的方差。最后,从睡眠阶段纯记忆强化的证据中,我们展示了所提出的模型如何模拟经历睡眠相关记忆巩固过程的神经网络,从而证明了巴甫洛夫学习与做梦机制的兼容性。
{"title":"Learning in Associative Networks Through Pavlovian Dynamics","authors":"Daniele Lotito;Miriam Aquaro;Chiara Marullo","doi":"10.1162/neco_a_01730","DOIUrl":"10.1162/neco_a_01730","url":null,"abstract":"Hebbian learning theory is rooted in Pavlov’s classical conditioning While mathematical models of the former have been proposed and studied in the past decades, especially in spin glass theory, only recently has it been numerically shown that it is possible to write neural and synaptic dynamics that mirror Pavlov conditioning mechanisms and also give rise to synaptic weights that correspond to the Hebbian learning rule. In this article we show that the same dynamics can be derived with equilibrium statistical mechanics tools and basic and motivated modeling assumptions. Then we show how to study the resulting system of coupled stochastic differential equations assuming the reasonable separation of neural and synaptic timescale. In particular, we analytically demonstrate that this synaptic evolution converges to the Hebbian learning rule in various settings and compute the variance of the stochastic process. Finally, drawing from evidence on pure memory reinforcement during sleep stages, we show how the proposed model can simulate neural networks that undergo sleep-associated memory consolidation processes, thereby proving the compatibility of Pavlovian learning with dreaming mechanisms.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 2","pages":"311-343"},"PeriodicalIF":2.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142774845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalization Guarantees of Gradient Descent for Shallow Neural Networks 浅层神经网络梯度下降的泛化保证
IF 2.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-21 DOI: 10.1162/neco_a_01725
Puyu Wang;Yunwen Lei;Di Wang;Yiming Ying;Ding-Xuan Zhou
Significant progress has been made recently in understanding the generalization of neural networks (NNs) trained by gradient descent (GD) using the algorithmic stability approach. However, most of the existing research has focused on one-hidden-layer NNs and has not addressed the impact of different network scaling. Here, network scaling corresponds to the normalization of the layers. In this article, we greatly extend the previous work (Lei et al., 2022; Richards & Kuzborskij, 2021) by conducting a comprehensive stability and generalization analysis of GD for two-layer and three-layer NNs. For two-layer NNs, our results are established under general network scaling, relaxing previous conditions. In the case of three-layer NNs, our technical contribution lies in demonstrating its nearly co-coercive property by utilizing a novel induction strategy that thoroughly explores the effects of overparameterization. As a direct application of our general findings, we derive the excess risk rate of O(1/n) for GD in both two-layer and three-layer NNs. This sheds light on sufficient or necessary conditions for underparameterized and overparameterized NNs trained by GD to attain the desired risk rate of O(1/n). Moreover, we demonstrate that as the scaling factor increases or the network complexity decreases, less overparameterization is required for GD to achieve the desired error rates. Additionally, under a low-noise condition, we obtain a fast risk rate of O(1/n) for GD in both two-layer and three-layer NNs.
近来,在利用算法稳定性方法理解通过梯度下降(GD)训练的神经网络(NN)的泛化方面取得了重大进展。然而,现有研究大多集中于单隐层神经网络,并未涉及不同网络规模的影响。在这里,网络缩放相当于层的规范化。在本文中,我们大大扩展了之前的工作(Lei 等人,2022;Richards & Kuzborskij,2021),对两层和三层 NN 的 GD 进行了全面的稳定性和泛化分析。对于两层 NN,我们的结果是在一般网络缩放条件下建立的,放宽了之前的条件。对于三层网络,我们的技术贡献在于利用一种新颖的归纳策略,彻底探讨了过参数化的影响,从而证明了其近乎协迫的特性。作为我们一般发现的直接应用,我们得出了两层和三层网络中 GD 的超额风险率为 O(1/n)。这揭示了通过 GD 训练的欠参数化和过参数化 NN 达到 O(1/n) 期望风险率的充分或必要条件。此外,我们还证明,随着缩放因子的增加或网络复杂度的降低,GD 所需的过参数化程度也会降低,从而达到所需的错误率。此外,在低噪声条件下,我们在两层和三层 NN 中都获得了 O(1/n)的快速风险率。
{"title":"Generalization Guarantees of Gradient Descent for Shallow Neural Networks","authors":"Puyu Wang;Yunwen Lei;Di Wang;Yiming Ying;Ding-Xuan Zhou","doi":"10.1162/neco_a_01725","DOIUrl":"10.1162/neco_a_01725","url":null,"abstract":"Significant progress has been made recently in understanding the generalization of neural networks (NNs) trained by gradient descent (GD) using the algorithmic stability approach. However, most of the existing research has focused on one-hidden-layer NNs and has not addressed the impact of different network scaling. Here, network scaling corresponds to the normalization of the layers. In this article, we greatly extend the previous work (Lei et al., 2022; Richards & Kuzborskij, 2021) by conducting a comprehensive stability and generalization analysis of GD for two-layer and three-layer NNs. For two-layer NNs, our results are established under general network scaling, relaxing previous conditions. In the case of three-layer NNs, our technical contribution lies in demonstrating its nearly co-coercive property by utilizing a novel induction strategy that thoroughly explores the effects of overparameterization. As a direct application of our general findings, we derive the excess risk rate of O(1/n) for GD in both two-layer and three-layer NNs. This sheds light on sufficient or necessary conditions for underparameterized and overparameterized NNs trained by GD to attain the desired risk rate of O(1/n). Moreover, we demonstrate that as the scaling factor increases or the network complexity decreases, less overparameterization is required for GD to achieve the desired error rates. Additionally, under a low-noise condition, we obtain a fast risk rate of O(1/n) for GD in both two-layer and three-layer NNs.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 2","pages":"344-402"},"PeriodicalIF":2.7,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142666383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bounded Rational Decision Networks With Belief Propagation 带信念传播的有界理性决策网络。
IF 2.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-12 DOI: 10.1162/neco_a_01719
Gerrit Schmid;Sebastian Gottwald;Daniel A. Braun
Complex information processing systems that are capable of a wide variety of tasks, such as the human brain, are composed of specialized units that collaborate and communicate with each other. An important property of such information processing networks is locality: there is no single global unit controlling the modules, but information is exchanged locally. Here, we consider a decision-theoretic approach to study networks of bounded rational decision makers that are allowed to specialize and communicate with each other. In contrast to previous work that has focused on feedforward communication between decision-making agents, we consider cyclical information processing paths allowing for back-and-forth communication. We adapt message-passing algorithms to suit this purpose, essentially allowing for local information flow between units and thus enabling circular dependency structures. We provide examples that show how repeated communication can increase performance given that each unit’s information processing capability is limited and that decision-making systems with too few or too many connections and feedback loops achieve suboptimal utility.
能够执行各种任务的复杂信息处理系统(如人脑)是由相互协作和通信的专门单元组成的。这类信息处理网络的一个重要特性是局部性:没有一个控制模块的全局单元,而是在局部交换信息。在这里,我们考虑用决策理论的方法来研究由有界理性决策者组成的网络,这些决策者可以进行专业化分工并相互交流。以往的研究主要关注决策制定者之间的前馈通信,与此不同的是,我们考虑的是允许前后通信的循环信息处理路径。我们调整了信息传递算法以适应这一目的,从根本上允许单元之间的局部信息流,从而实现循环依赖结构。我们举例说明了在每个单元的信息处理能力有限的情况下,重复通信如何提高性能,以及连接和反馈回路过少或过多的决策系统如何实现次优效用。
{"title":"Bounded Rational Decision Networks With Belief Propagation","authors":"Gerrit Schmid;Sebastian Gottwald;Daniel A. Braun","doi":"10.1162/neco_a_01719","DOIUrl":"10.1162/neco_a_01719","url":null,"abstract":"Complex information processing systems that are capable of a wide variety of tasks, such as the human brain, are composed of specialized units that collaborate and communicate with each other. An important property of such information processing networks is locality: there is no single global unit controlling the modules, but information is exchanged locally. Here, we consider a decision-theoretic approach to study networks of bounded rational decision makers that are allowed to specialize and communicate with each other. In contrast to previous work that has focused on feedforward communication between decision-making agents, we consider cyclical information processing paths allowing for back-and-forth communication. We adapt message-passing algorithms to suit this purpose, essentially allowing for local information flow between units and thus enabling circular dependency structures. We provide examples that show how repeated communication can increase performance given that each unit’s information processing capability is limited and that decision-making systems with too few or too many connections and feedback loops achieve suboptimal utility.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 1","pages":"76-127"},"PeriodicalIF":2.7,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10810330","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computation With Sequences of Assemblies in a Model of the Brain 用大脑模型中的集合序列进行计算
IF 2.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-12 DOI: 10.1162/neco_a_01720
Max Dabagia;Christos H. Papadimitriou;Santosh S. Vempala
Even as machine learning exceeds human-level performance on many applications, the generality, robustness, and rapidity of the brain’s learning capabilities remain unmatched. How cognition arises from neural activity is the central open question in neuroscience, inextricable from the study of intelligence itself. A simple formal model of neural activity was proposed in Papadimitriou et al. (2020) and has been subsequently shown, through both mathematical proofs and simulations, to be capable of implementing certain simple cognitive operations via the creation and manipulation of assemblies of neurons. However, many intelligent behaviors rely on the ability to recognize, store, and manipulate temporal sequences of stimuli (planning, language, navigation, to list a few). Here we show that in the same model, sequential precedence can be captured naturally through synaptic weights and plasticity, and, as a result, a range of computations on sequences of assemblies can be carried out. In particular, repeated presentation of a sequence of stimuli leads to the memorization of the sequence through corresponding neural assemblies: upon future presentation of any stimulus in the sequence, the corresponding assembly and its subsequent ones will be activated, one after the other, until the end of the sequence. If the stimulus sequence is presented to two brain areas simultaneously, a scaffolded representation is created, resulting in more efficient memorization and recall, in agreement with cognitive experiments. Finally, we show that any finite state machine can be learned in a similar way, through the presentation of appropriate patterns of sequences. Through an extension of this mechanism, the model can be shown to be capable of universal computation. Taken together, these results provide a concrete hypothesis for the basis of the brain’s remarkable abilities to compute and learn, with sequences playing a vital role.
即使机器学习在许多应用领域的表现已超过人类水平,大脑学习能力的通用性、鲁棒性和快速性仍然无与伦比。认知是如何从神经活动中产生的,这是神经科学的核心未决问题,与智能本身的研究密不可分。帕帕季米特里乌(Papadimitriou)等人(2020 年)提出了一个简单的神经活动形式模型,随后通过数学证明和模拟证明,该模型能够通过创建和操纵神经元集合实现某些简单的认知操作。然而,许多智能行为都依赖于识别、存储和操纵刺激的时间序列的能力(如规划、语言、导航等)。在这里,我们展示了在同一个模型中,可以通过突触权重和可塑性自然地捕捉顺序优先性,从而可以对集合序列进行一系列计算。特别是,重复呈现刺激序列会导致通过相应的神经集合记忆序列:当序列中的任何刺激在未来呈现时,相应的神经集合及其后续的神经集合都会被激活,一个接一个,直到序列结束。如果刺激序列同时呈现在两个脑区,就会形成一个支架式表征,从而提高记忆和回忆的效率,这与认知实验的结果是一致的。最后,我们证明,通过呈现适当的序列模式,任何有限状态机都能以类似的方式被学习。通过对这一机制的扩展,可以证明该模型能够进行通用计算。综上所述,这些结果为大脑非凡的计算和学习能力的基础提供了一个具体的假设,而序列在其中扮演着至关重要的角色。
{"title":"Computation With Sequences of Assemblies in a Model of the Brain","authors":"Max Dabagia;Christos H. Papadimitriou;Santosh S. Vempala","doi":"10.1162/neco_a_01720","DOIUrl":"10.1162/neco_a_01720","url":null,"abstract":"Even as machine learning exceeds human-level performance on many applications, the generality, robustness, and rapidity of the brain’s learning capabilities remain unmatched. How cognition arises from neural activity is the central open question in neuroscience, inextricable from the study of intelligence itself. A simple formal model of neural activity was proposed in Papadimitriou et al. (2020) and has been subsequently shown, through both mathematical proofs and simulations, to be capable of implementing certain simple cognitive operations via the creation and manipulation of assemblies of neurons. However, many intelligent behaviors rely on the ability to recognize, store, and manipulate temporal sequences of stimuli (planning, language, navigation, to list a few). Here we show that in the same model, sequential precedence can be captured naturally through synaptic weights and plasticity, and, as a result, a range of computations on sequences of assemblies can be carried out. In particular, repeated presentation of a sequence of stimuli leads to the memorization of the sequence through corresponding neural assemblies: upon future presentation of any stimulus in the sequence, the corresponding assembly and its subsequent ones will be activated, one after the other, until the end of the sequence. If the stimulus sequence is presented to two brain areas simultaneously, a scaffolded representation is created, resulting in more efficient memorization and recall, in agreement with cognitive experiments. Finally, we show that any finite state machine can be learned in a similar way, through the presentation of appropriate patterns of sequences. Through an extension of this mechanism, the model can be shown to be capable of universal computation. Taken together, these results provide a concrete hypothesis for the basis of the brain’s remarkable abilities to compute and learn, with sequences playing a vital role.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 1","pages":"193-233"},"PeriodicalIF":2.7,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computing With Residue Numbers in High-Dimensional Representation 用高维表示法计算残差数
IF 2.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-12-12 DOI: 10.1162/neco_a_01723
Christopher J. Kymn;Denis Kleyko;E. Paxon Frady;Connor Bybee;Pentti Kanerva;Friedrich T. Sommer;Bruno A. Olshausen
We introduce residue hyperdimensional computing, a computing framework that unifies residue number systems with an algebra defined over random, high-dimensional vectors. We show how residue numbers can be represented as high-dimensional vectors in a manner that allows algebraic operations to be performed with component-wise, parallelizable operations on the vector elements. The resulting framework, when combined with an efficient method for factorizing high-dimensional vectors, can represent and operate on numerical values over a large dynamic range using resources that scale only logarithmically with the range, a vast improvement over previous methods. It also exhibits impressive robustness to noise. We demonstrate the potential for this framework to solve computationally difficult problems in visual perception and combinatorial optimization, showing improvement over baseline methods. More broadly, the framework provides a possible account for the computational operations of grid cells in the brain, and it suggests new machine learning architectures for representing and manipulating numerical data.
我们介绍了残差超维计算,这是一种将残差数系统与定义在随机高维向量上的代数统一起来的计算框架。我们展示了如何将残差数表示为高维向量,从而可以通过对向量元素进行分量式并行运算来执行代数运算。由此产生的框架与对高维向量进行因式分解的高效方法相结合,可以在很大的动态范围内表示和运算数值,所使用的资源仅随动态范围的对数变化而变化,比以前的方法有了很大的改进。它对噪声的鲁棒性也令人印象深刻。我们展示了这一框架在解决视觉感知和组合优化等计算困难问题方面的潜力,并显示出与基线方法相比的改进。更广泛地说,该框架为大脑中网格细胞的计算操作提供了可能的解释,并为表示和处理数字数据提出了新的机器学习架构。
{"title":"Computing With Residue Numbers in High-Dimensional Representation","authors":"Christopher J. Kymn;Denis Kleyko;E. Paxon Frady;Connor Bybee;Pentti Kanerva;Friedrich T. Sommer;Bruno A. Olshausen","doi":"10.1162/neco_a_01723","DOIUrl":"10.1162/neco_a_01723","url":null,"abstract":"We introduce residue hyperdimensional computing, a computing framework that unifies residue number systems with an algebra defined over random, high-dimensional vectors. We show how residue numbers can be represented as high-dimensional vectors in a manner that allows algebraic operations to be performed with component-wise, parallelizable operations on the vector elements. The resulting framework, when combined with an efficient method for factorizing high-dimensional vectors, can represent and operate on numerical values over a large dynamic range using resources that scale only logarithmically with the range, a vast improvement over previous methods. It also exhibits impressive robustness to noise. We demonstrate the potential for this framework to solve computationally difficult problems in visual perception and combinatorial optimization, showing improvement over baseline methods. More broadly, the framework provides a possible account for the computational operations of grid cells in the brain, and it suggests new machine learning architectures for representing and manipulating numerical data.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 1","pages":"1-37"},"PeriodicalIF":2.7,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142669937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neural Computation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1