Randomized search heuristics (RHSs) are generally believed to be robust to noise. However, almost all mathematical analyses on how RSHs cope with a noisy access to the objective function assume that each solution is re-evaluated whenever it is compared to others. This is unfortunate, both because it wastes computational resources and because it requires the user to foresee that noise is present (as in a noise-free setting, one would never re-evaluate solutions). In this work, we show the need for re-evaluations could be overestimated, and in fact, detrimental. For the classic benchmark problem of how the $(1+1)$ evolutionary algorithm optimizes the LeadingOnes benchmark, we show that without re-evaluations up to constant noise rates can be tolerated, much more than the $O(n^{-2} log n)$ noise rates that can be tolerated when re-evaluating solutions. This first runtime analysis of an evolutionary algorithm solving a single-objective noisy problem without re-evaluations could indicate that such algorithms cope with noise much better than previously thought, and without the need to foresee the presence of noise.
{"title":"Evolutionary Algorithms Are Significantly More Robust to Noise When They Ignore It","authors":"Denis Antipov, Benjamin Doerr","doi":"arxiv-2409.00306","DOIUrl":"https://doi.org/arxiv-2409.00306","url":null,"abstract":"Randomized search heuristics (RHSs) are generally believed to be robust to\u0000noise. However, almost all mathematical analyses on how RSHs cope with a noisy\u0000access to the objective function assume that each solution is re-evaluated\u0000whenever it is compared to others. This is unfortunate, both because it wastes\u0000computational resources and because it requires the user to foresee that noise\u0000is present (as in a noise-free setting, one would never re-evaluate solutions). In this work, we show the need for re-evaluations could be overestimated, and\u0000in fact, detrimental. For the classic benchmark problem of how the $(1+1)$\u0000evolutionary algorithm optimizes the LeadingOnes benchmark, we show that\u0000without re-evaluations up to constant noise rates can be tolerated, much more\u0000than the $O(n^{-2} log n)$ noise rates that can be tolerated when\u0000re-evaluating solutions. This first runtime analysis of an evolutionary algorithm solving a\u0000single-objective noisy problem without re-evaluations could indicate that such\u0000algorithms cope with noise much better than previously thought, and without the\u0000need to foresee the presence of noise.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A natural strategy for continual learning is to weigh a Bayesian ensemble of fixed functions. This suggests that if a (single) neural network could be interpreted as an ensemble, one could design effective algorithms that learn without forgetting. To realize this possibility, we observe that a neural network classifier with N parameters can be interpreted as a weighted ensemble of N classifiers, and that in the lazy regime limit these classifiers are fixed throughout learning. We term these classifiers the neural tangent experts and show they output valid probability distributions over the labels. We then derive the likelihood and posterior probability of each expert given past data. Surprisingly, we learn that the posterior updates for these experts are equivalent to a scaled and projected form of stochastic gradient descent (SGD) over the network weights. Away from the lazy regime, networks can be seen as ensembles of adaptive experts which improve over time. These results offer a new interpretation of neural networks as Bayesian ensembles of experts, providing a principled framework for understanding and mitigating catastrophic forgetting in continual learning settings.
持续学习的一种自然策略是权衡一个贝叶斯集合的固定函数。这表明,如果(单个)神经网络可以被解释为一个集合,那么我们就可以设计出有效的算法,实现无遗忘学习。为了实现这种可能性,我们观察到,具有 N 个参数的神经网络分类器可以被解释为 N 个分类器的加权集合,而且在懒惰机制限制下,这些分类器在整个学习过程中都是固定的。我们称这些分类器为神经切线专家,并证明它们能输出有效的标签概率分布。令人惊讶的是,我们发现这些专家的后验更新等同于网络权重上的随机梯度下降(SGD)的缩放和投影形式。脱离了懒惰机制,网络可以被看作是随时间不断改进的自适应专家的集合体。这些结果为神经网络作为贝叶斯专家集合提供了新的解释,为理解和减轻持续学习环境中的灾难性遗忘提供了一个原则性框架。
{"title":"Continual learning with the neural tangent ensemble","authors":"Ari S. Benjamin, Christian Pehle, Kyle Daruwalla","doi":"arxiv-2408.17394","DOIUrl":"https://doi.org/arxiv-2408.17394","url":null,"abstract":"A natural strategy for continual learning is to weigh a Bayesian ensemble of\u0000fixed functions. This suggests that if a (single) neural network could be\u0000interpreted as an ensemble, one could design effective algorithms that learn\u0000without forgetting. To realize this possibility, we observe that a neural\u0000network classifier with N parameters can be interpreted as a weighted ensemble\u0000of N classifiers, and that in the lazy regime limit these classifiers are fixed\u0000throughout learning. We term these classifiers the neural tangent experts and\u0000show they output valid probability distributions over the labels. We then\u0000derive the likelihood and posterior probability of each expert given past data.\u0000Surprisingly, we learn that the posterior updates for these experts are\u0000equivalent to a scaled and projected form of stochastic gradient descent (SGD)\u0000over the network weights. Away from the lazy regime, networks can be seen as\u0000ensembles of adaptive experts which improve over time. These results offer a\u0000new interpretation of neural networks as Bayesian ensembles of experts,\u0000providing a principled framework for understanding and mitigating catastrophic\u0000forgetting in continual learning settings.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"2010 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spiking Neural Networks (SNNs) seek to mimic the spiking behavior of biological neurons and are expected to play a key role in the advancement of neural computing and artificial intelligence. The efficiency of SNNs is often determined by the neural coding schemes. Existing coding schemes either cause huge delays and energy consumption or necessitate intricate neuron models and training techniques. To address these issues, we propose a novel Stepwise Weighted Spike (SWS) coding scheme to enhance the encoding of information in spikes. This approach compresses the spikes by weighting the significance of the spike in each step of neural computation, achieving high performance and low energy consumption. A Ternary Self-Amplifying (TSA) neuron model with a silent period is proposed for supporting SWS-based computing, aimed at minimizing the residual error resulting from stepwise weighting in neural computation. Our experimental results show that the SWS coding scheme outperforms the existing neural coding schemes in very deep SNNs, and significantly reduces operations and latency.
{"title":"Stepwise Weighted Spike Coding for Deep Spiking Neural Networks","authors":"Yiwen Gu, Junchuan Gu, Haibin Shen, Kejie Huang","doi":"arxiv-2408.17245","DOIUrl":"https://doi.org/arxiv-2408.17245","url":null,"abstract":"Spiking Neural Networks (SNNs) seek to mimic the spiking behavior of\u0000biological neurons and are expected to play a key role in the advancement of\u0000neural computing and artificial intelligence. The efficiency of SNNs is often\u0000determined by the neural coding schemes. Existing coding schemes either cause\u0000huge delays and energy consumption or necessitate intricate neuron models and\u0000training techniques. To address these issues, we propose a novel Stepwise\u0000Weighted Spike (SWS) coding scheme to enhance the encoding of information in\u0000spikes. This approach compresses the spikes by weighting the significance of\u0000the spike in each step of neural computation, achieving high performance and\u0000low energy consumption. A Ternary Self-Amplifying (TSA) neuron model with a\u0000silent period is proposed for supporting SWS-based computing, aimed at\u0000minimizing the residual error resulting from stepwise weighting in neural\u0000computation. Our experimental results show that the SWS coding scheme\u0000outperforms the existing neural coding schemes in very deep SNNs, and\u0000significantly reduces operations and latency.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Independent component analysis (ICA) is a widely used method in various applications of signal processing and feature extraction. It extends principal component analysis (PCA) and can extract important and complicated components with small variances. One of the major problems of ICA is that the uniqueness of the solution is not guaranteed, unlike PCA. That is because there are many local optima in optimizing the objective function of ICA. It has been shown previously that the unique global optimum of ICA can be estimated from many random initializations by handcrafted thread computation. In this paper, the unique estimation of ICA is highly accelerated by reformulating the algorithm in matrix representation and reducing redundant calculations. Experimental results on artificial datasets and EEG data verified the efficiency of the proposed method.
{"title":"Efficient Estimation of Unique Components in Independent Component Analysis by Matrix Representation","authors":"Yoshitatsu Matsuda, Kazunori Yamaguch","doi":"arxiv-2408.17118","DOIUrl":"https://doi.org/arxiv-2408.17118","url":null,"abstract":"Independent component analysis (ICA) is a widely used method in various\u0000applications of signal processing and feature extraction. It extends principal\u0000component analysis (PCA) and can extract important and complicated components\u0000with small variances. One of the major problems of ICA is that the uniqueness\u0000of the solution is not guaranteed, unlike PCA. That is because there are many\u0000local optima in optimizing the objective function of ICA. It has been shown\u0000previously that the unique global optimum of ICA can be estimated from many\u0000random initializations by handcrafted thread computation. In this paper, the\u0000unique estimation of ICA is highly accelerated by reformulating the algorithm\u0000in matrix representation and reducing redundant calculations. Experimental\u0000results on artificial datasets and EEG data verified the efficiency of the\u0000proposed method.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"393 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Chwilczyński, Kacper Trębacz, Karol Cyganik, Mateusz Małecki, Dariusz Brzezinski
Current interest in deep learning captures the attention of many programmers and researchers. Unfortunately, the lack of a unified schema for developing deep learning models results in methodological inconsistencies, unclear documentation, and problems with reproducibility. Some guidelines have been proposed, yet currently, they lack practical implementations. Furthermore, neural network training often takes on the form of trial and error, lacking a structured and thoughtful process. To alleviate these issues, in this paper, we introduce Art, a Python library designed to help automatically impose rules and standards while developing deep learning pipelines. Art divides model development into a series of smaller steps of increasing complexity, each concluded with a validation check improving the interpretability and robustness of the process. The current version of Art comes equipped with nine predefined steps inspired by Andrej Karpathy's Recipe for Training Neural Networks, a visualization dashboard, and integration with loggers such as Neptune. The code related to this paper is available at: https://github.com/SebChw/Actually-Robust-Training.
当前,深度学习吸引了众多程序员和研究人员的关注。遗憾的是,由于缺乏开发深度学习模型的统一模式,导致了方法上的不一致、文档的不完整以及可重复性的问题。虽然已经提出了一些指导原则,但目前还缺乏实际应用。此外,神经网络的训练往往采取试错的形式,缺乏结构化和深思熟虑的过程。为了缓解这些问题,我们在本文中介绍了 Art,这是一个 Python 库,旨在帮助在开发深度学习管道时自动施加规则和标准。Art 将模型开发分为一系列复杂度不断增加的较小步骤,每个步骤都有一个验证检查,以提高过程的可解释性和鲁棒性。受 Andrej Karpathy 的《神经网络训练配方》(Recipe for Training Neural Networks)启发,Art 的当前版本配备了九个预定义步骤、可视化仪表板,并与 Neptune 等记录仪集成。与本文相关的代码请访问:https://github.com/SebChw/Actually-Robust-Training。
{"title":"ART: Actually Robust Training","authors":"Sebastian Chwilczyński, Kacper Trębacz, Karol Cyganik, Mateusz Małecki, Dariusz Brzezinski","doi":"arxiv-2408.16285","DOIUrl":"https://doi.org/arxiv-2408.16285","url":null,"abstract":"Current interest in deep learning captures the attention of many programmers\u0000and researchers. Unfortunately, the lack of a unified schema for developing\u0000deep learning models results in methodological inconsistencies, unclear\u0000documentation, and problems with reproducibility. Some guidelines have been\u0000proposed, yet currently, they lack practical implementations. Furthermore,\u0000neural network training often takes on the form of trial and error, lacking a\u0000structured and thoughtful process. To alleviate these issues, in this paper, we\u0000introduce Art, a Python library designed to help automatically impose rules and\u0000standards while developing deep learning pipelines. Art divides model\u0000development into a series of smaller steps of increasing complexity, each\u0000concluded with a validation check improving the interpretability and robustness\u0000of the process. The current version of Art comes equipped with nine predefined\u0000steps inspired by Andrej Karpathy's Recipe for Training Neural Networks, a\u0000visualization dashboard, and integration with loggers such as Neptune. The code\u0000related to this paper is available at:\u0000https://github.com/SebChw/Actually-Robust-Training.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthew Evanusa, Cornelia Fermüller, Yiannis Aloimonos
Artificial Neural Networks has struggled to devise a way to incorporate working memory into neural networks. While the ``long term'' memory can be seen as the learned weights, the working memory consists likely more of dynamical activity, that is missing from feed-forward models. Current state of the art models such as transformers tend to ``solve'' this by ignoring working memory entirely and simply process the sequence as an entire piece of data; however this means the network cannot process the sequence in an online fashion, and leads to an immense explosion in memory requirements. Here, inspired by a combination of controls, reservoir computing, deep learning, and recurrent neural networks, we offer an alternative paradigm that combines the strength of recurrent networks, with the pattern matching capability of feed-forward neural networks, which we call the textit{Maelstrom Networks} paradigm. This paradigm leaves the recurrent component - the textit{Maelstrom} - unlearned, and offloads the learning to a powerful feed-forward network. This allows the network to leverage the strength of feed-forward training without unrolling the network, and allows for the memory to be implemented in new neuromorphic hardware. It endows a neural network with a sequential memory that takes advantage of the inductive bias that data is organized causally in the temporal domain, and imbues the network with a state that represents the agent's ``self'', moving through the environment. This could also lead the way to continual learning, with the network modularized and ``'protected'' from overwrites that come with new data. In addition to aiding in solving these performance problems that plague current non-temporal deep networks, this also could finally lead towards endowing artificial networks with a sense of ``self''.
{"title":"Maelstrom Networks","authors":"Matthew Evanusa, Cornelia Fermüller, Yiannis Aloimonos","doi":"arxiv-2408.16632","DOIUrl":"https://doi.org/arxiv-2408.16632","url":null,"abstract":"Artificial Neural Networks has struggled to devise a way to incorporate\u0000working memory into neural networks. While the ``long term'' memory can be seen\u0000as the learned weights, the working memory consists likely more of dynamical\u0000activity, that is missing from feed-forward models. Current state of the art\u0000models such as transformers tend to ``solve'' this by ignoring working memory\u0000entirely and simply process the sequence as an entire piece of data; however\u0000this means the network cannot process the sequence in an online fashion, and\u0000leads to an immense explosion in memory requirements. Here, inspired by a\u0000combination of controls, reservoir computing, deep learning, and recurrent\u0000neural networks, we offer an alternative paradigm that combines the strength of\u0000recurrent networks, with the pattern matching capability of feed-forward neural\u0000networks, which we call the textit{Maelstrom Networks} paradigm. This paradigm\u0000leaves the recurrent component - the textit{Maelstrom} - unlearned, and\u0000offloads the learning to a powerful feed-forward network. This allows the\u0000network to leverage the strength of feed-forward training without unrolling the\u0000network, and allows for the memory to be implemented in new neuromorphic\u0000hardware. It endows a neural network with a sequential memory that takes\u0000advantage of the inductive bias that data is organized causally in the temporal\u0000domain, and imbues the network with a state that represents the agent's\u0000``self'', moving through the environment. This could also lead the way to\u0000continual learning, with the network modularized and ``'protected'' from\u0000overwrites that come with new data. In addition to aiding in solving these\u0000performance problems that plague current non-temporal deep networks, this also\u0000could finally lead towards endowing artificial networks with a sense of\u0000``self''.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spiking neural networks (SNNs) are generally regarded as more energy-efficient because they do not use multiplications. However, most SNN works only consider the counting of additions to evaluate energy consumption, neglecting other overheads such as memory accesses and data movement operations. This oversight can lead to a misleading perception of efficiency, especially when state-of-the-art SNN accelerators operate with very small time window sizes. In this paper, we present a detailed comparison of the energy consumption of artificial neural networks (ANNs) and SNNs from a hardware perspective. We provide accurate formulas for energy consumption based on classical multi-level memory hierarchy architectures, commonly used neuromorphic dataflow architectures, and our proposed improved spatial-dataflow architecture. Our research demonstrates that to achieve comparable accuracy and greater energy efficiency than ANNs, SNNs require strict limitations on both time window size T and sparsity s. For instance, with the VGG16 model and a fixed T of 6, the neuron sparsity rate must exceed 93% to ensure energy efficiency across most architectures. Inspired by our findings, we explore strategies to enhance energy efficiency by increasing sparsity. We introduce two regularization terms during training that constrain weights and activations, effectively boosting the sparsity rate. Our experiments on the CIFAR-10 dataset, using T of 6, show that our SNNs consume 69% of the energy used by optimized ANNs on spatial-dataflow architectures, while maintaining an SNN accuracy of 94.18%. This framework, developed using PyTorch, is publicly available for use and further research.
由于尖峰神经网络(SNN)不使用乘法运算,因此通常被认为更节能。然而,大多数 SNNworks 在评估能耗时只考虑加法运算,而忽略了其他开销,如内存访问和数据移动操作。这种疏忽可能会导致对效率的误解,尤其是当最先进的 SNN 加速器以非常小的时间窗口尺寸运行时。在本文中,我们从硬件角度详细比较了人工神经网络(ANN)和 SNN 的能耗。我们根据经典的多级内存分层架构、常用的超形态数据流架构以及我们提出的改进型空间数据流架构,提供了精确的能耗公式。我们的研究表明,为了达到与人工神经网络相当的精度和更高的能效,人工神经网络需要严格限制时间窗口大小 T 和稀疏度 s。例如,在 VGG16 模型和固定 T 为 6 的情况下,神经元稀疏率必须超过 93%,才能确保大多数架构的能效。受这一发现的启发,我们探索了通过增加稀疏性来提高能效的策略。我们在训练过程中引入了两个正则化项,对权重和激活进行约束,从而有效提高了稀疏率。我们在 CIFAR-10 数据集上使用 6 T 进行的实验表明,我们的 SNN 所消耗的能量是空间数据流架构上优化 ANN 所消耗能量的 69%,同时保持了 94.18% 的 SNN 准确率。该框架使用 PyTorch 开发,可公开使用和进一步研究。
{"title":"Reconsidering the energy efficiency of spiking neural networks","authors":"Zhanglu Yan, Zhenyu Bai, Weng-Fai Wong","doi":"arxiv-2409.08290","DOIUrl":"https://doi.org/arxiv-2409.08290","url":null,"abstract":"Spiking neural networks (SNNs) are generally regarded as more\u0000energy-efficient because they do not use multiplications. However, most SNN\u0000works only consider the counting of additions to evaluate energy consumption,\u0000neglecting other overheads such as memory accesses and data movement\u0000operations. This oversight can lead to a misleading perception of efficiency,\u0000especially when state-of-the-art SNN accelerators operate with very small time\u0000window sizes. In this paper, we present a detailed comparison of the energy\u0000consumption of artificial neural networks (ANNs) and SNNs from a hardware\u0000perspective. We provide accurate formulas for energy consumption based on\u0000classical multi-level memory hierarchy architectures, commonly used\u0000neuromorphic dataflow architectures, and our proposed improved spatial-dataflow\u0000architecture. Our research demonstrates that to achieve comparable accuracy and\u0000greater energy efficiency than ANNs, SNNs require strict limitations on both\u0000time window size T and sparsity s. For instance, with the VGG16 model and a\u0000fixed T of 6, the neuron sparsity rate must exceed 93% to ensure energy\u0000efficiency across most architectures. Inspired by our findings, we explore\u0000strategies to enhance energy efficiency by increasing sparsity. We introduce\u0000two regularization terms during training that constrain weights and\u0000activations, effectively boosting the sparsity rate. Our experiments on the\u0000CIFAR-10 dataset, using T of 6, show that our SNNs consume 69% of the energy\u0000used by optimized ANNs on spatial-dataflow architectures, while maintaining an\u0000SNN accuracy of 94.18%. This framework, developed using PyTorch, is publicly\u0000available for use and further research.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent years have witnessed Spiking Neural Networks (SNNs) gaining attention for their ultra-low energy consumption and high biological plausibility compared with traditional Artificial Neural Networks (ANNs). Despite their distinguished properties, the application of SNNs in the computationally intensive field of image generation is still under exploration. In this paper, we propose the Spiking Diffusion Models (SDMs), an innovative family of SNN-based generative models that excel in producing high-quality samples with significantly reduced energy consumption. In particular, we propose a Temporal-wise Spiking Mechanism (TSM) that allows SNNs to capture more temporal features from a bio-plasticity perspective. In addition, we propose a threshold-guided strategy that can further improve the performances by up to 16.7% without any additional training. We also make the first attempt to use the ANN-SNN approach for SNN-based generation tasks. Extensive experimental results reveal that our approach not only exhibits comparable performance to its ANN counterpart with few spiking time steps, but also outperforms previous SNN-based generative models by a large margin. Moreover, we also demonstrate the high-quality generation ability of SDM on large-scale datasets, e.g., LSUN bedroom. This development marks a pivotal advancement in the capabilities of SNN-based generation, paving the way for future research avenues to realize low-energy and low-latency generative applications. Our code is available at https://github.com/AndyCao1125/SDM.
{"title":"Spiking Diffusion Models","authors":"Jiahang Cao, Hanzhong Guo, Ziqing Wang, Deming Zhou, Hao Cheng, Qiang Zhang, Renjing Xu","doi":"arxiv-2408.16467","DOIUrl":"https://doi.org/arxiv-2408.16467","url":null,"abstract":"Recent years have witnessed Spiking Neural Networks (SNNs) gaining attention\u0000for their ultra-low energy consumption and high biological plausibility\u0000compared with traditional Artificial Neural Networks (ANNs). Despite their\u0000distinguished properties, the application of SNNs in the computationally\u0000intensive field of image generation is still under exploration. In this paper,\u0000we propose the Spiking Diffusion Models (SDMs), an innovative family of\u0000SNN-based generative models that excel in producing high-quality samples with\u0000significantly reduced energy consumption. In particular, we propose a\u0000Temporal-wise Spiking Mechanism (TSM) that allows SNNs to capture more temporal\u0000features from a bio-plasticity perspective. In addition, we propose a\u0000threshold-guided strategy that can further improve the performances by up to\u000016.7% without any additional training. We also make the first attempt to use\u0000the ANN-SNN approach for SNN-based generation tasks. Extensive experimental\u0000results reveal that our approach not only exhibits comparable performance to\u0000its ANN counterpart with few spiking time steps, but also outperforms previous\u0000SNN-based generative models by a large margin. Moreover, we also demonstrate\u0000the high-quality generation ability of SDM on large-scale datasets, e.g., LSUN\u0000bedroom. This development marks a pivotal advancement in the capabilities of\u0000SNN-based generation, paving the way for future research avenues to realize\u0000low-energy and low-latency generative applications. Our code is available at\u0000https://github.com/AndyCao1125/SDM.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"160 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This note addresses the Kolmogorov-Arnold Representation Theorem (KART) and the Universal Approximation Theorem (UAT), focusing on their common misinterpretations in some papers related to neural network approximation. Our remarks aim to support a more accurate understanding of KART and UAT among neural network specialists.
这篇论文讨论了科尔莫哥罗德-阿诺德表征定理(KART)和通用逼近定理(UAT),重点是它们在一些与神经网络逼近相关的论文中常见的错误解释。我们的评论旨在帮助神经网络专家更准确地理解 KART 和 UAT。
{"title":"Addressing Common Misinterpretations of KART and UAT in Neural Network Literature","authors":"Vugar Ismailov","doi":"arxiv-2408.16389","DOIUrl":"https://doi.org/arxiv-2408.16389","url":null,"abstract":"This note addresses the Kolmogorov-Arnold Representation Theorem (KART) and\u0000the Universal Approximation Theorem (UAT), focusing on their common\u0000misinterpretations in some papers related to neural network approximation. Our\u0000remarks aim to support a more accurate understanding of KART and UAT among\u0000neural network specialists.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wind flow can be highly unpredictable and can suffer substantial fluctuations in speed and direction due to the shape and height of hills, mountains, and valleys, making accurate wind speed (WS) forecasting essential in complex terrain. This paper presents a novel and adaptive model for short-term forecasting of WS. The paper's key contributions are as follows: (a) The Partial Auto Correlation Function (PACF) is utilised to minimise the dimension of the set of Intrinsic Mode Functions (IMF), hence reducing training time; (b) The sample entropy (SampEn) was used to calculate the complexity of the reduced set of IMFs. The proposed technique is adaptive since a specific Deep Learning (DL) model-feature combination was chosen based on complexity; (c) A novel bidirectional feature-LSTM framework for complicated IMFs has been suggested, resulting in improved forecasting accuracy; (d) The proposed model shows superior forecasting performance compared to the persistence, hybrid, Ensemble empirical mode decomposition (EEMD), and Variational Mode Decomposition (VMD)-based deep learning models. It has achieved the lowest variance in terms of forecasting accuracy between simple and complex terrain conditions 0.70%. Dimension reduction of IMF's and complexity-based model-feature selection helps reduce the training time by 68.77% and improve forecasting quality by 58.58% on average.
{"title":"A Novel Denoising Technique and Deep Learning Based Hybrid Wind Speed Forecasting Model for Variable Terrain Conditions","authors":"Sourav Malakar, Saptarsi Goswami, Amlan Chakrabarti, Bhaswati Ganguli","doi":"arxiv-2408.15554","DOIUrl":"https://doi.org/arxiv-2408.15554","url":null,"abstract":"Wind flow can be highly unpredictable and can suffer substantial fluctuations\u0000in speed and direction due to the shape and height of hills, mountains, and\u0000valleys, making accurate wind speed (WS) forecasting essential in complex\u0000terrain. This paper presents a novel and adaptive model for short-term\u0000forecasting of WS. The paper's key contributions are as follows: (a) The\u0000Partial Auto Correlation Function (PACF) is utilised to minimise the dimension\u0000of the set of Intrinsic Mode Functions (IMF), hence reducing training time; (b)\u0000The sample entropy (SampEn) was used to calculate the complexity of the reduced\u0000set of IMFs. The proposed technique is adaptive since a specific Deep Learning\u0000(DL) model-feature combination was chosen based on complexity; (c) A novel\u0000bidirectional feature-LSTM framework for complicated IMFs has been suggested,\u0000resulting in improved forecasting accuracy; (d) The proposed model shows\u0000superior forecasting performance compared to the persistence, hybrid, Ensemble\u0000empirical mode decomposition (EEMD), and Variational Mode Decomposition\u0000(VMD)-based deep learning models. It has achieved the lowest variance in terms\u0000of forecasting accuracy between simple and complex terrain conditions 0.70%.\u0000Dimension reduction of IMF's and complexity-based model-feature selection helps\u0000reduce the training time by 68.77% and improve forecasting quality by 58.58% on\u0000average.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}