SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-27 DOI:arxiv-2408.14909

Shuaijie Shen, Chao Wang, Renzhuo Huang, Yan Zhong, Qinghai Guo, Zhichao Lu, Jianguo Zhang, Luziwei Leng

{"title":"SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models","authors":"Shuaijie Shen, Chao Wang, Renzhuo Huang, Yan Zhong, Qinghai Guo, Zhichao Lu, Jianguo Zhang, Luziwei Leng","doi":"arxiv-2408.14909","DOIUrl":null,"url":null,"abstract":"Known as low energy consumption networks, spiking neural networks (SNNs) have\ngained a lot of attention within the past decades. While SNNs are increasing\ncompetitive with artificial neural networks (ANNs) for vision tasks, they are\nrarely used for long sequence tasks, despite their intrinsic temporal dynamics.\nIn this work, we develop spiking state space models (SpikingSSMs) for long\nsequence learning by leveraging on the sequence learning abilities of state\nspace models (SSMs). Inspired by dendritic neuron structure, we hierarchically\nintegrate neuronal dynamics with the original SSM block, meanwhile realizing\nsparse synaptic computation. Furthermore, to solve the conflict of event-driven\nneuronal dynamics with parallel computing, we propose a light-weight surrogate\ndynamic network which accurately predicts the after-reset membrane potential\nand compatible to learnable thresholds, enabling orders of acceleration in\ntraining speed compared with conventional iterative methods. On the long range\narena benchmark task, SpikingSSM achieves competitive performance to\nstate-of-the-art SSMs meanwhile realizing on average 90\\% of network sparsity.\nOn language modeling, our network significantly surpasses existing spiking\nlarge language models (spikingLLMs) on the WikiText-103 dataset with only a\nthird of the model size, demonstrating its potential as backbone architecture\nfor low computation cost LLMs.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"60 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14909","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Known as low energy consumption networks, spiking neural networks (SNNs) have gained a lot of attention within the past decades. While SNNs are increasing competitive with artificial neural networks (ANNs) for vision tasks, they are rarely used for long sequence tasks, despite their intrinsic temporal dynamics. In this work, we develop spiking state space models (SpikingSSMs) for long sequence learning by leveraging on the sequence learning abilities of state space models (SSMs). Inspired by dendritic neuron structure, we hierarchically integrate neuronal dynamics with the original SSM block, meanwhile realizing sparse synaptic computation. Furthermore, to solve the conflict of event-driven neuronal dynamics with parallel computing, we propose a light-weight surrogate dynamic network which accurately predicts the after-reset membrane potential and compatible to learnable thresholds, enabling orders of acceleration in training speed compared with conventional iterative methods. On the long range arena benchmark task, SpikingSSM achieves competitive performance to state-of-the-art SSMs meanwhile realizing on average 90\% of network sparsity. On language modeling, our network significantly surpasses existing spiking large language models (spikingLLMs) on the WikiText-103 dataset with only a third of the model size, demonstrating its potential as backbone architecture for low computation cost LLMs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SpikingSSMs：利用稀疏并行尖峰状态空间模型学习长序列

尖峰神经网络（SNN）被称为低能耗网络，在过去的几十年里受到了广泛关注。在这项工作中，我们利用状态空间模型（SSM）的序列学习能力，开发了用于长序列学习的尖峰状态空间模型（SpikingSSM）。受树突状神经元结构的启发，我们将神经元动力学与原始的 SSM 模块进行了分层整合，同时实现了解析突触计算。此外，为了解决事件驱动神经元动力学与并行计算的矛盾，我们提出了一种轻量级的代理动力学网络，它能准确预测复位后的膜电位，并兼容可学习的阈值，与传统的迭代方法相比，训练速度加快了几个数量级。在长距离区域基准任务上，SpikingSSM的性能与最先进的SSM相比具有竞争力，同时平均实现了90%的网络稀疏性。在语言建模方面，我们的网络在WikiText-103数据集上显著超越了现有的spiking大型语言模型（spikingLLMs），而模型大小仅为现有模型的三分之一，这证明了它作为低计算成本LLMs骨干架构的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Neural and Evolutionary Computing

自引率

0.00%

发文量