{"title":"SpikingViT: A Multiscale Spiking Vision Transformer Model for Event-Based Object Detection","authors":"Lixing Yu;Hanqi Chen;Ziming Wang;Shaojie Zhan;Jiankun Shao;Qingjie Liu;Shu Xu","doi":"10.1109/TCDS.2024.3422873","DOIUrl":null,"url":null,"abstract":"Event cameras have unique advantages in object detection, capturing asynchronous events without continuous frames. They excel in dynamic range, low latency, and high-speed motion scenarios, with lower power consumption. However, aggregating event data into image frames leads to information loss and reduced detection performance. Applying traditional neural networks to event camera outputs is challenging due to event data's distinct characteristics. In this study, we present a novel spiking neural networks (SNNs)-based object detection model, the spiking vision transformer (SpikingViT) to address these issues. First, we design a dedicated event data converting module that effectively captures the unique characteristics of event data, mitigating the risk of information loss while preserving its spatiotemporal features. Second, we introduce SpikingViT, a novel object detection model that leverages SNNs capable of extracting spatiotemporal information among events data. SpikingViT combines the advantages of SNNs and transformer models, incorporating mechanisms such as attention and residual voltage memory to further enhance detection performance. Extensive experiments have substantiated the remarkable proficiency of SpikingViT in event-based object detection, positioning it as a formidable contender. Our proposed approach adeptly retains spatiotemporal information inherent in event data, leading to a substantial enhancement in detection performance.","PeriodicalId":54300,"journal":{"name":"IEEE Transactions on Cognitive and Developmental Systems","volume":"17 1","pages":"130-146"},"PeriodicalIF":5.0000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive and Developmental Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10586833/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Event cameras have unique advantages in object detection, capturing asynchronous events without continuous frames. They excel in dynamic range, low latency, and high-speed motion scenarios, with lower power consumption. However, aggregating event data into image frames leads to information loss and reduced detection performance. Applying traditional neural networks to event camera outputs is challenging due to event data's distinct characteristics. In this study, we present a novel spiking neural networks (SNNs)-based object detection model, the spiking vision transformer (SpikingViT) to address these issues. First, we design a dedicated event data converting module that effectively captures the unique characteristics of event data, mitigating the risk of information loss while preserving its spatiotemporal features. Second, we introduce SpikingViT, a novel object detection model that leverages SNNs capable of extracting spatiotemporal information among events data. SpikingViT combines the advantages of SNNs and transformer models, incorporating mechanisms such as attention and residual voltage memory to further enhance detection performance. Extensive experiments have substantiated the remarkable proficiency of SpikingViT in event-based object detection, positioning it as a formidable contender. Our proposed approach adeptly retains spatiotemporal information inherent in event data, leading to a substantial enhancement in detection performance.
期刊介绍:
The IEEE Transactions on Cognitive and Developmental Systems (TCDS) focuses on advances in the study of development and cognition in natural (humans, animals) and artificial (robots, agents) systems. It welcomes contributions from multiple related disciplines including cognitive systems, cognitive robotics, developmental and epigenetic robotics, autonomous and evolutionary robotics, social structures, multi-agent and artificial life systems, computational neuroscience, and developmental psychology. Articles on theoretical, computational, application-oriented, and experimental studies as well as reviews in these areas are considered.