利用神经形态视觉传感器进行物体检测的时空聚合变换器

IF 4.3 2区综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Sensors Journal Pub Date : 2024-04-30 DOI:10.1109/JSEN.2024.3392973

Zhaoxuan Guo;Jiandong Gao;Guangyuan Ma;Jiangtao Xu

{"title":"利用神经形态视觉传感器进行物体检测的时空聚合变换器","authors":"Zhaoxuan Guo;Jiandong Gao;Guangyuan Ma;Jiangtao Xu","doi":"10.1109/JSEN.2024.3392973","DOIUrl":null,"url":null,"abstract":"To enhance the accuracy of object detection with event-based neuromorphic vision sensors, a novel event-based detector named spatiotemporal aggregation transformer (STAT) is proposed. First, in order to collect sufficient event information to estimate the problem considered, STAT uses a density-based adaptive sampling (DAS) module to sample continuous event stream into multiple groups adaptively. This module can determine the sampling termination condition by quantifying the velocity and size of objects. Second, STAT integrates a sparse event tensor (SET) to establish compatibility between event stream and traditional vision algorithms. SET maps events to a dense representation by end-to-end fitting the optimal mapping function, mitigating the loss of spatiotemporal information within the event stream. Finally, in order to enhance the features of slowly moving objects, a lightweight and efficient triaxial vision transformer (TVT) is designed for modeling global features and integrating historical motion information. Experimental evaluations on two benchmark datasets show that the performance of STAT achieves a mean average precision (mAP) of 68.2% and 49.9% on the Neuromorphic-Caltech101 (N-Caltech101) dataset and the Gen1 dataset, respectively. These results demonstrate that the detection accuracy of STAT outperforms the state-of-the-art methods by 2.0% on the Gen1 dataset. The code of this project is available at \n<uri>https://github.com/TJU-guozhaoxuan/STAT</uri>\n.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spatiotemporal Aggregation Transformer for Object Detection With Neuromorphic Vision Sensors\",\"authors\":\"Zhaoxuan Guo;Jiandong Gao;Guangyuan Ma;Jiangtao Xu\",\"doi\":\"10.1109/JSEN.2024.3392973\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To enhance the accuracy of object detection with event-based neuromorphic vision sensors, a novel event-based detector named spatiotemporal aggregation transformer (STAT) is proposed. First, in order to collect sufficient event information to estimate the problem considered, STAT uses a density-based adaptive sampling (DAS) module to sample continuous event stream into multiple groups adaptively. This module can determine the sampling termination condition by quantifying the velocity and size of objects. Second, STAT integrates a sparse event tensor (SET) to establish compatibility between event stream and traditional vision algorithms. SET maps events to a dense representation by end-to-end fitting the optimal mapping function, mitigating the loss of spatiotemporal information within the event stream. Finally, in order to enhance the features of slowly moving objects, a lightweight and efficient triaxial vision transformer (TVT) is designed for modeling global features and integrating historical motion information. Experimental evaluations on two benchmark datasets show that the performance of STAT achieves a mean average precision (mAP) of 68.2% and 49.9% on the Neuromorphic-Caltech101 (N-Caltech101) dataset and the Gen1 dataset, respectively. These results demonstrate that the detection accuracy of STAT outperforms the state-of-the-art methods by 2.0% on the Gen1 dataset. The code of this project is available at \\n<uri>https://github.com/TJU-guozhaoxuan/STAT</uri>\\n.\",\"PeriodicalId\":447,\"journal\":{\"name\":\"IEEE Sensors Journal\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Sensors Journal\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10516298/\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10516298/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

为了提高基于事件的神经形态视觉传感器检测物体的准确性，我们提出了一种名为时空聚合转换器（STAT）的新型基于事件的检测器。首先，为了收集足够的事件信息来估计所考虑的问题，STAT 使用基于密度的自适应采样（DAS）模块，将连续事件流自适应地采样为多组。该模块可通过量化物体的速度和大小来确定采样终止条件。其次，STAT 集成了稀疏事件张量（SET），以建立事件流与传统视觉算法之间的兼容性。稀疏事件张量（SET）通过端到端拟合最佳映射函数，将事件映射为密集表示，从而减少事件流中时空信息的损失。最后，为了增强缓慢移动物体的特征，设计了一种轻量级、高效的三轴视觉变换器（TVT），用于全局特征建模和历史运动信息整合。在两个基准数据集上进行的实验评估表明，STAT 在神经形态-Caltech101（N-Caltech101）数据集和 Gen1 数据集上的平均精度（mAP）分别达到了 68.2% 和 49.9%。这些结果表明，在 Gen1 数据集上，STAT 的检测准确率比最先进的方法高出 2.0%。该项目的代码见 https://github.com/TJU-guozhaoxuan/STAT。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Spatiotemporal Aggregation Transformer for Object Detection With Neuromorphic Vision Sensors

To enhance the accuracy of object detection with event-based neuromorphic vision sensors, a novel event-based detector named spatiotemporal aggregation transformer (STAT) is proposed. First, in order to collect sufficient event information to estimate the problem considered, STAT uses a density-based adaptive sampling (DAS) module to sample continuous event stream into multiple groups adaptively. This module can determine the sampling termination condition by quantifying the velocity and size of objects. Second, STAT integrates a sparse event tensor (SET) to establish compatibility between event stream and traditional vision algorithms. SET maps events to a dense representation by end-to-end fitting the optimal mapping function, mitigating the loss of spatiotemporal information within the event stream. Finally, in order to enhance the features of slowly moving objects, a lightweight and efficient triaxial vision transformer (TVT) is designed for modeling global features and integrating historical motion information. Experimental evaluations on two benchmark datasets show that the performance of STAT achieves a mean average precision (mAP) of 68.2% and 49.9% on the Neuromorphic-Caltech101 (N-Caltech101) dataset and the Gen1 dataset, respectively. These results demonstrate that the detection accuracy of STAT outperforms the state-of-the-art methods by 2.0% on the Gen1 dataset. The code of this project is available at https://github.com/TJU-guozhaoxuan/STAT .

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Sensors Journal 工程技术-工程：电子与电气

CiteScore

7.70

自引率

14.00%

发文量

2058

审稿时长

5.2 months

期刊介绍： The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice