{"title":"利用神经形态视觉传感器进行物体检测的时空聚合变换器","authors":"Zhaoxuan Guo;Jiandong Gao;Guangyuan Ma;Jiangtao Xu","doi":"10.1109/JSEN.2024.3392973","DOIUrl":null,"url":null,"abstract":"To enhance the accuracy of object detection with event-based neuromorphic vision sensors, a novel event-based detector named spatiotemporal aggregation transformer (STAT) is proposed. First, in order to collect sufficient event information to estimate the problem considered, STAT uses a density-based adaptive sampling (DAS) module to sample continuous event stream into multiple groups adaptively. This module can determine the sampling termination condition by quantifying the velocity and size of objects. Second, STAT integrates a sparse event tensor (SET) to establish compatibility between event stream and traditional vision algorithms. SET maps events to a dense representation by end-to-end fitting the optimal mapping function, mitigating the loss of spatiotemporal information within the event stream. Finally, in order to enhance the features of slowly moving objects, a lightweight and efficient triaxial vision transformer (TVT) is designed for modeling global features and integrating historical motion information. Experimental evaluations on two benchmark datasets show that the performance of STAT achieves a mean average precision (mAP) of 68.2% and 49.9% on the Neuromorphic-Caltech101 (N-Caltech101) dataset and the Gen1 dataset, respectively. These results demonstrate that the detection accuracy of STAT outperforms the state-of-the-art methods by 2.0% on the Gen1 dataset. The code of this project is available at \n<uri>https://github.com/TJU-guozhaoxuan/STAT</uri>\n.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spatiotemporal Aggregation Transformer for Object Detection With Neuromorphic Vision Sensors\",\"authors\":\"Zhaoxuan Guo;Jiandong Gao;Guangyuan Ma;Jiangtao Xu\",\"doi\":\"10.1109/JSEN.2024.3392973\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To enhance the accuracy of object detection with event-based neuromorphic vision sensors, a novel event-based detector named spatiotemporal aggregation transformer (STAT) is proposed. First, in order to collect sufficient event information to estimate the problem considered, STAT uses a density-based adaptive sampling (DAS) module to sample continuous event stream into multiple groups adaptively. This module can determine the sampling termination condition by quantifying the velocity and size of objects. Second, STAT integrates a sparse event tensor (SET) to establish compatibility between event stream and traditional vision algorithms. SET maps events to a dense representation by end-to-end fitting the optimal mapping function, mitigating the loss of spatiotemporal information within the event stream. Finally, in order to enhance the features of slowly moving objects, a lightweight and efficient triaxial vision transformer (TVT) is designed for modeling global features and integrating historical motion information. Experimental evaluations on two benchmark datasets show that the performance of STAT achieves a mean average precision (mAP) of 68.2% and 49.9% on the Neuromorphic-Caltech101 (N-Caltech101) dataset and the Gen1 dataset, respectively. These results demonstrate that the detection accuracy of STAT outperforms the state-of-the-art methods by 2.0% on the Gen1 dataset. The code of this project is available at \\n<uri>https://github.com/TJU-guozhaoxuan/STAT</uri>\\n.\",\"PeriodicalId\":447,\"journal\":{\"name\":\"IEEE Sensors Journal\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Sensors Journal\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10516298/\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10516298/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Spatiotemporal Aggregation Transformer for Object Detection With Neuromorphic Vision Sensors
To enhance the accuracy of object detection with event-based neuromorphic vision sensors, a novel event-based detector named spatiotemporal aggregation transformer (STAT) is proposed. First, in order to collect sufficient event information to estimate the problem considered, STAT uses a density-based adaptive sampling (DAS) module to sample continuous event stream into multiple groups adaptively. This module can determine the sampling termination condition by quantifying the velocity and size of objects. Second, STAT integrates a sparse event tensor (SET) to establish compatibility between event stream and traditional vision algorithms. SET maps events to a dense representation by end-to-end fitting the optimal mapping function, mitigating the loss of spatiotemporal information within the event stream. Finally, in order to enhance the features of slowly moving objects, a lightweight and efficient triaxial vision transformer (TVT) is designed for modeling global features and integrating historical motion information. Experimental evaluations on two benchmark datasets show that the performance of STAT achieves a mean average precision (mAP) of 68.2% and 49.9% on the Neuromorphic-Caltech101 (N-Caltech101) dataset and the Gen1 dataset, respectively. These results demonstrate that the detection accuracy of STAT outperforms the state-of-the-art methods by 2.0% on the Gen1 dataset. The code of this project is available at
https://github.com/TJU-guozhaoxuan/STAT
.
期刊介绍:
The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following:
-Sensor Phenomenology, Modelling, and Evaluation
-Sensor Materials, Processing, and Fabrication
-Chemical and Gas Sensors
-Microfluidics and Biosensors
-Optical Sensors
-Physical Sensors: Temperature, Mechanical, Magnetic, and others
-Acoustic and Ultrasonic Sensors
-Sensor Packaging
-Sensor Networks
-Sensor Applications
-Sensor Systems: Signals, Processing, and Interfaces
-Actuators and Sensor Power Systems
-Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting
-Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data)
-Sensors in Industrial Practice