{"title":"Space-Time Event Clouds for Gesture Recognition: From RGB Cameras to Event Cameras","authors":"Qinyi Wang, Yexin Zhang, Junsong Yuan, Yilong Lu","doi":"10.1109/WACV.2019.00199","DOIUrl":null,"url":null,"abstract":"The recently developed event cameras can directly sense the motion in the scene by generating an asynchronous sequence of events, i.e., event streams, where each individual event (x, y, t) corresponds to the space-time location when a pixel sensor captures an intensity change. Compared with RGB cameras, event cameras are frameless but can capture much faster motion, therefore have great potential for recognizing gestures of fast motions. To deal with the unique output of event cameras, previous methods often treat event streams as time sequences, thus do not fully explore the space-time sparsity of the event stream data. In this work, we treat the event stream as a set of 3D points in space-time, i.e., space-time event clouds. To analyze event clouds and recognize gestures, we propose to leverage PointNet, a neural network architecture originally designed for matching and recognizing 3D point clouds. We further adapt PointNet to cater to event clouds for real-time gesture recognition. On the benchmark dataset of event camera based gesture recognition, i.e., IBM DVS128 Gesture dataset, our proposed method achieves a high accuracy of 97.08% and performs the best among existing methods.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"80","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV.2019.00199","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 80
Abstract
The recently developed event cameras can directly sense the motion in the scene by generating an asynchronous sequence of events, i.e., event streams, where each individual event (x, y, t) corresponds to the space-time location when a pixel sensor captures an intensity change. Compared with RGB cameras, event cameras are frameless but can capture much faster motion, therefore have great potential for recognizing gestures of fast motions. To deal with the unique output of event cameras, previous methods often treat event streams as time sequences, thus do not fully explore the space-time sparsity of the event stream data. In this work, we treat the event stream as a set of 3D points in space-time, i.e., space-time event clouds. To analyze event clouds and recognize gestures, we propose to leverage PointNet, a neural network architecture originally designed for matching and recognizing 3D point clouds. We further adapt PointNet to cater to event clouds for real-time gesture recognition. On the benchmark dataset of event camera based gesture recognition, i.e., IBM DVS128 Gesture dataset, our proposed method achieves a high accuracy of 97.08% and performs the best among existing methods.
最近开发的事件相机可以通过产生异步事件序列(即事件流)直接感知场景中的运动,其中每个单独的事件(x, y, t)对应于像素传感器捕获强度变化时的时空位置。与RGB相机相比,事件相机是无帧的,但可以捕捉更快的运动,因此在识别快速运动的手势方面具有很大的潜力。为了处理事件相机的独特输出,以往的方法往往将事件流视为时间序列,因此没有充分挖掘事件流数据的时空稀疏性。在这项工作中,我们将事件流视为时空中三维点的集合,即时空事件云。为了分析事件云和识别手势,我们建议利用PointNet,这是一种最初设计用于匹配和识别3D点云的神经网络架构。我们进一步调整PointNet,以满足实时手势识别的事件云。在基于事件相机的手势识别基准数据集即IBM DVS128手势数据集上,本文方法的准确率达到97.08%,是现有方法中准确率最高的。