{"title":"A Two-Stream Network For Driving Hand Gesture Recognition","authors":"Yefan Zhou, Zhao Lv, Chaoqun Wang, Shengli Zhang","doi":"10.1109/ICDMW51313.2020.00079","DOIUrl":null,"url":null,"abstract":"The number of traffic accident deaths caused by driving is increasing every year, in which the improper driving behaviors account for a large proportion of traffic accidents. To alert the driver's behaviors, we design a light and fast neural network (LFNN). On this basis, we construct a convolutional two-stream interactive network framework. One stream is used to acquire the spatial information of hand appearance; the other stream is used to obtain hand movement's temporal information. The features generated by the two streams are fused and classified through a short, interactive connection network. Our network structure has been tested on the CVRR-HANDS 3D data set. The accuracy reaches up to 96.5%, which obtains an obvious improvement compared with state of the art.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The number of traffic accident deaths caused by driving is increasing every year, in which the improper driving behaviors account for a large proportion of traffic accidents. To alert the driver's behaviors, we design a light and fast neural network (LFNN). On this basis, we construct a convolutional two-stream interactive network framework. One stream is used to acquire the spatial information of hand appearance; the other stream is used to obtain hand movement's temporal information. The features generated by the two streams are fused and classified through a short, interactive connection network. Our network structure has been tested on the CVRR-HANDS 3D data set. The accuracy reaches up to 96.5%, which obtains an obvious improvement compared with state of the art.