Yamin Han, Peng Zhang, Tao Zhuo, Wei Huang, Yanning Zhang
{"title":"基于对帧运动串联的深度卷积网络视频动作识别","authors":"Yamin Han, Peng Zhang, Tao Zhuo, Wei Huang, Yanning Zhang","doi":"10.1109/CVPRW.2017.162","DOIUrl":null,"url":null,"abstract":"Deep convolution networks based strategies have shown a remarkable performance in different recognition tasks. Unfortunately, in a variety of realistic scenarios, accurate and robust recognition is hard especially for the videos. Different challenges such as cluttered backgrounds or viewpoint change etc. may generate the problem like large intrinsic and extrinsic class variations. In addition, the problem of data deficiency could also make the designed model degrade during learning and update. Therefore, an effective way by incorporating the frame-wise motion into the learning model on-the-fly has become more and more attractive in contemporary video analysis studies.,,,,,,To overcome those limitations, in this work, we proposed a deeper convolution networks based approach with pairwise motion concatenation, which is named deep temporal convolutional networks. In this work, a temporal motion accumulation mechanism has been introduced as an effective data entry for the learning of convolution networks. Specifically, to handle the possible data deficiency, beneficial practices of transferring ResNet-101 weights and data variation augmentation are also utilized for the purpose of robust recognition. Experiments on challenging dataset UCF101 and ODAR dataset have verified a preferable performance when compared with other state-of-art works.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"64 1","pages":"1226-1235"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation\",\"authors\":\"Yamin Han, Peng Zhang, Tao Zhuo, Wei Huang, Yanning Zhang\",\"doi\":\"10.1109/CVPRW.2017.162\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep convolution networks based strategies have shown a remarkable performance in different recognition tasks. Unfortunately, in a variety of realistic scenarios, accurate and robust recognition is hard especially for the videos. Different challenges such as cluttered backgrounds or viewpoint change etc. may generate the problem like large intrinsic and extrinsic class variations. In addition, the problem of data deficiency could also make the designed model degrade during learning and update. Therefore, an effective way by incorporating the frame-wise motion into the learning model on-the-fly has become more and more attractive in contemporary video analysis studies.,,,,,,To overcome those limitations, in this work, we proposed a deeper convolution networks based approach with pairwise motion concatenation, which is named deep temporal convolutional networks. In this work, a temporal motion accumulation mechanism has been introduced as an effective data entry for the learning of convolution networks. Specifically, to handle the possible data deficiency, beneficial practices of transferring ResNet-101 weights and data variation augmentation are also utilized for the purpose of robust recognition. Experiments on challenging dataset UCF101 and ODAR dataset have verified a preferable performance when compared with other state-of-art works.\",\"PeriodicalId\":6668,\"journal\":{\"name\":\"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)\",\"volume\":\"64 1\",\"pages\":\"1226-1235\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPRW.2017.162\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW.2017.162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation
Deep convolution networks based strategies have shown a remarkable performance in different recognition tasks. Unfortunately, in a variety of realistic scenarios, accurate and robust recognition is hard especially for the videos. Different challenges such as cluttered backgrounds or viewpoint change etc. may generate the problem like large intrinsic and extrinsic class variations. In addition, the problem of data deficiency could also make the designed model degrade during learning and update. Therefore, an effective way by incorporating the frame-wise motion into the learning model on-the-fly has become more and more attractive in contemporary video analysis studies.,,,,,,To overcome those limitations, in this work, we proposed a deeper convolution networks based approach with pairwise motion concatenation, which is named deep temporal convolutional networks. In this work, a temporal motion accumulation mechanism has been introduced as an effective data entry for the learning of convolution networks. Specifically, to handle the possible data deficiency, beneficial practices of transferring ResNet-101 weights and data variation augmentation are also utilized for the purpose of robust recognition. Experiments on challenging dataset UCF101 and ODAR dataset have verified a preferable performance when compared with other state-of-art works.