{"title":"3D Convolutional Network Based Foreground Feature Fusion","authors":"Hanjian Song, Lihua Tian, Chen Li","doi":"10.1109/ISM.2018.00036","DOIUrl":null,"url":null,"abstract":"with explosion of videos, action recognition has become an important research subject. This paper makes a special effort to investigate and study 3D Convolutional Network. Focused on the problem of ConvNet dependence on multiple large scale dataset, we propose a 3D ConvNet structure which incorporate the original 3D-ConvNet features and foreground 3D-ConvNet features fused by static object and motion detection. Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, experimental results demonstrate that with merely 50% pixels utilization, foreground ConvNet achieves satisfying performance as same as origin. With feature fusion, we achieve 83.7% accuracy on UCF-101 exceeding original ConvNet.","PeriodicalId":308698,"journal":{"name":"2018 IEEE International Symposium on Multimedia (ISM)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Symposium on Multimedia (ISM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISM.2018.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
with explosion of videos, action recognition has become an important research subject. This paper makes a special effort to investigate and study 3D Convolutional Network. Focused on the problem of ConvNet dependence on multiple large scale dataset, we propose a 3D ConvNet structure which incorporate the original 3D-ConvNet features and foreground 3D-ConvNet features fused by static object and motion detection. Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, experimental results demonstrate that with merely 50% pixels utilization, foreground ConvNet achieves satisfying performance as same as origin. With feature fusion, we achieve 83.7% accuracy on UCF-101 exceeding original ConvNet.