{"title":"3D pooling on local space-time features for human action recognition","authors":"Najme Hadibarhaghtalab, Z. Azimifar","doi":"10.1109/IRANIANMVIP.2013.6779992","DOIUrl":null,"url":null,"abstract":"Successful approaches use local space-time features for human action recognition task including hand designed features or learned features. However these methods need a wise technique to encode local features to make a global representation for video. For this, some methods use K-means vector quantization to histogram each video as a bag of word. Pooling is a way used for global representation of an image. This method pools the local image feature over some image neighborhood. In this paper we extend pooling method called 3D pooling for global representation of video. 3D pooling represents each video by concatenating pooled feature vectors achieved from 8 equal regions of video. We also applied stacked convolutional ISA as local feature extractor. We evaluated our method on KTH data set and got our best result using max pooling. It improves the performance of highly demanded earlier methods.","PeriodicalId":297204,"journal":{"name":"2013 8th Iranian Conference on Machine Vision and Image Processing (MVIP)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 8th Iranian Conference on Machine Vision and Image Processing (MVIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRANIANMVIP.2013.6779992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Successful approaches use local space-time features for human action recognition task including hand designed features or learned features. However these methods need a wise technique to encode local features to make a global representation for video. For this, some methods use K-means vector quantization to histogram each video as a bag of word. Pooling is a way used for global representation of an image. This method pools the local image feature over some image neighborhood. In this paper we extend pooling method called 3D pooling for global representation of video. 3D pooling represents each video by concatenating pooled feature vectors achieved from 8 equal regions of video. We also applied stacked convolutional ISA as local feature extractor. We evaluated our method on KTH data set and got our best result using max pooling. It improves the performance of highly demanded earlier methods.