Hongbo Zhang, Songzhi Su, Shaozi Li, Duansheng Chen, Bineng Zhong, R. Ji
{"title":"Seeing actions through scene context","authors":"Hongbo Zhang, Songzhi Su, Shaozi Li, Duansheng Chen, Bineng Zhong, R. Ji","doi":"10.1109/VCIP.2013.6706382","DOIUrl":null,"url":null,"abstract":"Recognizing human actions is not alone, as hinted by the scene herein. In this paper, we investigate the possibility to boost the action recognition performance by exploiting their scene context associated. To this end, we model the scene as a mid-level “hidden layer” to bridge action descriptors and action categories. This is achieved via a scene topic model, in which hybrid visual descriptors including spatiotemporal action features and scene descriptors are first extracted from the video sequence. Then, we learn a joint probability distribution between scene and action by a Naive Bayesian N-earest Neighbor algorithm, which is adopted to jointly infer the action categories online by combining off-the-shelf action recognition algorithms. We demonstrate our merits by comparing to state-of-the-arts in several action recognition benchmarks.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"283 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Visual Communications and Image Processing (VCIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP.2013.6706382","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recognizing human actions is not alone, as hinted by the scene herein. In this paper, we investigate the possibility to boost the action recognition performance by exploiting their scene context associated. To this end, we model the scene as a mid-level “hidden layer” to bridge action descriptors and action categories. This is achieved via a scene topic model, in which hybrid visual descriptors including spatiotemporal action features and scene descriptors are first extracted from the video sequence. Then, we learn a joint probability distribution between scene and action by a Naive Bayesian N-earest Neighbor algorithm, which is adopted to jointly infer the action categories online by combining off-the-shelf action recognition algorithms. We demonstrate our merits by comparing to state-of-the-arts in several action recognition benchmarks.