Hongbo Zhang, Songzhi Su, Shaozi Li, Duansheng Chen, Bineng Zhong, R. Ji
{"title":"通过场景背景来观察动作","authors":"Hongbo Zhang, Songzhi Su, Shaozi Li, Duansheng Chen, Bineng Zhong, R. Ji","doi":"10.1109/VCIP.2013.6706382","DOIUrl":null,"url":null,"abstract":"Recognizing human actions is not alone, as hinted by the scene herein. In this paper, we investigate the possibility to boost the action recognition performance by exploiting their scene context associated. To this end, we model the scene as a mid-level “hidden layer” to bridge action descriptors and action categories. This is achieved via a scene topic model, in which hybrid visual descriptors including spatiotemporal action features and scene descriptors are first extracted from the video sequence. Then, we learn a joint probability distribution between scene and action by a Naive Bayesian N-earest Neighbor algorithm, which is adopted to jointly infer the action categories online by combining off-the-shelf action recognition algorithms. We demonstrate our merits by comparing to state-of-the-arts in several action recognition benchmarks.","PeriodicalId":407080,"journal":{"name":"2013 Visual Communications and Image Processing (VCIP)","volume":"283 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Seeing actions through scene context\",\"authors\":\"Hongbo Zhang, Songzhi Su, Shaozi Li, Duansheng Chen, Bineng Zhong, R. Ji\",\"doi\":\"10.1109/VCIP.2013.6706382\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recognizing human actions is not alone, as hinted by the scene herein. In this paper, we investigate the possibility to boost the action recognition performance by exploiting their scene context associated. To this end, we model the scene as a mid-level “hidden layer” to bridge action descriptors and action categories. This is achieved via a scene topic model, in which hybrid visual descriptors including spatiotemporal action features and scene descriptors are first extracted from the video sequence. Then, we learn a joint probability distribution between scene and action by a Naive Bayesian N-earest Neighbor algorithm, which is adopted to jointly infer the action categories online by combining off-the-shelf action recognition algorithms. We demonstrate our merits by comparing to state-of-the-arts in several action recognition benchmarks.\",\"PeriodicalId\":407080,\"journal\":{\"name\":\"2013 Visual Communications and Image Processing (VCIP)\",\"volume\":\"283 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Visual Communications and Image Processing (VCIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VCIP.2013.6706382\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Visual Communications and Image Processing (VCIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP.2013.6706382","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Recognizing human actions is not alone, as hinted by the scene herein. In this paper, we investigate the possibility to boost the action recognition performance by exploiting their scene context associated. To this end, we model the scene as a mid-level “hidden layer” to bridge action descriptors and action categories. This is achieved via a scene topic model, in which hybrid visual descriptors including spatiotemporal action features and scene descriptors are first extracted from the video sequence. Then, we learn a joint probability distribution between scene and action by a Naive Bayesian N-earest Neighbor algorithm, which is adopted to jointly infer the action categories online by combining off-the-shelf action recognition algorithms. We demonstrate our merits by comparing to state-of-the-arts in several action recognition benchmarks.