基于局部时空特征结构化学习的视频人体动作识别与定位

2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance Pub Date : 2010-08-29 DOI:10.1109/AVSS.2010.76

Tuan Hue Thi, Jian Zhang, Li Cheng, Li Wang, S. Satoh

{"title":"基于局部时空特征结构化学习的视频人体动作识别与定位","authors":"Tuan Hue Thi, Jian Zhang, Li Cheng, Li Wang, S. Satoh","doi":"10.1109/AVSS.2010.76","DOIUrl":null,"url":null,"abstract":"This paper presents a unified framework for human actionclassification and localization in video using structuredlearning of local space-time features. Each human actionclass is represented by a set of its own compact set of localpatches. In our approach, we first use a discriminativehierarchical Bayesian classifier to select those space-timeinterest points that are constructive for each particular action.Those concise local features are then passed to a SupportVector Machine with Principal Component Analysisprojection for the classification task. Meanwhile, the actionlocalization is done using Dynamic Conditional RandomFields developed to incorporate the spatial and temporalstructure constraints of superpixels extracted aroundthose features. Each superpixel in the video is defined by theshape and motion information of its corresponding featureregion. Compelling results obtained from experiments onKTH [22], Weizmann [1], HOHA [13] and TRECVid [23]datasets have proven the efficiency and robustness of ourframework for the task of human action recognition and localizationin video.","PeriodicalId":415758,"journal":{"name":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"Human Action Recognition and Localization in Video Using Structured Learning of Local Space-Time Features\",\"authors\":\"Tuan Hue Thi, Jian Zhang, Li Cheng, Li Wang, S. Satoh\",\"doi\":\"10.1109/AVSS.2010.76\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a unified framework for human actionclassification and localization in video using structuredlearning of local space-time features. Each human actionclass is represented by a set of its own compact set of localpatches. In our approach, we first use a discriminativehierarchical Bayesian classifier to select those space-timeinterest points that are constructive for each particular action.Those concise local features are then passed to a SupportVector Machine with Principal Component Analysisprojection for the classification task. Meanwhile, the actionlocalization is done using Dynamic Conditional RandomFields developed to incorporate the spatial and temporalstructure constraints of superpixels extracted aroundthose features. Each superpixel in the video is defined by theshape and motion information of its corresponding featureregion. Compelling results obtained from experiments onKTH [22], Weizmann [1], HOHA [13] and TRECVid [23]datasets have proven the efficiency and robustness of ourframework for the task of human action recognition and localizationin video.\",\"PeriodicalId\":415758,\"journal\":{\"name\":\"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AVSS.2010.76\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AVSS.2010.76","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

摘要

本文提出了一种基于局部时空特征结构化学习的视频中人类动作分类和定位的统一框架。每个人类动作类都由一组自己的紧凑的localpatch集表示。在我们的方法中，我们首先使用判别层次贝叶斯分类器来选择那些对每个特定动作具有建设性的时空兴趣点。然后将这些简洁的局部特征传递给具有主成分分析投影的支持向量机，用于分类任务。同时，使用动态条件随机域(Dynamic Conditional RandomFields)来完成动作定位，该随机域结合了这些特征周围提取的超像素的空间和时间结构约束。视频中的每个超像素由其对应特征区域的形状和运动信息来定义。在kth[22]、Weizmann[1]、HOHA[13]和TRECVid[23]数据集上的实验结果证明了我们的框架在视频中人类动作识别和定位任务中的有效性和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Human Action Recognition and Localization in Video Using Structured Learning of Local Space-Time Features

This paper presents a unified framework for human actionclassification and localization in video using structuredlearning of local space-time features. Each human actionclass is represented by a set of its own compact set of localpatches. In our approach, we first use a discriminativehierarchical Bayesian classifier to select those space-timeinterest points that are constructive for each particular action.Those concise local features are then passed to a SupportVector Machine with Principal Component Analysisprojection for the classification task. Meanwhile, the actionlocalization is done using Dynamic Conditional RandomFields developed to incorporate the spatial and temporalstructure constraints of superpixels extracted aroundthose features. Each superpixel in the video is defined by theshape and motion information of its corresponding featureregion. Compelling results obtained from experiments onKTH [22], Weizmann [1], HOHA [13] and TRECVid [23]datasets have proven the efficiency and robustness of ourframework for the task of human action recognition and localizationin video.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance

自引率

0.00%

发文量

期刊最新文献

Statistical Background Modeling: An Edge Segment Based Moving Object Detection Approach Who, what, when, where, why and how in video analysis: an application centric view Trajectory Based Activity Discovery Local Abnormality Detection in Video Using Subspace Learning Functionality Delegation in Distributed Surveillance Systems