{"title":"Supervised dictionary learning for action localization","authors":"B. V. Kumar, I. Patras","doi":"10.1109/FG.2013.6553745","DOIUrl":null,"url":null,"abstract":"Most of the existing methods that adopt the Implicit Shape Model (ISM) for action localization learn the dictionary (codebook) in an unsupervised manner. In contrast to this, we present a supervised approach to learn a dictionary for action localization. We follow a Hough voting approach for action detection in which the spatio-temporal descriptors extracted from the videos vote for the spatio-temporal location and temporal extent of the action. We propose a framework that enables the incorporation of the localization information into the dictionary learning. More specifically we use the spatial center and temporal extent of the training sequences to learn a discriminative dictionary that maximizes the votes at the spatio-temporal center and extend of the action and minimizes the votes at the background. The above formulation results in a non-convex objective function which we minimize using alternating optimization algorithm. We demonstrate the performance of the algorithm on two publicly available action datasets where we show that the proposed method performs better than the state-of-the-art methods.","PeriodicalId":255121,"journal":{"name":"2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FG.2013.6553745","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
Most of the existing methods that adopt the Implicit Shape Model (ISM) for action localization learn the dictionary (codebook) in an unsupervised manner. In contrast to this, we present a supervised approach to learn a dictionary for action localization. We follow a Hough voting approach for action detection in which the spatio-temporal descriptors extracted from the videos vote for the spatio-temporal location and temporal extent of the action. We propose a framework that enables the incorporation of the localization information into the dictionary learning. More specifically we use the spatial center and temporal extent of the training sequences to learn a discriminative dictionary that maximizes the votes at the spatio-temporal center and extend of the action and minimizes the votes at the background. The above formulation results in a non-convex objective function which we minimize using alternating optimization algorithm. We demonstrate the performance of the algorithm on two publicly available action datasets where we show that the proposed method performs better than the state-of-the-art methods.