Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836056
A. Angelova, Philip M. Long
This paper presents a systematic evaluation of recent methods in the fine-grained categorization domain, which have shown significant promise. More specifically, we investigate an automatic segmentation algorithm, a region pooling algorithm which is akin to pose-normalized pooling [31] [28], and a multi-class optimization method. We considered the largest and most popular datasets for fine-grained categorization available in the field: the Caltech-UCSD 200 Birds dataset [27], the Oxford 102 Flowers dataset [19], the Stanford 120 Dogs dataset [16], and the Oxford 37 Cats and Dogs dataset [21]. We view this work from a practitioner's perspective, answering the question: what are the methods that can create the best possible fine-grained recognition system which can be applied in practice? Our experiments provide insights of the relative merit of these methods. More importantly, after combining the methods, we achieve the top results in the field, outperforming the state-of-the-art methods by 4.8% and 10.3% for birds and dogs datasets, respectively. Additionally, our method achieves a mAP of 37.92 on the of 2012 Imagenet Fine-Grained Categorization Challenge [1], which outperforms the winner of this challenge by 5.7 points.
{"title":"Benchmarking large-scale Fine-Grained Categorization","authors":"A. Angelova, Philip M. Long","doi":"10.1109/WACV.2014.6836056","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836056","url":null,"abstract":"This paper presents a systematic evaluation of recent methods in the fine-grained categorization domain, which have shown significant promise. More specifically, we investigate an automatic segmentation algorithm, a region pooling algorithm which is akin to pose-normalized pooling [31] [28], and a multi-class optimization method. We considered the largest and most popular datasets for fine-grained categorization available in the field: the Caltech-UCSD 200 Birds dataset [27], the Oxford 102 Flowers dataset [19], the Stanford 120 Dogs dataset [16], and the Oxford 37 Cats and Dogs dataset [21]. We view this work from a practitioner's perspective, answering the question: what are the methods that can create the best possible fine-grained recognition system which can be applied in practice? Our experiments provide insights of the relative merit of these methods. More importantly, after combining the methods, we achieve the top results in the field, outperforming the state-of-the-art methods by 4.8% and 10.3% for birds and dogs datasets, respectively. Additionally, our method achieves a mAP of 37.92 on the of 2012 Imagenet Fine-Grained Categorization Challenge [1], which outperforms the winner of this challenge by 5.7 points.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"83 1","pages":"532-539"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89952993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836118
Z. Kang, G. Medioni
We present a system for fast dense 3D reconstruction with a hand-held camera. Walking around a target object, we shoot sequential images using continuous shooting mode. High-quality camera poses are obtained offline using structure-from-motion (SfM) algorithm with Bundle Adjustment. Multi-view stereo is solved using a new, efficient adaptive multiscale discrete-continuous variational method to generate depth maps with sub-pixel accuracy. Depth maps are then fused into a 3D model using volumetric integration with truncated signed distance function (TSDF). Our system is accurate, efficient and flexible: accurate depth maps are estimated with sub-pixel accuracy in stereo matching; dense models can be achieved within minutes as major algorithms parallelized on multi-core processor and GPU; various tasks can be handled (e.g. reconstruction of objects in both indoor and outdoor environment with different scales) without specific hand-tuning parameters. We evaluate our system quantitatively and qualitatively on Middlebury benchmark and another dataset collected with a smartphone camera.
{"title":"Fast dense 3D reconstruction using an adaptive multiscale discrete-continuous variational method","authors":"Z. Kang, G. Medioni","doi":"10.1109/WACV.2014.6836118","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836118","url":null,"abstract":"We present a system for fast dense 3D reconstruction with a hand-held camera. Walking around a target object, we shoot sequential images using continuous shooting mode. High-quality camera poses are obtained offline using structure-from-motion (SfM) algorithm with Bundle Adjustment. Multi-view stereo is solved using a new, efficient adaptive multiscale discrete-continuous variational method to generate depth maps with sub-pixel accuracy. Depth maps are then fused into a 3D model using volumetric integration with truncated signed distance function (TSDF). Our system is accurate, efficient and flexible: accurate depth maps are estimated with sub-pixel accuracy in stereo matching; dense models can be achieved within minutes as major algorithms parallelized on multi-core processor and GPU; various tasks can be handled (e.g. reconstruction of objects in both indoor and outdoor environment with different scales) without specific hand-tuning parameters. We evaluate our system quantitatively and qualitatively on Middlebury benchmark and another dataset collected with a smartphone camera.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"5 9 1","pages":"53-60"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80468986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6835731
Abdelsalam Masoud, W. Hoff
We present an algorithm that can segment and track partial planar templates, from a sequence of images taken from a moving camera. By “partial planar template”, we mean that the template is the projection of a surface patch that is only partially planar; some of the points may correspond to other surfaces. The algorithm segments each image template to identify the pixels that belong to the dominant plane, and determines the three dimensional structure of that plane. We show that our algorithm can track such patches over a larger visual angle, compared to algorithms that assume that patches arise from a single planar surface. The new tracking algorithm is expected to improve the accuracy of visual simultaneous localization and mapping, especially in outdoor natural scenes where planar features are rare.
{"title":"Segmentation and tracking of partial planar templates","authors":"Abdelsalam Masoud, W. Hoff","doi":"10.1109/WACV.2014.6835731","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835731","url":null,"abstract":"We present an algorithm that can segment and track partial planar templates, from a sequence of images taken from a moving camera. By “partial planar template”, we mean that the template is the projection of a surface patch that is only partially planar; some of the points may correspond to other surfaces. The algorithm segments each image template to identify the pixels that belong to the dominant plane, and determines the three dimensional structure of that plane. We show that our algorithm can track such patches over a larger visual angle, compared to algorithms that assume that patches arise from a single planar surface. The new tracking algorithm is expected to improve the accuracy of visual simultaneous localization and mapping, especially in outdoor natural scenes where planar features are rare.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"432 1","pages":"1128-1133"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77509204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836102
S. Z. Gilani, F. Shafait, A. Mian
Selecting a reduced set of relevant and non-redundant features for supervised classification problems is a challenging task. We propose a gradient based feature selection method which can search the feature space efficiently and select a reduced set of representative features. We test our proposed algorithm on five small and medium sized pattern classification datasets as well as two large 3D face datasets for computer vision applications. Comparison with the state of the art wrapper and filter methods shows that our proposed technique yields better classification results in lesser number of evaluations of the target classifier. The feature subset selected by our algorithm is representative of the classes in the data and has the least variation in classification accuracy.
{"title":"Gradient based efficient feature selection","authors":"S. Z. Gilani, F. Shafait, A. Mian","doi":"10.1109/WACV.2014.6836102","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836102","url":null,"abstract":"Selecting a reduced set of relevant and non-redundant features for supervised classification problems is a challenging task. We propose a gradient based feature selection method which can search the feature space efficiently and select a reduced set of representative features. We test our proposed algorithm on five small and medium sized pattern classification datasets as well as two large 3D face datasets for computer vision applications. Comparison with the state of the art wrapper and filter methods shows that our proposed technique yields better classification results in lesser number of evaluations of the target classifier. The feature subset selected by our algorithm is representative of the classes in the data and has the least variation in classification accuracy.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"24 1","pages":"191-197"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81688893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836066
L. Talker, Y. Moses
We propose a method to precisely segment books on bookshelves in images taken from general viewpoints. The proposed segmentation algorithm overcomes difficulties due to text and texture on book spines, various book orientations under perspective projection, and book proximity. A shape dependent active contour is used as a first step to establish a set of book spine candidates. A subset of these candidates are selected using spatial constraints on the assembly of spine candidates by formulating the selection problem as the maximal weighted independent set (MWIS) of a graph. The segmented book spines may be used by recognition systems (e.g., library automation), or rendered in computer graphics applications. We also propose a novel application that uses the segmented book spines to assist users in bookshelf reorganization or to modify the image to create a bookshelf with a tidier look. Our method was successfully tested on challenging sets of images.
{"title":"Viewpoint-independent book spine segmentation","authors":"L. Talker, Y. Moses","doi":"10.1109/WACV.2014.6836066","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836066","url":null,"abstract":"We propose a method to precisely segment books on bookshelves in images taken from general viewpoints. The proposed segmentation algorithm overcomes difficulties due to text and texture on book spines, various book orientations under perspective projection, and book proximity. A shape dependent active contour is used as a first step to establish a set of book spine candidates. A subset of these candidates are selected using spatial constraints on the assembly of spine candidates by formulating the selection problem as the maximal weighted independent set (MWIS) of a graph. The segmented book spines may be used by recognition systems (e.g., library automation), or rendered in computer graphics applications. We also propose a novel application that uses the segmented book spines to assist users in bookshelf reorganization or to modify the image to create a bookshelf with a tidier look. Our method was successfully tested on challenging sets of images.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"10 1","pages":"453-460"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89840170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836125
Jie Ni, Tim K. Marks, Oncel Tuzel, F. Porikli
The goal of this research is to identify 3D geometric boundaries in a set of 2D photographs of a static indoor scene under unknown, changing lighting conditions. A 3D geometric boundary is a contour located at a 3D depth discontinuity or a discontinuity in the surface normal. These boundaries can be used effectively for reasoning about the 3D layout of a scene. To distinguish 3D geometric boundaries from 2D texture edges, we analyze the illumination subspace of local appearance at each image location. In indoor time-lapse photography and surveillance video, we frequently see images that are lit by unknown combinations of uncalibrated light sources. We introduce an algorithm for semi-binary nonnegative matrix factorization (SBNMF) to decompose such images into a set of lighting basis images, each of which shows the scene lit by a single light source. These basis images provide a natural, succinct representation of the scene, enabling tasks such as scene editing (e.g., relighting) and shadow edge identification.
{"title":"Detecting 3D geometric boundaries of indoor scenes under varying lighting","authors":"Jie Ni, Tim K. Marks, Oncel Tuzel, F. Porikli","doi":"10.1109/WACV.2014.6836125","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836125","url":null,"abstract":"The goal of this research is to identify 3D geometric boundaries in a set of 2D photographs of a static indoor scene under unknown, changing lighting conditions. A 3D geometric boundary is a contour located at a 3D depth discontinuity or a discontinuity in the surface normal. These boundaries can be used effectively for reasoning about the 3D layout of a scene. To distinguish 3D geometric boundaries from 2D texture edges, we analyze the illumination subspace of local appearance at each image location. In indoor time-lapse photography and surveillance video, we frequently see images that are lit by unknown combinations of uncalibrated light sources. We introduce an algorithm for semi-binary nonnegative matrix factorization (SBNMF) to decompose such images into a set of lighting basis images, each of which shows the scene lit by a single light source. These basis images provide a natural, succinct representation of the scene, enabling tasks such as scene editing (e.g., relighting) and shadow edge identification.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"34 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79475200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836121
Aamer Zaheer, Sohaib Khan
This paper proposes Automatic Metric Rectification of projectively distorted 3D structures for man-made scenes using Angle Regularity. Man-made scenes, such as buildings, are characterized by a profusion of mutually orthogonal planes and lines. Assuming the availability of planar segmentation, we search for the rectifying 3D homography which maximizes the number of orthogonal plane-pairs in the structure. We formulate the orthogonality constraints in terms of the Absolute Dual Quadric (ADQ). Using RANSAC, we first estimate the ADQ which maximizes the number of planes meeting at right angles. A rectifying homography recovered from the ADQ is then used as an initial guess for nonlinear refinement. Quantitative experiments show that the method is highly robust to the amount of projective distortion, the number of outliers (i.e. non-orthogonal planes) and noise in structure recovery. Unlike previous literature, this method does not rely on any knowledge of the cameras or images, and no global model, such as Manhattan World, is imposed.
{"title":"3D Metric Rectification using Angle Regularity","authors":"Aamer Zaheer, Sohaib Khan","doi":"10.1109/WACV.2014.6836121","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836121","url":null,"abstract":"This paper proposes Automatic Metric Rectification of projectively distorted 3D structures for man-made scenes using Angle Regularity. Man-made scenes, such as buildings, are characterized by a profusion of mutually orthogonal planes and lines. Assuming the availability of planar segmentation, we search for the rectifying 3D homography which maximizes the number of orthogonal plane-pairs in the structure. We formulate the orthogonality constraints in terms of the Absolute Dual Quadric (ADQ). Using RANSAC, we first estimate the ADQ which maximizes the number of planes meeting at right angles. A rectifying homography recovered from the ADQ is then used as an initial guess for nonlinear refinement. Quantitative experiments show that the method is highly robust to the amount of projective distortion, the number of outliers (i.e. non-orthogonal planes) and noise in structure recovery. Unlike previous literature, this method does not rely on any knowledge of the cameras or images, and no global model, such as Manhattan World, is imposed.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"9 1","pages":"31-36"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80787109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836042
M. V. Krishna, Joachim Denzler
Recent approaches in traffic and crowd scene analysis make extensive use of non-parametric hierarchical Bayesian models for intelligent clustering of features into activities. Although this has yielded impressive results, it requires the use of time consuming Bayesian inference during both training and classification. Therefore, we seek to limit Bayesian inference to the training stage, where unsupervised clustering is performed to extract semantically meaningful activities from the scene. In the testing stage, we use discriminative classifiers, taking advantage of their relative simplicity and fast inference. Experiments on publicly available data-sets show that our approach is comparable in classification accuracy to state-of-the-art methods and provides a significant speed-up in the testing phase.
{"title":"A combination of generative and discriminative models for fast unsupervised activity recognition from traffic scene videos","authors":"M. V. Krishna, Joachim Denzler","doi":"10.1109/WACV.2014.6836042","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836042","url":null,"abstract":"Recent approaches in traffic and crowd scene analysis make extensive use of non-parametric hierarchical Bayesian models for intelligent clustering of features into activities. Although this has yielded impressive results, it requires the use of time consuming Bayesian inference during both training and classification. Therefore, we seek to limit Bayesian inference to the training stage, where unsupervised clustering is performed to extract semantically meaningful activities from the scene. In the testing stage, we use discriminative classifiers, taking advantage of their relative simplicity and fast inference. Experiments on publicly available data-sets show that our approach is comparable in classification accuracy to state-of-the-art methods and provides a significant speed-up in the testing phase.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"67 1","pages":"640-645"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80218301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836070
R. Mihail, G. Blomquist, Nathan Jacobs
We present a new point distribution model capable of modeling joint subluxation (shifting) in rheumatoid arthritis (RA) patients and an approach to fitting this model to posteroanterior view hand radiographs. We formulate this shape fitting problem as inference in a conditional random field. This model combines potential functions that focus on specific anatomical structures and a learned shape prior. We evaluate our approach on two datasets: one containing relatively healthy hands and one containing hands of rheumatoid arthritis patients. We provide an empirical analysis of the relative value of different potential functions. We also show how to use the fitted hand skeleton to initialize a process for automatically estimating bone contours, which is a challenging, but important, problem in RA disease progression assessment.
{"title":"A CRF approach to fitting a generalized hand skeleton model","authors":"R. Mihail, G. Blomquist, Nathan Jacobs","doi":"10.1109/WACV.2014.6836070","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836070","url":null,"abstract":"We present a new point distribution model capable of modeling joint subluxation (shifting) in rheumatoid arthritis (RA) patients and an approach to fitting this model to posteroanterior view hand radiographs. We formulate this shape fitting problem as inference in a conditional random field. This model combines potential functions that focus on specific anatomical structures and a learned shape prior. We evaluate our approach on two datasets: one containing relatively healthy hands and one containing hands of rheumatoid arthritis patients. We provide an empirical analysis of the relative value of different potential functions. We also show how to use the fitted hand skeleton to initialize a process for automatically estimating bone contours, which is a challenging, but important, problem in RA disease progression assessment.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"7 1","pages":"409-416"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73842249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836038
Hisham Sager, W. Hoff
Pedestrian detection in low resolution videos can be challenging. In outdoor surveillance scenarios, the size of pedestrians in the images is often very small (around 20 pixels tall). The most common and successful approaches for single frame pedestrian detection use gradient-based features and a support vector machine classifier. We propose an extension of these ideas, and develop a new algorithm that extracts gradient features from a spatiotemporal volume, consisting of a short sequence of images (about one second in duration). The additional information provided by the motion of the person compensates for the loss of resolution. On standard datasets (PETS2001, VIRAT) we show a significant improvement in performance over single-frame detection.
{"title":"Pedestrian detection in low resolution videos","authors":"Hisham Sager, W. Hoff","doi":"10.1109/WACV.2014.6836038","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836038","url":null,"abstract":"Pedestrian detection in low resolution videos can be challenging. In outdoor surveillance scenarios, the size of pedestrians in the images is often very small (around 20 pixels tall). The most common and successful approaches for single frame pedestrian detection use gradient-based features and a support vector machine classifier. We propose an extension of these ideas, and develop a new algorithm that extracts gradient features from a spatiotemporal volume, consisting of a short sequence of images (about one second in duration). The additional information provided by the motion of the person compensates for the loss of resolution. On standard datasets (PETS2001, VIRAT) we show a significant improvement in performance over single-frame detection.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"140 1","pages":"668-673"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73910228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}