Pub Date : 2014-06-23DOI: 10.1109/WACV.2014.6836040
Sheng Chen, Zhongyuan Feng, Qingkai Lu, Behrooz Mahasseni, Trevor Fiez, Alan Fern, S. Todorovic
This paper presents a vision system for recognizing the sequence of plays in amateur videos of American football games (e.g. offense, defense, kickoff, punt, etc). The system is aimed at reducing user effort in annotating football videos, which are posted on a web service used by over 13,000 high school, college, and professional football teams. Recognizing football plays is particularly challenging in the context of such a web service, due to the huge variations across videos, in terms of camera viewpoint, motion, distance from the field, as well as amateur camerawork quality, and lighting conditions, among other factors. Given a sequence of videos, where each shows a particular play of a football game, we first run noisy play-level detectors on every video. Then, we integrate responses of the play-level detectors with global game-level reasoning which accounts for statistical knowledge about football games. Our empirical results on more than 1450 videos from 10 diverse football games show that our approach is quite effective, and close to being usable in a real-world setting.
{"title":"Play type recognition in real-world football video","authors":"Sheng Chen, Zhongyuan Feng, Qingkai Lu, Behrooz Mahasseni, Trevor Fiez, Alan Fern, S. Todorovic","doi":"10.1109/WACV.2014.6836040","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836040","url":null,"abstract":"This paper presents a vision system for recognizing the sequence of plays in amateur videos of American football games (e.g. offense, defense, kickoff, punt, etc). The system is aimed at reducing user effort in annotating football videos, which are posted on a web service used by over 13,000 high school, college, and professional football teams. Recognizing football plays is particularly challenging in the context of such a web service, due to the huge variations across videos, in terms of camera viewpoint, motion, distance from the field, as well as amateur camerawork quality, and lighting conditions, among other factors. Given a sequence of videos, where each shows a particular play of a football game, we first run noisy play-level detectors on every video. Then, we integrate responses of the play-level detectors with global game-level reasoning which accounts for statistical knowledge about football games. Our empirical results on more than 1450 videos from 10 diverse football games show that our approach is quite effective, and close to being usable in a real-world setting.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"1 1","pages":"652-659"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83172964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836115
Aggeliki Tsoli, M. Loper, Michael J. Black
Extracting anthropometric or tailoring measurements from 3D human body scans is important for applications such as virtual try-on, custom clothing, and online sizing. Existing commercial solutions identify anatomical landmarks on high-resolution 3D scans and then compute distances or circumferences on the scan. Landmark detection is sensitive to acquisition noise (e.g. holes) and these methods require subjects to adopt a specific pose. In contrast, we propose a solution we call model-based anthropometry. We fit a deformable 3D body model to scan data in one or more poses; this model-based fitting is robust to scan noise. This brings the scan into registration with a database of registered body scans. Then, we extract features from the registered model (rather than from the scan); these include, limb lengths, circumferences, and statistical features of global shape. Finally, we learn a mapping from these features to measurements using regularized linear regression. We perform an extensive evaluation using the CAESAR dataset and demonstrate that the accuracy of our method outperforms state-of-the-art methods.
{"title":"Model-based anthropometry: Predicting measurements from 3D human scans in multiple poses","authors":"Aggeliki Tsoli, M. Loper, Michael J. Black","doi":"10.1109/WACV.2014.6836115","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836115","url":null,"abstract":"Extracting anthropometric or tailoring measurements from 3D human body scans is important for applications such as virtual try-on, custom clothing, and online sizing. Existing commercial solutions identify anatomical landmarks on high-resolution 3D scans and then compute distances or circumferences on the scan. Landmark detection is sensitive to acquisition noise (e.g. holes) and these methods require subjects to adopt a specific pose. In contrast, we propose a solution we call model-based anthropometry. We fit a deformable 3D body model to scan data in one or more poses; this model-based fitting is robust to scan noise. This brings the scan into registration with a database of registered body scans. Then, we extract features from the registered model (rather than from the scan); these include, limb lengths, circumferences, and statistical features of global shape. Finally, we learn a mapping from these features to measurements using regularized linear regression. We perform an extensive evaluation using the CAESAR dataset and demonstrate that the accuracy of our method outperforms state-of-the-art methods.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"2 1","pages":"83-90"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/WACV.2014.6836115","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72522246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836001
Naveed Akhtar, F. Shafait, A. Mian
Hyperspectral images obtained from remote sensing platforms have limited spatial resolution. Thus, each spectra measured at a pixel is usually a mixture of many pure spectral signatures (endmembers) corresponding to different materials on the ground. Hyperspectral unmixing aims at separating these mixed spectra into its constituent end-members. We formulate hyperspectral unmixing as a constrained sparse coding (CSC) problem where unmixing is performed with the help of a library of pure spectral signatures under positivity and summation constraints. We propose two different methods that perform CSC repeatedly over the hyperspectral data. However, the first method, Repeated-CSC (RCSC), systematically neglects a few spectral bands of the data each time it performs the sparse coding. Whereas the second method, Repeated Spectral Derivative (RSD), takes the spectral derivative of the data before the sparse coding stage. The spectral derivative is taken such that it is not operated on a few selected bands. Experiments on simulated and real hyperspectral data and comparison with existing state of the art show that the proposed methods achieve significantly higher accuracy. Our results demonstrate the overall robustness of RCSC to noise and better performance of RSD at high signal to noise ratio.
{"title":"Repeated constrained sparse coding with partial dictionaries for hyperspectral unmixing","authors":"Naveed Akhtar, F. Shafait, A. Mian","doi":"10.1109/WACV.2014.6836001","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836001","url":null,"abstract":"Hyperspectral images obtained from remote sensing platforms have limited spatial resolution. Thus, each spectra measured at a pixel is usually a mixture of many pure spectral signatures (endmembers) corresponding to different materials on the ground. Hyperspectral unmixing aims at separating these mixed spectra into its constituent end-members. We formulate hyperspectral unmixing as a constrained sparse coding (CSC) problem where unmixing is performed with the help of a library of pure spectral signatures under positivity and summation constraints. We propose two different methods that perform CSC repeatedly over the hyperspectral data. However, the first method, Repeated-CSC (RCSC), systematically neglects a few spectral bands of the data each time it performs the sparse coding. Whereas the second method, Repeated Spectral Derivative (RSD), takes the spectral derivative of the data before the sparse coding stage. The spectral derivative is taken such that it is not operated on a few selected bands. Experiments on simulated and real hyperspectral data and comparison with existing state of the art show that the proposed methods achieve significantly higher accuracy. Our results demonstrate the overall robustness of RCSC to noise and better performance of RSD at high signal to noise ratio.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"44 1","pages":"953-960"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79335744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836082
Jing Huang, Suya You
This paper focuses on detecting parts in laser-scanned data of a cluttered industrial scene. To achieve the goal, we propose a robust object detection system based on segmentation and matching, as well as an adaptive segmentation algorithm and an efficient pose extraction algorithm based on correspondence filtering. We also propose an overlapping-based criterion that exploits more information of the original point cloud than the number-of-matching criterion that only considers key-points. Experiments show how each component works and the results demonstrate the performance of our system compared to the state of the art.
{"title":"Segmentation and matching: Towards a robust object detection system","authors":"Jing Huang, Suya You","doi":"10.1109/WACV.2014.6836082","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836082","url":null,"abstract":"This paper focuses on detecting parts in laser-scanned data of a cluttered industrial scene. To achieve the goal, we propose a robust object detection system based on segmentation and matching, as well as an adaptive segmentation algorithm and an efficient pose extraction algorithm based on correspondence filtering. We also propose an overlapping-based criterion that exploits more information of the original point cloud than the number-of-matching criterion that only considers key-points. Experiments show how each component works and the results demonstrate the performance of our system compared to the state of the art.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"30 1","pages":"325-332"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80935878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836032
Gautam Singh, J. Kosecka
Traditional approaches for semantic segmentation work in a supervised setting assuming a fixed number of semantic categories and require sufficiently large training sets. The performance of various approaches is often reported in terms of average per pixel class accuracy and global accuracy of the final labeling. When applying the learned models in the practical settings on large amounts of unlabeled data, possibly containing previously unseen categories, it is important to properly quantify their performance by measuring a classifier's introspective capability. We quantify the confidence of the region classifiers in the context of a non-parametric k-nearest neighbor (k-NN) framework for semantic segmentation by using the so called strangeness measure. The proposed measure is evaluated by introducing confidence based image ranking and showing its feasibility on a dataset containing a large number of previously unseen categories.
{"title":"Introspective semantic segmentation","authors":"Gautam Singh, J. Kosecka","doi":"10.1109/WACV.2014.6836032","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836032","url":null,"abstract":"Traditional approaches for semantic segmentation work in a supervised setting assuming a fixed number of semantic categories and require sufficiently large training sets. The performance of various approaches is often reported in terms of average per pixel class accuracy and global accuracy of the final labeling. When applying the learned models in the practical settings on large amounts of unlabeled data, possibly containing previously unseen categories, it is important to properly quantify their performance by measuring a classifier's introspective capability. We quantify the confidence of the region classifiers in the context of a non-parametric k-nearest neighbor (k-NN) framework for semantic segmentation by using the so called strangeness measure. The proposed measure is evaluated by introducing confidence based image ranking and showing its feasibility on a dataset containing a large number of previously unseen categories.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"12 1","pages":"714-720"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82218125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836117
J. Mak, Mauricio Hess-Flores, S. Recker, John Douglas Owens, K. Joy
This paper presents a framework for GPU-accelerated N-view triangulation in multi-view reconstruction that improves processing time and final reprojection error with respect to methods in the literature. The framework uses an algorithm based on optimizing an angular error-based L1 cost function and it is shown how adaptive gradient descent can be applied for convergence. The triangulation algorithm is mapped onto the GPU and two approaches for parallelization are compared: one thread per track and one thread block per track. The better performing approach depends on the number of tracks and the lengths of the tracks in the dataset. Furthermore, the algorithm uses statistical sampling based on confidence levels to successfully reduce the quantity of feature track positions needed to triangulate an entire track. Sampling aids in load balancing for the GPU's SIMD architecture and for exploiting the GPU's memory hierarchy. When compared to a serial implementation, a typical performance increase of 3-4× can be achieved on a 4-core CPU. On a GPU, large track numbers are favorable and an increase of up to 40× can be achieved. Results on real and synthetic data prove that reprojection errors are similar to the best performing current triangulation methods but costing only a fraction of the computation time, allowing for efficient and accurate triangulation of large scenes.
{"title":"GPU-accelerated and efficient multi-view triangulation for scene reconstruction","authors":"J. Mak, Mauricio Hess-Flores, S. Recker, John Douglas Owens, K. Joy","doi":"10.1109/WACV.2014.6836117","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836117","url":null,"abstract":"This paper presents a framework for GPU-accelerated N-view triangulation in multi-view reconstruction that improves processing time and final reprojection error with respect to methods in the literature. The framework uses an algorithm based on optimizing an angular error-based L1 cost function and it is shown how adaptive gradient descent can be applied for convergence. The triangulation algorithm is mapped onto the GPU and two approaches for parallelization are compared: one thread per track and one thread block per track. The better performing approach depends on the number of tracks and the lengths of the tracks in the dataset. Furthermore, the algorithm uses statistical sampling based on confidence levels to successfully reduce the quantity of feature track positions needed to triangulate an entire track. Sampling aids in load balancing for the GPU's SIMD architecture and for exploiting the GPU's memory hierarchy. When compared to a serial implementation, a typical performance increase of 3-4× can be achieved on a 4-core CPU. On a GPU, large track numbers are favorable and an increase of up to 40× can be achieved. Results on real and synthetic data prove that reprojection errors are similar to the best performing current triangulation methods but costing only a fraction of the computation time, allowing for efficient and accurate triangulation of large scenes.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"110 1","pages":"61-68"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88247327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836037
Long Sha, P. Lucey, S. Sridharan, S. Morgan, D. Pease
In elite sports, nearly all performances are captured on video. Despite the massive amounts of video that has been captured in this domain over the last 10-15 years, most of it remains in an “unstructured” or “raw” form, meaning it can only be viewed or manually annotated/tagged with higher-level event labels which is time consuming and subjective. As such, depending on the detail or depth of annotation, the value of the collected repositories of archived data is minimal as it does not lend itself to large-scale analysis and retrieval. One such example is swimming, where each race of a swimmer is captured on a camcorder and in-addition to the split-times (i.e., the time it takes for each lap), stroke rate and stroke-lengths are manually annotated. In this paper, we propose a vision-based system which effectively “digitizes” a large collection of archived swimming races by estimating the location of the swimmer in each frame, as well as detecting the stroke rate. As the videos are captured from moving hand-held cameras which are located at different positions and angles, we show our hierarchical-based approach to tracking the swimmer and their different parts is robust to these issues and allows us to accurately estimate the swimmer location and stroke rates.
{"title":"Understanding and analyzing a large collection of archived swimming videos","authors":"Long Sha, P. Lucey, S. Sridharan, S. Morgan, D. Pease","doi":"10.1109/WACV.2014.6836037","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836037","url":null,"abstract":"In elite sports, nearly all performances are captured on video. Despite the massive amounts of video that has been captured in this domain over the last 10-15 years, most of it remains in an “unstructured” or “raw” form, meaning it can only be viewed or manually annotated/tagged with higher-level event labels which is time consuming and subjective. As such, depending on the detail or depth of annotation, the value of the collected repositories of archived data is minimal as it does not lend itself to large-scale analysis and retrieval. One such example is swimming, where each race of a swimmer is captured on a camcorder and in-addition to the split-times (i.e., the time it takes for each lap), stroke rate and stroke-lengths are manually annotated. In this paper, we propose a vision-based system which effectively “digitizes” a large collection of archived swimming races by estimating the location of the swimmer in each frame, as well as detecting the stroke rate. As the videos are captured from moving hand-held cameras which are located at different positions and angles, we show our hierarchical-based approach to tracking the swimmer and their different parts is robust to these issues and allows us to accurately estimate the swimmer location and stroke rates.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"29 1","pages":"674-681"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82766189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836044
H. Rahmani, A. Mahmood, D. Huynh, A. Mian
We propose an algorithm which combines the discriminative information from depth images as well as from 3D joint positions to achieve high action recognition accuracy. To avoid the suppression of subtle discriminative information and also to handle local occlusions, we compute a vector of many independent local features. Each feature encodes spatiotemporal variations of depth and depth gradients at a specific space-time location in the action volume. Moreover, we encode the dominant skeleton movements by computing a local 3D joint position difference histogram. For each joint, we compute a 3D space-time motion volume which we use as an importance indicator and incorporate in the feature vector for improved action discrimination. To retain only the discriminant features, we train a random decision forest (RDF). The proposed algorithm is evaluated on three standard datasets and compared with nine state-of-the-art algorithms. Experimental results show that, on the average, the proposed algorithm outperform all other algorithms in accuracy and have a processing speed of over 112 frames/second.
{"title":"Real time action recognition using histograms of depth gradients and random decision forests","authors":"H. Rahmani, A. Mahmood, D. Huynh, A. Mian","doi":"10.1109/WACV.2014.6836044","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836044","url":null,"abstract":"We propose an algorithm which combines the discriminative information from depth images as well as from 3D joint positions to achieve high action recognition accuracy. To avoid the suppression of subtle discriminative information and also to handle local occlusions, we compute a vector of many independent local features. Each feature encodes spatiotemporal variations of depth and depth gradients at a specific space-time location in the action volume. Moreover, we encode the dominant skeleton movements by computing a local 3D joint position difference histogram. For each joint, we compute a 3D space-time motion volume which we use as an importance indicator and incorporate in the feature vector for improved action discrimination. To retain only the discriminant features, we train a random decision forest (RDF). The proposed algorithm is evaluated on three standard datasets and compared with nine state-of-the-art algorithms. Experimental results show that, on the average, the proposed algorithm outperform all other algorithms in accuracy and have a processing speed of over 112 frames/second.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"1 1","pages":"626-633"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88786006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836065
Pan Ji, M. Salzmann, Hongdong Li
In this paper, we tackle the problem of clustering data points drawn from a union of linear (or affine) subspaces. To this end, we introduce an efficient subspace clustering algorithm that estimates dense connections between the points lying in the same subspace. In particular, instead of following the standard compressive sensing approach, we formulate subspace clustering as a Frobenius norm minimization problem, which inherently yields denser con- nections between the data points. While in the noise-free case we rely on the self-expressiveness of the observations, in the presence of noise we simultaneously learn a clean dictionary to represent the data. Our formulation lets us address the subspace clustering problem efficiently. More specifically, the solution can be obtained in closed-form for outlier-free observations, and by performing a series of linear operations in the presence of outliers. Interestingly, we show that our Frobenius norm formulation shares the same solution as the popular nuclear norm minimization approach when the data is free of any noise, or, in the case of corrupted data, when a clean dictionary is learned. Our experimental evaluation on motion segmentation and face clustering demonstrates the benefits of our algorithm in terms of clustering accuracy and efficiency.
{"title":"Efficient dense subspace clustering","authors":"Pan Ji, M. Salzmann, Hongdong Li","doi":"10.1109/WACV.2014.6836065","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836065","url":null,"abstract":"In this paper, we tackle the problem of clustering data points drawn from a union of linear (or affine) subspaces. To this end, we introduce an efficient subspace clustering algorithm that estimates dense connections between the points lying in the same subspace. In particular, instead of following the standard compressive sensing approach, we formulate subspace clustering as a Frobenius norm minimization problem, which inherently yields denser con- nections between the data points. While in the noise-free case we rely on the self-expressiveness of the observations, in the presence of noise we simultaneously learn a clean dictionary to represent the data. Our formulation lets us address the subspace clustering problem efficiently. More specifically, the solution can be obtained in closed-form for outlier-free observations, and by performing a series of linear operations in the presence of outliers. Interestingly, we show that our Frobenius norm formulation shares the same solution as the popular nuclear norm minimization approach when the data is free of any noise, or, in the case of corrupted data, when a clean dictionary is learned. Our experimental evaluation on motion segmentation and face clustering demonstrates the benefits of our algorithm in terms of clustering accuracy and efficiency.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"49 1","pages":"461-468"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87395999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-03-24DOI: 10.1109/WACV.2014.6836100
Ning Zhou, A. Angelova, Jianping Fan
This paper addresses a general feature indexing and retrieval scenario in which a set of features detected in the image can retrieve a relevant class of objects, or classes of objects. The main idea behind those features for general object retrieval is that they are capable of identifying and localizing some small regions or parts of the potential object. We propose a set of criteria which take advantage of the learned features to find regions in the image which likely belong to an object. We further use the features' localization capability to localize the full object of interest and its extents. The proposed approach improves the recognition performance and is very efficient. Moreover, it has the potential to be used in automatic image understanding or annotation since it can uncover regions where the objects can be found in an image.
{"title":"Generalized feature learning and indexing for object localization and recognition","authors":"Ning Zhou, A. Angelova, Jianping Fan","doi":"10.1109/WACV.2014.6836100","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836100","url":null,"abstract":"This paper addresses a general feature indexing and retrieval scenario in which a set of features detected in the image can retrieve a relevant class of objects, or classes of objects. The main idea behind those features for general object retrieval is that they are capable of identifying and localizing some small regions or parts of the potential object. We propose a set of criteria which take advantage of the learned features to find regions in the image which likely belong to an object. We further use the features' localization capability to localize the full object of interest and its extents. The proposed approach improves the recognition performance and is very efficient. Moreover, it has the potential to be used in automatic image understanding or annotation since it can uncover regions where the objects can be found in an image.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"51 1","pages":"198-204"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86694802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}