Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126395
Adriana Kovashka, Sudheendra Vijayanarasimhan, K. Grauman
We present an active learning approach to choose image annotation requests among both object category labels and the objects' attribute labels. The goal is to solicit those labels that will best use human effort when training a multi-class object recognition model. In contrast to previous work in active visual category learning, our approach directly exploits the dependencies between human-nameable visual attributes and the objects they describe, shifting its requests in either label space accordingly. We adopt a discriminative latent model that captures object-attribute and attribute-attribute relationships, and then define a suitable entropy reduction selection criterion to predict the influence a new label might have throughout those connections. On three challenging datasets, we demonstrate that the method can more successfully accelerate object learning relative to both passive learning and traditional active learning approaches.
{"title":"Actively selecting annotations among objects and attributes","authors":"Adriana Kovashka, Sudheendra Vijayanarasimhan, K. Grauman","doi":"10.1109/ICCV.2011.6126395","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126395","url":null,"abstract":"We present an active learning approach to choose image annotation requests among both object category labels and the objects' attribute labels. The goal is to solicit those labels that will best use human effort when training a multi-class object recognition model. In contrast to previous work in active visual category learning, our approach directly exploits the dependencies between human-nameable visual attributes and the objects they describe, shifting its requests in either label space accordingly. We adopt a discriminative latent model that captures object-attribute and attribute-attribute relationships, and then define a suitable entropy reduction selection criterion to predict the influence a new label might have throughout those connections. On three challenging datasets, we demonstrate that the method can more successfully accelerate object learning relative to both passive learning and traditional active learning approaches.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"11 4 1","pages":"1403-1410"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90227381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126393
Yuhang Zhang, R. Hartley, J. Mashford, S. Burn
We propose an algorithm for creating superpixels. The major step in our algorithm is simply minimizing two pseudo-Boolean functions. The processing time of our algorithm on images of moderate size is only half a second. Experiments on a benchmark dataset show that our method produces superpixels of comparable quality with existing algorithms. Last but not least, the speed of our algorithm is independent of the number of superpixels, which is usually the bottle-neck for the traditional algorithms of superpixel creation.
{"title":"Superpixels via pseudo-Boolean optimization","authors":"Yuhang Zhang, R. Hartley, J. Mashford, S. Burn","doi":"10.1109/ICCV.2011.6126393","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126393","url":null,"abstract":"We propose an algorithm for creating superpixels. The major step in our algorithm is simply minimizing two pseudo-Boolean functions. The processing time of our algorithm on images of moderate size is only half a second. Experiments on a benchmark dataset show that our method produces superpixels of comparable quality with existing algorithms. Last but not least, the speed of our algorithm is independent of the number of superpixels, which is usually the bottle-neck for the traditional algorithms of superpixel creation.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"27 1","pages":"1387-1394"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89898213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126412
Nan Wang, H. Ai
Clothing is one of the most informative cues of human appearance. In this paper, we propose a novel multi-person clothing segmentation algorithm for highly occluded images. The key idea is combining blocking models to address the person-wise occlusions. In contrary to the traditional layered model that tries to solve the full layer ranking problem, the proposed blocking model partitions the problem into a series of pair-wise ones and then determines the local blocking relationship based on individual and contextual information. Thus, it is capable of dealing with cases with a large number of people. Additionally, we propose a layout model formulated as Markov Network which incorporates the blocking relationship to pursue an approximately optimal clothing layout for group people. Experiments demonstrated on a group images dataset show the effectiveness of our algorithm.
{"title":"Who Blocks Who: Simultaneous clothing segmentation for grouping images","authors":"Nan Wang, H. Ai","doi":"10.1109/ICCV.2011.6126412","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126412","url":null,"abstract":"Clothing is one of the most informative cues of human appearance. In this paper, we propose a novel multi-person clothing segmentation algorithm for highly occluded images. The key idea is combining blocking models to address the person-wise occlusions. In contrary to the traditional layered model that tries to solve the full layer ranking problem, the proposed blocking model partitions the problem into a series of pair-wise ones and then determines the local blocking relationship based on individual and contextual information. Thus, it is capable of dealing with cases with a large number of people. Additionally, we propose a layout model formulated as Markov Network which incorporates the blocking relationship to pursue an approximately optimal clothing layout for group people. Experiments demonstrated on a group images dataset show the effectiveness of our algorithm.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"13 1","pages":"1535-1542"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83951528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126504
Y. Aytar, Andrew Zisserman
Our objective is transfer training of a discriminatively trained object category detector, in order to reduce the number of training images required. To this end we propose three transfer learning formulations where a template learnt previously for other categories is used to regularize the training of a new category. All the formulations result in convex optimization problems. Experiments (on PASCAL VOC) demonstrate significant performance gains by transfer learning from one class to another (e.g. motorbike to bicycle), including one-shot learning, specialization from class to a subordinate class (e.g. from quadruped to horse) and transfer using multiple components. In the case of multiple training samples it is shown that a detection performance approaching that of the state of the art can be achieved with substantially fewer training samples.
{"title":"Tabula rasa: Model transfer for object category detection","authors":"Y. Aytar, Andrew Zisserman","doi":"10.1109/ICCV.2011.6126504","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126504","url":null,"abstract":"Our objective is transfer training of a discriminatively trained object category detector, in order to reduce the number of training images required. To this end we propose three transfer learning formulations where a template learnt previously for other categories is used to regularize the training of a new category. All the formulations result in convex optimization problems. Experiments (on PASCAL VOC) demonstrate significant performance gains by transfer learning from one class to another (e.g. motorbike to bicycle), including one-shot learning, specialization from class to a subordinate class (e.g. from quadruped to horse) and transfer using multiple components. In the case of multiple training samples it is shown that a detection performance approaching that of the state of the art can be achieved with substantially fewer training samples.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"21 1","pages":"2252-2259"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89092045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126249
N. Fan
In this paper, a robust regression method is proposed for human age estimation, in which, outlier samples are corrected by their neighbors, through asymptotically increasing the correlation coefficients between the desired distances and the distances of sample labels. As another extension, we adopt a nonlinear distance function and approximate it by neural network. For fair comparison, we also experiment on the regression problem of age estimation from face images, and the results are very competitive among the state of the art.
{"title":"Learning nonlinear distance functions using neural network for regression with application to robust human age estimation","authors":"N. Fan","doi":"10.1109/ICCV.2011.6126249","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126249","url":null,"abstract":"In this paper, a robust regression method is proposed for human age estimation, in which, outlier samples are corrected by their neighbors, through asymptotically increasing the correlation coefficients between the desired distances and the distances of sample labels. As another extension, we adopt a nonlinear distance function and approximate it by neural network. For fair comparison, we also experiment on the regression problem of age estimation from face images, and the results are very competitive among the state of the art.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"100 1","pages":"249-254"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76198381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126507
Marius Leordeanu, Andrei Zanfir, C. Sminchisescu
Graph and hypergraph matching are important problems in computer vision. They are successfully used in many applications requiring 2D or 3D feature matching, such as 3D reconstruction and object recognition. While graph matching is limited to using pairwise relationships, hypergraph matching permits the use of relationships between sets of features of any order. Consequently, it carries the promise to make matching more robust to changes in scale, deformations and outliers. In this paper we make two contributions. First, we present a first semi-supervised algorithm for learning the parameters that control the hypergraph matching model and demonstrate experimentally that it significantly improves the performance of current state-of-the-art methods. Second, we propose a novel efficient hypergraph matching algorithm, which outperforms the state-of-the-art, and, when used in combination with other higher-order matching algorithms, it consistently improves their performance.
{"title":"Semi-supervised learning and optimization for hypergraph matching","authors":"Marius Leordeanu, Andrei Zanfir, C. Sminchisescu","doi":"10.1109/ICCV.2011.6126507","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126507","url":null,"abstract":"Graph and hypergraph matching are important problems in computer vision. They are successfully used in many applications requiring 2D or 3D feature matching, such as 3D reconstruction and object recognition. While graph matching is limited to using pairwise relationships, hypergraph matching permits the use of relationships between sets of features of any order. Consequently, it carries the promise to make matching more robust to changes in scale, deformations and outliers. In this paper we make two contributions. First, we present a first semi-supervised algorithm for learning the parameters that control the hypergraph matching model and demonstrate experimentally that it significantly improves the performance of current state-of-the-art methods. Second, we propose a novel efficient hypergraph matching algorithm, which outperforms the state-of-the-art, and, when used in combination with other higher-order matching algorithms, it consistently improves their performance.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"217 1","pages":"2274-2281"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76732263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126452
Georgios Tzimiropoulos, S. Zafeiriou, M. Pantic
We propose a correlation-based approach to parametric object alignment particularly suitable for face analysis applications which require efficiency and robustness against occlusions and illumination changes. Our algorithm registers two images by iteratively maximizing their correlation coefficient using gradient ascent. We compute this correlation coefficient from complex gradients which capture the orientation of image structures rather than pixel intensities. The maximization of this gradient correlation coefficient results in an algorithm which is as computationally efficient as ℓ2 norm-based algorithms, can be extended within the inverse compositional framework (without the need for Hessian re-computation) and is robust to outliers. To the best of our knowledge, no other algorithm has been proposed so far having all three features. We show the robustness of our algorithm for the problem of face alignment in the presence of occlusions and non-uniform illumination changes. The code that reproduces the results of our paper can be found at http://ibug.doc.ic.ac.uk/resources.
{"title":"Robust and efficient parametric face alignment","authors":"Georgios Tzimiropoulos, S. Zafeiriou, M. Pantic","doi":"10.1109/ICCV.2011.6126452","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126452","url":null,"abstract":"We propose a correlation-based approach to parametric object alignment particularly suitable for face analysis applications which require efficiency and robustness against occlusions and illumination changes. Our algorithm registers two images by iteratively maximizing their correlation coefficient using gradient ascent. We compute this correlation coefficient from complex gradients which capture the orientation of image structures rather than pixel intensities. The maximization of this gradient correlation coefficient results in an algorithm which is as computationally efficient as ℓ2 norm-based algorithms, can be extended within the inverse compositional framework (without the need for Hessian re-computation) and is robust to outliers. To the best of our knowledge, no other algorithm has been proposed so far having all three features. We show the robustness of our algorithm for the problem of face alignment in the presence of occlusions and non-uniform illumination changes. The code that reproduces the results of our paper can be found at http://ibug.doc.ic.ac.uk/resources.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"175 1","pages":"1847-1854"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72679098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126372
Yong Xu, Yuhui Quan, Haibin Ling, Hui Ji
In this paper, we developed a novel tool called dynamic fractal analysis for dynamic texture (DT) classification, which not only provides a rich description of DT but also has strong robustness to environmental changes. The resulting dynamic fractal spectrum (DFS) for DT sequences consists of two components: One is the volumetric dynamic fractal spectrum component (V-DFS) that captures the stochastic self-similarities of DT sequences as 3D volume datasets; the other is the multi-slice dynamic fractal spectrum component (S-DFS) that encodes fractal structures of DT sequences on 2D slices along different views of the 3D volume. Various types of measures of DT sequences are collected in our approach to analyze DT sequences from different perspectives. The experimental evaluation is conducted on three widely used benchmark datasets. In all the experiments, our method demonstrated excellent performance in comparison with state-of-the-art approaches.
{"title":"Dynamic texture classification using dynamic fractal analysis","authors":"Yong Xu, Yuhui Quan, Haibin Ling, Hui Ji","doi":"10.1109/ICCV.2011.6126372","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126372","url":null,"abstract":"In this paper, we developed a novel tool called dynamic fractal analysis for dynamic texture (DT) classification, which not only provides a rich description of DT but also has strong robustness to environmental changes. The resulting dynamic fractal spectrum (DFS) for DT sequences consists of two components: One is the volumetric dynamic fractal spectrum component (V-DFS) that captures the stochastic self-similarities of DT sequences as 3D volume datasets; the other is the multi-slice dynamic fractal spectrum component (S-DFS) that encodes fractal structures of DT sequences on 2D slices along different views of the 3D volume. Various types of measures of DT sequences are collected in our approach to analyze DT sequences from different perspectives. The experimental evaluation is conducted on three widely used benchmark datasets. In all the experiments, our method demonstrated excellent performance in comparison with state-of-the-art approaches.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"30 1","pages":"1219-1226"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79889611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126406
Josip Krapac, J. Verbeek, F. Jurie
We introduce an extension of bag-of-words image representations to encode spatial layout. Using the Fisher kernel framework we derive a representation that encodes the spatial mean and the variance of image regions associated with visual words. We extend this representation by using a Gaussian mixture model to encode spatial layout, and show that this model is related to a soft-assign version of the spatial pyramid representation. We also combine our representation of spatial layout with the use of Fisher kernels to encode the appearance of local features. Through an extensive experimental evaluation, we show that our representation yields state-of-the-art image categorization results, while being more compact than spatial pyramid representations. In particular, using Fisher kernels to encode both appearance and spatial layout results in an image representation that is computationally efficient, compact, and yields excellent performance while using linear classifiers.
{"title":"Modeling spatial layout with fisher vectors for image categorization","authors":"Josip Krapac, J. Verbeek, F. Jurie","doi":"10.1109/ICCV.2011.6126406","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126406","url":null,"abstract":"We introduce an extension of bag-of-words image representations to encode spatial layout. Using the Fisher kernel framework we derive a representation that encodes the spatial mean and the variance of image regions associated with visual words. We extend this representation by using a Gaussian mixture model to encode spatial layout, and show that this model is related to a soft-assign version of the spatial pyramid representation. We also combine our representation of spatial layout with the use of Fisher kernels to encode the appearance of local features. Through an extensive experimental evaluation, we show that our representation yields state-of-the-art image categorization results, while being more compact than spatial pyramid representations. In particular, using Fisher kernels to encode both appearance and spatial layout results in an image representation that is computationally efficient, compact, and yields excellent performance while using linear classifiers.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"13 1","pages":"1487-1494"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73333703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126410
Chen-Kuo Chiang, Chih-Hsueh Duan, S. Lai, Shih-Fu Chang
A novel component-level dictionary learning framework which exploits image group characteristics within sparse coding is introduced in this work. Unlike previous methods, which select the dictionaries that best reconstruct the data, we present an energy minimization formulation that jointly optimizes the learning of both sparse dictionary and component level importance within one unified framework to give a discriminative representation for image groups. The importance measures how well each feature component represents the image group property with the dictionary by using histogram information. Then, dictionaries are updated iteratively to reduce the influence of unimportant components, thus refining the sparse representation for each image group. In the end, by keeping the top K important components, a compact representation is derived for the sparse coding dictionary. Experimental results on several public datasets are shown to demonstrate the superior performance of the proposed algorithm compared to the-state-of-the-art methods.
{"title":"Learning component-level sparse representation using histogram information for image classification","authors":"Chen-Kuo Chiang, Chih-Hsueh Duan, S. Lai, Shih-Fu Chang","doi":"10.1109/ICCV.2011.6126410","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126410","url":null,"abstract":"A novel component-level dictionary learning framework which exploits image group characteristics within sparse coding is introduced in this work. Unlike previous methods, which select the dictionaries that best reconstruct the data, we present an energy minimization formulation that jointly optimizes the learning of both sparse dictionary and component level importance within one unified framework to give a discriminative representation for image groups. The importance measures how well each feature component represents the image group property with the dictionary by using histogram information. Then, dictionaries are updated iteratively to reduce the influence of unimportant components, thus refining the sparse representation for each image group. In the end, by keeping the top K important components, a compact representation is derived for the sparse coding dictionary. Experimental results on several public datasets are shown to demonstrate the superior performance of the proposed algorithm compared to the-state-of-the-art methods.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"35 1","pages":"1519-1526"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87010103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}