Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126307
Qiang Qiu, Zhuolin Jiang, R. Chellappa
We present an approach for dictionary learning of action attributes via information maximization. We unify the class distribution and appearance information into an objective function for learning a sparse dictionary of action attributes. The objective function maximizes the mutual information between what has been learned and what remains to be learned in terms of appearance information and class distribution for each dictionary item. We propose a Gaussian Process (GP) model for sparse representation to optimize the dictionary objective function. The sparse coding property allows a kernel with a compact support in GP to realize a very efficient dictionary learning process. Hence we can describe an action video by a set of compact and discriminative action attributes. More importantly, we can recognize modeled action categories in a sparse feature space, which can be generalized to unseen and unmodeled action categories. Experimental results demonstrate the effectiveness of our approach in action recognition applications.
{"title":"Sparse dictionary-based representation and recognition of action attributes","authors":"Qiang Qiu, Zhuolin Jiang, R. Chellappa","doi":"10.1109/ICCV.2011.6126307","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126307","url":null,"abstract":"We present an approach for dictionary learning of action attributes via information maximization. We unify the class distribution and appearance information into an objective function for learning a sparse dictionary of action attributes. The objective function maximizes the mutual information between what has been learned and what remains to be learned in terms of appearance information and class distribution for each dictionary item. We propose a Gaussian Process (GP) model for sparse representation to optimize the dictionary objective function. The sparse coding property allows a kernel with a compact support in GP to realize a very efficient dictionary learning process. Hence we can describe an action video by a set of compact and discriminative action attributes. More importantly, we can recognize modeled action categories in a sparse feature space, which can be generalized to unseen and unmodeled action categories. Experimental results demonstrate the effectiveness of our approach in action recognition applications.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"58 1","pages":"707-714"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83947116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126220
Christine Chen, D. Vaquero, M. Turk
A class of techniques in computer vision and graphics is based on capturing multiple images of a scene under different illumination conditions. These techniques explore variations in illumination from image to image to extract interesting information about the scene. However, their applicability to dynamic environments is limited due to the need for robust motion compensation algorithms. To overcome this issue, we propose a method to separate multiple illuminants from a single image. Given an image of a scene simultaneously illuminated by multiple light sources, our method generates individual images as if they had been illuminated by each of the light sources separately. To facilitate the illumination separation process, we encode each light source with a distinct sinusoidal pattern, strategically selected given the relative position of each light with respect to the camera, such that the observed sinusoids become independent of the scene geometry. The individual illuminants are then demultiplexed by analyzing local frequencies. We show applications of our approach in image-based relighting, photometric stereo, and multiflash imaging.
{"title":"Illumination demultiplexing from a single image","authors":"Christine Chen, D. Vaquero, M. Turk","doi":"10.1109/ICCV.2011.6126220","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126220","url":null,"abstract":"A class of techniques in computer vision and graphics is based on capturing multiple images of a scene under different illumination conditions. These techniques explore variations in illumination from image to image to extract interesting information about the scene. However, their applicability to dynamic environments is limited due to the need for robust motion compensation algorithms. To overcome this issue, we propose a method to separate multiple illuminants from a single image. Given an image of a scene simultaneously illuminated by multiple light sources, our method generates individual images as if they had been illuminated by each of the light sources separately. To facilitate the illumination separation process, we encode each light source with a distinct sinusoidal pattern, strategically selected given the relative position of each light with respect to the camera, such that the observed sinusoids become independent of the scene geometry. The individual illuminants are then demultiplexed by analyzing local frequencies. We show applications of our approach in image-based relighting, photometric stereo, and multiflash imaging.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"2 1","pages":"17-24"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84217797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126477
A. J. Ma, P. Yuen
This paper addresses the independent assumption issue in fusion process. In the last decade, dependency modeling techniques were developed under a specific distribution of classifiers. This paper proposes a new framework to model the dependency between features without any assumption on feature/classifier distribution. In this paper, we prove that feature dependency can be modeled by a linear combination of the posterior probabilities under some mild assumptions. Based on the linear combination property, two methods, namely Linear Classifier Dependency Modeling (LCDM) and Linear Feature Dependency Modeling (LFDM), are derived and developed for dependency modeling in classifier level and feature level, respectively. The optimal models for LCDM and LFDM are learned by maximizing the margin between the genuine and imposter posterior probabilities. Both synthetic data and real datasets are used for experiments. Experimental results show that LFDM outperforms all existing combination methods.
{"title":"Linear dependency modeling for feature fusion","authors":"A. J. Ma, P. Yuen","doi":"10.1109/ICCV.2011.6126477","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126477","url":null,"abstract":"This paper addresses the independent assumption issue in fusion process. In the last decade, dependency modeling techniques were developed under a specific distribution of classifiers. This paper proposes a new framework to model the dependency between features without any assumption on feature/classifier distribution. In this paper, we prove that feature dependency can be modeled by a linear combination of the posterior probabilities under some mild assumptions. Based on the linear combination property, two methods, namely Linear Classifier Dependency Modeling (LCDM) and Linear Feature Dependency Modeling (LFDM), are derived and developed for dependency modeling in classifier level and feature level, respectively. The optimal models for LCDM and LFDM are learned by maximizing the margin between the genuine and imposter posterior probabilities. Both synthetic data and real datasets are used for experiments. Experimental results show that LFDM outperforms all existing combination methods.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"25 1","pages":"2041-2048"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82734205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126425
Nobuyuki Morioka, S. Satoh
Spatial relationships between local features are thought to play a vital role in representing object categories. However, learning a compact set of higher-order spatial features based on visual words, e.g., doublets and triplets, remains a challenging problem as possible combinations of visual words grow exponentially. While the local pairwise codebook achieves a compact codebook of pairs of spatially close local features without feature selection, its formulation is not scale invariant and is only suitable for densely sampled local features. In contrast, the proximity distribution kernel is a scale-invariant and robust representation capturing rich spatial proximity information between local features, but its representation grows quadratically in the number of visual words. Inspired by the two abovementioned techniques, this paper presents the compact correlation coding that combines the strengths of the two. Our method achieves a compact representation that is scaleinvariant and robust against object deformation. In addition, we adopt sparse coding instead of k-means clustering during the codebook construction to increase the discriminative power of our method. We systematically evaluate our method against both the local pairwise codebook and proximity distribution kernel on several challenging object categorization datasets to show performance improvements.
{"title":"Compact correlation coding for visual object categorization","authors":"Nobuyuki Morioka, S. Satoh","doi":"10.1109/ICCV.2011.6126425","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126425","url":null,"abstract":"Spatial relationships between local features are thought to play a vital role in representing object categories. However, learning a compact set of higher-order spatial features based on visual words, e.g., doublets and triplets, remains a challenging problem as possible combinations of visual words grow exponentially. While the local pairwise codebook achieves a compact codebook of pairs of spatially close local features without feature selection, its formulation is not scale invariant and is only suitable for densely sampled local features. In contrast, the proximity distribution kernel is a scale-invariant and robust representation capturing rich spatial proximity information between local features, but its representation grows quadratically in the number of visual words. Inspired by the two abovementioned techniques, this paper presents the compact correlation coding that combines the strengths of the two. Our method achieves a compact representation that is scaleinvariant and robust against object deformation. In addition, we adopt sparse coding instead of k-means clustering during the codebook construction to increase the discriminative power of our method. We systematically evaluate our method against both the local pairwise codebook and proximity distribution kernel on several challenging object categorization datasets to show performance improvements.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"146 1","pages":"1639-1646"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80541833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126454
Jie Luo, T. Tommasi, B. Caputo
The vast majority of transfer learning methods proposed in the visual recognition domain over the last years addresses the problem of object category detection, assuming a strong control over the priors from which transfer is done. This is a strict condition, as it concretely limits the use of this type of approach in several settings: for instance, it does not allow in general to use off-the-shelf models as priors. Moreover, the lack of a multiclass formulation for most of the existing transfer learning algorithms prevents using them for object categorization problems, where their use might be beneficial, especially when the number of categories grows and it becomes harder to get enough annotated data for training standard learning methods. This paper presents a multiclass transfer learning algorithm that allows to take advantage of priors built over different features and with different learning methods than the one used for learning the new task. We use the priors as experts, and transfer their outputs to the new incoming samples as additional information. We cast the learning problem within the Multi Kernel Learning framework. The resulting formulation solves efficiently a joint optimization problem that determines from where and how much to transfer, with a principled multiclass formulation. Extensive experiments illustrate the value of this approach.
{"title":"Multiclass transfer learning from unconstrained priors","authors":"Jie Luo, T. Tommasi, B. Caputo","doi":"10.1109/ICCV.2011.6126454","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126454","url":null,"abstract":"The vast majority of transfer learning methods proposed in the visual recognition domain over the last years addresses the problem of object category detection, assuming a strong control over the priors from which transfer is done. This is a strict condition, as it concretely limits the use of this type of approach in several settings: for instance, it does not allow in general to use off-the-shelf models as priors. Moreover, the lack of a multiclass formulation for most of the existing transfer learning algorithms prevents using them for object categorization problems, where their use might be beneficial, especially when the number of categories grows and it becomes harder to get enough annotated data for training standard learning methods. This paper presents a multiclass transfer learning algorithm that allows to take advantage of priors built over different features and with different learning methods than the one used for learning the new task. We use the priors as experts, and transfer their outputs to the new incoming samples as additional information. We cast the learning problem within the Multi Kernel Learning framework. The resulting formulation solves efficiently a joint optimization problem that determines from where and how much to transfer, with a principled multiclass formulation. Extensive experiments illustrate the value of this approach.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"1 1","pages":"1863-1870"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89473765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126219
Aurélien Lucchi, Yunpeng Li, X. Boix, Kevin Smith, P. Fua
Many state-of-the-art segmentation algorithms rely on Markov or Conditional Random Field models designed to enforce spatial and global consistency constraints. This is often accomplished by introducing additional latent variables to the model, which can greatly increase its complexity. As a result, estimating the model parameters or computing the best maximum a posteriori (MAP) assignment becomes a computationally expensive task. In a series of experiments on the PASCAL and the MSRC datasets, we were unable to find evidence of a significant performance increase attributed to the introduction of such constraints. On the contrary, we found that similar levels of performance can be achieved using a much simpler design that essentially ignores these constraints. This more simple approach makes use of the same local and global features to leverage evidence from the image, but instead directly biases the preferences of individual pixels. While our investigation does not prove that spatial and consistency constraints are not useful in principle, it points to the conclusion that they should be validated in a larger context.
{"title":"Are spatial and global constraints really necessary for segmentation?","authors":"Aurélien Lucchi, Yunpeng Li, X. Boix, Kevin Smith, P. Fua","doi":"10.1109/ICCV.2011.6126219","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126219","url":null,"abstract":"Many state-of-the-art segmentation algorithms rely on Markov or Conditional Random Field models designed to enforce spatial and global consistency constraints. This is often accomplished by introducing additional latent variables to the model, which can greatly increase its complexity. As a result, estimating the model parameters or computing the best maximum a posteriori (MAP) assignment becomes a computationally expensive task. In a series of experiments on the PASCAL and the MSRC datasets, we were unable to find evidence of a significant performance increase attributed to the introduction of such constraints. On the contrary, we found that similar levels of performance can be achieved using a much simpler design that essentially ignores these constraints. This more simple approach makes use of the same local and global features to leverage evidence from the image, but instead directly biases the preferences of individual pixels. While our investigation does not prove that spatial and consistency constraints are not useful in principle, it points to the conclusion that they should be validated in a larger context.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"47 1","pages":"9-16"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89839707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126463
Zohaib Khan, A. Mian, Yiqun Hu
We propose ‘Contour Code’, a novel representation and binary hash table encoding for multispectral palmprint recognition. We first present a reliable technique for the extraction of a region of interest (ROI) from palm images acquired with non-contact sensors. The Contour Code representation is then derived from the Nonsubsampled Contourlet Transform. A uniscale pyramidal filter is convolved with the ROI followed by the application of a directional filter bank. The dominant directional subband establishes the orientation at each pixel and the index corresponding to this subband is encoded in the Contour Code representation. Unlike existing representations which extract orientation features directly from the palm images, the Contour Code uses a two stage filtering to extract robust orientation features. The Contour Code is binarized into an efficient hash table structure that only requires indexing and summation operations for simultaneous one-to-many matching with an embedded score level fusion of multiple bands. We quantitatively evaluate the accuracy of the ROI extraction by comparison with a manually produced ground truth. Multispectral palmprint verification results on the PolyU and CASIA databases show that the Contour Code achieves an EER reduction upto 50%, compared to state-of-the-art methods.
{"title":"Contour Code: Robust and efficient multispectral palmprint encoding for human recognition","authors":"Zohaib Khan, A. Mian, Yiqun Hu","doi":"10.1109/ICCV.2011.6126463","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126463","url":null,"abstract":"We propose ‘Contour Code’, a novel representation and binary hash table encoding for multispectral palmprint recognition. We first present a reliable technique for the extraction of a region of interest (ROI) from palm images acquired with non-contact sensors. The Contour Code representation is then derived from the Nonsubsampled Contourlet Transform. A uniscale pyramidal filter is convolved with the ROI followed by the application of a directional filter bank. The dominant directional subband establishes the orientation at each pixel and the index corresponding to this subband is encoded in the Contour Code representation. Unlike existing representations which extract orientation features directly from the palm images, the Contour Code uses a two stage filtering to extract robust orientation features. The Contour Code is binarized into an efficient hash table structure that only requires indexing and summation operations for simultaneous one-to-many matching with an embedded score level fusion of multiple bands. We quantitatively evaluate the accuracy of the ROI extraction by comparison with a manually produced ground truth. Multispectral palmprint verification results on the PolyU and CASIA databases show that the Contour Code achieves an EER reduction upto 50%, compared to state-of-the-art methods.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"42 1","pages":"1935-1942"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90672924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126497
T. Haines, T. Xiang
In the security domain a key problem is identifying rare behaviours of interest. Training examples for these behaviours may or may not exist, and if they do exist there will be few examples, quite probably one. We present a novel weakly supervised algorithm that can detect behaviours that either have never before been seen or for which there are few examples. Global context is modelled, allowing the detection of abnormal behaviours that in isolation appear normal. Pragmatic aspects are considered, such that no parameter tuning is required and real time performance is achieved.
{"title":"Delta-Dual Hierarchical Dirichlet Processes: A pragmatic abnormal behaviour detector","authors":"T. Haines, T. Xiang","doi":"10.1109/ICCV.2011.6126497","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126497","url":null,"abstract":"In the security domain a key problem is identifying rare behaviours of interest. Training examples for these behaviours may or may not exist, and if they do exist there will be few examples, quite probably one. We present a novel weakly supervised algorithm that can detect behaviours that either have never before been seen or for which there are few examples. Global context is modelled, allowing the detection of abnormal behaviours that in isolation appear normal. Pragmatic aspects are considered, such that no parameter tuning is required and real time performance is achieved.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"1 1","pages":"2198-2205"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91389644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126449
T. Tuytelaars, Mario Fritz, Kate Saenko, Trevor Darrell
Naive Bayes Nearest Neighbor (NBNN) has recently been proposed as a powerful, non-parametric approach for object classification, that manages to achieve remarkably good results thanks to the avoidance of a vector quantization step and the use of image-to-class comparisons, yielding good generalization. In this paper, we introduce a kernelized version of NBNN. This way, we can learn the classifier in a discriminative setting. Moreover, it then becomes straightforward to combine it with other kernels. In particular, we show that our NBNN kernel is complementary to standard bag-of-features based kernels, focussing on local generalization as opposed to global image composition. By combining them, we achieve state-of-the-art results on Caltech101 and 15 Scenes datasets. As a side contribution, we also investigate how to speed up the NBNN computations.
{"title":"The NBNN kernel","authors":"T. Tuytelaars, Mario Fritz, Kate Saenko, Trevor Darrell","doi":"10.1109/ICCV.2011.6126449","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126449","url":null,"abstract":"Naive Bayes Nearest Neighbor (NBNN) has recently been proposed as a powerful, non-parametric approach for object classification, that manages to achieve remarkably good results thanks to the avoidance of a vector quantization step and the use of image-to-class comparisons, yielding good generalization. In this paper, we introduce a kernelized version of NBNN. This way, we can learn the classifier in a discriminative setting. Moreover, it then becomes straightforward to combine it with other kernels. In particular, we show that our NBNN kernel is complementary to standard bag-of-features based kernels, focussing on local generalization as opposed to global image composition. By combining them, we achieve state-of-the-art results on Caltech101 and 15 Scenes datasets. As a side contribution, we also investigate how to speed up the NBNN computations.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"426 1","pages":"1824-1831"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77859656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-06DOI: 10.1109/ICCV.2011.6126295
David Tsai, Yushi Jing, Yi Liu, H. Rowley, Sergey Ioffe, James M. Rehg
We address the problem of large-scale annotation of web images. Our approach is based on the concept of visual synset, which is an organization of images which are visually-similar and semantically-related. Each visual synset represents a single prototypical visual concept, and has an associated set of weighted annotations. Linear SVM's are utilized to predict the visual synset membership for unseen image examples, and a weighted voting rule is used to construct a ranked list of predicted annotations from a set of visual synsets. We demonstrate that visual synsets lead to better performance than standard methods on a new annotation database containing more than 200 million im- ages and 300 thousand annotations, which is the largest ever reported
{"title":"Large-scale image annotation using visual synset","authors":"David Tsai, Yushi Jing, Yi Liu, H. Rowley, Sergey Ioffe, James M. Rehg","doi":"10.1109/ICCV.2011.6126295","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126295","url":null,"abstract":"We address the problem of large-scale annotation of web images. Our approach is based on the concept of visual synset, which is an organization of images which are visually-similar and semantically-related. Each visual synset represents a single prototypical visual concept, and has an associated set of weighted annotations. Linear SVM's are utilized to predict the visual synset membership for unseen image examples, and a weighted voting rule is used to construct a ranked list of predicted annotations from a set of visual synsets. We demonstrate that visual synsets lead to better performance than standard methods on a new annotation database containing more than 200 million im- ages and 300 thousand annotations, which is the largest ever reported","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"27 1","pages":"611-618"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78037490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}