Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206698
Yong Jae Lee, K. Grauman
Can we discover common object shapes within unlabeled multi-category collections of images? While often a critical cue at the category-level, contour matches can be difficult to isolate reliably from edge clutter-even within labeled images from a known class, let alone unlabeled examples. We propose a shape discovery method in which local appearance (patch) matches serve to anchor the surrounding edge fragments, yielding a more reliable affinity function for images that accounts for both shape and appearance. Spectral clustering from the initial affinities provides candidate object clusters. Then, we compute the within-cluster match patterns to discern foreground edges from clutter, attributing higher weight to edges more likely to belong to a common object. In addition to discovering the object contours in each image, we show how to summarize what is found with prototypical shapes. Our results on benchmark datasets demonstrate the approach can successfully discover shapes from unlabeled images.
{"title":"Shape discovery from unlabeled image collections","authors":"Yong Jae Lee, K. Grauman","doi":"10.1109/CVPR.2009.5206698","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206698","url":null,"abstract":"Can we discover common object shapes within unlabeled multi-category collections of images? While often a critical cue at the category-level, contour matches can be difficult to isolate reliably from edge clutter-even within labeled images from a known class, let alone unlabeled examples. We propose a shape discovery method in which local appearance (patch) matches serve to anchor the surrounding edge fragments, yielding a more reliable affinity function for images that accounts for both shape and appearance. Spectral clustering from the initial affinities provides candidate object clusters. Then, we compute the within-cluster match patterns to discern foreground edges from clutter, attributing higher weight to edges more likely to belong to a common object. In addition to discovering the object contours in each image, we show how to summarize what is found with prototypical shapes. Our results on benchmark datasets demonstrate the approach can successfully discover shapes from unlabeled images.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133450344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206757
Jianchao Yang, Kai Yu, Yihong Gong, Thomas Huang
Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n2 ~ n3) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algorithms to handle more than thousands of training images. In this paper we develop an extension of the SPM method, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and propose a linear SPM kernel based on SIFT sparse codes. This new approach remarkably reduces the complexity of SVMs to O(n) in training and a constant in testing. In a number of image categorization experiments, we find that, in terms of classification accuracy, the suggested linear SPM based on sparse coding of SIFT descriptors always significantly outperforms the linear SPM kernel on histograms, and is even better than the nonlinear SPM kernels, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors.
{"title":"Linear spatial pyramid matching using sparse coding for image classification","authors":"Jianchao Yang, Kai Yu, Yihong Gong, Thomas Huang","doi":"10.1109/CVPR.2009.5206757","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206757","url":null,"abstract":"Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n2 ~ n3) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algorithms to handle more than thousands of training images. In this paper we develop an extension of the SPM method, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and propose a linear SPM kernel based on SIFT sparse codes. This new approach remarkably reduces the complexity of SVMs to O(n) in training and a constant in testing. In a number of image categorization experiments, we find that, in terms of classification accuracy, the suggested linear SPM based on sparse coding of SIFT descriptors always significantly outperforms the linear SPM kernel on histograms, and is even better than the nonlinear SPM kernels, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133774187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206530
Jan Heller, T. Pajdla
We present a general technique for rectification of a stereo pair acquired by a calibrated omnidirectional camera. Using this technique we formulate a new stereographic rectification method. Our rectification does not map epipolar curves onto lines as common rectification methods, but rather maps epipolar curves onto circles. We show that this rectification in a certain sense minimizes the distortion of the original omnidirectional images. We formulate the rectification for multiple images and show that the choice of the optimal projection center of the rectification is under certain circumstances equivalent to the classical problem of spherical minimax location. We demonstrate the behaviour and the quality of the rectification in real experiments with images from 180 degree field of view fish eye lenses.
{"title":"Stereographic rectification of omnidirectional stereo pairs","authors":"Jan Heller, T. Pajdla","doi":"10.1109/CVPR.2009.5206530","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206530","url":null,"abstract":"We present a general technique for rectification of a stereo pair acquired by a calibrated omnidirectional camera. Using this technique we formulate a new stereographic rectification method. Our rectification does not map epipolar curves onto lines as common rectification methods, but rather maps epipolar curves onto circles. We show that this rectification in a certain sense minimizes the distortion of the original omnidirectional images. We formulate the rectification for multiple images and show that the choice of the optimal projection center of the rectification is under certain circumstances equivalent to the classical problem of spherical minimax location. We demonstrate the behaviour and the quality of the rectification in real experiments with images from 180 degree field of view fish eye lenses.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"818 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132640475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206563
Tian Shen, Hongsheng Li, Z. Qian, Xiaolei Huang
In this paper, we propose a novel predictive model for object boundary, which can integrate information from any sources. The model is a dynamic “object” model whose manifestation includes a deformable surface representing shape, a volumetric interior carrying appearance statistics, and an embedded classifier that separates object from background based on current feature information. Unlike Snakes, Level Set, Graph Cut, MRF and CRF approaches, the model is “self-contained” in that it does not model the background, but rather focuses on an accurate representation of the foreground object's attributes. As we will show, however, the model is capable of reasoning about the background statistics thus can detect when is change sufficient to invoke a boundary decision. The shape of the 3D model is considered as an elastic solid, with a simplex-mesh (i.e. finite element triangulation) surface made of thousands of vertices. Deformations of the model are derived from a linear system that encodes external forces from the boundary of a Region of Interest (ROI), which is a binary mask representing the object region predicted by the current model. Efficient optimization and fast convergence of the model are achieved using the Finite Element Method (FEM). Other advantages of the model include the ease of dealing with topology changes and its ability to incorporate human interactions. Segmentation and validation results are presented for experiments on noisy 3D medical images.
{"title":"Active volume models for 3D medical image segmentation","authors":"Tian Shen, Hongsheng Li, Z. Qian, Xiaolei Huang","doi":"10.1109/CVPR.2009.5206563","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206563","url":null,"abstract":"In this paper, we propose a novel predictive model for object boundary, which can integrate information from any sources. The model is a dynamic “object” model whose manifestation includes a deformable surface representing shape, a volumetric interior carrying appearance statistics, and an embedded classifier that separates object from background based on current feature information. Unlike Snakes, Level Set, Graph Cut, MRF and CRF approaches, the model is “self-contained” in that it does not model the background, but rather focuses on an accurate representation of the foreground object's attributes. As we will show, however, the model is capable of reasoning about the background statistics thus can detect when is change sufficient to invoke a boundary decision. The shape of the 3D model is considered as an elastic solid, with a simplex-mesh (i.e. finite element triangulation) surface made of thousands of vertices. Deformations of the model are derived from a linear system that encodes external forces from the boundary of a Region of Interest (ROI), which is a binary mask representing the object region predicted by the current model. Efficient optimization and fast convergence of the model are achieved using the Finite Element Method (FEM). Other advantages of the model include the ease of dealing with topology changes and its ability to incorporate human interactions. Segmentation and validation results are presented for experiments on noisy 3D medical images.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114521024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206658
M. Leung, K. Lee, K. Wong, M. Chang
In this paper, we proposed a movable hand-held display system which uses a projector to project display content onto an ordinary cardboard which can move freely within the projection area. Such a system can give users greater freedom of control of the display such as the viewing angle and distance. At the same time, the size of the cardboard can be made to a size that fits one's application. A projector-camera pair is calibrated and used as the tracking and projection system. We present a vision based algorithm to detect an ordinary cardboard and track its subsequent motion. Display content is then pre-warped and projected onto the cardboard at the correct position. Experimental results show that our system can project onto the cardboard in reasonable precision.
{"title":"A projector-based movable hand-held display system","authors":"M. Leung, K. Lee, K. Wong, M. Chang","doi":"10.1109/CVPR.2009.5206658","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206658","url":null,"abstract":"In this paper, we proposed a movable hand-held display system which uses a projector to project display content onto an ordinary cardboard which can move freely within the projection area. Such a system can give users greater freedom of control of the display such as the viewing angle and distance. At the same time, the size of the cardboard can be made to a size that fits one's application. A projector-camera pair is calibrated and used as the tracking and projection system. We present a vision based algorithm to detect an ordinary cardboard and track its subsequent motion. Display content is then pre-warped and projected onto the cardboard at the correct position. Experimental results show that our system can project onto the cardboard in reasonable precision.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121100659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206801
Robin Hess, Alan Fern
This work presents a discriminative training method for particle filters in the context of multi-object tracking. We are motivated by the difficulty of hand-tuning the many model parameters for such applications and also by results in many application domains indicating that discriminative training is often superior to generative training methods. Our learning approach is tightly integrated into the actual inference process of the filter and attempts to directly optimize the filter parameters in response to observed errors. We present experimental results in the challenging domain of American football where our filter is trained to track all 22 players throughout football plays. The training method is shown to significantly improve performance of the tracker and to significantly outperform two recent particle-based multi-object tracking methods.
{"title":"Discriminatively trained particle filters for complex multi-object tracking","authors":"Robin Hess, Alan Fern","doi":"10.1109/CVPR.2009.5206801","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206801","url":null,"abstract":"This work presents a discriminative training method for particle filters in the context of multi-object tracking. We are motivated by the difficulty of hand-tuning the many model parameters for such applications and also by results in many application domains indicating that discriminative training is often superior to generative training methods. Our learning approach is tightly integrated into the actual inference process of the filter and attempts to directly optimize the filter parameters in response to observed errors. We present experimental results in the challenging domain of American football where our filter is trained to track all 22 players throughout football plays. The training method is shown to significantly improve performance of the tracker and to significantly outperform two recent particle-based multi-object tracking methods.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116079486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206851
Zhiwu Lu, H. Ip
This paper presents a novel semi-supervised learning method which can make use of intra-image semantic context and inter-image cluster consistency for image categorization with less labeled data. The image representation is first formed with the visual keywords generated by clustering all the blocks that we divide images into. The 2D spatial Markov chain model is then proposed to capture the semantic context across these keywords within an image. To develop a graph-based semi-supervised learning approach to image categorization, we incorporate the intra-image semantic context into a kind of spatial Markov kernel which can be used as the affinity matrix of a graph. Instead of constructing a complete graph, we resort to a k-nearest neighbor graph for label propagation with cluster consistency. To the best of our knowledge, this is the first application of kernel methods and 2D Markov models simultaneously to image categorization. Experiments on the Corel and histological image databases demonstrate that the proposed method can achieve superior results.
{"title":"Image categorization by learning with context and consistency","authors":"Zhiwu Lu, H. Ip","doi":"10.1109/CVPR.2009.5206851","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206851","url":null,"abstract":"This paper presents a novel semi-supervised learning method which can make use of intra-image semantic context and inter-image cluster consistency for image categorization with less labeled data. The image representation is first formed with the visual keywords generated by clustering all the blocks that we divide images into. The 2D spatial Markov chain model is then proposed to capture the semantic context across these keywords within an image. To develop a graph-based semi-supervised learning approach to image categorization, we incorporate the intra-image semantic context into a kind of spatial Markov kernel which can be used as the affinity matrix of a graph. Instead of constructing a complete graph, we resort to a k-nearest neighbor graph for label propagation with cluster consistency. To the best of our knowledge, this is the first application of kernel methods and 2D Markov models simultaneously to image categorization. Experiments on the Corel and histological image databases demonstrate that the proposed method can achieve superior results.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116700675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206845
Jingen Liu, Yang Yang, M. Shah
In this paper, we propose a novel approach for learning generic visual vocabulary. We use diffusion maps to automatically learn a semantic visual vocabulary from abundant quantized midlevel features. Each midlevel feature is represented by the vector of pointwise mutual information (PMI). In this midlevel feature space, we believe the features produced by similar sources must lie on a certain manifold. To capture the intrinsic geometric relations between features, we measure their dissimilarity using diffusion distance. The underlying idea is to embed the midlevel features into a semantic lower-dimensional space. Our goal is to construct a compact yet discriminative semantic visual vocabulary. Although the conventional approach using k-means is good for vocabulary construction, its performance is sensitive to the size of the visual vocabulary. In addition, the learnt visual words are not semantically meaningful since the clustering criterion is based on appearance similarity only. Our proposed approach can effectively overcome these problems by capturing the semantic and geometric relations of the feature space using diffusion maps. Unlike some of the supervised vocabulary construction approaches, and the unsupervised methods such as pLSA and LDA, diffusion maps can capture the local intrinsic geometric relations between the midlevel feature points on the manifold. We have tested our approach on the KTH action dataset, our own YouTube action dataset and the fifteen scene dataset, and have obtained very promising results.
{"title":"Learning semantic visual vocabularies using diffusion distance","authors":"Jingen Liu, Yang Yang, M. Shah","doi":"10.1109/CVPR.2009.5206845","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206845","url":null,"abstract":"In this paper, we propose a novel approach for learning generic visual vocabulary. We use diffusion maps to automatically learn a semantic visual vocabulary from abundant quantized midlevel features. Each midlevel feature is represented by the vector of pointwise mutual information (PMI). In this midlevel feature space, we believe the features produced by similar sources must lie on a certain manifold. To capture the intrinsic geometric relations between features, we measure their dissimilarity using diffusion distance. The underlying idea is to embed the midlevel features into a semantic lower-dimensional space. Our goal is to construct a compact yet discriminative semantic visual vocabulary. Although the conventional approach using k-means is good for vocabulary construction, its performance is sensitive to the size of the visual vocabulary. In addition, the learnt visual words are not semantically meaningful since the clustering criterion is based on appearance similarity only. Our proposed approach can effectively overcome these problems by capturing the semantic and geometric relations of the feature space using diffusion maps. Unlike some of the supervised vocabulary construction approaches, and the unsupervised methods such as pLSA and LDA, diffusion maps can capture the local intrinsic geometric relations between the midlevel feature points on the manifold. We have tested our approach on the KTH action dataset, our own YouTube action dataset and the fifteen scene dataset, and have obtained very promising results.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116768085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206700
Xinyu Huang, Liu Ren, Ruigang Yang
For most iris capturing scenarios, captured iris images could easily blur when the user is out of the depth of field (DOF) of the camera, or when he or she is moving. The common solution is to let the user try the capturing process again as the quality of these blurred iris images is not good enough for recognition. In this paper, we propose a novel iris deblurring algorithm that can be used to improve the robustness and nonintrusiveness for iris capture. Unlike other iris deblurring algorithms, the key feature of our algorithm is that we use the domain knowledge inherent in iris images and iris capture settings to improve the performance, which could be in the form of iris image statistics, characteristics of pupils or highlights, or even depth information from the iris capturing system itself. Our experiments on both synthetic and real data demonstrate that our deblurring algorithm can significantly restore blurred iris patterns and therefore improve the robustness of iris capture.
{"title":"Image deblurring for less intrusive iris capture","authors":"Xinyu Huang, Liu Ren, Ruigang Yang","doi":"10.1109/CVPR.2009.5206700","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206700","url":null,"abstract":"For most iris capturing scenarios, captured iris images could easily blur when the user is out of the depth of field (DOF) of the camera, or when he or she is moving. The common solution is to let the user try the capturing process again as the quality of these blurred iris images is not good enough for recognition. In this paper, we propose a novel iris deblurring algorithm that can be used to improve the robustness and nonintrusiveness for iris capture. Unlike other iris deblurring algorithms, the key feature of our algorithm is that we use the domain knowledge inherent in iris images and iris capture settings to improve the performance, which could be in the form of iris image statistics, characteristics of pupils or highlights, or even depth information from the iris capturing system itself. Our experiments on both synthetic and real data demonstrate that our deblurring algorithm can significantly restore blurred iris patterns and therefore improve the robustness of iris capture.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"2017 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115551023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206502
Junseok Kwon, Kyoung Mu Lee
We propose a novel tracking algorithm for the target of which geometric appearance changes drastically over time. To track it, we present a local patch-based appearance model and provide an efficient scheme to evolve the topology between local patches by on-line update. In the process of on-line update, the robustness of each patch in the model is estimated by a new method of measurement which analyzes the landscape of local mode of the patch. This patch can be moved, deleted or newly added, which gives more flexibility to the model. Additionally, we introduce the Basin Hopping Monte Carlo (BHMC) sampling method to our tracking problem to reduce the computational complexity and deal with the problem of getting trapped in local minima. The BHMC method makes it possible for our appearance model to consist of enough numbers of patches. Since BHMC uses the same local optimizer that is used in the appearance modeling, it can be efficiently integrated into our tracking framework. Experimental results show that our approach tracks the object whose geometric appearance is drastically changing, accurately and robustly.
{"title":"Tracking of a non-rigid object via patch-based dynamic appearance modeling and adaptive Basin Hopping Monte Carlo sampling","authors":"Junseok Kwon, Kyoung Mu Lee","doi":"10.1109/CVPR.2009.5206502","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206502","url":null,"abstract":"We propose a novel tracking algorithm for the target of which geometric appearance changes drastically over time. To track it, we present a local patch-based appearance model and provide an efficient scheme to evolve the topology between local patches by on-line update. In the process of on-line update, the robustness of each patch in the model is estimated by a new method of measurement which analyzes the landscape of local mode of the patch. This patch can be moved, deleted or newly added, which gives more flexibility to the model. Additionally, we introduce the Basin Hopping Monte Carlo (BHMC) sampling method to our tracking problem to reduce the computational complexity and deal with the problem of getting trapped in local minima. The BHMC method makes it possible for our appearance model to consist of enough numbers of patches. Since BHMC uses the same local optimizer that is used in the appearance modeling, it can be efficiently integrated into our tracking framework. Experimental results show that our approach tracks the object whose geometric appearance is drastically changing, accurately and robustly.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121513086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}