Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238308
Xiaofeng Ren, Jitendra Malik
We propose a two-class classification model for grouping. Human segmented natural images are used as positive examples. Negative examples of grouping are constructed by randomly matching human segmentations and images. In a preprocessing stage an image is over-segmented into super-pixels. We define a variety of features derived from the classical Gestalt cues, including contour, texture, brightness and good continuation. Information-theoretic analysis is applied to evaluate the power of these grouping cues. We train a linear classifier to combine these features. To demonstrate the power of the classification model, a simple algorithm is used to randomly search for good segmentations. Results are shown on a wide range of images.
{"title":"Learning a classification model for segmentation","authors":"Xiaofeng Ren, Jitendra Malik","doi":"10.1109/ICCV.2003.1238308","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238308","url":null,"abstract":"We propose a two-class classification model for grouping. Human segmented natural images are used as positive examples. Negative examples of grouping are constructed by randomly matching human segmentations and images. In a preprocessing stage an image is over-segmented into super-pixels. We define a variety of features derived from the classical Gestalt cues, including contour, texture, brightness and good continuation. Information-theoretic analysis is applied to evaluate the power of these grouping cues. We train a linear classifier to combine these features. To demonstrate the power of the classification model, a simple algorithm is used to randomly search for good segmentations. Results are shown on a wide range of images.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131934469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238372
Joachim Denzler, M. Zobel, H. Niemann
Active object tracking, for example, in surveillance tasks, becomes more and more important these days. Besides the tracking algorithms themselves methodologies have to be developed for reasonable active control of the degrees of freedom of all involved cameras. We present an information theoretic approach that allows the optimal selection of the focal lengths of two cameras during active 3D object tracking. The selection is based on the uncertainty in the 3D estimation. This allows us to resolve the trade-off between small and large focal length: in the former case, the chance is increased to keep the object in the field of view of the cameras. In the latter one, 3D estimation becomes more reliable. Also, more details are provided, for example for recognizing the objects. Beyond a rigorous mathematical framework we present real-time experiments demonstrating that we gain an improvement in 3D trajectory estimation by up to 42% in comparison with tracking using a fixed focal length.
{"title":"Information theoretic focal length selection for real-time active 3D object tracking","authors":"Joachim Denzler, M. Zobel, H. Niemann","doi":"10.1109/ICCV.2003.1238372","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238372","url":null,"abstract":"Active object tracking, for example, in surveillance tasks, becomes more and more important these days. Besides the tracking algorithms themselves methodologies have to be developed for reasonable active control of the degrees of freedom of all involved cameras. We present an information theoretic approach that allows the optimal selection of the focal lengths of two cameras during active 3D object tracking. The selection is based on the uncertainty in the 3D estimation. This allows us to resolve the trade-off between small and large focal length: in the former case, the chance is increased to keep the object in the field of view of the cameras. In the latter one, 3D estimation becomes more reliable. Also, more details are provided, for example for recognizing the objects. Beyond a rigorous mathematical framework we present real-time experiments demonstrating that we gain an improvement in 3D trajectory estimation by up to 42% in comparison with tracking using a fixed focal length.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"257 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113966006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238401
T. Bonfort, P. Sturm
We present an novel algorithm that reconstructs voxels of a general 3D specular surface from multiple images of a calibrated camera. A calibrated scene (i.e. points whose 3D coordinates are known) is reflected by the unknown specular surface onto the image plane of the camera. For every viewpoint, surface normals are associated to the voxels traversed by each projection ray formed by the reflection of a scene point. A decision process then discards voxels whose associated surface normals are not consistent with one another. The output of the algorithm is a collection of voxels and surface normals in 3D space, whose quality and size depend on user-set thresholds. The method has been tested on synthetic and real images. Visual and quantified experimental results are presented.
{"title":"Voxel carving for specular surfaces","authors":"T. Bonfort, P. Sturm","doi":"10.1109/ICCV.2003.1238401","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238401","url":null,"abstract":"We present an novel algorithm that reconstructs voxels of a general 3D specular surface from multiple images of a calibrated camera. A calibrated scene (i.e. points whose 3D coordinates are known) is reflected by the unknown specular surface onto the image plane of the camera. For every viewpoint, surface normals are associated to the voxels traversed by each projection ray formed by the reflection of a scene point. A decision process then discards voxels whose associated surface normals are not consistent with one another. The output of the algorithm is a collection of voxels and surface normals in 3D space, whose quality and size depend on user-set thresholds. The method has been tested on synthetic and real images. Visual and quantified experimental results are presented.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121199188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238656
Jan-Michael Frahm, R. Koch
We address the problem of using external rotation information with uncalibrated video sequences. The main problem addressed is, what is the benefit of the orientation information for camera calibration? It is shown that in case of a rotating camera the camera calibration problem is linear even in the case that all intrinsic parameters vary. For arbitrarily moving cameras the calibration problem is also linear but underdetermined for the general case of varying all intrinsic parameters. However, if certain constraints are applied to the intrinsic parameters the camera calibration can be computed linearly. It is analyzed which constraints are needed for camera calibration of freely moving cameras. Furthermore we address the problem of aligning the camera data with the rotation sensor data in time. We give an approach to align these data in case of a rotating camera.
{"title":"Camera calibration with known rotation","authors":"Jan-Michael Frahm, R. Koch","doi":"10.1109/ICCV.2003.1238656","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238656","url":null,"abstract":"We address the problem of using external rotation information with uncalibrated video sequences. The main problem addressed is, what is the benefit of the orientation information for camera calibration? It is shown that in case of a rotating camera the camera calibration problem is linear even in the case that all intrinsic parameters vary. For arbitrarily moving cameras the calibration problem is also linear but underdetermined for the general case of varying all intrinsic parameters. However, if certain constraints are applied to the intrinsic parameters the camera calibration can be computed linearly. It is analyzed which constraints are needed for camera calibration of freely moving cameras. Furthermore we address the problem of aligning the camera data with the rotation sensor data in time. We give an approach to align these data in case of a rotating camera.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128673736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238442
D. Cremers, Stefano Soatto
We propose a variational method for segmenting image sequences into spatiotemporal domains of homogeneous motion. To this end, we formulate the problem of motion estimation in the framework of Bayesian inference, using a prior which favors domain boundaries of minimal surface area. We derive a cost functional which depends on a surface in space-time separating a set of motion regions, as well as a set of vectors modeling the motion in each region. We propose a multiphase level set formulation of this functional, in which the surface and the motion regions are represented implicitly by a vector-valued level set function. Joint minimization of the proposed functional results in an eigenvalue problem for the motion model of each region and in a gradient descent evolution for the separating interface. Numerical results on real-world sequences demonstrate that minimization of a single cost functional generates a segmentation of space-time into multiple motion regions.
{"title":"Variational space-time motion segmentation","authors":"D. Cremers, Stefano Soatto","doi":"10.1109/ICCV.2003.1238442","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238442","url":null,"abstract":"We propose a variational method for segmenting image sequences into spatiotemporal domains of homogeneous motion. To this end, we formulate the problem of motion estimation in the framework of Bayesian inference, using a prior which favors domain boundaries of minimal surface area. We derive a cost functional which depends on a surface in space-time separating a set of motion regions, as well as a set of vectors modeling the motion in each region. We propose a multiphase level set formulation of this functional, in which the surface and the motion regions are represented implicitly by a vector-valued level set function. Joint minimization of the proposed functional results in an eigenvalue problem for the motion model of each region and in a gradient descent evolution for the separating interface. Numerical results on real-world sequences demonstrate that minimization of a single cost functional generates a segmentation of space-time into multiple motion regions.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"10 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116932111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238452
Hongcheng Wang, N. Ahuja
In this paper, we propose a novel approach for facial expression decomposition - higher-order singular value decomposition (HOSVD), a natural generalization of matrix SVD. We learn the expression subspace and person subspace from a corpus of images showing seven basic facial expressions, rather than resort to expert-coded facial expression parameters. We propose a simultaneous face and facial expression recognition algorithm, which can classify the given image into one of the seven basic facial expression categories, and then other facial expressions of the new person can be synthesized using the learned expression subspace model. The contributions of this work lie mainly in two aspects. First, we propose a new HOSVD based approach to model the mapping between persons and expressions, used for facial expression synthesis for a new person. Second, we realize simultaneous face and facial expression recognition as a result of facial expression decomposition. Experimental results are presented that illustrate the capability of the person subspace and expression subspace in both synthesis and recognition tasks. As a quantitative measure of the quality of synthesis, we propose using gradient minimum square error (GMSE) which measures the gradient difference between the original and synthesized images.
{"title":"Facial expression decomposition","authors":"Hongcheng Wang, N. Ahuja","doi":"10.1109/ICCV.2003.1238452","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238452","url":null,"abstract":"In this paper, we propose a novel approach for facial expression decomposition - higher-order singular value decomposition (HOSVD), a natural generalization of matrix SVD. We learn the expression subspace and person subspace from a corpus of images showing seven basic facial expressions, rather than resort to expert-coded facial expression parameters. We propose a simultaneous face and facial expression recognition algorithm, which can classify the given image into one of the seven basic facial expression categories, and then other facial expressions of the new person can be synthesized using the learned expression subspace model. The contributions of this work lie mainly in two aspects. First, we propose a new HOSVD based approach to model the mapping between persons and expressions, used for facial expression synthesis for a new person. Second, we realize simultaneous face and facial expression recognition as a result of facial expression decomposition. Experimental results are presented that illustrate the capability of the person subspace and expression subspace in both synthesis and recognition tasks. As a quantitative measure of the quality of synthesis, we propose using gradient minimum square error (GMSE) which measures the gradient difference between the original and synthesized images.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116153691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238424
Gregory Shakhnarovich, Paul A. Viola, Trevor Darrell
Example-based methods are effective for parameter estimation problems when the underlying system is simple or the dimensionality of the input is low. For complex and high-dimensional problems such as pose estimation, the number of required examples and the computational complexity rapidly become prohibitively high. We introduce a new algorithm that learns a set of hashing functions that efficiently index examples relevant to a particular estimation task. Our algorithm extends locality-sensitive hashing, a recently developed method to find approximate neighbors in time sublinear in the number of examples. This method depends critically on the choice of hash functions that are optimally relevant to a particular estimation problem. Experiments demonstrate that the resulting algorithm, which we call parameter-sensitive hashing, can rapidly and accurately estimate the articulated pose of human figures from a large database of example images.
{"title":"Fast pose estimation with parameter-sensitive hashing","authors":"Gregory Shakhnarovich, Paul A. Viola, Trevor Darrell","doi":"10.1109/ICCV.2003.1238424","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238424","url":null,"abstract":"Example-based methods are effective for parameter estimation problems when the underlying system is simple or the dimensionality of the input is low. For complex and high-dimensional problems such as pose estimation, the number of required examples and the computational complexity rapidly become prohibitively high. We introduce a new algorithm that learns a set of hashing functions that efficiently index examples relevant to a particular estimation task. Our algorithm extends locality-sensitive hashing, a recently developed method to find approximate neighbors in time sublinear in the number of examples. This method depends critically on the choice of hash functions that are optimally relevant to a particular estimation problem. Experiments demonstrate that the resulting algorithm, which we call parameter-sensitive hashing, can rapidly and accurately estimate the articulated pose of human figures from a large database of example images.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116280881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238377
U. Kothe
The boundaries of image regions necessarily consist of edges (in particular, step and roof edges), corners, and junctions. Currently, different algorithms are used to detect each boundary type separately, but the integration of the results into a single boundary representation is difficult. Therefore, a method for the simultaneous detection of all boundary types is needed. We propose to combine responses of suitable polar separable filters into what we will call the boundary tensor. The trace of this tensor is a measure of boundary strength, while the small eigenvalue and its difference to the large one represent corner/junction and edge strengths respectively. We prove that the edge strength measure behaves like a rotationally invariant quadrature filter. A number of examples demonstrate the properties of the new method and illustrate its application to image segmentation.
{"title":"Integrated edge and junction detection with the boundary tensor","authors":"U. Kothe","doi":"10.1109/ICCV.2003.1238377","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238377","url":null,"abstract":"The boundaries of image regions necessarily consist of edges (in particular, step and roof edges), corners, and junctions. Currently, different algorithms are used to detect each boundary type separately, but the integration of the results into a single boundary representation is difficult. Therefore, a method for the simultaneous detection of all boundary types is needed. We propose to combine responses of suitable polar separable filters into what we will call the boundary tensor. The trace of this tensor is a measure of boundary strength, while the small eigenvalue and its difference to the large one represent corner/junction and edge strengths respectively. We prove that the edge strength measure behaves like a rotationally invariant quadrature filter. A number of examples demonstrate the properties of the new method and illustrate its application to image segmentation.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114450000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238312
Jing Zhong, S. Sclaroff
The algorithm presented aims to segment the foreground objects in video (e.g., people) given time-varying, textured backgrounds. Examples of time-varying backgrounds include waves on water, clouds moving, trees waving in the wind, automobile traffic, moving crowds, escalators, etc. We have developed a novel foreground-background segmentation algorithm that explicitly accounts for the nonstationary nature and clutter-like appearance of many dynamic textures. The dynamic texture is modeled by an autoregressive moving average model (ARMA). A robust Kalman filter algorithm iteratively estimates the intrinsic appearance of the dynamic texture, as well as the regions of the foreground objects. Preliminary experiments with this method have demonstrated promising results.
{"title":"Segmenting foreground objects from a dynamic textured background via a robust Kalman filter","authors":"Jing Zhong, S. Sclaroff","doi":"10.1109/ICCV.2003.1238312","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238312","url":null,"abstract":"The algorithm presented aims to segment the foreground objects in video (e.g., people) given time-varying, textured backgrounds. Examples of time-varying backgrounds include waves on water, clouds moving, trees waving in the wind, automobile traffic, moving crowds, escalators, etc. We have developed a novel foreground-background segmentation algorithm that explicitly accounts for the nonstationary nature and clutter-like appearance of many dynamic textures. The dynamic texture is modeled by an autoregressive moving average model (ARMA). A robust Kalman filter algorithm iteratively estimates the intrinsic appearance of the dynamic texture, as well as the regions of the foreground objects. Preliminary experiments with this method have demonstrated promising results.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114865942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238443
Jean-Philippe Pons, G. Hermosillo, R. Keriven, O. Faugeras
In this paper, we overcome a major drawback of the level set framework: the lack of point correspondences. We maintain explicit backward correspondences from the evolving interface to the initial one by advecting the initial point coordinates with the same speed as the level set function. Our method leads to a system of coupled Eulerian partial differential equations. We show in a variety of numerical experiments that it can handle both normal and tangential velocities, large deformations, shocks, rarefactions and topological changes. Applications are many in computer vision and elsewhere since our method can upgrade virtually any level set evolution. We complement our work with the design of non zero tangential velocities that preserve the relative area of interface patches; this feature may be crucial in such applications as computational geometry, grid generation or unfolding of the organs' surfaces, e.g. brain, in medical imaging.
{"title":"How to deal with point correspondences and tangential velocities in the level set framework","authors":"Jean-Philippe Pons, G. Hermosillo, R. Keriven, O. Faugeras","doi":"10.1109/ICCV.2003.1238443","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238443","url":null,"abstract":"In this paper, we overcome a major drawback of the level set framework: the lack of point correspondences. We maintain explicit backward correspondences from the evolving interface to the initial one by advecting the initial point coordinates with the same speed as the level set function. Our method leads to a system of coupled Eulerian partial differential equations. We show in a variety of numerical experiments that it can handle both normal and tangential velocities, large deformations, shocks, rarefactions and topological changes. Applications are many in computer vision and elsewhere since our method can upgrade virtually any level set evolution. We complement our work with the design of non zero tangential velocities that preserve the relative area of interface patches; this feature may be crucial in such applications as computational geometry, grid generation or unfolding of the organs' surfaces, e.g. brain, in medical imaging.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125420769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}