Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710762
P. Torr, A. Fitzgibbon, Andrew Zisserman
In order to recover structure from images it is desirable to use many views to obtain the best possible estimates. However, whilst recovering projective structure and motion from such extended sequences problems arise that are not apparent from a general view-point/structure approach. Foremost amongst these are (a) maintaining image correspondences consistently through many images, and (b) identifying images, within the sequence, for which structure cannot be reliably recovery. Within this paper the use of multiple motion model hypotheses is explored as an aid to solve both of these problems.
{"title":"Maintaining multiple motion model hypotheses over many views to recover matching and structure","authors":"P. Torr, A. Fitzgibbon, Andrew Zisserman","doi":"10.1109/ICCV.1998.710762","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710762","url":null,"abstract":"In order to recover structure from images it is desirable to use many views to obtain the best possible estimates. However, whilst recovering projective structure and motion from such extended sequences problems arise that are not apparent from a general view-point/structure approach. Foremost amongst these are (a) maintaining image correspondences consistently through many images, and (b) identifying images, within the sequence, for which structure cannot be reliably recovery. Within this paper the use of multiple motion model hypotheses is explored as an aid to solve both of these problems.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"186 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134330211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710780
S. Ravela, R. Manmatha
A system to retrieve images using a description of the image intensity surface is presented. Gaussian derivative filters at several scales are applied to the image and low order 2D differential invariants are computed. The resulting multi-scale representation is indexed for rapid retrieval. Queries are designed by the users from an example image by selecting appropriate regions. The invariant vectors corresponding to these regions are matched with the database counterparts both in feature and coordinate space. This yields a match score per image. Images are sorted by the match score and displayed. Experiments conducted with over 1500 images of objects embedded in arbitrary backgrounds are described. It is observed that images similar in appearance and whose viewpoint is within small view variations of the query can be retrieved with an average precision of 50%.
{"title":"Retrieving images by appearance","authors":"S. Ravela, R. Manmatha","doi":"10.1109/ICCV.1998.710780","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710780","url":null,"abstract":"A system to retrieve images using a description of the image intensity surface is presented. Gaussian derivative filters at several scales are applied to the image and low order 2D differential invariants are computed. The resulting multi-scale representation is indexed for rapid retrieval. Queries are designed by the users from an example image by selecting appropriate regions. The invariant vectors corresponding to these regions are matched with the database counterparts both in feature and coordinate space. This yields a match score per image. Images are sorted by the match score and displayed. Experiments conducted with over 1500 images of objects embedded in arbitrary backgrounds are described. It is observed that images similar in appearance and whose viewpoint is within small view variations of the query can be retrieved with an average precision of 50%.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132589516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710739
Andrew D. Wilson, A. Bobick
A new method for the representation, recognition, and interpretation of parameterized gesture is presented. By parameterized gesture. We mean gestures that exhibit a meaningful variation; one example is a point gesture where the important parameter is the 2-dimensional direction. Our approach is to extend the standard hidden Markov model method of gesture recognition by including a global parametric variation in the output probabilities of the states of the HMM. Using a linear model to derive the theory, we formulated an expectation-maximization (EM) method for training the parametric HMM. During testing, the parametric HMM simultaneously recognizes the gesture and estimates the quantifying parameters. Using visually derived and directly measured 3-dimensional hand position measurements as input, we present results on two. Different movements-a size gesture and a point gesture-and show robustness with respect to noise in the input features.
{"title":"Recognition and interpretation of parametric gesture","authors":"Andrew D. Wilson, A. Bobick","doi":"10.1109/ICCV.1998.710739","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710739","url":null,"abstract":"A new method for the representation, recognition, and interpretation of parameterized gesture is presented. By parameterized gesture. We mean gestures that exhibit a meaningful variation; one example is a point gesture where the important parameter is the 2-dimensional direction. Our approach is to extend the standard hidden Markov model method of gesture recognition by including a global parametric variation in the output probabilities of the states of the HMM. Using a linear model to derive the theory, we formulated an expectation-maximization (EM) method for training the parametric HMM. During testing, the parametric HMM simultaneously recognizes the gesture and estimates the quantifying parameters. Using visually derived and directly measured 3-dimensional hand position measurements as input, we present results on two. Different movements-a size gesture and a point gesture-and show robustness with respect to noise in the input features.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116659289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710783
A. Yuille, Daniel Snow, Mark Nitzberg
We describe an approach to detecting, locating and normalizing road signs. The approach will apply provided: (i) the signs have stereotypical boundary shapes (i.e. rectangular, or hexagonal-of course, we allow for these shapes to be distorted by projection to unknown viewpoint), (ii) the writing on the sign has one uniform color and the rest of the sign has a second uniform color (we allow for the color of the illuminant to be unknown). We show that the approach works even under significant illuminant color changes, viewpoint direction, shadowing, and occlusion. This work is part of a project intended to help people who are blind, or whose sight is impaired.
{"title":"Signfinder: using color to detect, localize and identify informational signs","authors":"A. Yuille, Daniel Snow, Mark Nitzberg","doi":"10.1109/ICCV.1998.710783","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710783","url":null,"abstract":"We describe an approach to detecting, locating and normalizing road signs. The approach will apply provided: (i) the signs have stereotypical boundary shapes (i.e. rectangular, or hexagonal-of course, we allow for these shapes to be distorted by projection to unknown viewpoint), (ii) the writing on the sign has one uniform color and the rest of the sign has a second uniform color (we allow for the color of the illuminant to be unknown). We show that the approach works even under significant illuminant color changes, viewpoint direction, shadowing, and occlusion. This work is part of a project intended to help people who are blind, or whose sight is impaired.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122233547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710790
Serge J. Belongie, C. Carson, H. Greenspan, Jitendra Malik
Retrieving images from large and varied collections using image content as a key is a challenging and important problem. In this paper we present a new image representation which provides a transformation from the raw pixel data to a small set of image regions which are coherent in color and texture space. This so-called "blobworld" representation is based on segmentation using the expectation-maximization algorithm on combined color and texture features. The texture features we use for the segmentation arise from a new approach to texture description and scale selection. We describe a system that uses the blobworld representation to retrieve images. An important and unique aspect of the system is that, in the context of similarity-based querying, the user is allowed to view the internal representation of the submitted image and the query results. Similar systems do not offer the user this view into the workings of the system; consequently, the outcome of many queries on these systems can be quite inexplicable, despite the availability of knobs for adjusting the similarity metric.
{"title":"Color- and texture-based image segmentation using EM and its application to content-based image retrieval","authors":"Serge J. Belongie, C. Carson, H. Greenspan, Jitendra Malik","doi":"10.1109/ICCV.1998.710790","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710790","url":null,"abstract":"Retrieving images from large and varied collections using image content as a key is a challenging and important problem. In this paper we present a new image representation which provides a transformation from the raw pixel data to a small set of image regions which are coherent in color and texture space. This so-called \"blobworld\" representation is based on segmentation using the expectation-maximization algorithm on combined color and texture features. The texture features we use for the segmentation arise from a new approach to texture description and scale selection. We describe a system that uses the blobworld representation to retrieve images. An important and unique aspect of the system is that, in the context of similarity-based querying, the user is allowed to view the internal representation of the submitted image and the query results. Similar systems do not offer the user this view into the workings of the system; consequently, the outcome of many queries on these systems can be quite inexplicable, despite the availability of knobs for adjusting the similarity metric.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124653029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710791
Michael J. Jones, T. Poggio
We describe a flexible model for representing images of objects of a certain class, known a priori, such as faces, and introduce a new algorithm for matching it to a novel image and thereby performing image analysis. We call this model a multidimensional morphable model or just a, morphable model. The morphable model is learned from example images (called prototypes) of objects of a class. In this paper we introduce an effective stochastic gradient descent algorithm that automaticaIly matches a model to a novel image by finding the parameters that minimize the error between the image generated by the model and the novel image. Two examples demonstrate the robustness and the broad range of applicability of the matching algorithm and the underlying morphable model. Our approach can provide novel solutions to several vision tasks, including the computation of image correspondence, object verification, image synthesis and image compression.
{"title":"Multidimensional morphable models","authors":"Michael J. Jones, T. Poggio","doi":"10.1109/ICCV.1998.710791","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710791","url":null,"abstract":"We describe a flexible model for representing images of objects of a certain class, known a priori, such as faces, and introduce a new algorithm for matching it to a novel image and thereby performing image analysis. We call this model a multidimensional morphable model or just a, morphable model. The morphable model is learned from example images (called prototypes) of objects of a class. In this paper we introduce an effective stochastic gradient descent algorithm that automaticaIly matches a model to a novel image by finding the parameters that minimize the error between the image generated by the model and the novel image. Two examples demonstrate the robustness and the broad range of applicability of the matching algorithm and the underlying morphable model. Our approach can provide novel solutions to several vision tasks, including the computation of image correspondence, object verification, image synthesis and image compression.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125147543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710716
G. Sapiro
A geometric-vision approach to solve bilinear problems in general, and the color constancy and illuminant estimation problem in particular, is presented in this paper. We show a general framework, based on ideas from the generalized (probabilistic) Hough transform, to estimate the unknown variables in the bilinear form. In the case of illuminant and reflectance estimation in natural images, each image pixel "votes" for possible illuminants (or reflectance), and the estimation is based on cumulative votes. In the general case, the voting is for the parameters of the bilinear model. The framework is natural for the introduction of physical constraints. For the case of illuminant estimation, we briefly show the relation of this work with previous algorithms for color constancy, and present examples.
{"title":"Bilinear voting","authors":"G. Sapiro","doi":"10.1109/ICCV.1998.710716","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710716","url":null,"abstract":"A geometric-vision approach to solve bilinear problems in general, and the color constancy and illuminant estimation problem in particular, is presented in this paper. We show a general framework, based on ideas from the generalized (probabilistic) Hough transform, to estimate the unknown variables in the bilinear form. In the case of illuminant and reflectance estimation in natural images, each image pixel \"votes\" for possible illuminants (or reflectance), and the estimation is based on cumulative votes. In the general case, the voting is for the parameters of the bilinear model. The framework is natural for the introduction of physical constraints. For the case of illuminant estimation, we briefly show the relation of this work with previous algorithms for color constancy, and present examples.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128822729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710793
Daniel Morris, T. Kanade
In this paper we present a unified factorization algorithm for recovering structure and motion from image sequences by using point features, line segments and planes. This new formulation is based on directional uncertainty model for features. Points and line segments are both described by the same probabilistic models and so can be recovered in the same way. Prior information on the coplanarity of features is shown to fit naturally into the new factorization formulation and provides additional constraints for the shape recovery. This formulation leads to a weighted least squares motion and shape recovery problem which is solved by an efficient quasi-linear algorithm. The statistical uncertainty model also enables us to recover uncertainty estimates for the reconstructed three dimensional feature locations.
{"title":"A unified factorization algorithm for points, line segments and planes with uncertainty models","authors":"Daniel Morris, T. Kanade","doi":"10.1109/ICCV.1998.710793","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710793","url":null,"abstract":"In this paper we present a unified factorization algorithm for recovering structure and motion from image sequences by using point features, line segments and planes. This new formulation is based on directional uncertainty model for features. Points and line segments are both described by the same probabilistic models and so can be recovered in the same way. Prior information on the coplanarity of features is shown to fit naturally into the new factorization formulation and provides additional constraints for the shape recovery. This formulation leads to a weighted least squares motion and shape recovery problem which is solved by an efficient quasi-linear algorithm. The statistical uncertainty model also enables us to recover uncertainty estimates for the reconstructed three dimensional feature locations.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122567841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710861
Jianbo Shi, Jitendra Malik
We propose a motion segmentation algorithm that aims to break a scene into its most prominent moving groups. A weighted graph is constructed on the image sequence by connecting pixels that are in the spatiotemporal neighborhood of each other. At each pixel, we define motion profile vectors which capture the probability distribution of the image velocity. The distance between motion profiles is used to assign a weight on the graph edges. Using normalised cuts we find the most salient partitions of the spatiotemporal graph formed by the image sequence. For segmenting long image sequences, we have developed a recursive update procedure that incorporates knowledge of segmentation in previous frames for efficiently finding the group correspondence in the new frame.
{"title":"Motion segmentation and tracking using normalized cuts","authors":"Jianbo Shi, Jitendra Malik","doi":"10.1109/ICCV.1998.710861","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710861","url":null,"abstract":"We propose a motion segmentation algorithm that aims to break a scene into its most prominent moving groups. A weighted graph is constructed on the image sequence by connecting pixels that are in the spatiotemporal neighborhood of each other. At each pixel, we define motion profile vectors which capture the probability distribution of the image velocity. The distance between motion profiles is used to assign a weight on the graph edges. Using normalised cuts we find the most salient partitions of the spatiotemporal graph formed by the image sequence. For segmenting long image sequences, we have developed a recursive update procedure that incorporates knowledge of segmentation in previous frames for efficiently finding the group correspondence in the new frame.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"15 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120906188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710703
A. Huertas, R. Nevatia
Many applications require detecting structural changes in a scene over a period of time. Comparing intensity values of successive images is not effective as such changes don't necessarily reflect actual changes at a site but might be caused by changes in the view point, illumination and seasons. We take the approach of comparing a 3-D model of the site, prepared from previous images, with new images to infer significant changes. This task is difficult as the images and the models have very different levels of abstract representations. Our approach consists of several steps: registering a site model to a new image, model validation to confirm the presence of model objects in the image; structural change detection seeks to resolve matching problems and indicate possibly changed structures; and finally updating models to reflect the changes. Our system is able to detect missing (or mis-modeled) buildings, changes in model dimensions, and new buildings under some conditions.
{"title":"Detecting changes in aerial views of man-made structures","authors":"A. Huertas, R. Nevatia","doi":"10.1109/ICCV.1998.710703","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710703","url":null,"abstract":"Many applications require detecting structural changes in a scene over a period of time. Comparing intensity values of successive images is not effective as such changes don't necessarily reflect actual changes at a site but might be caused by changes in the view point, illumination and seasons. We take the approach of comparing a 3-D model of the site, prepared from previous images, with new images to infer significant changes. This task is difficult as the images and the models have very different levels of abstract representations. Our approach consists of several steps: registering a site model to a new image, model validation to confirm the presence of model objects in the image; structural change detection seeks to resolve matching problems and indicate possibly changed structures; and finally updating models to reflect the changes. Our system is able to detect missing (or mis-modeled) buildings, changes in model dimensions, and new buildings under some conditions.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"25 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120930000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}