Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710778
D. Jacobs, D. Weinshall, Yoram Gdalyahu
One of the key problems in appearance-based vision is understanding how to use a set of labeled images to classify new images. Classification systems that can model human performance, or that use robust image matching methods, often make use of similarity judgments that are non-metric but when the triangle inequality is not obeyed, most existing pattern recognition techniques are not applicable. We note that exemplar-based (or nearest-neighbor) methods can be applied naturally when using a wide class of non-metric similarity functions. The key issue, however, is to find methods for choosing good representatives of a class that accurately characterize it. We note that existing condensing techniques for finding class representatives are ill-suited to deal with non-metric dataspaces. We then focus on developing techniques for solving this problem, emphasizing two points: First, we show that the distance between two images is not a good measure of how well one image can represent another in non-metric spaces. Instead, we use the vector correlation between the distances from each image to other previously seen images. Second, we show that in non-metric spaces, boundary points are less significant for capturing the structure of a class than they are in Euclidean spaces. We suggest that atypical points may be more important in describing classes. We demonstrate the importance of these ideas to learning that generalizes from experience by improving performance using both synthetic and real images.
{"title":"Condensing image databases when retrieval is based on non-metric distances","authors":"D. Jacobs, D. Weinshall, Yoram Gdalyahu","doi":"10.1109/ICCV.1998.710778","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710778","url":null,"abstract":"One of the key problems in appearance-based vision is understanding how to use a set of labeled images to classify new images. Classification systems that can model human performance, or that use robust image matching methods, often make use of similarity judgments that are non-metric but when the triangle inequality is not obeyed, most existing pattern recognition techniques are not applicable. We note that exemplar-based (or nearest-neighbor) methods can be applied naturally when using a wide class of non-metric similarity functions. The key issue, however, is to find methods for choosing good representatives of a class that accurately characterize it. We note that existing condensing techniques for finding class representatives are ill-suited to deal with non-metric dataspaces. We then focus on developing techniques for solving this problem, emphasizing two points: First, we show that the distance between two images is not a good measure of how well one image can represent another in non-metric spaces. Instead, we use the vector correlation between the distances from each image to other previously seen images. Second, we show that in non-metric spaces, boundary points are less significant for capturing the structure of a class than they are in Euclidean spaces. We suggest that atypical points may be more important in describing classes. We demonstrate the importance of these ideas to learning that generalizes from experience by improving performance using both synthetic and real images.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122352821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710747
B. North, A. Blake
Tracking with deformable contours in a filtering framework requires a dynamical model for prediction. For any given application, tracking is improved by having an accurate model, learned from training data. We develop a method for learning dynamical models from training sequences, explicitly taking account of the fact that training data are noisy measurements and not true states. By introducing an 'augmented-state smoothing filter' we show how the technique of Expectation-Maximisation can be applied to this problem, and show that the resulting algorithm produces more robust and accurate tracking.
{"title":"Learning dynamical models using expectation-maximisation","authors":"B. North, A. Blake","doi":"10.1109/ICCV.1998.710747","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710747","url":null,"abstract":"Tracking with deformable contours in a filtering framework requires a dynamical model for prediction. For any given application, tracking is improved by having an accurate model, learned from training data. We develop a method for learning dynamical models from training sequences, explicitly taking account of the fact that training data are noisy measurements and not true states. By introducing an 'augmented-state smoothing filter' we show how the technique of Expectation-Maximisation can be applied to this problem, and show that the resulting algorithm produces more robust and accurate tracking.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130527076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710781
R. Nelson, A. Salgian
We describe an appearance-based object recognition system using a keyed, multi-level contest representation reminiscent of certain aspects of cubist art. Specifically, we utilize distinctive intermediate-level features in this case automatically extracted 2-D boundary fragments, as keys, which are then verified within a local contest, and assembled within a loose global contest to evoke an overall percept. This system demonstrates good recognition of a variety of 3-D shapes, ranging from sports cars and fighter planes to snakes and lizards with full orthographic invariance. We report the results of large-scale tests, involving over 2000 separate test images, that evaluate performance with increasing number of items in the database, in the presence of clutter, background change, and occlusion, and also the results of some generic classification experiments where the system is tested on objects never previously seen or modeled. To our knowledge, the results we report are the best in the literature for full-sphere tests of general shapes with occlusion and clutter resistance.
{"title":"A cubist approach to object recognition","authors":"R. Nelson, A. Salgian","doi":"10.1109/ICCV.1998.710781","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710781","url":null,"abstract":"We describe an appearance-based object recognition system using a keyed, multi-level contest representation reminiscent of certain aspects of cubist art. Specifically, we utilize distinctive intermediate-level features in this case automatically extracted 2-D boundary fragments, as keys, which are then verified within a local contest, and assembled within a loose global contest to evoke an overall percept. This system demonstrates good recognition of a variety of 3-D shapes, ranging from sports cars and fighter planes to snakes and lizards with full orthographic invariance. We report the results of large-scale tests, involving over 2000 separate test images, that evaluate performance with increasing number of items in the database, in the presence of clutter, background change, and occlusion, and also the results of some generic classification experiments where the system is tested on objects never previously seen or modeled. To our knowledge, the results we report are the best in the literature for full-sphere tests of general shapes with occlusion and clutter resistance.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123325833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710742
Jonathan H. Fernyhough, A. Cohn, David C. Hogg
We describe an implemented technique for generating event models automatically based on qualitative reasoning and a statistical analysis of video input. Using an existing tracking program which generates labelled contours for objects in every frame, the view from a fixed camera is partitioned into semantically relevant regions based on the paths followed by moving objects. The paths are indexed with temporal information so objects moving along the same path at different speeds can be distinguished. Using a notion of proximity based on the speed of the moving objects and qualitative spatial reasoning techniques, event models describing the behaviour of pairs of objects can be built, again using statistical methods. The system has been tested on a traffic domain and learns various event models expressed in the qualitative calculus which represent human observable events. The system can then be used to recognise subsequent selected event occurrences or unusual behaviours.
{"title":"Building qualitative event models automatically from visual input","authors":"Jonathan H. Fernyhough, A. Cohn, David C. Hogg","doi":"10.1109/ICCV.1998.710742","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710742","url":null,"abstract":"We describe an implemented technique for generating event models automatically based on qualitative reasoning and a statistical analysis of video input. Using an existing tracking program which generates labelled contours for objects in every frame, the view from a fixed camera is partitioned into semantically relevant regions based on the paths followed by moving objects. The paths are indexed with temporal information so objects moving along the same path at different speeds can be distinguished. Using a notion of proximity based on the speed of the moving objects and qualitative spatial reasoning techniques, event models describing the behaviour of pairs of objects can be built, again using statistical methods. The system has been tested on a traffic domain and learns various event models expressed in the qualitative calculus which represent human observable events. The system can then be used to recognise subsequent selected event occurrences or unusual behaviours.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120961432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710792
C. Fermüller, Y. Aloimonos
In a practical situation, the rigid transformation relating different views is recovered with errors. In such a case, the recovered depth of the scene contains errors, and consequently a distorted version of visual space is computed. What then are meaningful shape representations that can be computed from the images? The result presented in this paper states that if the rigid transformation between different views is estimated in a way that gives rise to a minimum number of negative depth values, then at the center of the image affine shape can be correctly computed. This result is obtained by exploiting properties of the distortion function. The distortion model turns out to be a very powerful tool in the analysis and design of 3D motion and shape estimation algorithms, and as a byproduct of our analysis we present a computational explanation of psychophysical results demonstrating human visual space distortion from motion information.
{"title":"Which shape from motion?","authors":"C. Fermüller, Y. Aloimonos","doi":"10.1109/ICCV.1998.710792","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710792","url":null,"abstract":"In a practical situation, the rigid transformation relating different views is recovered with errors. In such a case, the recovered depth of the scene contains errors, and consequently a distorted version of visual space is computed. What then are meaningful shape representations that can be computed from the images? The result presented in this paper states that if the rigid transformation between different views is estimated in a way that gives rise to a minimum number of negative depth values, then at the center of the image affine shape can be correctly computed. This result is obtained by exploiting properties of the distortion function. The distortion model turns out to be a very powerful tool in the analysis and design of 3D motion and shape estimation algorithms, and as a byproduct of our analysis we present a computational explanation of psychophysical results demonstrating human visual space distortion from motion information.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121162400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710750
A. Caunce, C. Taylor
In this paper we present the first steps in the development of a statistical shape model, specifically a point distribution model (PDM), of the cortical surface of the brain. This will ultimately be used to locate, label, and describe the cortex, for visualisation, diagnosis, and quantification. In order to produce the model it was necessary to find and label the sulcal fissures on a series of MR images. Due to the complexity of the surface, an automated method was developed to facilitate development of a full surface model. Automating the marking process introduced the problem of identifying correspondences between examples, the knowledge of which is essential to the development of a PDM. Various methods were investigated to solve this problem including simple point matching and more complex curve matching. Each is outlined and discussed. The models obtained so far provide interesting insights into the shape and cortical pattern variations over a group of normal subjects.
{"title":"3D point distribution models of the cortical sulci","authors":"A. Caunce, C. Taylor","doi":"10.1109/ICCV.1998.710750","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710750","url":null,"abstract":"In this paper we present the first steps in the development of a statistical shape model, specifically a point distribution model (PDM), of the cortical surface of the brain. This will ultimately be used to locate, label, and describe the cortex, for visualisation, diagnosis, and quantification. In order to produce the model it was necessary to find and label the sulcal fissures on a series of MR images. Due to the complexity of the surface, an automated method was developed to facilitate development of a full surface model. Automating the marking process introduced the problem of identifying correspondences between examples, the knowledge of which is essential to the development of a PDM. Various methods were investigated to solve this problem including simple point matching and more complex curve matching. Each is outlined and discussed. The models obtained so far provide interesting insights into the shape and cortical pattern variations over a group of normal subjects.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130125273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710723
C. Schmid, R. Mohr, C. Bauckhage
Many computer vision tasks rely on feature extraction. Interest points are such features. This paper shows that interest points are geometrically stable under different transformations and have high information content (distinctiveness). These two properties make interest points very successful in the contest of image matching. To measure these two properties quantitatively, we introduce two evaluation criteria: repeatability rate and information content. The quality of the interest points depends on the detector used. In this paper several detectors are compared according to the criteria specified above. We determine which detector gives the best results and show that it satisfies the criteria well.
{"title":"Comparing and evaluating interest points","authors":"C. Schmid, R. Mohr, C. Bauckhage","doi":"10.1109/ICCV.1998.710723","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710723","url":null,"abstract":"Many computer vision tasks rely on feature extraction. Interest points are such features. This paper shows that interest points are geometrically stable under different transformations and have high information content (distinctiveness). These two properties make interest points very successful in the contest of image matching. To measure these two properties quantitatively, we introduce two evaluation criteria: repeatability rate and information content. The quality of the interest points depends on the detector used. In this paper several detectors are compared according to the criteria specified above. We determine which detector gives the best results and show that it satisfies the criteria well.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"912 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113995568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710815
Carlo Tomasi, R. Manduchi
Bilateral filtering smooths images while preserving edges, by means of a nonlinear combination of nearby image values. The method is noniterative, local, and simple. It combines gray levels or colors based on both their geometric closeness and their photometric similarity, and prefers near values to distant values in both domain and range. In contrast with filters that operate on the three bands of a color image separately, a bilateral filter can enforce the perceptual metric underlying the CIE-Lab color space, and smooth colors and preserve edges in a way that is tuned to human perception. Also, in contrast with standard filtering, bilateral filtering produces no phantom colors along edges in color images, and reduces phantom colors where they appear in the original image.
{"title":"Bilateral filtering for gray and color images","authors":"Carlo Tomasi, R. Manduchi","doi":"10.1109/ICCV.1998.710815","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710815","url":null,"abstract":"Bilateral filtering smooths images while preserving edges, by means of a nonlinear combination of nearby image values. The method is noniterative, local, and simple. It combines gray levels or colors based on both their geometric closeness and their photometric similarity, and prefers near values to distant values in both domain and range. In contrast with filters that operate on the three bands of a color image separately, a bilateral filter can enforce the perceptual metric underlying the CIE-Lab color space, and smooth colors and preserve edges in a way that is tuned to human perception. Also, in contrast with standard filtering, bilateral filtering produces no phantom colors along edges in color images, and reduces phantom colors where they appear in the original image.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132647141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710699
J. Bouguet, P. Perona
A simple and inexpensive approach for extracting the three-dimensional shape of objects is presented. It is based on 'weak structured lighting'; it differs from other conventional structured lighting approaches in that it requires very little hardware besides the camera: a desk-lamp, a pencil and a checker-board. The camera faces the object, which is illuminated by the desk-lamp. The user moves a pencil in front of the light source casting a moving shadow on the object. The 3D shape of the object is extracted from the spatial and temporal location of the observed shadow. Experimental results are presented on three different scenes demonstrating that the error in reconstructing the surface is less than 1%.
{"title":"3D photography on your desk","authors":"J. Bouguet, P. Perona","doi":"10.1109/ICCV.1998.710699","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710699","url":null,"abstract":"A simple and inexpensive approach for extracting the three-dimensional shape of objects is presented. It is based on 'weak structured lighting'; it differs from other conventional structured lighting approaches in that it requires very little hardware besides the camera: a desk-lamp, a pencil and a checker-board. The camera faces the object, which is illuminated by the desk-lamp. The user moves a pencil in front of the light source casting a moving shadow on the object. The 3D shape of the object is extracted from the spatial and temporal location of the observed shadow. Experimental results are presented on three different scenes demonstrating that the error in reconstructing the surface is less than 1%.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133813720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710773
B. Huet, E. Hancock
This paper is concerned with the retrieval of images from large databases based on their shape similarity to a query image. Our approach is based on two dimensional histograms that encode both the local and global geometric properties of the shapes. The pairwise attributes are the directed segment relative angle and directed relative position. The novelty of the proposed approach is to simultaneously use the relational and structural constraints, derived from an adjacency graph, to gate histogram contributions. We investigate the retrieval capabilities of the method for various queries. We also investigate the robustness of the method to segmentation errors. We conclude that a relational histogram of pairwise segment attributes presents a very efficient way of indexing into large databases. The optimal configuration is obtained when the local features are constructed from six neighbouring segments pairs. Moreover, a sensitivity analysis reveals that segmentation errors do not affect the retrieval performances.
{"title":"Relational histograms for shape indexing","authors":"B. Huet, E. Hancock","doi":"10.1109/ICCV.1998.710773","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710773","url":null,"abstract":"This paper is concerned with the retrieval of images from large databases based on their shape similarity to a query image. Our approach is based on two dimensional histograms that encode both the local and global geometric properties of the shapes. The pairwise attributes are the directed segment relative angle and directed relative position. The novelty of the proposed approach is to simultaneously use the relational and structural constraints, derived from an adjacency graph, to gate histogram contributions. We investigate the retrieval capabilities of the method for various queries. We also investigate the robustness of the method to segmentation errors. We conclude that a relational histogram of pairwise segment attributes presents a very efficient way of indexing into large databases. The optimal configuration is obtained when the local features are constructed from six neighbouring segments pairs. Moreover, a sensitivity analysis reveals that segmentation errors do not affect the retrieval performances.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115926652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}