Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238435
H. Scharr, Michael J. Black, H. Haussecker
Many sensing techniques and image processing applications are characterized by noisy, or corrupted, image data. Anisotropic diffusion is a popular, and theoretically well understood, technique for denoising such images. Diffusion approaches however require the selection of an "edge stopping" function, the definition of which is typically ad hoc. We exploit and extend recent work on the statistics of natural images to define principled edge stopping functions for different types of imagery. We consider a variety of anisotropic diffusion schemes and note that they compute spatial derivatives at fixed scales from which we estimate the appropriate algorithm-specific image statistics. Going beyond traditional work on image statistics, we also model the statistics of the eigenvalues of the local structure tensor. Novel edge-stopping functions are derived from these image statistics giving a principled way of formulating anisotropic diffusion problems in which all edge-stopping parameters are learned from training data.
{"title":"Image statistics and anisotropic diffusion","authors":"H. Scharr, Michael J. Black, H. Haussecker","doi":"10.1109/ICCV.2003.1238435","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238435","url":null,"abstract":"Many sensing techniques and image processing applications are characterized by noisy, or corrupted, image data. Anisotropic diffusion is a popular, and theoretically well understood, technique for denoising such images. Diffusion approaches however require the selection of an \"edge stopping\" function, the definition of which is typically ad hoc. We exploit and extend recent work on the statistics of natural images to define principled edge stopping functions for different types of imagery. We consider a variety of anisotropic diffusion schemes and note that they compute spatial derivatives at fixed scales from which we estimate the appropriate algorithm-specific image statistics. Going beyond traditional work on image statistics, we also model the statistics of the eigenvalues of the local structure tensor. Novel edge-stopping functions are derived from these image statistics giving a principled way of formulating anisotropic diffusion problems in which all edge-stopping parameters are learned from training data.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"405 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123523002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238478
Sanjiv Kumar, M. Hebert
In this work we present discriminative random fields (DRFs), a discriminative framework for the classification of image regions by incorporating neighborhood interactions in the labels as well as the observed data. The discriminative random fields offer several advantages over the conventional Markov random field (MRF) framework. First, the DRFs allow to relax the strong assumption of conditional independence of the observed data generally used in the MRF framework for tractability. This assumption is too restrictive for a large number of applications in vision. Second, the DRFs derive their classification power by exploiting the probabilistic discriminative models instead of the generative models used in the MRF framework. Finally, all the parameters in the DRF model are estimated simultaneously from the training data unlike the MRF framework where likelihood parameters are usually learned separately from the field parameters. We illustrate the advantages of the DRFs over the MRF framework in an application of man-made structure detection in natural images taken from the Corel database.
{"title":"Discriminative random fields: a discriminative framework for contextual interaction in classification","authors":"Sanjiv Kumar, M. Hebert","doi":"10.1109/ICCV.2003.1238478","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238478","url":null,"abstract":"In this work we present discriminative random fields (DRFs), a discriminative framework for the classification of image regions by incorporating neighborhood interactions in the labels as well as the observed data. The discriminative random fields offer several advantages over the conventional Markov random field (MRF) framework. First, the DRFs allow to relax the strong assumption of conditional independence of the observed data generally used in the MRF framework for tractability. This assumption is too restrictive for a large number of applications in vision. Second, the DRFs derive their classification power by exploiting the probabilistic discriminative models instead of the generative models used in the MRF framework. Finally, all the parameters in the DRF model are estimated simultaneously from the training data unlike the MRF framework where likelihood parameters are usually learned separately from the field parameters. We illustrate the advantages of the DRFs over the MRF framework in an application of man-made structure detection in natural images taken from the Corel database.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125307833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238627
C. Strecha, T. Tuytelaars, L. Gool
This paper describes a PDE-based method for dense depth extraction from multiple wide-baseline images. Emphasis lies on the usage of only a small amount of images. The integration of these multiple wide-baseline views is guided by the relative confidence that the system has in the matching to different views. This weighting is fine-grained in that it is determined for every pixel at every iteration. Reliable information spreads fast at the expense of less reliable data, both in terms of spatial communications within a view and in terms of information exchange between the views. Changes in intensity between images can be handled in a similar fine grained fashion.
{"title":"Dense matching of multiple wide-baseline views","authors":"C. Strecha, T. Tuytelaars, L. Gool","doi":"10.1109/ICCV.2003.1238627","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238627","url":null,"abstract":"This paper describes a PDE-based method for dense depth extraction from multiple wide-baseline images. Emphasis lies on the usage of only a small amount of images. The integration of these multiple wide-baseline views is guided by the relative confidence that the system has in the matching to different views. This weighting is fine-grained in that it is determined for every pixel at every iteration. Reliable information spreads fast at the expense of less reliable data, both in terms of spatial communications within a view and in terms of information exchange between the views. Changes in intensity between images can be handled in a similar fine grained fashion.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128247170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238652
S. Narasimhan, Visvanathan Ramesh, S. Nayar
We derive a new class of photometric invariants that can be used for a variety of vision tasks including lighting invariant material segmentation, change detection and tracking, as well as material invariant shape recognition. The key idea is the formulation of a scene radiance model for the class of "separable" BRDFs, that can be decomposed into material related terms and object shape and lighting related terms. All the proposed invariants are simple rational functions of the appearance parameters (say, material or shape and lighting). The invariants in this class differ from one another in the number and type of image measurements they require. Most of the invariants in this class need changes in illumination or object position between image acquisitions. The invariants can handle large changes in lighting which pose problems for most existing vision algorithms. We demonstrate the power of these invariants using scenes with complex shapes, materials, textures, shadows and specularities.
{"title":"A class of photometric invariants: separating material from shape and illumination","authors":"S. Narasimhan, Visvanathan Ramesh, S. Nayar","doi":"10.1109/ICCV.2003.1238652","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238652","url":null,"abstract":"We derive a new class of photometric invariants that can be used for a variety of vision tasks including lighting invariant material segmentation, change detection and tracking, as well as material invariant shape recognition. The key idea is the formulation of a scene radiance model for the class of \"separable\" BRDFs, that can be decomposed into material related terms and object shape and lighting related terms. All the proposed invariants are simple rational functions of the appearance parameters (say, material or shape and lighting). The invariants in this class differ from one another in the number and type of image measurements they require. Most of the invariants in this class need changes in illumination or object position between image acquisitions. The invariants can handle large changes in lighting which pose problems for most existing vision algorithms. We demonstrate the power of these invariants using scenes with complex shapes, materials, textures, shadows and specularities.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122309552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238333
P. Tan, Stephen Lin, Long Quan, H. Shum
We present a single-image highlight removal method that incorporates illumination-based constraints into image inpainting. Unlike occluded image regions filled by traditional inpainting, highlight pixels contain some useful information for guiding the inpainting process. Constraints provided by observed pixel colors, highlight color analysis and illumination color uniformity are employed in our method to improve estimation of the underlying diffuse color. The inclusion of these illumination constraints allows for better recovery of shading and textures by inpainting. Experimental results are given to demonstrate the performance of our method.
{"title":"Highlight removal by illumination-constrained inpainting","authors":"P. Tan, Stephen Lin, Long Quan, H. Shum","doi":"10.1109/ICCV.2003.1238333","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238333","url":null,"abstract":"We present a single-image highlight removal method that incorporates illumination-based constraints into image inpainting. Unlike occluded image regions filled by traditional inpainting, highlight pixels contain some useful information for guiding the inpainting process. Constraints provided by observed pixel colors, highlight color analysis and illumination color uniformity are employed in our method to improve estimation of the underlying diffuse color. The inclusion of these illumination constraints allows for better recovery of shading and textures by inpainting. Experimental results are given to demonstrate the performance of our method.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115032696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238663
Josef Sivic, Andrew Zisserman
We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors. The analogy with text retrieval is in the implementation where matches on descriptors are pre-computed (using vector quantization), and inverted file systems and document rankings are used. The result is that retrieved is immediate, returning a ranked list of key frames/shots in the manner of Google. The method is illustrated for matching in two full length feature films.
{"title":"Video Google: a text retrieval approach to object matching in videos","authors":"Josef Sivic, Andrew Zisserman","doi":"10.1109/ICCV.2003.1238663","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238663","url":null,"abstract":"We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion. The temporal continuity of the video within a shot is used to track the regions in order to reject unstable regions and reduce the effects of noise in the descriptors. The analogy with text retrieval is in the implementation where matches on descriptors are pre-computed (using vector quantization), and inverted file systems and document rankings are used. The result is that retrieved is immediate, returning a ranked list of key frames/shots in the manner of Google. The method is illustrated for matching in two full length feature films.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129514931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238657
A. Benedetti, Alessandro Busti, M. Farenzena, Andrea Fusiello
Existing autocalibration techniques use numerical optimization algorithms that are prone to the problem of local minima. To address this problem, we have developed a method where an interval branch-and-bound method is employed for numerical minimization. Thanks to the properties of interval analysis this method is guaranteed to converge to the global solution with mathematical certainty and arbitrary accuracy, and the only input information it requires from the user is a set of point correspondences and a search box. The cost function is based on the Huang-Faugeras constraint of the fundamental matrix. A recently proposed interval extension based on Bernstein polynomial forms has been investigated to speed up the search for the solution. Finally, some experimental results on synthetic images are presented.
{"title":"Globally convergent autocalibration","authors":"A. Benedetti, Alessandro Busti, M. Farenzena, Andrea Fusiello","doi":"10.1109/ICCV.2003.1238657","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238657","url":null,"abstract":"Existing autocalibration techniques use numerical optimization algorithms that are prone to the problem of local minima. To address this problem, we have developed a method where an interval branch-and-bound method is employed for numerical minimization. Thanks to the properties of interval analysis this method is guaranteed to converge to the global solution with mathematical certainty and arbitrary accuracy, and the only input information it requires from the user is a set of point correspondences and a search box. The cost function is based on the Huang-Faugeras constraint of the fundamental matrix. A recently proposed interval extension based on Bernstein polynomial forms has been investigated to speed up the search for the solution. Finally, some experimental results on synthetic images are presented.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134513615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238639
R. Fransens, J. D. Prins, L. Gool
Detecting the dominant normal directions to the decision surface is an established technique for feature selection in high dimensional classification problems. Several approaches have been proposed to render this strategy more amenable to practice, but they still show a number of important shortcomings from a pragmatic point of view. This paper introduces a novel such approach, which combines the normal directions idea with support vector machine classifiers. The two make a natural and powerful match, as SVs are located nearby, and fully describe the decision surfaces. The approach can be included elegantly into the training of performant classifiers from extensive datasets. The potential is corroborated by experiments, both on synthetic and real data, the latter on a face detection experiment. In this experiment we demonstrate how our approach can lead to a significant reduction of CPU-time, with neglectable loss of classification performance.
{"title":"SVM-based nonparametric discriminant analysis, an application to face detection","authors":"R. Fransens, J. D. Prins, L. Gool","doi":"10.1109/ICCV.2003.1238639","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238639","url":null,"abstract":"Detecting the dominant normal directions to the decision surface is an established technique for feature selection in high dimensional classification problems. Several approaches have been proposed to render this strategy more amenable to practice, but they still show a number of important shortcomings from a pragmatic point of view. This paper introduces a novel such approach, which combines the normal directions idea with support vector machine classifiers. The two make a natural and powerful match, as SVs are located nearby, and fully describe the decision surfaces. The approach can be included elegantly into the training of performant classifiers from extensive datasets. The potential is corroborated by experiments, both on synthetic and real data, the latter on a face detection experiment. In this experiment we demonstrate how our approach can lead to a significant reduction of CPU-time, with neglectable loss of classification performance.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130853459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238454
Stefano Soatto, A. Yezzi, Hailin Jin
To what extent can three-dimensional shape and radiance be inferred from a collection of images? Can the two be estimated separately while retaining optimality? How should the optimality criterion be computed? When is it necessary to employ an explicit model of the reflectance properties of a scene? In this paper we introduce a separation principle for shape and radiance estimation that applies to Lambertian scenes and holds for any choice of norm. When the scene is not Lambertian, however, shape cannot be decoupled from radiance, and therefore matching image-to-image is not possible directly. We employ a rank constraint on the radiance tensor, which is commonly used in computer graphics, and construct a novel cost functional whose minimization leads to an estimate of both shape and radiance for nonLambertian objects, which we validate experimentally.
{"title":"Tales of shape and radiance in multiview stereo","authors":"Stefano Soatto, A. Yezzi, Hailin Jin","doi":"10.1109/ICCV.2003.1238454","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238454","url":null,"abstract":"To what extent can three-dimensional shape and radiance be inferred from a collection of images? Can the two be estimated separately while retaining optimality? How should the optimality criterion be computed? When is it necessary to employ an explicit model of the reflectance properties of a scene? In this paper we introduce a separation principle for shape and radiance estimation that applies to Lambertian scenes and holds for any choice of norm. When the scene is not Lambertian, however, shape cannot be decoupled from radiance, and therefore matching image-to-image is not possible directly. We employ a rank constraint on the radiance tensor, which is commonly used in computer graphics, and construct a novel cost functional whose minimization leads to an estimate of both shape and radiance for nonLambertian objects, which we validate experimentally.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134124041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-10-13DOI: 10.1109/ICCV.2003.1238376
Daniel B. Russakoff, T. Rohlfing, C. Maurer
Registration of a preoperative CT (3D) image to one or more X-ray projection (2D) images, a special case of the pose estimation problem, has been attempted in a variety of ways with varying degrees of success. Recently, there has been a great deal of interest in intensity-based methods. One of the drawbacks to such methods is the need to create digitally reconstructed radiographs (DRRs) at each step of the optimization process. DRRs are typically generated by ray casting, an operation that requires O(n/sup 3/) time, where we assume that n is approximately the size (in voxels) of one side of the DRR as well as one side of the CT volume. We address this issue by extending light field rendering techniques from the computer graphics community to generate DRRs instead of conventional rendered images. Using light fields allows most of the computation to be performed in a preprocessing step; after this precomputation, very accurate DRRs can be generated in O(n/sup 2/) time. Another important issue for 2D-3D registration algorithms is validation. Previously reported 2D-3D registration algorithms were validated using synthetic data or phantoms but not clinical data. We present an intensity-based 2D-3D registration system that generates DRRs using light fields; we validate its performance using clinical data with a known gold standard transformation.
{"title":"Fast intensity-based 2D-3D image registration of clinical data using light","authors":"Daniel B. Russakoff, T. Rohlfing, C. Maurer","doi":"10.1109/ICCV.2003.1238376","DOIUrl":"https://doi.org/10.1109/ICCV.2003.1238376","url":null,"abstract":"Registration of a preoperative CT (3D) image to one or more X-ray projection (2D) images, a special case of the pose estimation problem, has been attempted in a variety of ways with varying degrees of success. Recently, there has been a great deal of interest in intensity-based methods. One of the drawbacks to such methods is the need to create digitally reconstructed radiographs (DRRs) at each step of the optimization process. DRRs are typically generated by ray casting, an operation that requires O(n/sup 3/) time, where we assume that n is approximately the size (in voxels) of one side of the DRR as well as one side of the CT volume. We address this issue by extending light field rendering techniques from the computer graphics community to generate DRRs instead of conventional rendered images. Using light fields allows most of the computation to be performed in a preprocessing step; after this precomputation, very accurate DRRs can be generated in O(n/sup 2/) time. Another important issue for 2D-3D registration algorithms is validation. Previously reported 2D-3D registration algorithms were validated using synthetic data or phantoms but not clinical data. We present an intensity-based 2D-3D registration system that generates DRRs using light fields; we validate its performance using clinical data with a known gold standard transformation.","PeriodicalId":131580,"journal":{"name":"Proceedings Ninth IEEE International Conference on Computer Vision","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133470708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}