This paper considers the use of stereo vision in structured environments. Sharp discontinuities and large untextured areas must be anticipated, but complex or natural shapes of objects and fine structures should be handled as well. Additionally, radiometric differences of input images often occur in practice. Finally, computation time is an issue for handling large or many images in acceptable time. The Semi-Global Matching method is chosen as it fulfills already many of the requirements. Remaining problems in structured environments are carefully analyzed and two novel extensions suggested. Firstly, intensity consistent disparity selection is proposed for handling untextured areas. Secondly, discontinuity preserving interpolation is suggested for filling holes in the disparity images that are caused by some filters. It is shown that the performance of the new method on test images with ground truth is comparable to the currently best stereo methods, but the complexity and runtime is much lower.
{"title":"Stereo Vision in Structured Environments by Consistent Semi-Global Matching","authors":"H. Hirschmüller","doi":"10.1109/CVPR.2006.294","DOIUrl":"https://doi.org/10.1109/CVPR.2006.294","url":null,"abstract":"This paper considers the use of stereo vision in structured environments. Sharp discontinuities and large untextured areas must be anticipated, but complex or natural shapes of objects and fine structures should be handled as well. Additionally, radiometric differences of input images often occur in practice. Finally, computation time is an issue for handling large or many images in acceptable time. The Semi-Global Matching method is chosen as it fulfills already many of the requirements. Remaining problems in structured environments are carefully analyzed and two novel extensions suggested. Firstly, intensity consistent disparity selection is proposed for handling untextured areas. Secondly, discontinuity preserving interpolation is suggested for filling holes in the disparity images that are caused by some filters. It is shown that the performance of the new method on test images with ground truth is comparable to the currently best stereo methods, but the complexity and runtime is much lower.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127879353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SIFT has been proven to be the most robust local invariant feature descriptor. SIFT is designed mainly for gray images. However, color provides valuable information in object description and matching tasks. Many objects can be misclassified if their color contents are ignored. This paper addresses this problem and proposes a novel colored local invariant feature descriptor. Instead of using the gray space to represent the input image, the proposed approach builds the SIFT descriptors in a color invariant space. The built Colored SIFT (CSIFT) is more robust than the conventional SIFT with respect to color and photometrical variations. The evaluation results support the potential of the proposed approach.
{"title":"CSIFT: A SIFT Descriptor with Color Invariant Characteristics","authors":"Alaa E. Abdel-Hakim, A. Farag","doi":"10.1109/CVPR.2006.95","DOIUrl":"https://doi.org/10.1109/CVPR.2006.95","url":null,"abstract":"SIFT has been proven to be the most robust local invariant feature descriptor. SIFT is designed mainly for gray images. However, color provides valuable information in object description and matching tasks. Many objects can be misclassified if their color contents are ignored. This paper addresses this problem and proposes a novel colored local invariant feature descriptor. Instead of using the gray space to represent the input image, the proposed approach builds the SIFT descriptors in a color invariant space. The built Colored SIFT (CSIFT) is more robust than the conventional SIFT with respect to color and photometrical variations. The evaluation results support the potential of the proposed approach.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129194162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a Riemannian geometric framework to compute averages and distributions of point configurations so that different configurations up to affine transformations are considered to be the same. The algorithms are fast and proven to be robust both theoretically and empirically. The utility of this framework is shown in a number of affine invariant clustering algorithms on image point data.
{"title":"Affine Invariance Revisited","authors":"Evgeni Begelfor, M. Werman","doi":"10.1109/CVPR.2006.50","DOIUrl":"https://doi.org/10.1109/CVPR.2006.50","url":null,"abstract":"This paper proposes a Riemannian geometric framework to compute averages and distributions of point configurations so that different configurations up to affine transformations are considered to be the same. The algorithms are fast and proven to be robust both theoretically and empirically. The utility of this framework is shown in a number of affine invariant clustering algorithms on image point data.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126599475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A recognition scheme that scales efficiently to a large number of objects is presented. The efficiency and quality is exhibited in a live demonstration that recognizes CD-covers from a database of 40000 images of popular music CD’s. The scheme builds upon popular techniques of indexing descriptors extracted from local regions, and is robust to background clutter and occlusion. The local region descriptors are hierarchically quantized in a vocabulary tree. The vocabulary tree allows a larger and more discriminatory vocabulary to be used efficiently, which we show experimentally leads to a dramatic improvement in retrieval quality. The most significant property of the scheme is that the tree directly defines the quantization. The quantization and the indexing are therefore fully integrated, essentially being one and the same. The recognition quality is evaluated through retrieval on a database with ground truth, showing the power of the vocabulary tree approach, going as high as 1 million images.
{"title":"Scalable Recognition with a Vocabulary Tree","authors":"D. Nistér, Henrik Stewénius","doi":"10.1109/CVPR.2006.264","DOIUrl":"https://doi.org/10.1109/CVPR.2006.264","url":null,"abstract":"A recognition scheme that scales efficiently to a large number of objects is presented. The efficiency and quality is exhibited in a live demonstration that recognizes CD-covers from a database of 40000 images of popular music CD’s. The scheme builds upon popular techniques of indexing descriptors extracted from local regions, and is robust to background clutter and occlusion. The local region descriptors are hierarchically quantized in a vocabulary tree. The vocabulary tree allows a larger and more discriminatory vocabulary to be used efficiently, which we show experimentally leads to a dramatic improvement in retrieval quality. The most significant property of the scheme is that the tree directly defines the quantization. The quantization and the indexing are therefore fully integrated, essentially being one and the same. The recognition quality is evaluated through retrieval on a database with ground truth, showing the power of the vocabulary tree approach, going as high as 1 million images.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"331 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123230046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a novel stereo algorithm that combines the strengths of region-based stereo and dynamic programming on a tree approaches. Instead of formulating an image as individual scan-lines or as a pixel tree, a new region tree structure, which is built as a minimum spanning tree on the adjacency-graph of an over-segmented image, is used for the global dynamic programming optimization. The resulting disparity maps do not contain any streaking problem as is common in scanline-based algorithms because of the tree structure. The performance evaluation using the Middlebury benchmark datasets shows that the performance of our algorithm is comparable in accuracy and efficiency with top ranking algorithms.
{"title":"Region-Tree Based Stereo Using Dynamic Programming Optimization","authors":"C. Lei, Jason M. Selzer, Herbert Yang","doi":"10.1109/CVPR.2006.251","DOIUrl":"https://doi.org/10.1109/CVPR.2006.251","url":null,"abstract":"In this paper, we present a novel stereo algorithm that combines the strengths of region-based stereo and dynamic programming on a tree approaches. Instead of formulating an image as individual scan-lines or as a pixel tree, a new region tree structure, which is built as a minimum spanning tree on the adjacency-graph of an over-segmented image, is used for the global dynamic programming optimization. The resulting disparity maps do not contain any streaking problem as is common in scanline-based algorithms because of the tree structure. The performance evaluation using the Middlebury benchmark datasets shows that the performance of our algorithm is comparable in accuracy and efficiency with top ranking algorithms.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114129565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graphical models are often used to represent and recognize activities. Purely unsupervised methods (such as HMMs) can be trained automatically but yield models whose internal structure - the nodes - are difficult to interpret semantically. Manually constructed networks typically have nodes corresponding to sub-events, but the programming and training of these networks is tedious and requires extensive domain expertise. In this paper, we propose a semi-supervised approach in which a manually structured, Propagation Network (a form of a DBN) is initialized from a small amount of fully annotated data, and then refined by an EM-based learning method in an unsupervised fashion. During node refinement (the M step) a boosting-based algorithm is employed to train the evidence detectors of individual nodes. Experiments on a variety of data types - vision and inertial measurements - in several tasks demonstrate the ability to learn from as little as one fully annotated example accompanied by a small number of positive but non-annotated training examples. The system is applied to both recognition and anomaly detection tasks.
{"title":"Learning Temporal Sequence Model from Partially Labeled Data","authors":"Yifan Shi, A. Bobick, Irfan Essa","doi":"10.1109/CVPR.2006.174","DOIUrl":"https://doi.org/10.1109/CVPR.2006.174","url":null,"abstract":"Graphical models are often used to represent and recognize activities. Purely unsupervised methods (such as HMMs) can be trained automatically but yield models whose internal structure - the nodes - are difficult to interpret semantically. Manually constructed networks typically have nodes corresponding to sub-events, but the programming and training of these networks is tedious and requires extensive domain expertise. In this paper, we propose a semi-supervised approach in which a manually structured, Propagation Network (a form of a DBN) is initialized from a small amount of fully annotated data, and then refined by an EM-based learning method in an unsupervised fashion. During node refinement (the M step) a boosting-based algorithm is employed to train the evidence detectors of individual nodes. Experiments on a variety of data types - vision and inertial measurements - in several tasks demonstrate the ability to learn from as little as one fully annotated example accompanied by a small number of positive but non-annotated training examples. The system is applied to both recognition and anomaly detection tasks.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116681052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper addresses the problem of finding objects made of glass (or other transparent materials) in images. Since the appearance of glass objects depends for the most part on what lies behind them, we propose to use binary criteria ("are these two regions made of the same material?") rather than unary ones ("is this glass?") to guide the segmentation process. Concretely, we combine two complementary measures of affinity between regions made of the same material and discrepancy between regions made of different ones into a single objective function, and use the geodesic active contour framework to minimize this function over pixel labels. The proposed approach has been implemented, and qualitative and quantitative experimental results are presented.
{"title":"A Geodesic Active Contour Framework for Finding Glass","authors":"Kenton McHenry, J. Ponce","doi":"10.1109/CVPR.2006.28","DOIUrl":"https://doi.org/10.1109/CVPR.2006.28","url":null,"abstract":"This paper addresses the problem of finding objects made of glass (or other transparent materials) in images. Since the appearance of glass objects depends for the most part on what lies behind them, we propose to use binary criteria (\"are these two regions made of the same material?\") rather than unary ones (\"is this glass?\") to guide the segmentation process. Concretely, we combine two complementary measures of affinity between regions made of the same material and discrepancy between regions made of different ones into a single objective function, and use the geodesic active contour framework to minimize this function over pixel labels. The proposed approach has been implemented, and qualitative and quantitative experimental results are presented.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116701319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In scenes containing specular objects, the image motion observed by a moving camera may be an intermixed combination of optical flow resulting from diffuse reflectance (diffuse flow) and specular reflection (specular flow). Here, with few assumptions, we formalize the notion of specular flow, show how it relates to the 3D structure of the world, and develop an algorithm for estimating scene structure from 2D image motion. Unlike previous work on isolated specular highlights we use two image frames and estimate the semi-dense flow arising from the specular reflections of textured scenes. We parametrically model the image motion of a quadratic surface patch viewed from a moving camera. The flow is modeled as a probabilistic mixture of diffuse and specular components and the 3D shape is recovered using an Expectation-Maximization algorithm. Rather than treating specular reflections as noise to be removed or ignored, we show that the specular flow provides additional constraints on scene geometry that improve estimation of 3D structure when compared with reconstruction from diffuse flow alone. We demonstrate this for a set of synthetic and real sequences of mixed specular-diffuse objects.
{"title":"Specular Flow and the Recovery of Surface Structure","authors":"S. Roth, Michael J. Black","doi":"10.1109/CVPR.2006.290","DOIUrl":"https://doi.org/10.1109/CVPR.2006.290","url":null,"abstract":"In scenes containing specular objects, the image motion observed by a moving camera may be an intermixed combination of optical flow resulting from diffuse reflectance (diffuse flow) and specular reflection (specular flow). Here, with few assumptions, we formalize the notion of specular flow, show how it relates to the 3D structure of the world, and develop an algorithm for estimating scene structure from 2D image motion. Unlike previous work on isolated specular highlights we use two image frames and estimate the semi-dense flow arising from the specular reflections of textured scenes. We parametrically model the image motion of a quadratic surface patch viewed from a moving camera. The flow is modeled as a probabilistic mixture of diffuse and specular components and the 3D shape is recovered using an Expectation-Maximization algorithm. Rather than treating specular reflections as noise to be removed or ignored, we show that the specular flow provides additional constraints on scene geometry that improve estimation of 3D structure when compared with reconstruction from diffuse flow alone. We demonstrate this for a set of synthetic and real sequences of mixed specular-diffuse objects.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123757365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a generative model based approach to solve the multi-view stereo problem. The input images are considered to be generated by either one of two processes: (i) an inlier process, which generates the pixels which are visible from the reference camera and which obey the constant brightness assumption, and (ii) an outlier process which generates all other pixels. Depth and visibility are jointly modelled as a hiddenMarkov Random Field, and the spatial correlations of both are explicitly accounted for. Inference is made tractable by an EM-algorithm, which alternates between estimation of visibility and depth, and optimisation of model parameters. We describe and compare two implementations of the E-step of the algorithm, which correspond to the Mean Field and Bethe approximations of the free energy. The approach is validated by experiments on challenging real-world scenes, of which two are contaminated by independently moving objects.
{"title":"Combined Depth and Outlier Estimation in Multi-View Stereo","authors":"C. Strecha, R. Fransens, L. Gool","doi":"10.1109/CVPR.2006.78","DOIUrl":"https://doi.org/10.1109/CVPR.2006.78","url":null,"abstract":"In this paper, we present a generative model based approach to solve the multi-view stereo problem. The input images are considered to be generated by either one of two processes: (i) an inlier process, which generates the pixels which are visible from the reference camera and which obey the constant brightness assumption, and (ii) an outlier process which generates all other pixels. Depth and visibility are jointly modelled as a hiddenMarkov Random Field, and the spatial correlations of both are explicitly accounted for. Inference is made tractable by an EM-algorithm, which alternates between estimation of visibility and depth, and optimisation of model parameters. We describe and compare two implementations of the E-step of the algorithm, which correspond to the Mean Field and Bethe approximations of the free energy. The approach is validated by experiments on challenging real-world scenes, of which two are contaminated by independently moving objects.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123886964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several kernel algorithms have recently been proposed for nonlinear discriminant analysis. However, these methods mainly address the singularity problem in the high dimensional feature space. Less attention has been focused on the properties of the resulting discriminant vectors and feature vectors in the reduced dimensional space. In this paper, we present a new formulation for kernel discriminant analysis. The proposed formulation includes, as special cases, kernel uncorrelated discriminant analysis (KUDA) and kernel orthogonal discriminant analysis (KODA). The feature vectors of KUDA are uncorrelated, while the discriminant vectors of KODA are orthogonal to each other in the feature space. We present theoretical derivations of proposed KUDA and KODA algorithms. The experimental results show that both KUDA and KODA are very competitive in comparison with other nonlinear discriminant algorithms in terms of classification accuracy.
{"title":"Kernel Uncorrelated and Orthogonal Discriminant Analysis: A Unified Approach","authors":"T. Xiong, Jieping Ye, V. Cherkassky","doi":"10.1109/CVPR.2006.161","DOIUrl":"https://doi.org/10.1109/CVPR.2006.161","url":null,"abstract":"Several kernel algorithms have recently been proposed for nonlinear discriminant analysis. However, these methods mainly address the singularity problem in the high dimensional feature space. Less attention has been focused on the properties of the resulting discriminant vectors and feature vectors in the reduced dimensional space. In this paper, we present a new formulation for kernel discriminant analysis. The proposed formulation includes, as special cases, kernel uncorrelated discriminant analysis (KUDA) and kernel orthogonal discriminant analysis (KODA). The feature vectors of KUDA are uncorrelated, while the discriminant vectors of KODA are orthogonal to each other in the feature space. We present theoretical derivations of proposed KUDA and KODA algorithms. The experimental results show that both KUDA and KODA are very competitive in comparison with other nonlinear discriminant algorithms in terms of classification accuracy.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122418090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}