A system that could automatically analyze the facial actions in real time have applications in a number of different fields. However, developing such a system is always a challenging task due to the richness, ambiguity, and dynamic nature of facial actions. Although a number of research groups attempt to recognize action units (AUs) by either improving facial feature extraction techniques, or the AU classification techniques, these methods often recognize AUs individually and statically, therefore ignoring the semantic relationships among AUs and the dynamics of AUs. Hence, these approaches cannot always recognize AUs reliably, robustly, and consistently. In this paper, we propose a novel approach for AUs classification, that systematically accounts for relationships among AUs and their temporal evolution. Specifically, we use a dynamic Bayesian network (DBN) to model the relationships among different AUs. The DBN provides a coherent and unified hierarchical probabilistic framework to represent probabilistic relationships among different AUs and account for the temporal changes in facial action development. Under our system, robust computer vision techniques are used to get AU measurements. And such AU measurements are then applied as evidence into the DBN for inferencing various AUs. The experiments show the integration of AU relationships and AU dynamics with AU image measurements yields significant improvements in AU recognition.
{"title":"Inferring Facial Action Units with Causal Relations","authors":"Yan Tong, Wenhui Liao, Q. Ji","doi":"10.1109/CVPR.2006.154","DOIUrl":"https://doi.org/10.1109/CVPR.2006.154","url":null,"abstract":"A system that could automatically analyze the facial actions in real time have applications in a number of different fields. However, developing such a system is always a challenging task due to the richness, ambiguity, and dynamic nature of facial actions. Although a number of research groups attempt to recognize action units (AUs) by either improving facial feature extraction techniques, or the AU classification techniques, these methods often recognize AUs individually and statically, therefore ignoring the semantic relationships among AUs and the dynamics of AUs. Hence, these approaches cannot always recognize AUs reliably, robustly, and consistently. In this paper, we propose a novel approach for AUs classification, that systematically accounts for relationships among AUs and their temporal evolution. Specifically, we use a dynamic Bayesian network (DBN) to model the relationships among different AUs. The DBN provides a coherent and unified hierarchical probabilistic framework to represent probabilistic relationships among different AUs and account for the temporal changes in facial action development. Under our system, robust computer vision techniques are used to get AU measurements. And such AU measurements are then applied as evidence into the DBN for inferencing various AUs. The experiments show the integration of AU relationships and AU dynamics with AU image measurements yields significant improvements in AU recognition.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124528414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a new model of object classes which incorporates appearance and shape information jointly. Modeling objects appearance by distributions of visual words has recently proven successful. Here appearancebased models are augmented by capturing the spatial arrangement of visual words. Compact spatial modeling without loss of discrimination is achieved through the introduction of adaptive vector quantized correlograms, which we call correlatons. Efficiency is further improved by means of integral images. The robustness of our new models to geometric transformations, severe occlusions and missing information is also demonstrated. The accuracy of discrimination of the proposed models is assessed with respect to existing databases with large numbers of object classes viewed under general conditions, and shown to outperform appearance-only models.
{"title":"Discriminative Object Class Models of Appearance and Shape by Correlatons","authors":"S. Savarese, J. Winn, A. Criminisi","doi":"10.1109/CVPR.2006.102","DOIUrl":"https://doi.org/10.1109/CVPR.2006.102","url":null,"abstract":"This paper presents a new model of object classes which incorporates appearance and shape information jointly. Modeling objects appearance by distributions of visual words has recently proven successful. Here appearancebased models are augmented by capturing the spatial arrangement of visual words. Compact spatial modeling without loss of discrimination is achieved through the introduction of adaptive vector quantized correlograms, which we call correlatons. Efficiency is further improved by means of integral images. The robustness of our new models to geometric transformations, severe occlusions and missing information is also demonstrated. The accuracy of discrimination of the proposed models is assessed with respect to existing databases with large numbers of object classes viewed under general conditions, and shown to outperform appearance-only models.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124813956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many sources of information relevant to computer vision and machine learning tasks are often underused. One example is the similarity between the elements from a novel source, such as a speaker, writer, or printed font. By comparing instances emitted by a source, we help ensure that similar instances are given the same label. Previous approaches have clustered instances prior to recognition. We propose a probabilistic framework that unifies similarity with prior identity and contextual information. By fusing information sources in a single model, we eliminate unrecoverable errors that result from processing the information in separate stages and improve overall accuracy. The framework also naturally integrates dissimilarity information, which has previously been ignored. We demonstrate with an application in printed character recognition from images of signs in natural scenes.
{"title":"Improving Recognition of Novel Input with Similarity","authors":"Jerod J. Weinman, E. Learned-Miller","doi":"10.1109/CVPR.2006.151","DOIUrl":"https://doi.org/10.1109/CVPR.2006.151","url":null,"abstract":"Many sources of information relevant to computer vision and machine learning tasks are often underused. One example is the similarity between the elements from a novel source, such as a speaker, writer, or printed font. By comparing instances emitted by a source, we help ensure that similar instances are given the same label. Previous approaches have clustered instances prior to recognition. We propose a probabilistic framework that unifies similarity with prior identity and contextual information. By fusing information sources in a single model, we eliminate unrecoverable errors that result from processing the information in separate stages and improve overall accuracy. The framework also naturally integrates dissimilarity information, which has previously been ignored. We demonstrate with an application in printed character recognition from images of signs in natural scenes.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125298014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The problem of dense optical flow computation is addressed from a variational viewpoint. A new geometric framework is introduced. It unifies previous art and yields new efficient methods. Along with the framework a new alignment criterion suggests itself. It is shown that the alignment between the gradients of the optical flow components and between the latter and the intensity gradients is an important measure of the flow’s quality. Adding this criterion as a requirement in the optimization process improves the resulting flow. This is demonstrated in synthetic and real sequences.
{"title":"A General Framework and New Alignment Criterion for Dense Optical Flow","authors":"Rami Ben-Ari, N. Sochen","doi":"10.1109/CVPR.2006.25","DOIUrl":"https://doi.org/10.1109/CVPR.2006.25","url":null,"abstract":"The problem of dense optical flow computation is addressed from a variational viewpoint. A new geometric framework is introduced. It unifies previous art and yields new efficient methods. Along with the framework a new alignment criterion suggests itself. It is shown that the alignment between the gradients of the optical flow components and between the latter and the intensity gradients is an important measure of the flow’s quality. Adding this criterion as a requirement in the optimization process improves the resulting flow. This is demonstrated in synthetic and real sequences.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132161712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Active Appearance Models (AAMs) have been popularly used to represent the appearance and shape variations of human faces. Fitting an AAM to images recovers the face pose as well as its deformable shape and varying appearance. Successful fitting requires that the AAM is sufficiently generic such that it covers all possible facial appearances and shapes in the images. Such a generic AAM is often difficult to be obtained in practice, especially when the image quality is low or when occlusion occurs. To achieve robust AAM fitting under such circumstances, this paper proposes to incorporate the disparity data obtained from a stereo camera with the image fitting process. We develop an iterative multi-level algorithm that combines efficient AAM fitting to 2D images and robust 3D shape alignment to disparity data. Experiments on tracking faces in low-resolution images captured from meeting scenarios show that the proposed method achieves better performance than the original 2D AAM fitting algorithm. We also demonstrate an application of the proposed method to a facial expression recognition task.
{"title":"Robust AAM Fitting by Fusion of Images and Disparity Data","authors":"Joerg Liebelt, Jing Xiao, Jie Yang","doi":"10.1109/CVPR.2006.255","DOIUrl":"https://doi.org/10.1109/CVPR.2006.255","url":null,"abstract":"Active Appearance Models (AAMs) have been popularly used to represent the appearance and shape variations of human faces. Fitting an AAM to images recovers the face pose as well as its deformable shape and varying appearance. Successful fitting requires that the AAM is sufficiently generic such that it covers all possible facial appearances and shapes in the images. Such a generic AAM is often difficult to be obtained in practice, especially when the image quality is low or when occlusion occurs. To achieve robust AAM fitting under such circumstances, this paper proposes to incorporate the disparity data obtained from a stereo camera with the image fitting process. We develop an iterative multi-level algorithm that combines efficient AAM fitting to 2D images and robust 3D shape alignment to disparity data. Experiments on tracking faces in low-resolution images captured from meeting scenarios show that the proposed method achieves better performance than the original 2D AAM fitting algorithm. We also demonstrate an application of the proposed method to a facial expression recognition task.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114940186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce a family of spectral partitioning methods. Edge separators of a graph are produced by iteratively reweighting the edges until the graph disconnects into the prescribed number of components. At each iteration a small number of eigenvectors with small eigenvalue are computed and used to determine the reweighting. In this way spectral rounding directly produces discrete solutions where as current spectral algorithms must map the continuous eigenvectors to discrete solutions by employing a heuristic geometric separator (e.g. k-means). We show that spectral rounding compares favorably to current spectral approximations on the Normalized Cut criterion (NCut). Results are given for natural image segmentation, medical image segmentation, and clustering. A practical version is shown to converge.
{"title":"Graph Partitioning by Spectral Rounding: Applications in Image Segmentation and Clustering","authors":"David Tolliver, G. Miller","doi":"10.1109/CVPR.2006.129","DOIUrl":"https://doi.org/10.1109/CVPR.2006.129","url":null,"abstract":"We introduce a family of spectral partitioning methods. Edge separators of a graph are produced by iteratively reweighting the edges until the graph disconnects into the prescribed number of components. At each iteration a small number of eigenvectors with small eigenvalue are computed and used to determine the reweighting. In this way spectral rounding directly produces discrete solutions where as current spectral algorithms must map the continuous eigenvectors to discrete solutions by employing a heuristic geometric separator (e.g. k-means). We show that spectral rounding compares favorably to current spectral approximations on the Normalized Cut criterion (NCut). Results are given for natural image segmentation, medical image segmentation, and clustering. A practical version is shown to converge.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116277051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erik B. Sudderth, A. Torralba, W. Freeman, A. Willsky
We develop an integrated, probabilistic model for the appearance and three-dimensional geometry of cluttered scenes. Object categories are modeled via distributions over the 3D location and appearance of visual features. Uncertainty in the number of object instances depicted in a particular image is then achieved via a transformed Dirichlet process. In contrast with image-based approaches to object recognition, we model scale variations as the perspective projection of objects in different 3D poses. To calibrate the underlying geometry, we incorporate binocular stereo images into the training process. A robust likelihood model accounts for outliers in matched stereo features, allowing effective learning of 3D object structure from partial 2D segmentations. Applied to a dataset of office scenes, our model detects objects at multiple scales via a coarse reconstruction of the corresponding 3D geometry.
{"title":"Depth from Familiar Objects: A Hierarchical Model for 3D Scenes","authors":"Erik B. Sudderth, A. Torralba, W. Freeman, A. Willsky","doi":"10.1109/CVPR.2006.97","DOIUrl":"https://doi.org/10.1109/CVPR.2006.97","url":null,"abstract":"We develop an integrated, probabilistic model for the appearance and three-dimensional geometry of cluttered scenes. Object categories are modeled via distributions over the 3D location and appearance of visual features. Uncertainty in the number of object instances depicted in a particular image is then achieved via a transformed Dirichlet process. In contrast with image-based approaches to object recognition, we model scale variations as the perspective projection of objects in different 3D poses. To calibrate the underlying geometry, we incorporate binocular stereo images into the training process. A robust likelihood model accounts for outliers in matched stereo features, allowing effective learning of 3D object structure from partial 2D segmentations. Applied to a dataset of office scenes, our model detects objects at multiple scales via a coarse reconstruction of the corresponding 3D geometry.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115047964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A combination of techniques that is becoming increasingly popular is the construction of part-based object representations using the outputs of interest-point detectors. Our contributions in this paper are twofold: first, we propose a primal-sketch-based set of image tokens that are used for object representation and detection. Second, top-down information is introduced based on an efficient method for the evaluation of the likelihood of hypothesized part locations. This allows us to use graphical model techniques to complement bottom-up detection, by proposing and finding the parts of the object that were missed by the front-end feature detection stage. Detection results for four object categories validate the merits of this joint top-down and bottom-up approach.
{"title":"Bottom-Up & Top-down Object Detection using Primal Sketch Features and Graphical Models","authors":"Iasonas Kokkinos, P. Maragos, A. Yuille","doi":"10.1109/CVPR.2006.74","DOIUrl":"https://doi.org/10.1109/CVPR.2006.74","url":null,"abstract":"A combination of techniques that is becoming increasingly popular is the construction of part-based object representations using the outputs of interest-point detectors. Our contributions in this paper are twofold: first, we propose a primal-sketch-based set of image tokens that are used for object representation and detection. Second, top-down information is introduced based on an efficient method for the evaluation of the likelihood of hypothesized part locations. This allows us to use graphical model techniques to complement bottom-up detection, by proposing and finding the parts of the object that were missed by the front-end feature detection stage. Detection results for four object categories validate the merits of this joint top-down and bottom-up approach.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123283831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advances in single-view reconstruction (SVR) have been in modelling power (curved 2.5D surfaces) and automation (automatic photo pop-up). We extend SVR along both of these directions. We increase modelling power in several ways: (i) We represent general 3D surfaces, rather than 2.5D Monge patches; (ii) We describe a closed-form method to reconstruct a smooth surface from its image apparent contour, including multilocal singularities ("kidney-bean" self-occlusions); (iii) We show how to incorporate user-specified data such as surface normals, interpolation and approximation constraints; (iv) We show how this algorithm can be adapted to deal with surfaces of arbitrary genus. We also show how the modelling process can be automated for simple object shapes and views, using a-priori object class information. We demonstrate these advances on natural images drawn from a number of object classes.
{"title":"Single View Reconstruction of Curved Surfaces","authors":"Mukta Prasad, A. Fitzgibbon","doi":"10.1109/CVPR.2006.281","DOIUrl":"https://doi.org/10.1109/CVPR.2006.281","url":null,"abstract":"Recent advances in single-view reconstruction (SVR) have been in modelling power (curved 2.5D surfaces) and automation (automatic photo pop-up). We extend SVR along both of these directions. We increase modelling power in several ways: (i) We represent general 3D surfaces, rather than 2.5D Monge patches; (ii) We describe a closed-form method to reconstruct a smooth surface from its image apparent contour, including multilocal singularities (\"kidney-bean\" self-occlusions); (iii) We show how to incorporate user-specified data such as surface normals, interpolation and approximation constraints; (iv) We show how this algorithm can be adapted to deal with surfaces of arbitrary genus. We also show how the modelling process can be automated for simple object shapes and views, using a-priori object class information. We demonstrate these advances on natural images drawn from a number of object classes.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124089754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gaussian mean-shift (GMS) is a clustering algorithm that has been shown to produce good image segmentations (where each pixel is represented as a feature vector with spatial and range components). GMS operates by defining a Gaussian kernel density estimate for the data and clustering together points that converge to the same mode under a fixed-point iterative scheme. However, the algorithm is slow, since its complexity is O(kN2), where N is the number of pixels and k the average number of iterations per pixel. We study four acceleration strategies for GMS based on the spatial structure of images and on the fact that GMS is an expectation-maximisation (EM) algorithm: spatial discretisation, spatial neighbourhood, sparse EM and EM-Newton algorithm. We show that the spatial discretisation strategy can accelerate GMS by one to two orders of magnitude while achieving essentially the same segmentation; and that the other strategies attain speedups of less than an order of magnitude.
{"title":"Acceleration Strategies for Gaussian Mean-Shift Image Segmentation","authors":"M. A. Carreira-Perpiñán","doi":"10.1109/CVPR.2006.44","DOIUrl":"https://doi.org/10.1109/CVPR.2006.44","url":null,"abstract":"Gaussian mean-shift (GMS) is a clustering algorithm that has been shown to produce good image segmentations (where each pixel is represented as a feature vector with spatial and range components). GMS operates by defining a Gaussian kernel density estimate for the data and clustering together points that converge to the same mode under a fixed-point iterative scheme. However, the algorithm is slow, since its complexity is O(kN2), where N is the number of pixels and k the average number of iterations per pixel. We study four acceleration strategies for GMS based on the spatial structure of images and on the fact that GMS is an expectation-maximisation (EM) algorithm: spatial discretisation, spatial neighbourhood, sparse EM and EM-Newton algorithm. We show that the spatial discretisation strategy can accelerate GMS by one to two orders of magnitude while achieving essentially the same segmentation; and that the other strategies attain speedups of less than an order of magnitude.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123623649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}