A system that could automatically analyze the facial actions in real time have applications in a number of different fields. However, developing such a system is always a challenging task due to the richness, ambiguity, and dynamic nature of facial actions. Although a number of research groups attempt to recognize action units (AUs) by either improving facial feature extraction techniques, or the AU classification techniques, these methods often recognize AUs individually and statically, therefore ignoring the semantic relationships among AUs and the dynamics of AUs. Hence, these approaches cannot always recognize AUs reliably, robustly, and consistently. In this paper, we propose a novel approach for AUs classification, that systematically accounts for relationships among AUs and their temporal evolution. Specifically, we use a dynamic Bayesian network (DBN) to model the relationships among different AUs. The DBN provides a coherent and unified hierarchical probabilistic framework to represent probabilistic relationships among different AUs and account for the temporal changes in facial action development. Under our system, robust computer vision techniques are used to get AU measurements. And such AU measurements are then applied as evidence into the DBN for inferencing various AUs. The experiments show the integration of AU relationships and AU dynamics with AU image measurements yields significant improvements in AU recognition.
{"title":"Inferring Facial Action Units with Causal Relations","authors":"Yan Tong, Wenhui Liao, Q. Ji","doi":"10.1109/CVPR.2006.154","DOIUrl":"https://doi.org/10.1109/CVPR.2006.154","url":null,"abstract":"A system that could automatically analyze the facial actions in real time have applications in a number of different fields. However, developing such a system is always a challenging task due to the richness, ambiguity, and dynamic nature of facial actions. Although a number of research groups attempt to recognize action units (AUs) by either improving facial feature extraction techniques, or the AU classification techniques, these methods often recognize AUs individually and statically, therefore ignoring the semantic relationships among AUs and the dynamics of AUs. Hence, these approaches cannot always recognize AUs reliably, robustly, and consistently. In this paper, we propose a novel approach for AUs classification, that systematically accounts for relationships among AUs and their temporal evolution. Specifically, we use a dynamic Bayesian network (DBN) to model the relationships among different AUs. The DBN provides a coherent and unified hierarchical probabilistic framework to represent probabilistic relationships among different AUs and account for the temporal changes in facial action development. Under our system, robust computer vision techniques are used to get AU measurements. And such AU measurements are then applied as evidence into the DBN for inferencing various AUs. The experiments show the integration of AU relationships and AU dynamics with AU image measurements yields significant improvements in AU recognition.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124528414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The face recognition system based on the only single classifier considering the restricted information can not guarantee the generality and superiority of performances in a real situation. To challenge such problems, we propose the hybrid Fourier features extracted from different frequency bands and multiple face models. The hybrid Fourier feature comprises three different Fourier domains; merged real and imaginary components, Fourier spectrum and phase angle. When deriving Fourier features from three Fourier domains, we define three different frequency bandwidths, so that additional complementary features can be obtained. After this, they are individually classified by Linear Discriminant Analysis. This approach makes possible analyzing a face image from the various viewpoints to recognize identities. Moreover, we propose multiple face models based on different eye positions with a same image size, and it contributes to increasing the performance of the proposed system. We evaluated this proposed system using the Face Recognition Grand Challenge (FRGC) experimental protocols known as the largest data sets available. Experimental results on FRGC version 2.0 data sets has proven that the proposed method shows better verification rates than the baseline of FRGC on 2D frontal face images under various situations such as illumination changes, expression changes, and time elapses.
{"title":"Multiple Face Model of Hybrid Fourier Feature for Large Face Image Set","authors":"Wonjun Hwang, Gyu-tae Park, Jongha Lee, S. Kee","doi":"10.1109/CVPR.2006.201","DOIUrl":"https://doi.org/10.1109/CVPR.2006.201","url":null,"abstract":"The face recognition system based on the only single classifier considering the restricted information can not guarantee the generality and superiority of performances in a real situation. To challenge such problems, we propose the hybrid Fourier features extracted from different frequency bands and multiple face models. The hybrid Fourier feature comprises three different Fourier domains; merged real and imaginary components, Fourier spectrum and phase angle. When deriving Fourier features from three Fourier domains, we define three different frequency bandwidths, so that additional complementary features can be obtained. After this, they are individually classified by Linear Discriminant Analysis. This approach makes possible analyzing a face image from the various viewpoints to recognize identities. Moreover, we propose multiple face models based on different eye positions with a same image size, and it contributes to increasing the performance of the proposed system. We evaluated this proposed system using the Face Recognition Grand Challenge (FRGC) experimental protocols known as the largest data sets available. Experimental results on FRGC version 2.0 data sets has proven that the proposed method shows better verification rates than the baseline of FRGC on 2D frontal face images under various situations such as illumination changes, expression changes, and time elapses.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130891878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a tunable representation for tracking that simultaneously encodes appearance and geometry in a manner that enables the use of mean-shift iterations for tracking. The classic formulation of the tracking problem using mean-shift iterations encodes spatial information very loosely (i.e. using radially symmetric kernels). A problem with such a formulation is that it becomes easy for the tracker to get confused with other objects having the same feature distribution but different spatial configurations of features. Subsequent approaches have addressed this issue but not to the degree of generality required for tracking specific classes of objects and motions (e.g. humans walking). In this paper, we formulate the tracking problem in a manner that encodes the spatial configuration of features along with their density and yet retains robustness to spatial deformations and feature density variations. The encoding of spatial configuration is done using a set of kernels whose parameters can be optimized for a given class of objects and motions, off-line. The formulation enables the use of meanshift iterations and runs in real-time. We demonstrate better tracking results on synthetic and real image sequences as compared to the original mean-shift tracker.
{"title":"Tunable Kernels for Tracking","authors":"Vasu Parameswaran, Visvanathan Ramesh, Imad Zoghlami","doi":"10.1109/CVPR.2006.317","DOIUrl":"https://doi.org/10.1109/CVPR.2006.317","url":null,"abstract":"We present a tunable representation for tracking that simultaneously encodes appearance and geometry in a manner that enables the use of mean-shift iterations for tracking. The classic formulation of the tracking problem using mean-shift iterations encodes spatial information very loosely (i.e. using radially symmetric kernels). A problem with such a formulation is that it becomes easy for the tracker to get confused with other objects having the same feature distribution but different spatial configurations of features. Subsequent approaches have addressed this issue but not to the degree of generality required for tracking specific classes of objects and motions (e.g. humans walking). In this paper, we formulate the tracking problem in a manner that encodes the spatial configuration of features along with their density and yet retains robustness to spatial deformations and feature density variations. The encoding of spatial configuration is done using a set of kernels whose parameters can be optimized for a given class of objects and motions, off-line. The formulation enables the use of meanshift iterations and runs in real-time. We demonstrate better tracking results on synthetic and real image sequences as compared to the original mean-shift tracker.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128866321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shrinkage is a well known and appealing denoising technique. The use of shrinkage is known to be optimal for Gaussian white noise, provided that the sparsity on the signal’s representation is enforced using a unitary transform. Still, shrinkage is also practiced successfully with nonunitary, and even redundant representations. In this paper we shed some light on this behavior. We show that simple shrinkage could be interpreted as the first iteration of an algorithm that solves the basis pursuit denoising (BPDN) problem. Thus, this work leads to a novel iterative shrinkage algorithm that can be considered as an effective pursuit method. We demonstrate this algorithm, both on synthetic data, and for the image denoising problem, where we learn the image prior parameters directly from the given image. The results in both cases are superior to several popular alternatives.
{"title":"Image Denoising with Shrinkage and Redundant Representations","authors":"Michael Elad, Boaz Matalon, M. Zibulevsky","doi":"10.1109/CVPR.2006.143","DOIUrl":"https://doi.org/10.1109/CVPR.2006.143","url":null,"abstract":"Shrinkage is a well known and appealing denoising technique. The use of shrinkage is known to be optimal for Gaussian white noise, provided that the sparsity on the signal’s representation is enforced using a unitary transform. Still, shrinkage is also practiced successfully with nonunitary, and even redundant representations. In this paper we shed some light on this behavior. We show that simple shrinkage could be interpreted as the first iteration of an algorithm that solves the basis pursuit denoising (BPDN) problem. Thus, this work leads to a novel iterative shrinkage algorithm that can be considered as an effective pursuit method. We demonstrate this algorithm, both on synthetic data, and for the image denoising problem, where we learn the image prior parameters directly from the given image. The results in both cases are superior to several popular alternatives.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128898530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-target tracking requires locating the targets and labeling their identities. The latter is a challenge when many targets, with indistinct appearances, frequently occlude one another, as in football and surveillance tracking. We present an approach to solving this labeling problem. When isolated, a target can be tracked and its identity maintained. While, if targets interact this is not always the case. This paper assumes a track graph exists, denoting when targets are isolated and describing how they interact. Measures of similarity between isolated tracks are defined. The goal is to associate the identities of the isolated tracks, by exploiting the graph constraints and similarity measures. We formulate this as a Bayesian network inference problem, allowing us to use standard message propagation to find the most probable set of paths in an efficient way. The high complexity inevitable in large problems is gracefully reduced by removing dependency links between tracks. We apply the method to a 10 min sequence of an international football game and compare results to ground truth.
{"title":"Multi-Target Tracking - Linking Identities using Bayesian Network Inference","authors":"Peter Nillius, Josephine Sullivan, S. Carlsson","doi":"10.1109/CVPR.2006.198","DOIUrl":"https://doi.org/10.1109/CVPR.2006.198","url":null,"abstract":"Multi-target tracking requires locating the targets and labeling their identities. The latter is a challenge when many targets, with indistinct appearances, frequently occlude one another, as in football and surveillance tracking. We present an approach to solving this labeling problem. When isolated, a target can be tracked and its identity maintained. While, if targets interact this is not always the case. This paper assumes a track graph exists, denoting when targets are isolated and describing how they interact. Measures of similarity between isolated tracks are defined. The goal is to associate the identities of the isolated tracks, by exploiting the graph constraints and similarity measures. We formulate this as a Bayesian network inference problem, allowing us to use standard message propagation to find the most probable set of paths in an efficient way. The high complexity inevitable in large problems is gracefully reduced by removing dependency links between tracks. We apply the method to a 10 min sequence of an international football game and compare results to ground truth.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126920485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advances in single-view reconstruction (SVR) have been in modelling power (curved 2.5D surfaces) and automation (automatic photo pop-up). We extend SVR along both of these directions. We increase modelling power in several ways: (i) We represent general 3D surfaces, rather than 2.5D Monge patches; (ii) We describe a closed-form method to reconstruct a smooth surface from its image apparent contour, including multilocal singularities ("kidney-bean" self-occlusions); (iii) We show how to incorporate user-specified data such as surface normals, interpolation and approximation constraints; (iv) We show how this algorithm can be adapted to deal with surfaces of arbitrary genus. We also show how the modelling process can be automated for simple object shapes and views, using a-priori object class information. We demonstrate these advances on natural images drawn from a number of object classes.
{"title":"Single View Reconstruction of Curved Surfaces","authors":"Mukta Prasad, A. Fitzgibbon","doi":"10.1109/CVPR.2006.281","DOIUrl":"https://doi.org/10.1109/CVPR.2006.281","url":null,"abstract":"Recent advances in single-view reconstruction (SVR) have been in modelling power (curved 2.5D surfaces) and automation (automatic photo pop-up). We extend SVR along both of these directions. We increase modelling power in several ways: (i) We represent general 3D surfaces, rather than 2.5D Monge patches; (ii) We describe a closed-form method to reconstruct a smooth surface from its image apparent contour, including multilocal singularities (\"kidney-bean\" self-occlusions); (iii) We show how to incorporate user-specified data such as surface normals, interpolation and approximation constraints; (iv) We show how this algorithm can be adapted to deal with surfaces of arbitrary genus. We also show how the modelling process can be automated for simple object shapes and views, using a-priori object class information. We demonstrate these advances on natural images drawn from a number of object classes.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124089754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a generative model and learning procedure for unsupervised video clustering into scenes. The work addresses two important problems: realistic modeling of the sources of variability in the video and fast transformation invariant frame clustering. We suggest a solution to the problem of computationally intensive learning in this model by combining the recursive model estimation, fast inference, and on-line learning. Thus, we achieve real time frame clustering performance. Novel aspects of this method include an algorithm for the clustering of Gaussian mixtures, and the fast computation of the KL divergence between two mixtures of Gaussians. The efficiency and the performance of clustering and KL approximation methods are demonstrated. We also present novel video browsing tool based on the visualization of the variables in the generative model.
{"title":"Recursive estimation of generative models of video","authors":"Nemanja Petrović, A. Ivanovic, N. Jojic","doi":"10.1109/CVPR.2006.248","DOIUrl":"https://doi.org/10.1109/CVPR.2006.248","url":null,"abstract":"In this paper we present a generative model and learning procedure for unsupervised video clustering into scenes. The work addresses two important problems: realistic modeling of the sources of variability in the video and fast transformation invariant frame clustering. We suggest a solution to the problem of computationally intensive learning in this model by combining the recursive model estimation, fast inference, and on-line learning. Thus, we achieve real time frame clustering performance. Novel aspects of this method include an algorithm for the clustering of Gaussian mixtures, and the fast computation of the KL divergence between two mixtures of Gaussians. The efficiency and the performance of clustering and KL approximation methods are demonstrated. We also present novel video browsing tool based on the visualization of the variables in the generative model.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124121065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A wide range of computer vision applications such as distance field computation, shape from shading, and shape representation require an accurate solution of a particular Hamilton-Jacobi (HJ) equation, known as the Eikonal equation. Although the fast marching method (FMM) is the most stable and consistent method among existing techniques for solving such equation, it suffers from large numerical error along diagonal directions as well as its computational complexity is not optimal. In this paper, we propose an improved version of the FMMthat is both highly accurate and computationally efficient for Cartesian domains. The new method is called the multi-stencils fast marching (MSFM), which computes the solution at each grid point by solving the Eikonal equation along several stencils and then picks the solution that satisfies the fast marching causality relationship. The stencils are centered at each grid point x and cover its entire nearest neighbors. In 2D space, 2 stencils cover the 8-neighbors of x, while in 3D space, 6 stencils cover its 26-neighbors. For those stencils that are not aligned with the natural coordinate system, the Eikonal equation is derived using directional derivatives and then solved using a higher order finite difference scheme.
{"title":"Accurate Tracking of Monotonically Advancing Fronts","authors":"M. Hassouna, A. Farag","doi":"10.1109/CVPR.2006.46","DOIUrl":"https://doi.org/10.1109/CVPR.2006.46","url":null,"abstract":"A wide range of computer vision applications such as distance field computation, shape from shading, and shape representation require an accurate solution of a particular Hamilton-Jacobi (HJ) equation, known as the Eikonal equation. Although the fast marching method (FMM) is the most stable and consistent method among existing techniques for solving such equation, it suffers from large numerical error along diagonal directions as well as its computational complexity is not optimal. In this paper, we propose an improved version of the FMMthat is both highly accurate and computationally efficient for Cartesian domains. The new method is called the multi-stencils fast marching (MSFM), which computes the solution at each grid point by solving the Eikonal equation along several stencils and then picks the solution that satisfies the fast marching causality relationship. The stencils are centered at each grid point x and cover its entire nearest neighbors. In 2D space, 2 stencils cover the 8-neighbors of x, while in 3D space, 6 stencils cover its 26-neighbors. For those stencils that are not aligned with the natural coordinate system, the Eikonal equation is derived using directional derivatives and then solved using a higher order finite difference scheme.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126214536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The wide availability of GPS sensors is changing the landscape in the applications of structure from motion techniques for localization. In this paper, we study the problem of estimating camera orientations from multiple views, given the positions of the viewpoints in a world coordinate system and a set of point correspondences across the views. Given three or more views, the above problem has a finite number of solutions for three or more point correspondences. Given six or more views, the problem has a finite number of solutions for just two or more points. In the three-view case, we show the necessary and sufficient conditions for the three essential matrices to be consistent with a set of known baselines. We also introduce a method to recover the absolute orientations of three views in world coordinates from their essential matrices. To refine these estimates we perform a least-squares minimization on the group cross product SO(3) × SO(3) × SO(3). We report experiments on synthetic data and on data from the ICCV2005 Computer Vision Contest.
{"title":"Structure from Motion with Known Camera Positions","authors":"R. Carceroni, Ankita Kumar, Kostas Daniilidis","doi":"10.1109/CVPR.2006.296","DOIUrl":"https://doi.org/10.1109/CVPR.2006.296","url":null,"abstract":"The wide availability of GPS sensors is changing the landscape in the applications of structure from motion techniques for localization. In this paper, we study the problem of estimating camera orientations from multiple views, given the positions of the viewpoints in a world coordinate system and a set of point correspondences across the views. Given three or more views, the above problem has a finite number of solutions for three or more point correspondences. Given six or more views, the problem has a finite number of solutions for just two or more points. In the three-view case, we show the necessary and sufficient conditions for the three essential matrices to be consistent with a set of known baselines. We also introduce a method to recover the absolute orientations of three views in world coordinates from their essential matrices. To refine these estimates we perform a least-squares minimization on the group cross product SO(3) × SO(3) × SO(3). We report experiments on synthetic data and on data from the ICCV2005 Computer Vision Contest.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127917884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edge detection is one of the most studied problems in computer vision, yet it remains a very challenging task. It is difficult since often the decision for an edge cannot be made purely based on low level cues such as gradient, instead we need to engage all levels of information, low, middle, and high, in order to decide where to put edges. In this paper we propose a novel supervised learning algorithm for edge and object boundary detection which we refer to as Boosted Edge Learning or BEL for short. A decision of an edge point is made independently at each location in the image; a very large aperture is used providing significant context for each decision. In the learning stage, the algorithm selects and combines a large number of features across different scales in order to learn a discriminative model using an extended version of the Probabilistic Boosting Tree classification algorithm. The learning based framework is highly adaptive and there are no parameters to tune. We show applications for edge detection in a number of specific image domains as well as on natural images. We test on various datasets including the Berkeley dataset and the results obtained are very good.
{"title":"Supervised Learning of Edges and Object Boundaries","authors":"Piotr Dollár, Z. Tu, Serge J. Belongie","doi":"10.1109/CVPR.2006.298","DOIUrl":"https://doi.org/10.1109/CVPR.2006.298","url":null,"abstract":"Edge detection is one of the most studied problems in computer vision, yet it remains a very challenging task. It is difficult since often the decision for an edge cannot be made purely based on low level cues such as gradient, instead we need to engage all levels of information, low, middle, and high, in order to decide where to put edges. In this paper we propose a novel supervised learning algorithm for edge and object boundary detection which we refer to as Boosted Edge Learning or BEL for short. A decision of an edge point is made independently at each location in the image; a very large aperture is used providing significant context for each decision. In the learning stage, the algorithm selects and combines a large number of features across different scales in order to learn a discriminative model using an extended version of the Probabilistic Boosting Tree classification algorithm. The learning based framework is highly adaptive and there are no parameters to tune. We show applications for edge detection in a number of specific image domains as well as on natural images. We test on various datasets including the Berkeley dataset and the results obtained are very good.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127558247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}