In this paper we present a generative model and learning procedure for unsupervised video clustering into scenes. The work addresses two important problems: realistic modeling of the sources of variability in the video and fast transformation invariant frame clustering. We suggest a solution to the problem of computationally intensive learning in this model by combining the recursive model estimation, fast inference, and on-line learning. Thus, we achieve real time frame clustering performance. Novel aspects of this method include an algorithm for the clustering of Gaussian mixtures, and the fast computation of the KL divergence between two mixtures of Gaussians. The efficiency and the performance of clustering and KL approximation methods are demonstrated. We also present novel video browsing tool based on the visualization of the variables in the generative model.
{"title":"Recursive estimation of generative models of video","authors":"Nemanja Petrović, A. Ivanovic, N. Jojic","doi":"10.1109/CVPR.2006.248","DOIUrl":"https://doi.org/10.1109/CVPR.2006.248","url":null,"abstract":"In this paper we present a generative model and learning procedure for unsupervised video clustering into scenes. The work addresses two important problems: realistic modeling of the sources of variability in the video and fast transformation invariant frame clustering. We suggest a solution to the problem of computationally intensive learning in this model by combining the recursive model estimation, fast inference, and on-line learning. Thus, we achieve real time frame clustering performance. Novel aspects of this method include an algorithm for the clustering of Gaussian mixtures, and the fast computation of the KL divergence between two mixtures of Gaussians. The efficiency and the performance of clustering and KL approximation methods are demonstrated. We also present novel video browsing tool based on the visualization of the variables in the generative model.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124121065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The wide availability of GPS sensors is changing the landscape in the applications of structure from motion techniques for localization. In this paper, we study the problem of estimating camera orientations from multiple views, given the positions of the viewpoints in a world coordinate system and a set of point correspondences across the views. Given three or more views, the above problem has a finite number of solutions for three or more point correspondences. Given six or more views, the problem has a finite number of solutions for just two or more points. In the three-view case, we show the necessary and sufficient conditions for the three essential matrices to be consistent with a set of known baselines. We also introduce a method to recover the absolute orientations of three views in world coordinates from their essential matrices. To refine these estimates we perform a least-squares minimization on the group cross product SO(3) × SO(3) × SO(3). We report experiments on synthetic data and on data from the ICCV2005 Computer Vision Contest.
{"title":"Structure from Motion with Known Camera Positions","authors":"R. Carceroni, Ankita Kumar, Kostas Daniilidis","doi":"10.1109/CVPR.2006.296","DOIUrl":"https://doi.org/10.1109/CVPR.2006.296","url":null,"abstract":"The wide availability of GPS sensors is changing the landscape in the applications of structure from motion techniques for localization. In this paper, we study the problem of estimating camera orientations from multiple views, given the positions of the viewpoints in a world coordinate system and a set of point correspondences across the views. Given three or more views, the above problem has a finite number of solutions for three or more point correspondences. Given six or more views, the problem has a finite number of solutions for just two or more points. In the three-view case, we show the necessary and sufficient conditions for the three essential matrices to be consistent with a set of known baselines. We also introduce a method to recover the absolute orientations of three views in world coordinates from their essential matrices. To refine these estimates we perform a least-squares minimization on the group cross product SO(3) × SO(3) × SO(3). We report experiments on synthetic data and on data from the ICCV2005 Computer Vision Contest.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127917884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fan Zhang, Y. Yoo, Yongmin Kim, Lichen Zhang, L. M. Koh
A new noise reduction and edge enhancement method, i.e., Laplacian pyramid-based nonlinear diffusion and shock filter (LPNDSF), is proposed for medical ultrasound imaging. In the proposed LPNDSF, a coupled nonlinear diffusion and shock filter process is applied in Laplacian pyramid domain of an image, to remove speckle and enhance edges simultaneously. The performance of the proposed method was evaluated on a phantom and a real ultrasound image. In the phantom study, we obtained an average gain of 0.55 and 1.11 in contrast-to-noise ratio compared to the speckle reducing anisotropic diffusion (SRAD) and nonlinear coherent diffusion (NCD), respectively. Also, the proposed LPNDSF showed clearer boundaries on the phantom and the real ultrasound image. These preliminary results indicate that the proposed LPNDSF can effectively reduce speckle noise while enhancing image edges for retaining subtle features.
{"title":"Multiscale Nonlinear Diffusion and Shock Filter for Ultrasound Image Enhancement","authors":"Fan Zhang, Y. Yoo, Yongmin Kim, Lichen Zhang, L. M. Koh","doi":"10.1109/CVPR.2006.203","DOIUrl":"https://doi.org/10.1109/CVPR.2006.203","url":null,"abstract":"A new noise reduction and edge enhancement method, i.e., Laplacian pyramid-based nonlinear diffusion and shock filter (LPNDSF), is proposed for medical ultrasound imaging. In the proposed LPNDSF, a coupled nonlinear diffusion and shock filter process is applied in Laplacian pyramid domain of an image, to remove speckle and enhance edges simultaneously. The performance of the proposed method was evaluated on a phantom and a real ultrasound image. In the phantom study, we obtained an average gain of 0.55 and 1.11 in contrast-to-noise ratio compared to the speckle reducing anisotropic diffusion (SRAD) and nonlinear coherent diffusion (NCD), respectively. Also, the proposed LPNDSF showed clearer boundaries on the phantom and the real ultrasound image. These preliminary results indicate that the proposed LPNDSF can effectively reduce speckle noise while enhancing image edges for retaining subtle features.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128414186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shrinkage is a well known and appealing denoising technique. The use of shrinkage is known to be optimal for Gaussian white noise, provided that the sparsity on the signal’s representation is enforced using a unitary transform. Still, shrinkage is also practiced successfully with nonunitary, and even redundant representations. In this paper we shed some light on this behavior. We show that simple shrinkage could be interpreted as the first iteration of an algorithm that solves the basis pursuit denoising (BPDN) problem. Thus, this work leads to a novel iterative shrinkage algorithm that can be considered as an effective pursuit method. We demonstrate this algorithm, both on synthetic data, and for the image denoising problem, where we learn the image prior parameters directly from the given image. The results in both cases are superior to several popular alternatives.
{"title":"Image Denoising with Shrinkage and Redundant Representations","authors":"Michael Elad, Boaz Matalon, M. Zibulevsky","doi":"10.1109/CVPR.2006.143","DOIUrl":"https://doi.org/10.1109/CVPR.2006.143","url":null,"abstract":"Shrinkage is a well known and appealing denoising technique. The use of shrinkage is known to be optimal for Gaussian white noise, provided that the sparsity on the signal’s representation is enforced using a unitary transform. Still, shrinkage is also practiced successfully with nonunitary, and even redundant representations. In this paper we shed some light on this behavior. We show that simple shrinkage could be interpreted as the first iteration of an algorithm that solves the basis pursuit denoising (BPDN) problem. Thus, this work leads to a novel iterative shrinkage algorithm that can be considered as an effective pursuit method. We demonstrate this algorithm, both on synthetic data, and for the image denoising problem, where we learn the image prior parameters directly from the given image. The results in both cases are superior to several popular alternatives.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128898530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The face recognition system based on the only single classifier considering the restricted information can not guarantee the generality and superiority of performances in a real situation. To challenge such problems, we propose the hybrid Fourier features extracted from different frequency bands and multiple face models. The hybrid Fourier feature comprises three different Fourier domains; merged real and imaginary components, Fourier spectrum and phase angle. When deriving Fourier features from three Fourier domains, we define three different frequency bandwidths, so that additional complementary features can be obtained. After this, they are individually classified by Linear Discriminant Analysis. This approach makes possible analyzing a face image from the various viewpoints to recognize identities. Moreover, we propose multiple face models based on different eye positions with a same image size, and it contributes to increasing the performance of the proposed system. We evaluated this proposed system using the Face Recognition Grand Challenge (FRGC) experimental protocols known as the largest data sets available. Experimental results on FRGC version 2.0 data sets has proven that the proposed method shows better verification rates than the baseline of FRGC on 2D frontal face images under various situations such as illumination changes, expression changes, and time elapses.
{"title":"Multiple Face Model of Hybrid Fourier Feature for Large Face Image Set","authors":"Wonjun Hwang, Gyu-tae Park, Jongha Lee, S. Kee","doi":"10.1109/CVPR.2006.201","DOIUrl":"https://doi.org/10.1109/CVPR.2006.201","url":null,"abstract":"The face recognition system based on the only single classifier considering the restricted information can not guarantee the generality and superiority of performances in a real situation. To challenge such problems, we propose the hybrid Fourier features extracted from different frequency bands and multiple face models. The hybrid Fourier feature comprises three different Fourier domains; merged real and imaginary components, Fourier spectrum and phase angle. When deriving Fourier features from three Fourier domains, we define three different frequency bandwidths, so that additional complementary features can be obtained. After this, they are individually classified by Linear Discriminant Analysis. This approach makes possible analyzing a face image from the various viewpoints to recognize identities. Moreover, we propose multiple face models based on different eye positions with a same image size, and it contributes to increasing the performance of the proposed system. We evaluated this proposed system using the Face Recognition Grand Challenge (FRGC) experimental protocols known as the largest data sets available. Experimental results on FRGC version 2.0 data sets has proven that the proposed method shows better verification rates than the baseline of FRGC on 2D frontal face images under various situations such as illumination changes, expression changes, and time elapses.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130891878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. E. Solem, N. C. Overgaard, Markus Persson, A. Heyden
In this paper we consider region-based variational segmentation of two- and three-dimensional images by the minimization of functionals whose fidelity term is the quotient of two integrals. Users often refrain from quotient functionals, even when they seem to be the most natural choice, probably because the corresponding gradient descent PDEs are nonlocal and hence require the computation of global properties. Here it is shown how this problem may be overcome by employing the structure of the Euler-Lagrange equation of the fidelity term to construct a good initialization for the gradient descent PDE, which will then converge rapidly to the desired (local) minimum. The initializer is found by making a one-dimensional search among the level sets of a function related to the fidelity term, picking the level set which minimizes the segmentation functional. This partial extremal initialization is tested on a medical segmentation problem with velocity- and intensity data from MR images. In this particular application, the partial extremal initialization speeds up the segmentation by two orders of magnitude compared to straight forward gradient descent.
{"title":"Fast Variational Segmentation using Partial Extremal Initialization","authors":"J. E. Solem, N. C. Overgaard, Markus Persson, A. Heyden","doi":"10.1109/CVPR.2006.120","DOIUrl":"https://doi.org/10.1109/CVPR.2006.120","url":null,"abstract":"In this paper we consider region-based variational segmentation of two- and three-dimensional images by the minimization of functionals whose fidelity term is the quotient of two integrals. Users often refrain from quotient functionals, even when they seem to be the most natural choice, probably because the corresponding gradient descent PDEs are nonlocal and hence require the computation of global properties. Here it is shown how this problem may be overcome by employing the structure of the Euler-Lagrange equation of the fidelity term to construct a good initialization for the gradient descent PDE, which will then converge rapidly to the desired (local) minimum. The initializer is found by making a one-dimensional search among the level sets of a function related to the fidelity term, picking the level set which minimizes the segmentation functional. This partial extremal initialization is tested on a medical segmentation problem with velocity- and intensity data from MR images. In this particular application, the partial extremal initialization speeds up the segmentation by two orders of magnitude compared to straight forward gradient descent.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131021735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we propose diffusion distance, a new dissimilarity measure between histogram-based descriptors. We define the difference between two histograms to be a temperature field. We then study the relationship between histogram similarity and a diffusion process, showing how diffusion handles deformation as well as quantization effects. As a result, the diffusion distance is derived as the sum of dissimilarities over scales. Being a cross-bin histogram distance, the diffusion distance is robust to deformation, lighting change and noise in histogram-based local descriptors. In addition, it enjoys linear computational complexity which significantly improves previously proposed cross-bin distances with quadratic complexity or higher. We tested the proposed approach on both shape recognition and interest point matching tasks using several multi-dimensional histogram-based descriptors including shape context, SIFT, and spin images. In all experiments, the diffusion distance performs excellently in both accuracy and efficiency in comparison with other state-of-the-art distance measures. In particular, it performs as accurately as the Earth Mover’s Distance with much greater efficiency.
{"title":"Diffusion Distance for Histogram Comparison","authors":"Haibin Ling, K. Okada","doi":"10.1109/CVPR.2006.99","DOIUrl":"https://doi.org/10.1109/CVPR.2006.99","url":null,"abstract":"In this paper we propose diffusion distance, a new dissimilarity measure between histogram-based descriptors. We define the difference between two histograms to be a temperature field. We then study the relationship between histogram similarity and a diffusion process, showing how diffusion handles deformation as well as quantization effects. As a result, the diffusion distance is derived as the sum of dissimilarities over scales. Being a cross-bin histogram distance, the diffusion distance is robust to deformation, lighting change and noise in histogram-based local descriptors. In addition, it enjoys linear computational complexity which significantly improves previously proposed cross-bin distances with quadratic complexity or higher. We tested the proposed approach on both shape recognition and interest point matching tasks using several multi-dimensional histogram-based descriptors including shape context, SIFT, and spin images. In all experiments, the diffusion distance performs excellently in both accuracy and efficiency in comparison with other state-of-the-art distance measures. In particular, it performs as accurately as the Earth Mover’s Distance with much greater efficiency.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129278327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a general methodology for automated recognition of complex human activities. The methodology uses a context-free grammar (CFG) based representation scheme to represent composite actions and interactions. The CFG-based representation enables us to formally define complex human activities based on simple actions or movements. Human activities are classified into three categories: atomic action, composite action, and interaction. Our system is not only able to represent complex human activities formally, but also able to recognize represented actions and interactions with high accuracy. Image sequences are processed to extract poses and gestures. Based on gestures, the system detects actions and interactions occurring in a sequence of image frames. Our results show that the system is able to represent composite actions and interactions naturally. The system was tested to represent and recognize eight types of interactions: approach, depart, point, shake-hands, hug, punch, kick, and push. The experiments show that the system can recognize sequences of represented composite actions and interactions with a high recognition rate.
{"title":"Recognition of Composite Human Activities through Context-Free Grammar Based Representation","authors":"M. Ryoo, J. Aggarwal","doi":"10.1109/CVPR.2006.242","DOIUrl":"https://doi.org/10.1109/CVPR.2006.242","url":null,"abstract":"This paper describes a general methodology for automated recognition of complex human activities. The methodology uses a context-free grammar (CFG) based representation scheme to represent composite actions and interactions. The CFG-based representation enables us to formally define complex human activities based on simple actions or movements. Human activities are classified into three categories: atomic action, composite action, and interaction. Our system is not only able to represent complex human activities formally, but also able to recognize represented actions and interactions with high accuracy. Image sequences are processed to extract poses and gestures. Based on gestures, the system detects actions and interactions occurring in a sequence of image frames. Our results show that the system is able to represent composite actions and interactions naturally. The system was tested to represent and recognize eight types of interactions: approach, depart, point, shake-hands, hug, punch, kick, and push. The experiments show that the system can recognize sequences of represented composite actions and interactions with a high recognition rate.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124417063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a tunable representation for tracking that simultaneously encodes appearance and geometry in a manner that enables the use of mean-shift iterations for tracking. The classic formulation of the tracking problem using mean-shift iterations encodes spatial information very loosely (i.e. using radially symmetric kernels). A problem with such a formulation is that it becomes easy for the tracker to get confused with other objects having the same feature distribution but different spatial configurations of features. Subsequent approaches have addressed this issue but not to the degree of generality required for tracking specific classes of objects and motions (e.g. humans walking). In this paper, we formulate the tracking problem in a manner that encodes the spatial configuration of features along with their density and yet retains robustness to spatial deformations and feature density variations. The encoding of spatial configuration is done using a set of kernels whose parameters can be optimized for a given class of objects and motions, off-line. The formulation enables the use of meanshift iterations and runs in real-time. We demonstrate better tracking results on synthetic and real image sequences as compared to the original mean-shift tracker.
{"title":"Tunable Kernels for Tracking","authors":"Vasu Parameswaran, Visvanathan Ramesh, Imad Zoghlami","doi":"10.1109/CVPR.2006.317","DOIUrl":"https://doi.org/10.1109/CVPR.2006.317","url":null,"abstract":"We present a tunable representation for tracking that simultaneously encodes appearance and geometry in a manner that enables the use of mean-shift iterations for tracking. The classic formulation of the tracking problem using mean-shift iterations encodes spatial information very loosely (i.e. using radially symmetric kernels). A problem with such a formulation is that it becomes easy for the tracker to get confused with other objects having the same feature distribution but different spatial configurations of features. Subsequent approaches have addressed this issue but not to the degree of generality required for tracking specific classes of objects and motions (e.g. humans walking). In this paper, we formulate the tracking problem in a manner that encodes the spatial configuration of features along with their density and yet retains robustness to spatial deformations and feature density variations. The encoding of spatial configuration is done using a set of kernels whose parameters can be optimized for a given class of objects and motions, off-line. The formulation enables the use of meanshift iterations and runs in real-time. We demonstrate better tracking results on synthetic and real image sequences as compared to the original mean-shift tracker.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128866321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edge detection is one of the most studied problems in computer vision, yet it remains a very challenging task. It is difficult since often the decision for an edge cannot be made purely based on low level cues such as gradient, instead we need to engage all levels of information, low, middle, and high, in order to decide where to put edges. In this paper we propose a novel supervised learning algorithm for edge and object boundary detection which we refer to as Boosted Edge Learning or BEL for short. A decision of an edge point is made independently at each location in the image; a very large aperture is used providing significant context for each decision. In the learning stage, the algorithm selects and combines a large number of features across different scales in order to learn a discriminative model using an extended version of the Probabilistic Boosting Tree classification algorithm. The learning based framework is highly adaptive and there are no parameters to tune. We show applications for edge detection in a number of specific image domains as well as on natural images. We test on various datasets including the Berkeley dataset and the results obtained are very good.
{"title":"Supervised Learning of Edges and Object Boundaries","authors":"Piotr Dollár, Z. Tu, Serge J. Belongie","doi":"10.1109/CVPR.2006.298","DOIUrl":"https://doi.org/10.1109/CVPR.2006.298","url":null,"abstract":"Edge detection is one of the most studied problems in computer vision, yet it remains a very challenging task. It is difficult since often the decision for an edge cannot be made purely based on low level cues such as gradient, instead we need to engage all levels of information, low, middle, and high, in order to decide where to put edges. In this paper we propose a novel supervised learning algorithm for edge and object boundary detection which we refer to as Boosted Edge Learning or BEL for short. A decision of an edge point is made independently at each location in the image; a very large aperture is used providing significant context for each decision. In the learning stage, the algorithm selects and combines a large number of features across different scales in order to learn a discriminative model using an extended version of the Probabilistic Boosting Tree classification algorithm. The learning based framework is highly adaptive and there are no parameters to tune. We show applications for edge detection in a number of specific image domains as well as on natural images. We test on various datasets including the Berkeley dataset and the results obtained are very good.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127558247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}