Pub Date : 2009-06-20DOI: 10.1109/CVPRW.2009.5204054
D. Beymer, T. Syeda-Mahmood, A. Amir, Fei Wang, Scott Adelman
Echocardiography is often used to diagnose cardiac diseases related to regional and valvular motion abnormalities. Due to the low resolution of the imaging modality, the choice of viewpoint and mode, and the experience of the sonographers, there is a large variance in the estimation of important diagnostic measurements such as ejection fraction. In this paper, we develop an automatic algorithm to estimate diagnostic measurements from raw echocardiogram video sequences. Specifically, we locate and track the left ventricular region over a heart cycle using active shape models. We also present efficient ventricular localization in video sequences by automatically detecting and propagating echocardiographer annotations. Results on a large database of cardiac echo videos demonstrate the use of our method for the prediction of left ventricular dysfunction.
{"title":"Automatic estimation of left ventricular dysfunction from echocardiogram videos","authors":"D. Beymer, T. Syeda-Mahmood, A. Amir, Fei Wang, Scott Adelman","doi":"10.1109/CVPRW.2009.5204054","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204054","url":null,"abstract":"Echocardiography is often used to diagnose cardiac diseases related to regional and valvular motion abnormalities. Due to the low resolution of the imaging modality, the choice of viewpoint and mode, and the experience of the sonographers, there is a large variance in the estimation of important diagnostic measurements such as ejection fraction. In this paper, we develop an automatic algorithm to estimate diagnostic measurements from raw echocardiogram video sequences. Specifically, we locate and track the left ventricular region over a heart cycle using active shape models. We also present efficient ventricular localization in video sequences by automatically detecting and propagating echocardiographer annotations. Results on a large database of cardiac echo videos demonstrate the use of our method for the prediction of left ventricular dysfunction.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123469416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPRW.2009.5204304
Josh Harguess, J. Aggarwal
We observe that the human face is inherently symmetric and we would like to exploit this symmetry in face recognition. The average-half-face has been previously shown to do just that for a set of 3D faces when using eigenfaces for recognition. We build upon that work and present a comparison of the use of the average-half-face to the use of the original full face with 6 different algorithms applied to two- and three-dimensional (2D and 3D) databases. The average-half-face is constructed from the full frontal face image in two steps; first the face image is centered and divided in half and then the two halves are averaged together (reversing the columns of one of the halves). The resulting average-half-face is then used as the input for face recognition algorithms. Previous work has shown that the accuracy of 3D face recognition using eigenfaces with the average-half-face is significantly better than using the full face. We compare the results using the average-half-face and the full face using six face recognition methods; eigenfaces, multi-linear principal components analysis (MPCA), MPCA with linear discriminant analysis (MPCALDA), Fisherfaces (LDA), independent component analysis (ICA), and support vector machines (SVM). We utilize two well-known 2D face database as well as a 3D face database for the comparison. Our results show that in most cases it is superior to employ the average-half-face for frontal face recognition. The consequences of this discovery may result in substantial savings in storage and computation time.
{"title":"A case for the average-half-face in 2D and 3D for face recognition","authors":"Josh Harguess, J. Aggarwal","doi":"10.1109/CVPRW.2009.5204304","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204304","url":null,"abstract":"We observe that the human face is inherently symmetric and we would like to exploit this symmetry in face recognition. The average-half-face has been previously shown to do just that for a set of 3D faces when using eigenfaces for recognition. We build upon that work and present a comparison of the use of the average-half-face to the use of the original full face with 6 different algorithms applied to two- and three-dimensional (2D and 3D) databases. The average-half-face is constructed from the full frontal face image in two steps; first the face image is centered and divided in half and then the two halves are averaged together (reversing the columns of one of the halves). The resulting average-half-face is then used as the input for face recognition algorithms. Previous work has shown that the accuracy of 3D face recognition using eigenfaces with the average-half-face is significantly better than using the full face. We compare the results using the average-half-face and the full face using six face recognition methods; eigenfaces, multi-linear principal components analysis (MPCA), MPCA with linear discriminant analysis (MPCALDA), Fisherfaces (LDA), independent component analysis (ICA), and support vector machines (SVM). We utilize two well-known 2D face database as well as a 3D face database for the comparison. Our results show that in most cases it is superior to employ the average-half-face for frontal face recognition. The consequences of this discovery may result in substantial savings in storage and computation time.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122087964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPRW.2009.5204294
Yufu Qu, Tao Wang, Zhigang Zhu
To address the challenges of noncooperative, large-distance human signature detection, we present a novel multimodal remote audio/video acquisition system. The system mainly consists of a laser Doppler virbometer (LDV) and a pan-tilt-zoom (PTZ) camera. The LDV is a unique remote hearing sensor that uses the principle of laser interferometry. However, it needs an appropriate surface to modulate the speech of a human subject and reflect the laser beam to the LDV receiver. The manual operation to turn the laser beam onto a target is very difficult at a distance of more than 20 meters. Therefore, the PTZ camera is used to capture the video of the human subject, track the subject when he/she moves, and analyze the image to get a good reflection surface for LDV measurements in real-time. Experiments show that the integration of those two sensory components is ideal for multimodal human signature detection at a large distance.
{"title":"Remote audio/video acquisition for human signature detection","authors":"Yufu Qu, Tao Wang, Zhigang Zhu","doi":"10.1109/CVPRW.2009.5204294","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204294","url":null,"abstract":"To address the challenges of noncooperative, large-distance human signature detection, we present a novel multimodal remote audio/video acquisition system. The system mainly consists of a laser Doppler virbometer (LDV) and a pan-tilt-zoom (PTZ) camera. The LDV is a unique remote hearing sensor that uses the principle of laser interferometry. However, it needs an appropriate surface to modulate the speech of a human subject and reflect the laser beam to the LDV receiver. The manual operation to turn the laser beam onto a target is very difficult at a distance of more than 20 meters. Therefore, the PTZ camera is used to capture the video of the human subject, track the subject when he/she moves, and analyze the image to get a good reflection surface for LDV measurements in real-time. Experiments show that the integration of those two sensory components is ideal for multimodal human signature detection at a large distance.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122266699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPRW.2009.5204299
N. Ozay, Yan Tong, F. Wheeler, Xiaoming Liu
This paper addresses the problem of developing facial image quality metrics that are predictive of the performance of existing biometric matching algorithms and incorporating the quality estimates into the recognition decision process to improve overall performance. The first task we consider is the separation of probe/gallery qualities since the match score depends on both. Given a set of training images of the same individual, we find the match scores between all possible probe/gallery image pairs. Then, we define symmetric normalized match score for any pair, model it as the average of the qualities of probe/gallery corrupted by additive noise, and estimate the quality values such that the noise is minimized. To utilize quality in the decision process, we employ a Bayesian network to model the relationships among qualities, predefined quality related image features and recognition. The recognition decision is made by probabilistic inference via this model. We illustrate with various face verification experiments that incorporating quality into the decision process can improve the performance significantly.
{"title":"Improving face recognition with a quality-based probabilistic framework","authors":"N. Ozay, Yan Tong, F. Wheeler, Xiaoming Liu","doi":"10.1109/CVPRW.2009.5204299","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204299","url":null,"abstract":"This paper addresses the problem of developing facial image quality metrics that are predictive of the performance of existing biometric matching algorithms and incorporating the quality estimates into the recognition decision process to improve overall performance. The first task we consider is the separation of probe/gallery qualities since the match score depends on both. Given a set of training images of the same individual, we find the match scores between all possible probe/gallery image pairs. Then, we define symmetric normalized match score for any pair, model it as the average of the qualities of probe/gallery corrupted by additive noise, and estimate the quality values such that the noise is minimized. To utilize quality in the decision process, we employ a Bayesian network to model the relationships among qualities, predefined quality related image features and recognition. The recognition decision is made by probabilistic inference via this model. We illustrate with various face verification experiments that incorporating quality into the decision process can improve the performance significantly.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130965886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPRW.2009.5204300
J. Heo, M. Savvides
In this paper we propose a novel method of generating 3D morphable models (3DMMs) from 2D images. We develop algorithms of 3D face reconstruction from a sparse set of points acquired from 2D images. In order to establish correspondence between images precisely, we combined active shape models (ASMs) and active appearance models (AAMs)(CASAAMs) in an intelligent way, showing improved performance on pixel-level accuracy and generalization to unseen faces. The CASAAMs are applied to the images of different views of the same person to extract facial shapes across pose. These 2D shapes are combined for reconstructing a sparse 3D model. The point density of the model is increased by the loop subdivision method, which generates new vertices by a weighted sum of the existing vertices. Then, the depth of the dense 3D model is modified with an average 3D depth-map in order to preserve facial structure more realistically. Finally, all 249 3D models with expression changes are combined to generate a 3DMM for a compact representation. The first session of the multi-PIE database, consisting of 249 persons with expression and illumination changes, is used for the modeling. Unlike typical 3DMMs, our model can generate 3D human faces more realistically and efficiently (2-3 seconds on P4 machine) under diverse illumination conditions.
{"title":"In between 3D Active Appearance Models and 3D Morphable Models","authors":"J. Heo, M. Savvides","doi":"10.1109/CVPRW.2009.5204300","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204300","url":null,"abstract":"In this paper we propose a novel method of generating 3D morphable models (3DMMs) from 2D images. We develop algorithms of 3D face reconstruction from a sparse set of points acquired from 2D images. In order to establish correspondence between images precisely, we combined active shape models (ASMs) and active appearance models (AAMs)(CASAAMs) in an intelligent way, showing improved performance on pixel-level accuracy and generalization to unseen faces. The CASAAMs are applied to the images of different views of the same person to extract facial shapes across pose. These 2D shapes are combined for reconstructing a sparse 3D model. The point density of the model is increased by the loop subdivision method, which generates new vertices by a weighted sum of the existing vertices. Then, the depth of the dense 3D model is modified with an average 3D depth-map in order to preserve facial structure more realistically. Finally, all 249 3D models with expression changes are combined to generate a 3DMM for a compact representation. The first session of the multi-PIE database, consisting of 249 persons with expression and illumination changes, is used for the modeling. Unlike typical 3DMMs, our model can generate 3D human faces more realistically and efficiently (2-3 seconds on P4 machine) under diverse illumination conditions.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"191 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133720200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPRW.2009.5204236
Özge Öztimur Karadag, F. Yarman-Vural
Automatic image annotation is the process of assigning keywords to digital images depending on the content information. In one sense, it is a mapping from the visual content information to the semantic context information. In this study, we propose a novel approach for automatic image annotation problem, where the annotation is formulated as a multivariate mapping from a set of independent descriptor spaces, representing a whole image, to a set of words, representing class labels. For this purpose, a hierarchical annotation architecture, named as HANOLISTIC (hierarchical image annotation system using holistic approach), is defined with two layers. The first layer, called level 0 consists of annotators each of which is fed by a set of distinct descriptors, extracted from the whole image. This enables us to represent the image at each annotator by a different visual property of a descriptor. Since, we use the whole image, the problematic segmentation process is avoided. Training of each annotator is accomplished by a supervised learning paradigm, where each word is considered as a class label. Note that, this approach is slightly different then the classical training approaches, where each data has a unique label. In the proposed system, since each image has one or more annotating words, we assume that an image belongs to more than one class. The output of the level 0 annotators indicate the membership values of the words in the vocabulary, to belong an image. These membership values from each annotator is, then, aggregated at the second layer to obtain meta level annotator. Finally, a set of words from the vocabulary is selected based on the ranking of the output of meta level. The hierarchical annotation system proposed in this study outperforms state of the art annotation systems based on segmental and holistic approaches.
{"title":"HANOLISTIC: A Hierarchical Automatic Image Annotation System Using Holistic Approach","authors":"Özge Öztimur Karadag, F. Yarman-Vural","doi":"10.1109/CVPRW.2009.5204236","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204236","url":null,"abstract":"Automatic image annotation is the process of assigning keywords to digital images depending on the content information. In one sense, it is a mapping from the visual content information to the semantic context information. In this study, we propose a novel approach for automatic image annotation problem, where the annotation is formulated as a multivariate mapping from a set of independent descriptor spaces, representing a whole image, to a set of words, representing class labels. For this purpose, a hierarchical annotation architecture, named as HANOLISTIC (hierarchical image annotation system using holistic approach), is defined with two layers. The first layer, called level 0 consists of annotators each of which is fed by a set of distinct descriptors, extracted from the whole image. This enables us to represent the image at each annotator by a different visual property of a descriptor. Since, we use the whole image, the problematic segmentation process is avoided. Training of each annotator is accomplished by a supervised learning paradigm, where each word is considered as a class label. Note that, this approach is slightly different then the classical training approaches, where each data has a unique label. In the proposed system, since each image has one or more annotating words, we assume that an image belongs to more than one class. The output of the level 0 annotators indicate the membership values of the words in the vocabulary, to belong an image. These membership values from each annotator is, then, aggregated at the second layer to obtain meta level annotator. Finally, a set of words from the vocabulary is selected based on the ranking of the output of meta level. The hierarchical annotation system proposed in this study outperforms state of the art annotation systems based on segmental and holistic approaches.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129572288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPRW.2009.5204053
S. Sommer, Aditya Tatu, Cheng Chen, D. Jurgensen, Marleen de Bruijne, M. Loog, M. Nielsen, F. Lauze
In this paper we introduce landmark-based pre-shapes which allow mixing of anatomical landmarks and pseudo-landmarks, constraining consecutive pseudo-landmarks to satisfy planar equidistance relations. This defines naturally a structure of Riemannian manifold on these preshapes, with a natural action of the group of planar rotations. Orbits define the shapes. We develop a geodesic generalized procrustes analysis procedure for a sample set on such a preshape spaces and use it to compute principal geodesic analysis. We demonstrate it on an elementary synthetic example as well on a dataset of manually annotated vertebra shapes from x-ray. We re-landmark them consistently and show that PGA captures the variability of the dataset better than its linear counterpart, PCA.
{"title":"Bicycle chain shape models","authors":"S. Sommer, Aditya Tatu, Cheng Chen, D. Jurgensen, Marleen de Bruijne, M. Loog, M. Nielsen, F. Lauze","doi":"10.1109/CVPRW.2009.5204053","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204053","url":null,"abstract":"In this paper we introduce landmark-based pre-shapes which allow mixing of anatomical landmarks and pseudo-landmarks, constraining consecutive pseudo-landmarks to satisfy planar equidistance relations. This defines naturally a structure of Riemannian manifold on these preshapes, with a natural action of the group of planar rotations. Orbits define the shapes. We develop a geodesic generalized procrustes analysis procedure for a sample set on such a preshape spaces and use it to compute principal geodesic analysis. We demonstrate it on an elementary synthetic example as well on a dataset of manually annotated vertebra shapes from x-ray. We re-landmark them consistently and show that PGA captures the variability of the dataset better than its linear counterpart, PCA.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133921183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPRW.2009.5204132
Mayank Bansal, Shunguang Wu, J. Eledath
This paper addresses the frame-to-frame data association and state estimation problems in localization of a pedestrian relative to a moving vehicle from a monocular far infra-red video sequence. Using a novel application of the hierarchical model-based motion estimation framework, we are able to use the image appearance information to solve the frame-to-frame data association problem and estimate a sub-pixel accurate height ratio for a pedestrian in two frames. Then, to localize the pedestrian, we propose a novel approach of using the pedestrian height ratio estimates to guide an interacting multiple-hypothesis-mode/height filtering algorithm instead of using a constant pedestrian height model. Experiments on several IR sequences demonstrate that this approach achieves results comparable to those from a known pedestrian height thus avoiding errors from a constant height model based approach.
{"title":"Pedestrian association and localization in monocular FIR video sequence","authors":"Mayank Bansal, Shunguang Wu, J. Eledath","doi":"10.1109/CVPRW.2009.5204132","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204132","url":null,"abstract":"This paper addresses the frame-to-frame data association and state estimation problems in localization of a pedestrian relative to a moving vehicle from a monocular far infra-red video sequence. Using a novel application of the hierarchical model-based motion estimation framework, we are able to use the image appearance information to solve the frame-to-frame data association problem and estimate a sub-pixel accurate height ratio for a pedestrian in two frames. Then, to localize the pedestrian, we propose a novel approach of using the pedestrian height ratio estimates to guide an interacting multiple-hypothesis-mode/height filtering algorithm instead of using a constant pedestrian height model. Experiments on several IR sequences demonstrate that this approach achieves results comparable to those from a known pedestrian height thus avoiding errors from a constant height model based approach.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130572398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPRW.2009.5204327
S. Fidler, Marko Boben, A. Leonardis
Summary form only given. Learning hierarchical representations of object structure in a bottom-up manner faces several difficult issues. First, we are dealing with a very large number of potential feature aggregations. Furthermore, the set of features the algorithm learns at each layer directly influences the expressiveness of the compositional layers that work on top of them. However, we cannot ensure the usefulness of a particular local feature for object class representation based solely on the local statistics. This can only be done when more global, object-wise information is taken into account. We build on the hierarchical compositional approach (Fidler and Leonardis, 2007) that learns a hierarchy of contour compositions of increasing complexity and specificity. Each composition models spatial relations between its constituent parts.
只提供摘要形式。以自底向上的方式学习对象结构的分层表示面临几个难题。首先,我们正在处理大量潜在的特征聚合。此外,算法在每一层学习的特征集直接影响在其上工作的组合层的表达性。然而,我们不能确保仅基于局部统计数据的特定局部特征对对象类表示的有用性。只有在考虑到更多全局的、面向对象的信息时才能做到这一点。我们建立在分层组合方法(Fidler and Leonardis, 2007)的基础上,该方法学习了越来越复杂和特异性的轮廓组合层次。每个组成部分之间的空间关系模型。
{"title":"A bottom-up and top-down optimization framework for learning a compositional hierarchy of object classes","authors":"S. Fidler, Marko Boben, A. Leonardis","doi":"10.1109/CVPRW.2009.5204327","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204327","url":null,"abstract":"Summary form only given. Learning hierarchical representations of object structure in a bottom-up manner faces several difficult issues. First, we are dealing with a very large number of potential feature aggregations. Furthermore, the set of features the algorithm learns at each layer directly influences the expressiveness of the compositional layers that work on top of them. However, we cannot ensure the usefulness of a particular local feature for object class representation based solely on the local statistics. This can only be done when more global, object-wise information is taken into account. We build on the hierarchical compositional approach (Fidler and Leonardis, 2007) that learns a hierarchy of contour compositions of increasing complexity and specificity. Each composition models spatial relations between its constituent parts.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124878085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPRW.2009.5204048
Wei Wang, Xiaolei Huang
We empirically evaluate a distance-guided learning method embedded in a multiple classifier system (MCS) for tissue segmentation in optical images of the uterine cervix. Instead of combining multiple base classifiers as in traditional ensemble methods, we propose a Bhattacharyya distance based metric for measuring the similarity in decision boundary shapes between a pair of statistical classifiers. By generating an ensemble of base classifiers trained independently on separate training images, we can use the distance metric to select those classifiers in the ensemble whose decision boundaries are similar to that of an unknown test image. In an extreme case, we select the base classifier with the most similar decision boundary to accomplish classification and segmentation on the test image. Our approach is novel in the way that the nearest neighbor is picked and effectively solves classification problems in which base classifiers with good overall performance are not easy to construct due to a large variation in the training examples. In our experiments, we applied our method and several popular ensemble methods to segmenting acetowhite regions in cervical images. The overall classification accuracy of the proposed method is significantly better than that of a single classifier learned using the entire training set, and is also superior to other ensemble methods including majority voting, STAPLE, Boosting and Bagging.
{"title":"Distance guided selection of the best base classifier in an ensemble with application to cervigram image segmentation","authors":"Wei Wang, Xiaolei Huang","doi":"10.1109/CVPRW.2009.5204048","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204048","url":null,"abstract":"We empirically evaluate a distance-guided learning method embedded in a multiple classifier system (MCS) for tissue segmentation in optical images of the uterine cervix. Instead of combining multiple base classifiers as in traditional ensemble methods, we propose a Bhattacharyya distance based metric for measuring the similarity in decision boundary shapes between a pair of statistical classifiers. By generating an ensemble of base classifiers trained independently on separate training images, we can use the distance metric to select those classifiers in the ensemble whose decision boundaries are similar to that of an unknown test image. In an extreme case, we select the base classifier with the most similar decision boundary to accomplish classification and segmentation on the test image. Our approach is novel in the way that the nearest neighbor is picked and effectively solves classification problems in which base classifiers with good overall performance are not easy to construct due to a large variation in the training examples. In our experiments, we applied our method and several popular ensemble methods to segmenting acetowhite regions in cervical images. The overall classification accuracy of the proposed method is significantly better than that of a single classifier learned using the entire training set, and is also superior to other ensemble methods including majority voting, STAPLE, Boosting and Bagging.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125062195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}