Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206646
S. Kanade, D. Petrovska-Delacrétaz, B. Dorizzi
With the increasing use of biometrics, more and more concerns are being raised about the privacy of the personal biometric data. Conventional biometric systems store biometric templates in a database. This may lead to the possibility of tracking personal information stored in one database by getting access to another database through cross-database matching. Moreover, biometric data are permanently associated with the user. Hence if stolen, they are lost permanently and become unusable in that system and possibly in all other systems based on that biometrics. In order to overcome this non-revocability of biometrics, we propose a two factor scheme to generate cancelable iris templates using iris-biometric and password. We employ a user specific shuffling key to shuffle the iris codes. Additionally, we introduce a novel way to use error correcting codes (ECC) to reduce the variabilities in biometric data. The shuffling scheme increases the impostor Hamming distance leaving genuine Hamming distance intact while the ECC reduce the Hamming distance for genuine comparisons by a larger amount than for the impostor comparisons. This results in better separation between genuine and impostor users which improves the verification performance. The shuffling key is protected by a password which makes the system truly revocable. The biometric data is stored in a protected form which protects the privacy. The proposed scheme reduces the equal error rate (EER) of the system by more than 90% (e.g., from 1.70% to 0.057% on the NIST-ICE database).
{"title":"Cancelable iris biometrics and using Error Correcting Codes to reduce variability in biometric data","authors":"S. Kanade, D. Petrovska-Delacrétaz, B. Dorizzi","doi":"10.1109/CVPR.2009.5206646","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206646","url":null,"abstract":"With the increasing use of biometrics, more and more concerns are being raised about the privacy of the personal biometric data. Conventional biometric systems store biometric templates in a database. This may lead to the possibility of tracking personal information stored in one database by getting access to another database through cross-database matching. Moreover, biometric data are permanently associated with the user. Hence if stolen, they are lost permanently and become unusable in that system and possibly in all other systems based on that biometrics. In order to overcome this non-revocability of biometrics, we propose a two factor scheme to generate cancelable iris templates using iris-biometric and password. We employ a user specific shuffling key to shuffle the iris codes. Additionally, we introduce a novel way to use error correcting codes (ECC) to reduce the variabilities in biometric data. The shuffling scheme increases the impostor Hamming distance leaving genuine Hamming distance intact while the ECC reduce the Hamming distance for genuine comparisons by a larger amount than for the impostor comparisons. This results in better separation between genuine and impostor users which improves the verification performance. The shuffling key is protected by a password which makes the system truly revocable. The biometric data is stored in a protected form which protects the privacy. The proposed scheme reduces the equal error rate (EER) of the system by more than 90% (e.g., from 1.70% to 0.057% on the NIST-ICE database).","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126695730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206517
Michael Kolomenkin, I. Shimshoni, A. Tal
Edge detection in images has been a fundamental problem in computer vision from its early days. Edge detection on surfaces, on the other hand, has received much less attention. The most common edges on surfaces are ridges and valleys, used for processing range images in computer vision, as well as for non-photo realistic rendering in computer graphics. We propose a new type of edges on surfaces, termed relief edges. Intuitively, the surface can be considered as an unknown smooth manifold, on top of which a local height image is placed. Relief edges are the edges of this local image. We show how to compute these edges from the local differential geometric surface properties, by fitting a local edge model to the surface. We also show how the underlying manifold and the local images can be roughly approximated and exploited in the edge detection process. Last but not least, we demonstrate the application of relief edges to artifact illustration in archaeology.
{"title":"On edge detection on surfaces","authors":"Michael Kolomenkin, I. Shimshoni, A. Tal","doi":"10.1109/CVPR.2009.5206517","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206517","url":null,"abstract":"Edge detection in images has been a fundamental problem in computer vision from its early days. Edge detection on surfaces, on the other hand, has received much less attention. The most common edges on surfaces are ridges and valleys, used for processing range images in computer vision, as well as for non-photo realistic rendering in computer graphics. We propose a new type of edges on surfaces, termed relief edges. Intuitively, the surface can be considered as an unknown smooth manifold, on top of which a local height image is placed. Relief edges are the edges of this local image. We show how to compute these edges from the local differential geometric surface properties, by fitting a local edge model to the surface. We also show how the underlying manifold and the local images can be roughly approximated and exploited in the edge detection process. Last but not least, we demonstrate the application of relief edges to artifact illustration in archaeology.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130577249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206847
Avinash Ravichandran, Rizwan Ahmed Chaudhry, R. Vidal
In this paper, we consider the problem of categorizing videos of dynamic textures under varying view-point. We propose to model each video with a collection of linear dynamics systems (LDSs) describing the dynamics of spatiotemporal video patches. This bag of systems (BoS) representation is analogous to the bag of features (BoF) representation, except that we use LDSs as feature descriptors. This poses several technical challenges to the BoF framework. Most notably, LDSs do not live in a Euclidean space, hence novel methods for clustering LDSs and computing codewords of LDSs need to be developed. Our framework makes use of nonlinear dimensionality reduction and clustering techniques combined with the Martin distance for LDSs for tackling these issues. Our experiments show that our BoS approach can be used for recognizing dynamic textures in challenging scenarios, which could not be handled by existing dynamic texture recognition methods.
{"title":"View-invariant dynamic texture recognition using a bag of dynamical systems","authors":"Avinash Ravichandran, Rizwan Ahmed Chaudhry, R. Vidal","doi":"10.1109/CVPR.2009.5206847","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206847","url":null,"abstract":"In this paper, we consider the problem of categorizing videos of dynamic textures under varying view-point. We propose to model each video with a collection of linear dynamics systems (LDSs) describing the dynamics of spatiotemporal video patches. This bag of systems (BoS) representation is analogous to the bag of features (BoF) representation, except that we use LDSs as feature descriptors. This poses several technical challenges to the BoF framework. Most notably, LDSs do not live in a Euclidean space, hence novel methods for clustering LDSs and computing codewords of LDSs need to be developed. Our framework makes use of nonlinear dimensionality reduction and clustering techniques combined with the Martin distance for LDSs for tackling these issues. Our experiments show that our BoS approach can be used for recognizing dynamic textures in challenging scenarios, which could not be handled by existing dynamic texture recognition methods.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124238226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206794
Stefan Hinterstoißer, Oliver Kutter, Nassir Navab, P. Fua, V. Lepetit
Recent work showed that learning-based patch rectification methods are both faster and more reliable than affine region methods. Unfortunately, their performance improvements are founded in a computationally expensive offline learning stage, which is not possible for applications such as SLAM. In this paper we propose an approach whose training stage is fast enough to be performed at run-time without the loss of accuracy or robustness. To this end, we developed a very fast method to compute the mean appearances of the feature points over sets of small variations that span the range of possible camera viewpoints. Then, by simply matching incoming feature points against these mean appearances, we get a coarse estimate of the viewpoint that is refined afterwards. Because there is no need to compute descriptors for the input image, the method is very fast at run-time. We demonstrate our approach on tracking-by-detection for SLAM, real-time object detection and pose estimation applications.
{"title":"Real-time learning of accurate patch rectification","authors":"Stefan Hinterstoißer, Oliver Kutter, Nassir Navab, P. Fua, V. Lepetit","doi":"10.1109/CVPR.2009.5206794","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206794","url":null,"abstract":"Recent work showed that learning-based patch rectification methods are both faster and more reliable than affine region methods. Unfortunately, their performance improvements are founded in a computationally expensive offline learning stage, which is not possible for applications such as SLAM. In this paper we propose an approach whose training stage is fast enough to be performed at run-time without the loss of accuracy or robustness. To this end, we developed a very fast method to compute the mean appearances of the feature points over sets of small variations that span the range of possible camera viewpoints. Then, by simply matching incoming feature points against these mean appearances, we get a coarse estimate of the viewpoint that is refined afterwards. Because there is no need to compute descriptors for the input image, the method is very fast at run-time. We demonstrate our approach on tracking-by-detection for SLAM, real-time object detection and pose estimation applications.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116369779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206625
Shengyang Dai, Ying Wu
Removing image partial blur is of great practical importance. However, as existing recovery techniques usually assume a one-layer clear image model, they can not characterize the actual generation process of partial blurs. In this paper, a two-layer image model is investigated. Based on the study of partial blur generation process, a novel recovery technique is proposed for a single input image. Both foreground and background layers are recovered simultaneously with the help of the matting technique, powerful image prior models, and user assistance. The effectiveness of the proposed approach is demonstrated by extensive experiments on image recovery and synthesis on real data.
{"title":"Removing partial blur in a single image","authors":"Shengyang Dai, Ying Wu","doi":"10.1109/CVPR.2009.5206625","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206625","url":null,"abstract":"Removing image partial blur is of great practical importance. However, as existing recovery techniques usually assume a one-layer clear image model, they can not characterize the actual generation process of partial blurs. In this paper, a two-layer image model is investigated. Based on the study of partial blur generation process, a novel recovery technique is proposed for a single input image. Both foreground and background layers are recovered simultaneously with the help of the matting technique, powerful image prior models, and user assistance. The effectiveness of the proposed approach is demonstrated by extensive experiments on image recovery and synthesis on real data.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114776078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206498
W. Smith, E. Hancock
In this paper we consider diffuse and specular reflectance from surfaces modeled as distributions of glossy microfacets. In contrast to previous work, we describe the relative contribution of both of these components in the same terms, namely with resource to Fresnel theory. This results in a more highly constrained model with a reduced number of parameters. Also, the need for ad hoc and physically meaningless specular and diffuse reflectance coefficients is removed. This ensures that the conservation of energy is obeyed and only physically plausible mixtures of the two components are allowed. In our model, both specular and diffuse reflectance are related to the roughness and refractive index of the surface. We show how physically meaningful parameters of a surface can be measured from uncalibrated imagery and that our model fits observed BRDF data more accurately than comparable existing models.
{"title":"A unified model of specular and diffuse reflectance for rough, glossy surfaces","authors":"W. Smith, E. Hancock","doi":"10.1109/CVPR.2009.5206498","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206498","url":null,"abstract":"In this paper we consider diffuse and specular reflectance from surfaces modeled as distributions of glossy microfacets. In contrast to previous work, we describe the relative contribution of both of these components in the same terms, namely with resource to Fresnel theory. This results in a more highly constrained model with a reduced number of parameters. Also, the need for ad hoc and physically meaningless specular and diffuse reflectance coefficients is removed. This ensures that the conservation of energy is obeyed and only physically plausible mixtures of the two components are allowed. In our model, both specular and diffuse reflectance are related to the roughness and refractive index of the surface. We show how physically meaningful parameters of a surface can be measured from uncalibrated imagery and that our model fits observed BRDF data more accurately than comparable existing models.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114862349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206649
A. Besbes, N. Komodakis, G. Langs, N. Paragios
In this paper we introduce a new approach to knowledge-based segmentation. Our method consists of a novel representation to model shape variations as well as an efficient inference procedure to fit the model to new data. The considered shape model is similarity-invariant and refers to an incomplete graph that consists of intra and intercluster connections representing the inter-dependencies of control points. The clusters are determined according to the co-dependencies of the deformations of the control points within the training set. The connections between the components of a cluster represent the local structure while the connections between the clusters account for the global structure. The distributions of the normalized distances between the connected control points encode the prior model. During search, this model is used together with a discrete Markov random field (MRF) based segmentation, where the unknown variables are the positions of the control points in the image domain. To encode the image support, a Voronoi decomposition of the domain is considered and regional based statistics are used. The resulting model is computationally efficient, can encode complex statistical models of shape variations and benefits from the image support of the entire spatial domain.
{"title":"Shape priors and discrete MRFs for knowledge-based segmentation","authors":"A. Besbes, N. Komodakis, G. Langs, N. Paragios","doi":"10.1109/CVPR.2009.5206649","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206649","url":null,"abstract":"In this paper we introduce a new approach to knowledge-based segmentation. Our method consists of a novel representation to model shape variations as well as an efficient inference procedure to fit the model to new data. The considered shape model is similarity-invariant and refers to an incomplete graph that consists of intra and intercluster connections representing the inter-dependencies of control points. The clusters are determined according to the co-dependencies of the deformations of the control points within the training set. The connections between the components of a cluster represent the local structure while the connections between the clusters account for the global structure. The distributions of the normalized distances between the connected control points encode the prior model. During search, this model is used together with a discrete Markov random field (MRF) based segmentation, where the unknown variables are the positions of the control points in the image domain. To encode the image support, a Voronoi decomposition of the domain is considered and regional based statistics are used. The resulting model is computationally efficient, can encode complex statistical models of shape variations and benefits from the image support of the entire spatial domain.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128068743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206728
Mikhail Sizintsev, Richard P. Wildes
Spatiotemporal stereo is concerned with the recovery of the 3D structure of a dynamic scene from a temporal sequence of multiview images. This paper presents a novel method for computing temporally coherent disparity maps from a sequence of binocular images through an integrated consideration of image spacetime structure and without explicit recovery of motion. The approach is based on matching spatiotemporal quadric elements (stequels) between views, as it is shown that this matching primitive provides a natural way to encapsulate both local spatial and temporal structure for disparity estimation. Empirical evaluation with laboratory based imagery with ground truth and more typical natural imagery shows that the approach provides considerable benefit in comparison to alternative methods for enforcing temporal coherence in disparity estimation.
{"title":"Spatiotemporal stereo via spatiotemporal quadric element (stequel) matching","authors":"Mikhail Sizintsev, Richard P. Wildes","doi":"10.1109/CVPR.2009.5206728","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206728","url":null,"abstract":"Spatiotemporal stereo is concerned with the recovery of the 3D structure of a dynamic scene from a temporal sequence of multiview images. This paper presents a novel method for computing temporally coherent disparity maps from a sequence of binocular images through an integrated consideration of image spacetime structure and without explicit recovery of motion. The approach is based on matching spatiotemporal quadric elements (stequels) between views, as it is shown that this matching primitive provides a natural way to encapsulate both local spatial and temporal structure for disparity estimation. Empirical evaluation with laboratory based imagery with ground truth and more typical natural imagery shows that the approach provides considerable benefit in comparison to alternative methods for enforcing temporal coherence in disparity estimation.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132320146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206571
A. Myronenko, Xubo B. Song
Accurate definition of similarity measure is a key component in image registration. Most commonly used intensity-based similarity measures rely on the assumptions of independence and stationarity of the intensities from pixel to pixel. Such measures cannot capture the complex interactions among the pixel intensities, and often result in less satisfactory registration performances, especially in the presence of nonstationary intensity distortions. We propose a novel similarity measure that accounts for intensity non-stationarities and complex spatially-varying intensity distortions. We derive the similarity measure by analytically solving for the intensity correction field and its adaptive regularization. The final measure can be interpreted as one that favors a registration with minimum compression complexity of the residual image between the two registered images. This measure produces accurate registration results on both artificial and real-world problems that we have tested, whereas many other state-of-the-art similarity measures have failed to do so.
{"title":"Image registration by minimization of residual complexity","authors":"A. Myronenko, Xubo B. Song","doi":"10.1109/CVPR.2009.5206571","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206571","url":null,"abstract":"Accurate definition of similarity measure is a key component in image registration. Most commonly used intensity-based similarity measures rely on the assumptions of independence and stationarity of the intensities from pixel to pixel. Such measures cannot capture the complex interactions among the pixel intensities, and often result in less satisfactory registration performances, especially in the presence of nonstationary intensity distortions. We propose a novel similarity measure that accounts for intensity non-stationarities and complex spatially-varying intensity distortions. We derive the similarity measure by analytically solving for the intensity correction field and its adaptive regularization. The final measure can be interpreted as one that favors a registration with minimum compression complexity of the residual image between the two registered images. This measure produces accurate registration results on both artificial and real-world problems that we have tested, whereas many other state-of-the-art similarity measures have failed to do so.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131592526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206564
Csaba Beleznai, H. Bischof
The complexity of human detection increases significantly with a growing density of humans populating a scene. This paper presents a Bayesian detection framework using shape and motion cues to obtain a maximum a posteriori (MAP) solution for human configurations consisting of many, possibly occluded pedestrians viewed by a stationary camera. The paper contains two novel contributions for the human detection task: 1. computationally efficient detection based on shape templates using contour integration by means of integral images which are built by oriented string scans; (2) a non-parametric approach using an approximated version of the shape context descriptor which generates informative object parts and infers the presence of humans despite occlusions. The outputs of the two detectors are used to generate a spatial configuration of hypothesized human body locations. The configuration is iteratively optimized while taking into account the depth ordering and occlusion status of the hypotheses. The method achieves fast computation times even in complex scenarios with a high density of people. Its validity is demonstrated on a substantial amount of image data using the CAVIAR and our own datasets. Evaluation results and comparison with state of the art are presented.
{"title":"Fast human detection in crowded scenes by contour integration and local shape estimation","authors":"Csaba Beleznai, H. Bischof","doi":"10.1109/CVPR.2009.5206564","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206564","url":null,"abstract":"The complexity of human detection increases significantly with a growing density of humans populating a scene. This paper presents a Bayesian detection framework using shape and motion cues to obtain a maximum a posteriori (MAP) solution for human configurations consisting of many, possibly occluded pedestrians viewed by a stationary camera. The paper contains two novel contributions for the human detection task: 1. computationally efficient detection based on shape templates using contour integration by means of integral images which are built by oriented string scans; (2) a non-parametric approach using an approximated version of the shape context descriptor which generates informative object parts and infers the presence of humans despite occlusions. The outputs of the two detectors are used to generate a spatial configuration of hypothesized human body locations. The configuration is iteratively optimized while taking into account the depth ordering and occlusion status of the hypotheses. The method achieves fast computation times even in complex scenarios with a high density of people. Its validity is demonstrated on a substantial amount of image data using the CAVIAR and our own datasets. Evaluation results and comparison with state of the art are presented.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129394622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}