Andrés Solís Montero, H. Sekkati, J. Lang, R. Laganière, J. James
In this paper we present a framework for vision-based robot localization using natural planar landmarks. Specifically, we demonstrate our framework with planar targets using Fern classifiers that have been shown to be robust against illumination changes, perspective distortion, motion blur, and occlusions. We add stratified sampling in the image plane to increase robustness of the localization scheme in cluttered environments and on-line checking for false detection of targets to decrease false positives. We use all matching points to improve pose estimation and an off-line target evaluation strategy to improve a priori map building. We report experiments demonstrating the accuracy and speed of localization. Our experiments entail synthetic and real data. Our framework and our improvements are however more general and the Fern classifier could be replaced by other techniques.
{"title":"Framework for Natural Landmark-based Robot Localization","authors":"Andrés Solís Montero, H. Sekkati, J. Lang, R. Laganière, J. James","doi":"10.1109/CRV.2012.25","DOIUrl":"https://doi.org/10.1109/CRV.2012.25","url":null,"abstract":"In this paper we present a framework for vision-based robot localization using natural planar landmarks. Specifically, we demonstrate our framework with planar targets using Fern classifiers that have been shown to be robust against illumination changes, perspective distortion, motion blur, and occlusions. We add stratified sampling in the image plane to increase robustness of the localization scheme in cluttered environments and on-line checking for false detection of targets to decrease false positives. We use all matching points to improve pose estimation and an off-line target evaluation strategy to improve a priori map building. We report experiments demonstrating the accuracy and speed of localization. Our experiments entail synthetic and real data. Our framework and our improvements are however more general and the Fern classifier could be replaced by other techniques.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129018550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present an algorithm to estimate the pose of a human head from a single image. It builds on the fact that only a limited set of cues are required to estimate human head pose and that most images contain far too many details than what are required for this task. Thus, non-photorealistic rendering is first used to eliminate irrelevant details from the picture and accentuate facial features critical to estimating head pose. The maximum likelihood pose range is then estimated by training a classifier on scaled down abstracted images. This algorithm covers a wide range of head orientations, can be used at various image resolutions, does not need personalized initialization, and is also relatively insensitive to illumination. Moreover, the facts that it performs competitively when compared with other state of the art methods and that it is fast enough to be used in real time systems make it a promising method for coarse head pose estimation.
{"title":"Coarse Head Pose Estimation using Image Abstraction","authors":"A. Puri, Hariprasad Kannan, P. Kalra","doi":"10.1109/CRV.2012.24","DOIUrl":"https://doi.org/10.1109/CRV.2012.24","url":null,"abstract":"We present an algorithm to estimate the pose of a human head from a single image. It builds on the fact that only a limited set of cues are required to estimate human head pose and that most images contain far too many details than what are required for this task. Thus, non-photorealistic rendering is first used to eliminate irrelevant details from the picture and accentuate facial features critical to estimating head pose. The maximum likelihood pose range is then estimated by training a classifier on scaled down abstracted images. This algorithm covers a wide range of head orientations, can be used at various image resolutions, does not need personalized initialization, and is also relatively insensitive to illumination. Moreover, the facts that it performs competitively when compared with other state of the art methods and that it is fast enough to be used in real time systems make it a promising method for coarse head pose estimation.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132506819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Augmented Reality (AR) is an application of computer vision that is processor intensive and typically suffers from a trade-off between robust view alignment and real time performance. Real time AR that can function robustly in variable environments is a process difficult to achieve on a PC (personal computer) let alone on the mobile devices that will likely be where AR is adopted as a consumer application. Despite the availability of high quality feature matching algorithms such as SIFT, SURF and robust pose estimation algorithms such as EPNP, practical AR systems today rely on older methods such as Harris/KLT corners and template matching for performance reasons. SIFT-like algorithms are typically used only to initialize tracking by these methods. We demonstrate a practical system with real ime performance using only SURF without the need for tracking. We achieve this with extensive use of the Graphics Processing Unit (GPU) now prevalent in PC's. Due to mobile devices becoming equipped with GPU's we believe that this architecture will lead to practical robust AR.
{"title":"A Real Time Augmented Reality System Using GPU Acceleration","authors":"David Chi Chung Tam, M. Fiala","doi":"10.1109/CRV.2012.21","DOIUrl":"https://doi.org/10.1109/CRV.2012.21","url":null,"abstract":"Augmented Reality (AR) is an application of computer vision that is processor intensive and typically suffers from a trade-off between robust view alignment and real time performance. Real time AR that can function robustly in variable environments is a process difficult to achieve on a PC (personal computer) let alone on the mobile devices that will likely be where AR is adopted as a consumer application. Despite the availability of high quality feature matching algorithms such as SIFT, SURF and robust pose estimation algorithms such as EPNP, practical AR systems today rely on older methods such as Harris/KLT corners and template matching for performance reasons. SIFT-like algorithms are typically used only to initialize tracking by these methods. We demonstrate a practical system with real ime performance using only SURF without the need for tracking. We achieve this with extensive use of the Graphics Processing Unit (GPU) now prevalent in PC's. Due to mobile devices becoming equipped with GPU's we believe that this architecture will lead to practical robust AR.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133396600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates a novel perceptual driven approach to image quality assessment using complex wavelets called PSDR (Perceptual Structure Distortion Ratio). The measure is grounded in the modeling of the human vision system as a frequency-based processing system and that perceptual structural significance can be derived based on the concept of complex phase order. Built upon a robust complex phase order framework for measuring structural significance, preliminary results using test images under different types of distortions show that PSDR can be a promising direction for evaluating the quality of visual media.
{"title":"Perceptual Structure Distortion Ratio: An Image Quality Metric Based on Robust Measures of Complex Phase Order","authors":"A. Wong","doi":"10.1109/CRV.2012.15","DOIUrl":"https://doi.org/10.1109/CRV.2012.15","url":null,"abstract":"This paper investigates a novel perceptual driven approach to image quality assessment using complex wavelets called PSDR (Perceptual Structure Distortion Ratio). The measure is grounded in the modeling of the human vision system as a frequency-based processing system and that perceptual structural significance can be derived based on the concept of complex phase order. Built upon a robust complex phase order framework for measuring structural significance, preliminary results using test images under different types of distortions show that PSDR can be a promising direction for evaluating the quality of visual media.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134082916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nasim Sepehri Boroujeni, S. A. Etemad, A. Whitehead
A critical step in navigation of unmanned aerial vehicles is the detection of the horizon line. This information can be used for adjusting flight parameters as well as obstacle avoidance. In this paper, a fast and robust technique for precise detection of the horizon path is proposed. The method is based on existence of a unique light field that occurs in imagery where the horizon is viewed. This light field exists in different scenes including sea-sky, soil-sky, and forest-sky horizon lines. Our proposed approach employs segmentation of the scene and subsequent analysis of the image segments for extraction of the mentioned field and thus the horizon path. Through various experiments carried out on our own dataset and that of another previously published paper, we illustrate the significance and accuracy of this technique for various types of terrains from water to ground, and even snow-covered ground. Finally, it is shown that robust performance and accuracy, speed, and extraction of the path as curves (as opposed to a straight line which is resulted from many other approaches) are the benefits of our method.
{"title":"Robust Horizon Detection Using Segmentation for UAV Applications","authors":"Nasim Sepehri Boroujeni, S. A. Etemad, A. Whitehead","doi":"10.1109/CRV.2012.52","DOIUrl":"https://doi.org/10.1109/CRV.2012.52","url":null,"abstract":"A critical step in navigation of unmanned aerial vehicles is the detection of the horizon line. This information can be used for adjusting flight parameters as well as obstacle avoidance. In this paper, a fast and robust technique for precise detection of the horizon path is proposed. The method is based on existence of a unique light field that occurs in imagery where the horizon is viewed. This light field exists in different scenes including sea-sky, soil-sky, and forest-sky horizon lines. Our proposed approach employs segmentation of the scene and subsequent analysis of the image segments for extraction of the mentioned field and thus the horizon path. Through various experiments carried out on our own dataset and that of another previously published paper, we illustrate the significance and accuracy of this technique for various types of terrains from water to ground, and even snow-covered ground. Finally, it is shown that robust performance and accuracy, speed, and extraction of the path as curves (as opposed to a straight line which is resulted from many other approaches) are the benefits of our method.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124827160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a new method for figure-ground image segmentation based on a probabilistic learning approach of the object shape. Historically, segmentation is mostly defined as a data-driven bottom-up process, where pixels are grouped into regions/objects according to objective criteria, such as region homogeneity, etc. In particular, it aims at creating a partition of the image into contiguous, homogenous regions. In the proposed work, we propose to incorporate prior knowledge about the object shape and category to segment the object from the background. The segmentation process is composed of two parts. In the first part, object shape models are built using sets of object fragments. The second part starts by first segmenting an image into homogenous regions using the mean-shift algorithm. Then, several object hypotheses are tested and validated using the different object shape models as supporting information. As an output, our algorithm identifies the object category, position, as well as its optimal segmentation. Experimental results show the capacity of the approach to segment several object categories.
{"title":"A Learning Probabilistic Approach for Object Segmentation","authors":"Guillaume Larivière, M. S. Allili","doi":"10.1109/CRV.2012.19","DOIUrl":"https://doi.org/10.1109/CRV.2012.19","url":null,"abstract":"This paper proposes a new method for figure-ground image segmentation based on a probabilistic learning approach of the object shape. Historically, segmentation is mostly defined as a data-driven bottom-up process, where pixels are grouped into regions/objects according to objective criteria, such as region homogeneity, etc. In particular, it aims at creating a partition of the image into contiguous, homogenous regions. In the proposed work, we propose to incorporate prior knowledge about the object shape and category to segment the object from the background. The segmentation process is composed of two parts. In the first part, object shape models are built using sets of object fragments. The second part starts by first segmenting an image into homogenous regions using the mean-shift algorithm. Then, several object hypotheses are tested and validated using the different object shape models as supporting information. As an output, our algorithm identifies the object category, position, as well as its optimal segmentation. Experimental results show the capacity of the approach to segment several object categories.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115739005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Specular surfaces pose difficulties for machine vision. In some applications, this may be further complicated by the presence of marks from a machining process. We propose a system that directly illuminates machined specular surfaces with a programmable array of high-power light-emitting diodes. A novel approach is described in which the angle of the incident light is varied over a series of images from which a specular-reduced median image is computed. A quality factor is used to quantitatively characterize the degree to which these specular-reduced median images approximate a diffusely lit image, and this quality factor is shown to depend linearly on the number of specular images used to produce the single specular-reduced median image. Defects such as porosity and scratches are shown to be identifiable in the specular-reduced median images of machined surfaces.
{"title":"Specular-Reduced Imaging for Inspection of Machined Surfaces","authors":"K. Sills, D. Capson, G. Bone","doi":"10.1109/CRV.2012.54","DOIUrl":"https://doi.org/10.1109/CRV.2012.54","url":null,"abstract":"Specular surfaces pose difficulties for machine vision. In some applications, this may be further complicated by the presence of marks from a machining process. We propose a system that directly illuminates machined specular surfaces with a programmable array of high-power light-emitting diodes. A novel approach is described in which the angle of the incident light is varied over a series of images from which a specular-reduced median image is computed. A quality factor is used to quantitatively characterize the degree to which these specular-reduced median images approximate a diffusely lit image, and this quality factor is shown to depend linearly on the number of specular images used to produce the single specular-reduced median image. Defects such as porosity and scratches are shown to be identifiable in the specular-reduced median images of machined surfaces.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122083895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An automatic adjustment of the seat position according to the driver height significantly increases the level of comfort when entering a car. A camera attached to a vehicle can estimate the body heights of approaching drivers. However, absolute height estimation based on a single camera leads to several problems. Cost-sensitive cameras used in automotive industry provide low-resolution grayscale images, which make driver extraction in real-life parking scenarios difficult. Absolute height estimation also prerequisites a known camera position relative to a road surface, but this position is not available for any parking scenarios. Toward this, we first propose a background-based driver-extraction method that can operate on low-resolution grayscale images, and that is robust against shadows and illumination changes. Second, we derive a scheme for estimating the camera position relative to an unknown road surface using head and foot points of extracted persons. Our experimental results obtained from real-life video sequences show that the proposed schemes are highly suitable for robust driver extraction and height estimation in automotive industry.
{"title":"Robust Body-Height Estimation for Applications in Automotive Industry","authors":"C. Scharfenberger, J. Zelek, David A Clausi","doi":"10.1109/CRV.2012.31","DOIUrl":"https://doi.org/10.1109/CRV.2012.31","url":null,"abstract":"An automatic adjustment of the seat position according to the driver height significantly increases the level of comfort when entering a car. A camera attached to a vehicle can estimate the body heights of approaching drivers. However, absolute height estimation based on a single camera leads to several problems. Cost-sensitive cameras used in automotive industry provide low-resolution grayscale images, which make driver extraction in real-life parking scenarios difficult. Absolute height estimation also prerequisites a known camera position relative to a road surface, but this position is not available for any parking scenarios. Toward this, we first propose a background-based driver-extraction method that can operate on low-resolution grayscale images, and that is robust against shadows and illumination changes. Second, we derive a scheme for estimating the camera position relative to an unknown road surface using head and foot points of extracted persons. Our experimental results obtained from real-life video sequences show that the proposed schemes are highly suitable for robust driver extraction and height estimation in automotive industry.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117291036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the advent of affordable RGBD sensors such as the Kinect, the collection of depth and appearance information from a scene has become effortless. However, neither the correct noise model for these sensors, nor a principled methodology for extracting planar segmentations has been developed yet. In this work, we advance the state of art with the following contributions: we correctly model the Kinect sensor data by observing that the data has inherent noise only over the measured disparity values, we formulate plane fitting as a linear least-squares problem that allow us to quickly merge different segments, and we apply an advanced Markov Chain Monte Carlo (MCMC) method, generalized Swendsen-Wang sampling, to efficiently search the space of planar segmentations. We evaluate our plane fitting and surface reconstruction algorithms with simulated and real-world data.
{"title":"Planar Segmentation of RGBD Images Using Fast Linear Fitting and Markov Chain Monte Carlo","authors":"Can Erdogan, Manohar Paluri, F. Dellaert","doi":"10.1109/CRV.2012.12","DOIUrl":"https://doi.org/10.1109/CRV.2012.12","url":null,"abstract":"With the advent of affordable RGBD sensors such as the Kinect, the collection of depth and appearance information from a scene has become effortless. However, neither the correct noise model for these sensors, nor a principled methodology for extracting planar segmentations has been developed yet. In this work, we advance the state of art with the following contributions: we correctly model the Kinect sensor data by observing that the data has inherent noise only over the measured disparity values, we formulate plane fitting as a linear least-squares problem that allow us to quickly merge different segments, and we apply an advanced Markov Chain Monte Carlo (MCMC) method, generalized Swendsen-Wang sampling, to efficiently search the space of planar segmentations. We evaluate our plane fitting and surface reconstruction algorithms with simulated and real-world data.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116889898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Virtual Vision advocates developing visually and behaviorally realistic 3D synthetic environments to serve the needs of computer vision research. Virtual vision, especially, is well-suited for studying large-scale camera networks. A virtual vision simulator capable of generating "realistic" synthetic imagery from real-life scenes, involving pedestrians and other objects, is the sine qua non of carrying out virtual vision research. Here we develop a distributed, customizable virtual vision simulator capable of simulating pedestrian traffic in a variety of 3D environments. Virtual cameras deployed in this synthetic environment generate imagery using state-of-the-art computer graphics techniques, boasting realistic lighting effects, shadows, etc. The synthetic imagery is fed into a visual analysis pipeline that currently supports pedestrian detection and tracking. The results of this analysis can then be used for subsequent processing, such as camera control, coordination, and handoff. It is important to bear in mind that our visual analysis pipeline is designed to handle real world imagery without any modifications. Consequently, it closely mimics the performance of visual analysis routines that one might deploy on physical cameras. Our virtual vision simulator is realized as a collection of modules that communicate with each other over the network. Consequently, we can deploy our simulator over a network of computers, allowing us to simulate much larger camera networks and much more complex scenes then is otherwise possible.
{"title":"A Virtual Vision Simulator for Camera Networks Research","authors":"Wiktor Starzyk, Adam Domurad, F. Qureshi","doi":"10.1109/CRV.2012.47","DOIUrl":"https://doi.org/10.1109/CRV.2012.47","url":null,"abstract":"Virtual Vision advocates developing visually and behaviorally realistic 3D synthetic environments to serve the needs of computer vision research. Virtual vision, especially, is well-suited for studying large-scale camera networks. A virtual vision simulator capable of generating \"realistic\" synthetic imagery from real-life scenes, involving pedestrians and other objects, is the sine qua non of carrying out virtual vision research. Here we develop a distributed, customizable virtual vision simulator capable of simulating pedestrian traffic in a variety of 3D environments. Virtual cameras deployed in this synthetic environment generate imagery using state-of-the-art computer graphics techniques, boasting realistic lighting effects, shadows, etc. The synthetic imagery is fed into a visual analysis pipeline that currently supports pedestrian detection and tracking. The results of this analysis can then be used for subsequent processing, such as camera control, coordination, and handoff. It is important to bear in mind that our visual analysis pipeline is designed to handle real world imagery without any modifications. Consequently, it closely mimics the performance of visual analysis routines that one might deploy on physical cameras. Our virtual vision simulator is realized as a collection of modules that communicate with each other over the network. Consequently, we can deploy our simulator over a network of computers, allowing us to simulate much larger camera networks and much more complex scenes then is otherwise possible.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125658967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}