In this paper, we present Gaussian Process Gauss-Newton (GPGN), an algorithm for non-parametric, continuous-time, nonlinear, batch state estimation. This work adapts the methods of Gaussian Process regression to the problem of batch state estimation by using the Gauss-Newton method. In particular, we formulate the estimation problem with a continuous-time state model, along with the more conventional discrete-time measurements. Our derivation utilizes a basis function approach, but through algebraic manipulations, returns to a non-parametric form by replacing the basis functions with covariance functions (i.e., the kernel trick). The algorithm is validated through hardware-based experiments utilizing the well-understood problem of 2D rover localization using a known map as an illustrative example, and is compared to the traditional discrete-time batch Gauss-Newton approach.
{"title":"Gaussian Process Gauss-Newton: Non-Parametric State Estimation","authors":"Chi Hay Tong, P. Furgale, T. Barfoot","doi":"10.1109/CRV.2012.35","DOIUrl":"https://doi.org/10.1109/CRV.2012.35","url":null,"abstract":"In this paper, we present Gaussian Process Gauss-Newton (GPGN), an algorithm for non-parametric, continuous-time, nonlinear, batch state estimation. This work adapts the methods of Gaussian Process regression to the problem of batch state estimation by using the Gauss-Newton method. In particular, we formulate the estimation problem with a continuous-time state model, along with the more conventional discrete-time measurements. Our derivation utilizes a basis function approach, but through algebraic manipulations, returns to a non-parametric form by replacing the basis functions with covariance functions (i.e., the kernel trick). The algorithm is validated through hardware-based experiments utilizing the well-understood problem of 2D rover localization using a known map as an illustrative example, and is compared to the traditional discrete-time batch Gauss-Newton approach.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125785136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces an approach to predict the three-dimensional shape of an object belonging to a specific class of shapes shown in an input image. We use suggestive contour, a shape-suggesting image feature developed in computer graphics in the context of non-photorealistic rendering, to reconstruct 3D shapes. We learn a functional mapping from the shape space of suggestive contours to the space of 3D shapes and use this mapping to predict 3D shapes based on a single input image. We demonstrate that the method can be used to predict the shape of deformable objects and to predict the shape of human faces using synthetic experiments and experiments based on artist drawn sketches and photographs.
{"title":"Shape from Suggestive Contours Using 3D Priors","authors":"S. Wuhrer, Chang Shu","doi":"10.1109/CRV.2012.38","DOIUrl":"https://doi.org/10.1109/CRV.2012.38","url":null,"abstract":"This paper introduces an approach to predict the three-dimensional shape of an object belonging to a specific class of shapes shown in an input image. We use suggestive contour, a shape-suggesting image feature developed in computer graphics in the context of non-photorealistic rendering, to reconstruct 3D shapes. We learn a functional mapping from the shape space of suggestive contours to the space of 3D shapes and use this mapping to predict 3D shapes based on a single input image. We demonstrate that the method can be used to predict the shape of deformable objects and to predict the shape of human faces using synthetic experiments and experiments based on artist drawn sketches and photographs.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114287500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The fundamental problem in robotic exploration and mapping of an unknown environment is answering the question 'have I been here before?', which is also known as the 'loop closing' problem. One approach to answering this problem in embedded topological worlds is to resort to the use of an external marking aid that can help the robot disambiguate places. This paper investigates the power of different marker-based aids in topological exploration. We describe enhanced versions of edge- and vertex-based marker algorithms and demonstrate algorithms with enhanced lower bounds in terms of number of markers and motions required in order to map an embedded topological environment.
{"title":"Enhancing Exploration in Topological Worlds with Multiple Immovable Markers","authors":"Hui Wang, M. Jenkin, Patrick W. Dymond","doi":"10.1109/CRV.2012.49","DOIUrl":"https://doi.org/10.1109/CRV.2012.49","url":null,"abstract":"The fundamental problem in robotic exploration and mapping of an unknown environment is answering the question 'have I been here before?', which is also known as the 'loop closing' problem. One approach to answering this problem in embedded topological worlds is to resort to the use of an external marking aid that can help the robot disambiguate places. This paper investigates the power of different marker-based aids in topological exploration. We describe enhanced versions of edge- and vertex-based marker algorithms and demonstrate algorithms with enhanced lower bounds in terms of number of markers and motions required in order to map an embedded topological environment.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129463853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes how to accelerate a real-world face detection and tracking system by taking advantage of the multiple processing cores that are present in most modern CPUs. This work makes three key contributions. The first is the presentation of a highly optimized serial face detection and tracking algorithm that uses motion estimation and local search windows to achieve fast processing rates. The second is redefining the face detection process based on a set of independent face scales that can be processed in parallel on separate CPU cores while also achieving a target processing rate. The third contribution is demonstrating how multiple cores can be used to accelerate the face tracking process which provides significant speed boosts when tracking a large number of faces simultaneously. Used in a real-world application, the parallel face detector and tracker yields a 50-70% speed boost over the serial version when tested on a commodity multi-core CPU.
{"title":"Parallelizing a Face Detection and Tracking System for Multi-Core Processors","authors":"A. Ranjan, S. Malik","doi":"10.1109/CRV.2012.45","DOIUrl":"https://doi.org/10.1109/CRV.2012.45","url":null,"abstract":"This paper describes how to accelerate a real-world face detection and tracking system by taking advantage of the multiple processing cores that are present in most modern CPUs. This work makes three key contributions. The first is the presentation of a highly optimized serial face detection and tracking algorithm that uses motion estimation and local search windows to achieve fast processing rates. The second is redefining the face detection process based on a set of independent face scales that can be processed in parallel on separate CPU cores while also achieving a target processing rate. The third contribution is demonstrating how multiple cores can be used to accelerate the face tracking process which provides significant speed boosts when tracking a large number of faces simultaneously. Used in a real-world application, the parallel face detector and tracker yields a 50-70% speed boost over the serial version when tested on a commodity multi-core CPU.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129073183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In current biometric-based identification systems, tattoos and other body modifications have shown to provide a useful source of information. Besides manual category label assignment, approaches utilizing state-of-the-art content-based image retrieval (CBIR) techniques have become increasingly popular. While local feature-based similarities of tattoo images achieve excellent retrieval accuracy, scalability to large image databases can be addressed with the popular bag-of-word model. In this paper, we show how recent advances in CBIR can be utilized to build up a large-scale tattoo image retrieval system. Compared to other systems, we chose a different approach to circumvent the loss of accuracy caused by the bag-of-word quantization. Its efficiency and effectiveness are shown in experiments with several tattoo databases of up to 330,000 images.
{"title":"Large-Scale Tattoo Image Retrieval","authors":"D. Manger","doi":"10.1109/CRV.2012.67","DOIUrl":"https://doi.org/10.1109/CRV.2012.67","url":null,"abstract":"In current biometric-based identification systems, tattoos and other body modifications have shown to provide a useful source of information. Besides manual category label assignment, approaches utilizing state-of-the-art content-based image retrieval (CBIR) techniques have become increasingly popular. While local feature-based similarities of tattoo images achieve excellent retrieval accuracy, scalability to large image databases can be addressed with the popular bag-of-word model. In this paper, we show how recent advances in CBIR can be utilized to build up a large-scale tattoo image retrieval system. Compared to other systems, we chose a different approach to circumvent the loss of accuracy caused by the bag-of-word quantization. Its efficiency and effectiveness are shown in experiments with several tattoo databases of up to 330,000 images.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115415498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feature detection is a crucial step in many Computer Vision applications such as matching, tracking, visual odometry and object recognition, etc. Detecting robust features that are persistent, rotation-invariant, and quickly calculated is a major problem in computer vision. Feature detectors using the difference of Gaussian (DoG) are computationally expensive, however, if the DoG is used with image sub sampling at higher orders, the detectors become fast but their feature localization becomes inaccurate. Detectors based on difference of octagons (DoO) or difference of stars (DoS) algorithm are fast and localize the features accurately, but they are not rotation-invariant. This paper introduces a novel technique for the difference of circles (DoC) algorithm, used for feature detection, that is perfectly rotation-invariant and has the potential of being very fast through using circular integral images. The performance of DoC algorithm is compared with the difference of stars algorithm presented by 'Willow Garage'. The experiments conducted concentrate on the rotation-invariance property of DoC.
{"title":"Difference of Circles Feature Detector","authors":"Abdullah Hojaij, Adel H. Fakih, A. Wong, J. Zelek","doi":"10.1109/CRV.2012.16","DOIUrl":"https://doi.org/10.1109/CRV.2012.16","url":null,"abstract":"Feature detection is a crucial step in many Computer Vision applications such as matching, tracking, visual odometry and object recognition, etc. Detecting robust features that are persistent, rotation-invariant, and quickly calculated is a major problem in computer vision. Feature detectors using the difference of Gaussian (DoG) are computationally expensive, however, if the DoG is used with image sub sampling at higher orders, the detectors become fast but their feature localization becomes inaccurate. Detectors based on difference of octagons (DoO) or difference of stars (DoS) algorithm are fast and localize the features accurately, but they are not rotation-invariant. This paper introduces a novel technique for the difference of circles (DoC) algorithm, used for feature detection, that is perfectly rotation-invariant and has the potential of being very fast through using circular integral images. The performance of DoC algorithm is compared with the difference of stars algorithm presented by 'Willow Garage'. The experiments conducted concentrate on the rotation-invariance property of DoC.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116128147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A fundamental open problem in SLAM is the effective representation of the map in unknown, ambiguous, complex, dynamic environments. Representing such environments in a suitable manner is a complex task. Existing approaches to SLAM use map representations that store individual features (range measurements, image patches, or higher level semantic features) and their locations in the environment. The choice of how we represent the map produces limitations which in many ways are unfavourable for application in real-world scenarios. In this paper, we explore a new approach to SLAM that redefines sensing and robot motion as acts of deformation of a differentiable surface. Distance fields and level set methods are utilized to define a parallel to the components of the SLAM estimation process and an algorithm is developed and demonstrated. The variational framework developed is capable of representing complex dynamic scenes and spatially varying uncertainty for sensor and robot models.
{"title":"A Variational Approach to Mapping and Localization","authors":"A. Hogue, S. Khattak","doi":"10.1109/CRV.2012.72","DOIUrl":"https://doi.org/10.1109/CRV.2012.72","url":null,"abstract":"A fundamental open problem in SLAM is the effective representation of the map in unknown, ambiguous, complex, dynamic environments. Representing such environments in a suitable manner is a complex task. Existing approaches to SLAM use map representations that store individual features (range measurements, image patches, or higher level semantic features) and their locations in the environment. The choice of how we represent the map produces limitations which in many ways are unfavourable for application in real-world scenarios. In this paper, we explore a new approach to SLAM that redefines sensing and robot motion as acts of deformation of a differentiable surface. Distance fields and level set methods are utilized to define a parallel to the components of the SLAM estimation process and an algorithm is developed and demonstrated. The variational framework developed is capable of representing complex dynamic scenes and spatially varying uncertainty for sensor and robot models.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122582019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. S. Lee, S. Fidler, Alex Levinshtein, Sven J. Dickinson
Given a set of captioned images of cluttered scenes containing various objects in different positions and scales, we learn named contour models of object categories without relying on bounding box annotation. We extend a recent language-vision integration framework that finds spatial configurations of image features that co-occur with words in image captions. By substituting appearance features with local contour features, object categories are recognized by a contour model that grows along the object's boundary. Experiments on ETHZ are presented to show that 1) the extended framework is better able to learn named visual categories whose within-class variation is better captured by a shape model than an appearance model, and 2) typical object recognition methods fail when manually annotated bounding boxes are unavailable.
{"title":"Learning Categorical Shape from Captioned Images","authors":"T. S. Lee, S. Fidler, Alex Levinshtein, Sven J. Dickinson","doi":"10.1109/CRV.2012.37","DOIUrl":"https://doi.org/10.1109/CRV.2012.37","url":null,"abstract":"Given a set of captioned images of cluttered scenes containing various objects in different positions and scales, we learn named contour models of object categories without relying on bounding box annotation. We extend a recent language-vision integration framework that finds spatial configurations of image features that co-occur with words in image captions. By substituting appearance features with local contour features, object categories are recognized by a contour model that grows along the object's boundary. Experiments on ETHZ are presented to show that 1) the extended framework is better able to learn named visual categories whose within-class variation is better captured by a shape model than an appearance model, and 2) typical object recognition methods fail when manually annotated bounding boxes are unavailable.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131553446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper addresses the problem of localizing an accelerometer in the view of a stationary camera as a first step towards multi-model activity recognition. This problem is challenging as accelerometers are visually occluded, they measure proper acceleration including effects of gravity and their orientation is unknown and changes over time relative to camera viewpoint. Accelerometers are localized by matching acceleration estimated along visual point trajectories to accelerometer data. Trajectories are constructed from point feature tracking (KLT) and by grid sampling from a dense flow field. We also construct 3D trajectories with visual depth information. The similarity between accelerometer data and a trajectory is computed by counting the number of frames in which the norms of accelerations in both sequences exceed a threshold. For quantitative evaluation we collected a challenging dataset consisting of video and accelerometer data of a person preparing a mixed salad with accelerometer-equipped kitchen utensils. Trajectories from dense optical flow yielded a higher localization accuracy compared to point feature tracking.
{"title":"Accelerometer Localization in the View of a Stationary Camera","authors":"Sebastian Stein, S. McKenna","doi":"10.1109/CRV.2012.22","DOIUrl":"https://doi.org/10.1109/CRV.2012.22","url":null,"abstract":"This paper addresses the problem of localizing an accelerometer in the view of a stationary camera as a first step towards multi-model activity recognition. This problem is challenging as accelerometers are visually occluded, they measure proper acceleration including effects of gravity and their orientation is unknown and changes over time relative to camera viewpoint. Accelerometers are localized by matching acceleration estimated along visual point trajectories to accelerometer data. Trajectories are constructed from point feature tracking (KLT) and by grid sampling from a dense flow field. We also construct 3D trajectories with visual depth information. The similarity between accelerometer data and a trajectory is computed by counting the number of frames in which the norms of accelerations in both sequences exceed a threshold. For quantitative evaluation we collected a challenging dataset consisting of video and accelerometer data of a person preparing a mixed salad with accelerometer-equipped kitchen utensils. Trajectories from dense optical flow yielded a higher localization accuracy compared to point feature tracking.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121753647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper addresses the problem of visual place categorization, which aims at augmenting different locations of the environment visited by an autonomous robot with information that relates them to human-understandable concepts. We formulate the problem of visual place categorization in terms of energy minimization. To label visual observations with place categories we present a global image representation that is invariant to common changes in dynamic environments and robust against intra-class variations. To satisfy temporal consistency, a general solution is presented that incorporates statistical cues, without being restricted by constant and small neighbourhood radii, or being dependent on the actual path followed by the robot. A set of experiments on publicly available databases demonstrates the advantages of the presented system and show a significant improvement over available methods.
{"title":"Visual Place Categorization in Indoor Environments","authors":"E. F. Ersi, John K. Tsotsos","doi":"10.1109/CRV.2012.66","DOIUrl":"https://doi.org/10.1109/CRV.2012.66","url":null,"abstract":"This paper addresses the problem of visual place categorization, which aims at augmenting different locations of the environment visited by an autonomous robot with information that relates them to human-understandable concepts. We formulate the problem of visual place categorization in terms of energy minimization. To label visual observations with place categories we present a global image representation that is invariant to common changes in dynamic environments and robust against intra-class variations. To satisfy temporal consistency, a general solution is presented that incorporates statistical cues, without being restricted by constant and small neighbourhood radii, or being dependent on the actual path followed by the robot. A set of experiments on publicly available databases demonstrates the advantages of the presented system and show a significant improvement over available methods.","PeriodicalId":372951,"journal":{"name":"2012 Ninth Conference on Computer and Robot Vision","volume":"63 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114014790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}