Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563096
Qian Yu, G. Medioni
We describe a GPU-based implementation of motion detection from a moving platform. Motion detection from a moving platform is inherently difficult as the moving camera induces 2D motion field in the entire image. A step compensating for camera motion is required prior to estimating of the background model. Due to inevitable registration errors, the background model is estimated according to a sliding window of frames to avoid the case where erroneous registration influences the quality of the detection for the whole sequence. However, this approach involves several characteristics that put a heavy burden on real-time CPU implementation. We exploit GPU to achieve significant acceleration over standard CPU implementations. Our GPU-based implementation can build the background model and detect motion regions at around 18 fps on 320times240 videos that are captured for a moving camera.
{"title":"A GPU-based implementation of motion detection from a moving platform","authors":"Qian Yu, G. Medioni","doi":"10.1109/CVPRW.2008.4563096","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563096","url":null,"abstract":"We describe a GPU-based implementation of motion detection from a moving platform. Motion detection from a moving platform is inherently difficult as the moving camera induces 2D motion field in the entire image. A step compensating for camera motion is required prior to estimating of the background model. Due to inevitable registration errors, the background model is estimated according to a sliding window of frames to avoid the case where erroneous registration influences the quality of the detection for the whole sequence. However, this approach involves several characteristics that put a heavy burden on real-time CPU implementation. We exploit GPU to achieve significant acceleration over standard CPU implementations. Our GPU-based implementation can build the background model and detect motion regions at around 18 fps on 320times240 videos that are captured for a moving camera.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"48 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114020286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563097
Li Zhang, R. Nevatia
We describe an efficient design for scan-window based object detectors using a general purpose graphics hardware computing (GPGPU) framework. While the design is particularly applied to built a pedestrian detector that uses histogram of oriented gradient (HOG) features and the support vector machine (SVM) classifiers, the methodology we use is generic and can be applied to other objects, using different features and classifiers. The GPGPU paradigm is utilized for feature extraction and classification, so that the scan windows can be processed in parallel. We further propose to precompute and cache all the histograms in advance, instead of using integral images, which greatly lowers the computation cost. A multi-scale reduce strategy is employed to save expensive CPU-GPU data transfers. Experimental results show that our implementation achieves a more-than-ten-times speed up with no loss on detection rates.
{"title":"Efficient scan-window based object detection using GPGPU","authors":"Li Zhang, R. Nevatia","doi":"10.1109/CVPRW.2008.4563097","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563097","url":null,"abstract":"We describe an efficient design for scan-window based object detectors using a general purpose graphics hardware computing (GPGPU) framework. While the design is particularly applied to built a pedestrian detector that uses histogram of oriented gradient (HOG) features and the support vector machine (SVM) classifiers, the methodology we use is generic and can be applied to other objects, using different features and classifiers. The GPGPU paradigm is utilized for feature extraction and classification, so that the scan windows can be processed in parallel. We further propose to precompute and cache all the histograms in advance, instead of using integral images, which greatly lowers the computation cost. A multi-scale reduce strategy is employed to save expensive CPU-GPU data transfers. Experimental results show that our implementation achieves a more-than-ten-times speed up with no loss on detection rates.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114999172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563010
Jin Zhou, Ananya Das, Feng Li, Baoxin Li
Endoscopy has become an established procedure for the diagnosis and therapy of various gastrointestinal (GI) ailments, and has also emerged as a commonly-used technique for minimally-invasive surgery. Most existing endoscopes are monocular, with stereo-endoscopy facing practical difficulties, preventing the physicians/surgeons from having a desired, realistic 3D view. Traditional monocular 3D reconstruction approaches (e.g., structure from motion) face extraordinary challenges for this application due to issues including noisy data, lack of textures supporting robust feature matching, nonrigidity of the objects, and glare artifacts from the imaging process, etc. In this paper, we propose a method to automatically reconstruct 3D structure from a monocular endoscopic video. Our approach attempts to address the above challenges by incorporating a circular generalized cylinder (CGC) model in 3D reconstruction. The CGC model is decomposed as a series of 3D circles. To reconstruct this model, we formulate the problem as one of maximum a posteriori estimation within a Markov random field framework, so as to ensure the smoothness constraints of the CGC model and to support robust search for the optimal solution, which is achieved by a two-stage heuristic search scheme. Both simulated and real data experiments demonstrate the effectiveness of the proposed approach.
{"title":"Circular generalized cylinder fitting for 3D reconstruction in endoscopic imaging based on MRF","authors":"Jin Zhou, Ananya Das, Feng Li, Baoxin Li","doi":"10.1109/CVPRW.2008.4563010","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563010","url":null,"abstract":"Endoscopy has become an established procedure for the diagnosis and therapy of various gastrointestinal (GI) ailments, and has also emerged as a commonly-used technique for minimally-invasive surgery. Most existing endoscopes are monocular, with stereo-endoscopy facing practical difficulties, preventing the physicians/surgeons from having a desired, realistic 3D view. Traditional monocular 3D reconstruction approaches (e.g., structure from motion) face extraordinary challenges for this application due to issues including noisy data, lack of textures supporting robust feature matching, nonrigidity of the objects, and glare artifacts from the imaging process, etc. In this paper, we propose a method to automatically reconstruct 3D structure from a monocular endoscopic video. Our approach attempts to address the above challenges by incorporating a circular generalized cylinder (CGC) model in 3D reconstruction. The CGC model is decomposed as a series of 3D circles. To reconstruct this model, we formulate the problem as one of maximum a posteriori estimation within a Markov random field framework, so as to ensure the smoothness constraints of the CGC model and to support robust search for the optimal solution, which is achieved by a two-stage heuristic search scheme. Both simulated and real data experiments demonstrate the effectiveness of the proposed approach.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"40 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113976041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563008
D. Beymer, T. Syeda-Mahmood, Fei Wang
2D Echocardiography is an important diagnostic aid for morphological and functional assessment of the heart. The transducer position is varied during an echo exam to elicit important information about the heart function and its anatomy. The knowledge of the transducer viewpoint is important in automatic cardiac echo interpretation to understand the regions being depicted as well as in the quantification of their attributes. In this paper, we address the problem of inferring the transducer viewpoint from the spatio-temporal information in cardiac echo videos. Unlike previous approaches, we exploit motion of the heart within a cardiac cycle in addition to spatial information to discriminate between viewpoints. Specifically, we use an active shape model (ASM) to model shape and texture information in an echo frame. The motion information derived by tracking ASMs through a heart cycle is then projected into the eigen-motion feature space of the viewpoint class for matching. We report comparison with a re-implementation of state-of-the-art view recognition methods in echos on a large database of patients with various cardiac diseases.
{"title":"Exploiting spatio-temporal information for view recognition in cardiac echo videos","authors":"D. Beymer, T. Syeda-Mahmood, Fei Wang","doi":"10.1109/CVPRW.2008.4563008","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563008","url":null,"abstract":"2D Echocardiography is an important diagnostic aid for morphological and functional assessment of the heart. The transducer position is varied during an echo exam to elicit important information about the heart function and its anatomy. The knowledge of the transducer viewpoint is important in automatic cardiac echo interpretation to understand the regions being depicted as well as in the quantification of their attributes. In this paper, we address the problem of inferring the transducer viewpoint from the spatio-temporal information in cardiac echo videos. Unlike previous approaches, we exploit motion of the heart within a cardiac cycle in addition to spatial information to discriminate between viewpoints. Specifically, we use an active shape model (ASM) to model shape and texture information in an echo frame. The motion information derived by tracking ASMs through a heart cycle is then projected into the eigen-motion feature space of the viewpoint class for matching. We report comparison with a re-implementation of state-of-the-art view recognition methods in echos on a large database of patients with various cardiac diseases.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123917349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563160
Y. Kim, Derek Chan, C. Theobalt, S. Thrun
This paper describes the design and calibration of a system that enables simultaneous recording of dynamic scenes with multiple high-resolution video and low-resolution Swissranger time-of-flight (TOF) depth cameras. The system shall serve as a testbed for the development of new algorithms for high-quality multi-view dynamic scene reconstruction and 3D video. The paper also provides a detailed analysis of random and systematic depth camera noise which is important for reliable fusion of video and depth data. Finally, the paper describes how to compensate systematic depth errors and calibrate all dynamic depth and video data into a common frame.
{"title":"Design and calibration of a multi-view TOF sensor fusion system","authors":"Y. Kim, Derek Chan, C. Theobalt, S. Thrun","doi":"10.1109/CVPRW.2008.4563160","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563160","url":null,"abstract":"This paper describes the design and calibration of a system that enables simultaneous recording of dynamic scenes with multiple high-resolution video and low-resolution Swissranger time-of-flight (TOF) depth cameras. The system shall serve as a testbed for the development of new algorithms for high-quality multi-view dynamic scene reconstruction and 3D video. The paper also provides a detailed analysis of random and systematic depth camera noise which is important for reliable fusion of video and depth data. Finally, the paper describes how to compensate systematic depth errors and calibrate all dynamic depth and video data into a common frame.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125811959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563101
Yuping Lin, G. Medioni
We present a GPU implementation to compute both mutual information and its derivatives. Mutual information computation is a highly demanding process due to the enormous number of exponential computations. It is therefore the bottleneck in many image registration applications. However, we show that these computations are fully parallizable and can be efficiently ported onto the GPU architecture. Compared with the same CPU implementation running on a workstation level CPU, we reached a factor of 170 in computing mutual information, and a factor of 400 in computing its derivatives.
{"title":"Mutual information computation and maximization using GPU","authors":"Yuping Lin, G. Medioni","doi":"10.1109/CVPRW.2008.4563101","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563101","url":null,"abstract":"We present a GPU implementation to compute both mutual information and its derivatives. Mutual information computation is a highly demanding process due to the enormous number of exponential computations. It is therefore the bottleneck in many image registration applications. However, we show that these computations are fully parallizable and can be efficiently ported onto the GPU architecture. Compared with the same CPU implementation running on a workstation level CPU, we reached a factor of 170 in computing mutual information, and a factor of 400 in computing its derivatives.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124645359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4562974
Leandro A. Loss, G. Bebis, M. Nicolescu, A. Skurikhin
Boundary detection in natural images represents an important but also challenging problem in computer vision. Motivated by studies in psychophysics claiming that humans use multiple cues for segmentation, several promising methods have been proposed which perform boundary detection by optimally combining local image measurements such as color, texture, and brightness. Very interesting results have been reported by applying these methods on challenging datasets such as the Berkeley segmentation benchmark. Although combining different cues for boundary detection has been shown to outperform methods using a single cue, results can be further improved by integrating perceptual organization cues with the boundary detection process. The main goal of this study is to investigate how and when perceptual organization cues improve boundary detection in natural images. In this context, we investigate the idea of integrating with segmentation the iterative multi-scale tensor voting (IMSTV), a variant of tensor voting (TV) that performs perceptual grouping by analyzing information at multiple-scales and removing background clutter in an iterative fashion, preserving salient, organized structures. The key idea is to use IMSTV to post-process the boundary posterior probability (PB) map produced by segmentation algorithms. Detailed analysis of our experimental results reveals how and when perceptual organization cues are likely to improve or degrade boundary detection. In particular, we show that using perceptual grouping as a post-processing step improves boundary detection in 84% of the grayscale test images in the Berkeley segmentation dataset.
{"title":"Investigating how and when perceptual organization cues improve boundary detection in natural images","authors":"Leandro A. Loss, G. Bebis, M. Nicolescu, A. Skurikhin","doi":"10.1109/CVPRW.2008.4562974","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4562974","url":null,"abstract":"Boundary detection in natural images represents an important but also challenging problem in computer vision. Motivated by studies in psychophysics claiming that humans use multiple cues for segmentation, several promising methods have been proposed which perform boundary detection by optimally combining local image measurements such as color, texture, and brightness. Very interesting results have been reported by applying these methods on challenging datasets such as the Berkeley segmentation benchmark. Although combining different cues for boundary detection has been shown to outperform methods using a single cue, results can be further improved by integrating perceptual organization cues with the boundary detection process. The main goal of this study is to investigate how and when perceptual organization cues improve boundary detection in natural images. In this context, we investigate the idea of integrating with segmentation the iterative multi-scale tensor voting (IMSTV), a variant of tensor voting (TV) that performs perceptual grouping by analyzing information at multiple-scales and removing background clutter in an iterative fashion, preserving salient, organized structures. The key idea is to use IMSTV to post-process the boundary posterior probability (PB) map produced by segmentation algorithms. Detailed analysis of our experimental results reveals how and when perceptual organization cues are likely to improve or degrade boundary detection. In particular, we show that using perceptual grouping as a post-processing step improves boundary detection in 84% of the grayscale test images in the Berkeley segmentation dataset.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"54 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131470794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563132
Ruisheng Wang, F. Ferrie
This paper presents a new method for reconstructing rectilinear buildings from single images under the assumption of flat terrain. An intuition of the method is that, given an image composed of rectilinear buildings, the 3D buildings can be geometrically reconstructed by using the image only. The recovery algorithm is formulated in terms of two objective functions which are based on the equivalence between the vector normal to the interpretation plane in the image space and the vector normal to the rotated interpretation plane in the object space. These objective functions are minimized with respect to the camera pose, the building dimensions, locations and orientations to obtain estimates for the structure of the scene. The method potentially provides a solution for large-scale urban modelling using aerial images, and can be easily extended to deal with piecewise planar objects in a more general situation.
{"title":"Camera localization and building reconstruction from single monocular images","authors":"Ruisheng Wang, F. Ferrie","doi":"10.1109/CVPRW.2008.4563132","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563132","url":null,"abstract":"This paper presents a new method for reconstructing rectilinear buildings from single images under the assumption of flat terrain. An intuition of the method is that, given an image composed of rectilinear buildings, the 3D buildings can be geometrically reconstructed by using the image only. The recovery algorithm is formulated in terms of two objective functions which are based on the equivalence between the vector normal to the interpretation plane in the image space and the vector normal to the rotated interpretation plane in the object space. These objective functions are minimized with respect to the camera pose, the building dimensions, locations and orientations to obtain estimates for the structure of the scene. The method potentially provides a solution for large-scale urban modelling using aerial images, and can be easily extended to deal with piecewise planar objects in a more general situation.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125523584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4562970
Bo Peng, G. Qian
In this paper, we propose a novel approach to dance pose recognition and body orientation estimation using multilinear analysis. By performing tensor decomposition and projection using silhouette images obtained from wide base-line binocular cameras, low dimensional pose and body orientation coefficient vectors can be extracted. Different from traditional tensor-based recognition methods, the proposed approach takes the pose coefficient vector as features to train a family of support vector machines as pose classifiers. Using the body orientation coefficient vectors, a one-dimensional orientation manifold is learned and further used for the estimation of body orientation. Experiment results obtained using both synthetic and real image data showed the efficacy of the proposed approach, and that our approach outperformed the traditional tensor-based approach in the comparative test.
{"title":"Binocular dance pose recognition and body orientation estimation via multilinear analysis","authors":"Bo Peng, G. Qian","doi":"10.1109/CVPRW.2008.4562970","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4562970","url":null,"abstract":"In this paper, we propose a novel approach to dance pose recognition and body orientation estimation using multilinear analysis. By performing tensor decomposition and projection using silhouette images obtained from wide base-line binocular cameras, low dimensional pose and body orientation coefficient vectors can be extracted. Different from traditional tensor-based recognition methods, the proposed approach takes the pose coefficient vector as features to train a family of support vector machines as pose classifiers. Using the body orientation coefficient vectors, a one-dimensional orientation manifold is learned and further used for the estimation of body orientation. Experiment results obtained using both synthetic and real image data showed the efficacy of the proposed approach, and that our approach outperformed the traditional tensor-based approach in the comparative test.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126884091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563133
P. Espinace, A. Soto
The unsupervised selection and posterior recognition of visual landmarks is a highly valuable perceptual capability for a mobile robot. Recently, we proposed a system that aims to achieve this capability by combining a bottom-up data driven approach with top-down feedback provided by high level semantic representations. The bottom-up approach is based on three main mechanisms: visual attention, area segmentation, and landmark characterization. The top-down feedback is based on two information sources: i) An estimation of the robot position that reduces the searching scope for potential matches with previously selected landmarks, ii) A set of weights that, according to the results of previous recognitions, controls the influence of different segmentation algorithms in the recognition of each landmark. In this paper we explore the benefits of extending our previous work by including a visual tracking step for each of the selected landmarks. Our intuition is that the inclusion of a tracking step can help to improve the model of each landmark by associating and selecting information from its most significant views. Furthermore, it can also help to avoid problems related to the selection of spurious landmarks. Our results confirm these intuitions by showing that the inclusion of the tracking step produces a significant increase in the recall rate for landmark recognition.
{"title":"Improving the selection and detection of visual landmarks through object tracking","authors":"P. Espinace, A. Soto","doi":"10.1109/CVPRW.2008.4563133","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563133","url":null,"abstract":"The unsupervised selection and posterior recognition of visual landmarks is a highly valuable perceptual capability for a mobile robot. Recently, we proposed a system that aims to achieve this capability by combining a bottom-up data driven approach with top-down feedback provided by high level semantic representations. The bottom-up approach is based on three main mechanisms: visual attention, area segmentation, and landmark characterization. The top-down feedback is based on two information sources: i) An estimation of the robot position that reduces the searching scope for potential matches with previously selected landmarks, ii) A set of weights that, according to the results of previous recognitions, controls the influence of different segmentation algorithms in the recognition of each landmark. In this paper we explore the benefits of extending our previous work by including a visual tracking step for each of the selected landmarks. Our intuition is that the inclusion of a tracking step can help to improve the model of each landmark by associating and selecting information from its most significant views. Furthermore, it can also help to avoid problems related to the selection of spurious landmarks. Our results confirm these intuitions by showing that the inclusion of the tracking step produces a significant increase in the recall rate for landmark recognition.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"11 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120807545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}