Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563131
Anastasios I. Mourikis, S. Roumeliotis
In this paper, we present a localization algorithm for estimating the 3D position and orientation (pose) of a moving vehicle based on visual and inertial measurements. The main advantage of the proposed method is that it provides precise pose estimates at low computational cost. This is achieved by introducing a two-layer estimation architecture that processes measurements based on their information content. Inertial measurements and feature tracks between consecutive images are processed locally in the first layer (multi-state-constraint Kalman filter) providing estimates for the motion of the vehicle at a high rate. The second layer comprises a bundle adjustment iterative estimator that operates intermittently so as to (i) reduce the effect of the linearization errors, and (ii) update the state estimates every time an area is re-visited and features are re-detected (loop closure). Through this process reliable state estimates are available continuously, while the estimation errors remain bounded during long-term operation. The performance of the developed system is demonstrated in large-scale experiments, involving a vehicle localizing within an urban area.
{"title":"A dual-layer estimator architecture for long-term localization","authors":"Anastasios I. Mourikis, S. Roumeliotis","doi":"10.1109/CVPRW.2008.4563131","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563131","url":null,"abstract":"In this paper, we present a localization algorithm for estimating the 3D position and orientation (pose) of a moving vehicle based on visual and inertial measurements. The main advantage of the proposed method is that it provides precise pose estimates at low computational cost. This is achieved by introducing a two-layer estimation architecture that processes measurements based on their information content. Inertial measurements and feature tracks between consecutive images are processed locally in the first layer (multi-state-constraint Kalman filter) providing estimates for the motion of the vehicle at a high rate. The second layer comprises a bundle adjustment iterative estimator that operates intermittently so as to (i) reduce the effect of the linearization errors, and (ii) update the state estimates every time an area is re-visited and features are re-detected (loop closure). Through this process reliable state estimates are available continuously, while the estimation errors remain bounded during long-term operation. The performance of the developed system is demonstrated in large-scale experiments, involving a vehicle localizing within an urban area.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124425174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563101
Yuping Lin, G. Medioni
We present a GPU implementation to compute both mutual information and its derivatives. Mutual information computation is a highly demanding process due to the enormous number of exponential computations. It is therefore the bottleneck in many image registration applications. However, we show that these computations are fully parallizable and can be efficiently ported onto the GPU architecture. Compared with the same CPU implementation running on a workstation level CPU, we reached a factor of 170 in computing mutual information, and a factor of 400 in computing its derivatives.
{"title":"Mutual information computation and maximization using GPU","authors":"Yuping Lin, G. Medioni","doi":"10.1109/CVPRW.2008.4563101","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563101","url":null,"abstract":"We present a GPU implementation to compute both mutual information and its derivatives. Mutual information computation is a highly demanding process due to the enormous number of exponential computations. It is therefore the bottleneck in many image registration applications. However, we show that these computations are fully parallizable and can be efficiently ported onto the GPU architecture. Compared with the same CPU implementation running on a workstation level CPU, we reached a factor of 170 in computing mutual information, and a factor of 400 in computing its derivatives.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124645359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563094
Manoj Seshadrinathan, K. Dempski
In this paper, we propose a system for the complete implementation of the advanced encryption standard (AES) for encryption and decryption of images and text on a graphics processing unit. The GPU acts as a valuable co-processor that relieves the load off the CPU. In the decryption stage, we use a novel technique to display the decrypted images and text on the screen without bringing it onto CPU memory. We also present a system for encryption and decryption of hybrid map tiles generated from GIS data sets.
{"title":"Implementation of Advanced Encryption Standard for encryption and decryption of images and text on a GPU","authors":"Manoj Seshadrinathan, K. Dempski","doi":"10.1109/CVPRW.2008.4563094","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563094","url":null,"abstract":"In this paper, we propose a system for the complete implementation of the advanced encryption standard (AES) for encryption and decryption of images and text on a graphics processing unit. The GPU acts as a valuable co-processor that relieves the load off the CPU. In the decryption stage, we use a novel technique to display the decrypted images and text on the screen without bringing it onto CPU memory. We also present a system for encryption and decryption of hybrid map tiles generated from GIS data sets.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134185389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4562964
S. Barbieri, M. Welk, J. Weickert
We present a variational framework for the registration of tensor-valued images. It is based on an energy functional with four terms: a data term based on a diffusion tensor constancy constraint, a compatibility term encoding the physical model linking domain deformations and tensor reorientation, and smoothness terms for deformation and tensor reorientation. Although the tensor deformation model employed here is designed with regard to diffusion tensor MRI data, the separation of data and compatibility term allows to adapt the model easily to different tensor deformation models. We minimise the energy functional with respect to both transformation fields by a multiscale gradient descent. Experiments demonstrate the viability and potential of this approach in the registration of tensor-valued images.
{"title":"Variational registration of tensor-valued images","authors":"S. Barbieri, M. Welk, J. Weickert","doi":"10.1109/CVPRW.2008.4562964","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4562964","url":null,"abstract":"We present a variational framework for the registration of tensor-valued images. It is based on an energy functional with four terms: a data term based on a diffusion tensor constancy constraint, a compatibility term encoding the physical model linking domain deformations and tensor reorientation, and smoothness terms for deformation and tensor reorientation. Although the tensor deformation model employed here is designed with regard to diffusion tensor MRI data, the separation of data and compatibility term allows to adapt the model easily to different tensor deformation models. We minimise the energy functional with respect to both transformation fields by a multiscale gradient descent. Experiments demonstrate the viability and potential of this approach in the registration of tensor-valued images.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133617911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563070
V. Balasubramanian, S. Panchanathan, Shayok Chakraborty
An important facet of learning in an online setting is the confidence associated with a prediction on a given test data point. In an online learning scenario, it would be expected that the system can increase its confidence of prediction as training data increases. We present a statistical approach in this work to associate a confidence value with a predicted class label in an online learning scenario. Our work is based on the existing work on transductive confidence machines (TCM) [1], which provided a methodology to define a heuristic confidence measure. We applied this approach to the problem of head pose classification from face images, and extended the framework to compute a confidence value when multiple cues are extracted from images to perform classification. Our approach is based on combining the results of multiple hypotheses and obtaining an integrated p-value to validate a single test hypothesis. From our experiments on the widely accepted FERET database, we obtained results which corroborated the significance of confidence measures - particularly, in online learning approaches. We could infer from our results with transductive learning that using confidence measures in online learning could yield significant boosts in the prediction accuracy, which would be very useful in critical pattern recognition applications.
{"title":"Multiple cue integration in transductive confidence machines for head pose classification","authors":"V. Balasubramanian, S. Panchanathan, Shayok Chakraborty","doi":"10.1109/CVPRW.2008.4563070","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563070","url":null,"abstract":"An important facet of learning in an online setting is the confidence associated with a prediction on a given test data point. In an online learning scenario, it would be expected that the system can increase its confidence of prediction as training data increases. We present a statistical approach in this work to associate a confidence value with a predicted class label in an online learning scenario. Our work is based on the existing work on transductive confidence machines (TCM) [1], which provided a methodology to define a heuristic confidence measure. We applied this approach to the problem of head pose classification from face images, and extended the framework to compute a confidence value when multiple cues are extracted from images to perform classification. Our approach is based on combining the results of multiple hypotheses and obtaining an integrated p-value to validate a single test hypothesis. From our experiments on the widely accepted FERET database, we obtained results which corroborated the significance of confidence measures - particularly, in online learning approaches. We could infer from our results with transductive learning that using confidence measures in online learning could yield significant boosts in the prediction accuracy, which would be very useful in critical pattern recognition applications.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116951436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563097
Li Zhang, R. Nevatia
We describe an efficient design for scan-window based object detectors using a general purpose graphics hardware computing (GPGPU) framework. While the design is particularly applied to built a pedestrian detector that uses histogram of oriented gradient (HOG) features and the support vector machine (SVM) classifiers, the methodology we use is generic and can be applied to other objects, using different features and classifiers. The GPGPU paradigm is utilized for feature extraction and classification, so that the scan windows can be processed in parallel. We further propose to precompute and cache all the histograms in advance, instead of using integral images, which greatly lowers the computation cost. A multi-scale reduce strategy is employed to save expensive CPU-GPU data transfers. Experimental results show that our implementation achieves a more-than-ten-times speed up with no loss on detection rates.
{"title":"Efficient scan-window based object detection using GPGPU","authors":"Li Zhang, R. Nevatia","doi":"10.1109/CVPRW.2008.4563097","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563097","url":null,"abstract":"We describe an efficient design for scan-window based object detectors using a general purpose graphics hardware computing (GPGPU) framework. While the design is particularly applied to built a pedestrian detector that uses histogram of oriented gradient (HOG) features and the support vector machine (SVM) classifiers, the methodology we use is generic and can be applied to other objects, using different features and classifiers. The GPGPU paradigm is utilized for feature extraction and classification, so that the scan windows can be processed in parallel. We further propose to precompute and cache all the histograms in advance, instead of using integral images, which greatly lowers the computation cost. A multi-scale reduce strategy is employed to save expensive CPU-GPU data transfers. Experimental results show that our implementation achieves a more-than-ten-times speed up with no loss on detection rates.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114999172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563008
D. Beymer, T. Syeda-Mahmood, Fei Wang
2D Echocardiography is an important diagnostic aid for morphological and functional assessment of the heart. The transducer position is varied during an echo exam to elicit important information about the heart function and its anatomy. The knowledge of the transducer viewpoint is important in automatic cardiac echo interpretation to understand the regions being depicted as well as in the quantification of their attributes. In this paper, we address the problem of inferring the transducer viewpoint from the spatio-temporal information in cardiac echo videos. Unlike previous approaches, we exploit motion of the heart within a cardiac cycle in addition to spatial information to discriminate between viewpoints. Specifically, we use an active shape model (ASM) to model shape and texture information in an echo frame. The motion information derived by tracking ASMs through a heart cycle is then projected into the eigen-motion feature space of the viewpoint class for matching. We report comparison with a re-implementation of state-of-the-art view recognition methods in echos on a large database of patients with various cardiac diseases.
{"title":"Exploiting spatio-temporal information for view recognition in cardiac echo videos","authors":"D. Beymer, T. Syeda-Mahmood, Fei Wang","doi":"10.1109/CVPRW.2008.4563008","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563008","url":null,"abstract":"2D Echocardiography is an important diagnostic aid for morphological and functional assessment of the heart. The transducer position is varied during an echo exam to elicit important information about the heart function and its anatomy. The knowledge of the transducer viewpoint is important in automatic cardiac echo interpretation to understand the regions being depicted as well as in the quantification of their attributes. In this paper, we address the problem of inferring the transducer viewpoint from the spatio-temporal information in cardiac echo videos. Unlike previous approaches, we exploit motion of the heart within a cardiac cycle in addition to spatial information to discriminate between viewpoints. Specifically, we use an active shape model (ASM) to model shape and texture information in an echo frame. The motion information derived by tracking ASMs through a heart cycle is then projected into the eigen-motion feature space of the viewpoint class for matching. We report comparison with a re-implementation of state-of-the-art view recognition methods in echos on a large database of patients with various cardiac diseases.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123917349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563089
C. Zach, D. Gallup, Jan-Michael Frahm
High-performance feature tracking from video input is a valuable tool in many computer vision techniques and mixed reality applications. This work presents a refined and substantially accelerated approach to KLT feature tracking performed on the GPU. Additionally, a global gain ratio between successive frames is estimated to compensate for changes in the camera exposure. The proposed approach achieves more than 200 frames per second on state-of-the art consumer GPUs for PAL (720 times 576) resolution data, and delivers real-time performance even on low-end mobile graphics processors.
{"title":"Fast gain-adaptive KLT tracking on the GPU","authors":"C. Zach, D. Gallup, Jan-Michael Frahm","doi":"10.1109/CVPRW.2008.4563089","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563089","url":null,"abstract":"High-performance feature tracking from video input is a valuable tool in many computer vision techniques and mixed reality applications. This work presents a refined and substantially accelerated approach to KLT feature tracking performed on the GPU. Additionally, a global gain ratio between successive frames is estimated to compensate for changes in the camera exposure. The proposed approach achieves more than 200 frames per second on state-of-the art consumer GPUs for PAL (720 times 576) resolution data, and delivers real-time performance even on low-end mobile graphics processors.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127970508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563168
Jochen Radmer, Pol Moser Fuste, H. Schmidt, J. Krüger
For various applications, such as object recognition or tracking and especially when the object is partly occluded or articulated, 3D information is crucial for the robustness of the application. A recently developed sensor to acquire distance information is based on the Photo Mixer Device (PMD)for which a distance error based on different causes can be observed. This article presents an improved distance calibration approach for PMD-based distance sensoring which handles objects with different Lambertian reflectance properties. Within this scope the relation of the sources of distance errors were investigated. Where applicable they were isolated for relational studies with the actuating variables, i.e. integration time, amplitude and measured distance, as these are the only parameters available for the calibration. The calibration results of the proposed method excel the results of all other known methods. In particular with objects with unknown reflectance properties a significant reduction of the error is achieved.
{"title":"Incident light related distance error study and calibration of the PMD-range imaging camera","authors":"Jochen Radmer, Pol Moser Fuste, H. Schmidt, J. Krüger","doi":"10.1109/CVPRW.2008.4563168","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563168","url":null,"abstract":"For various applications, such as object recognition or tracking and especially when the object is partly occluded or articulated, 3D information is crucial for the robustness of the application. A recently developed sensor to acquire distance information is based on the Photo Mixer Device (PMD)for which a distance error based on different causes can be observed. This article presents an improved distance calibration approach for PMD-based distance sensoring which handles objects with different Lambertian reflectance properties. Within this scope the relation of the sources of distance errors were investigated. Where applicable they were isolated for relational studies with the actuating variables, i.e. integration time, amplitude and measured distance, as these are the only parameters available for the calibration. The calibration results of the proposed method excel the results of all other known methods. In particular with objects with unknown reflectance properties a significant reduction of the error is achieved.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127983507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-06-23DOI: 10.1109/CVPRW.2008.4563132
Ruisheng Wang, F. Ferrie
This paper presents a new method for reconstructing rectilinear buildings from single images under the assumption of flat terrain. An intuition of the method is that, given an image composed of rectilinear buildings, the 3D buildings can be geometrically reconstructed by using the image only. The recovery algorithm is formulated in terms of two objective functions which are based on the equivalence between the vector normal to the interpretation plane in the image space and the vector normal to the rotated interpretation plane in the object space. These objective functions are minimized with respect to the camera pose, the building dimensions, locations and orientations to obtain estimates for the structure of the scene. The method potentially provides a solution for large-scale urban modelling using aerial images, and can be easily extended to deal with piecewise planar objects in a more general situation.
{"title":"Camera localization and building reconstruction from single monocular images","authors":"Ruisheng Wang, F. Ferrie","doi":"10.1109/CVPRW.2008.4563132","DOIUrl":"https://doi.org/10.1109/CVPRW.2008.4563132","url":null,"abstract":"This paper presents a new method for reconstructing rectilinear buildings from single images under the assumption of flat terrain. An intuition of the method is that, given an image composed of rectilinear buildings, the 3D buildings can be geometrically reconstructed by using the image only. The recovery algorithm is formulated in terms of two objective functions which are based on the equivalence between the vector normal to the interpretation plane in the image space and the vector normal to the rotated interpretation plane in the object space. These objective functions are minimized with respect to the camera pose, the building dimensions, locations and orientations to obtain estimates for the structure of the scene. The method potentially provides a solution for large-scale urban modelling using aerial images, and can be easily extended to deal with piecewise planar objects in a more general situation.","PeriodicalId":102206,"journal":{"name":"2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125523584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}