Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711516
W. Ahmed, Ming Zhang, O. Al-Kofahi
X-ray scanners are increasingly used for scanning vehicles crossing international borders or entering critical infrastructure installations. The ability to penetrate through steel and other opaque materials and the nondestructive nature of x-ray radiation make them ideal for finding drugs, explosives and other contraband. In many situations, the same vehicles cross the checkpoint repeatedly, such as the employee vehicles entering a high-risk facility or cargo vehicles crossing international borders back and forth. Manual analysis of these images puts extra burden on the operator and results in slow throughput. In this paper we report an integrated and fully automated system to solve this problem. In the first stage of the algorithm, a model-based segmentation approach is used to find the vehicle outline. It proceeds by first using background subtraction to find the overall body of the vehicle. Next, we find the outlines of tires by using rotating edge detection kernels. The lower outline of the vehicle is found using active contours. We then use a deformable registration approach to align the vehicles which is specifically designed for the requirements of this problem. An intensity normalization step is then performed to account for the intensity variations between the scans at two time points. We use a histogram-based approach that scales and shifts the histogram of one image to match that of the other. The differences between the two inspection results are computed next. We then apply knowledge-based rules to remove false alarms such as lights and driver's body. The system is specifically designed for back-scatter x-ray imaging which is a powerful modality for detecting organic materials such as drugs and explosives. We have applied this system to images scanned by a deployed x-ray scanner and have achieved satisfactory results.
{"title":"Historical comparison of vehicles using scanned x-ray images","authors":"W. Ahmed, Ming Zhang, O. Al-Kofahi","doi":"10.1109/WACV.2011.5711516","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711516","url":null,"abstract":"X-ray scanners are increasingly used for scanning vehicles crossing international borders or entering critical infrastructure installations. The ability to penetrate through steel and other opaque materials and the nondestructive nature of x-ray radiation make them ideal for finding drugs, explosives and other contraband. In many situations, the same vehicles cross the checkpoint repeatedly, such as the employee vehicles entering a high-risk facility or cargo vehicles crossing international borders back and forth. Manual analysis of these images puts extra burden on the operator and results in slow throughput. In this paper we report an integrated and fully automated system to solve this problem. In the first stage of the algorithm, a model-based segmentation approach is used to find the vehicle outline. It proceeds by first using background subtraction to find the overall body of the vehicle. Next, we find the outlines of tires by using rotating edge detection kernels. The lower outline of the vehicle is found using active contours. We then use a deformable registration approach to align the vehicles which is specifically designed for the requirements of this problem. An intensity normalization step is then performed to account for the intensity variations between the scans at two time points. We use a histogram-based approach that scales and shifts the histogram of one image to match that of the other. The differences between the two inspection results are computed next. We then apply knowledge-based rules to remove false alarms such as lights and driver's body. The system is specifically designed for back-scatter x-ray imaging which is a powerful modality for detecting organic materials such as drugs and explosives. We have applied this system to images scanned by a deployed x-ray scanner and have achieved satisfactory results.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123686467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711539
Yanpeng Cao, M. Yang, J. McDonald
The complexity of natural scenes and the amount of information acquired by terrestrial laser scanners turn the registration among scans into a complex problem. This problem becomes even more challenging when two individual scans captured at significantly changed viewpoints (wide baseline). Since laser-scanning instruments nowadays are often equipped with an additional image sensor, it stands to reason making use of the image content to improve the registration process of 3D scanning data. In this paper, we present a novel improvement to the existing feature techniques to enable automatic alignment between two widely separated 3D scans. The key idea consists of extracting dominant planar structures from 3D point clouds and then utilizing the recovered 3D geometry to improve the performance of 2D image feature extraction and matching. The resulting features are very discriminative and robust to perspective distortions and viewpoint changes due to exploiting the underlying 3D structure. Using this novel viewpoint invariant feature, the corresponding 3D points are automatically linked in terms of wide baseline image matching. Initial experiments with real data demonstrate the potential of the proposed method for the challenging wide baseline 3D scanning data alignment tasks.
{"title":"Robust alignment of wide baseline terrestrial laser scans via 3D viewpoint normalization","authors":"Yanpeng Cao, M. Yang, J. McDonald","doi":"10.1109/WACV.2011.5711539","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711539","url":null,"abstract":"The complexity of natural scenes and the amount of information acquired by terrestrial laser scanners turn the registration among scans into a complex problem. This problem becomes even more challenging when two individual scans captured at significantly changed viewpoints (wide baseline). Since laser-scanning instruments nowadays are often equipped with an additional image sensor, it stands to reason making use of the image content to improve the registration process of 3D scanning data. In this paper, we present a novel improvement to the existing feature techniques to enable automatic alignment between two widely separated 3D scans. The key idea consists of extracting dominant planar structures from 3D point clouds and then utilizing the recovered 3D geometry to improve the performance of 2D image feature extraction and matching. The resulting features are very discriminative and robust to perspective distortions and viewpoint changes due to exploiting the underlying 3D structure. Using this novel viewpoint invariant feature, the corresponding 3D points are automatically linked in terms of wide baseline image matching. Initial experiments with real data demonstrate the potential of the proposed method for the challenging wide baseline 3D scanning data alignment tasks.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"141 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125833705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711511
Chris Miller, B. Babb, F. Moore, M. R. Peterson
State-of-the-art lossy compression schemes for medical imagery utilize the 9/7 wavelet. Recent research has established a methodology for using evolutionary computation (EC) to evolve wavelet and scaling numbers describing novel reconstruction transforms that outperform the 9/7 under lossy conditions. This paper describes an investigation into whether evolved transforms could automatically compensate for the detrimental effects of quantization for ultrasound (US) images. Results for 16:1, 32:1, and 64:1 quantization consistently demonstrate superior performance of evolved transforms in comparison to the 9/7 wavelet; in general, this advantage increases in proportion to the selected quantization level.
{"title":"Evolving improved transforms for reconstruction of quantized ultrasound images","authors":"Chris Miller, B. Babb, F. Moore, M. R. Peterson","doi":"10.1109/WACV.2011.5711511","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711511","url":null,"abstract":"State-of-the-art lossy compression schemes for medical imagery utilize the 9/7 wavelet. Recent research has established a methodology for using evolutionary computation (EC) to evolve wavelet and scaling numbers describing novel reconstruction transforms that outperform the 9/7 under lossy conditions. This paper describes an investigation into whether evolved transforms could automatically compensate for the detrimental effects of quantization for ultrasound (US) images. Results for 16:1, 32:1, and 64:1 quantization consistently demonstrate superior performance of evolved transforms in comparison to the 9/7 wavelet; in general, this advantage increases in proportion to the selected quantization level.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128482862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711560
K. Derpanis, Richard P. Wildes
This paper describes a system for classifying traffic congestion videos based on their observed visual dynamics. Central to the proposed system is treating traffic flow identification as an instance of dynamic texture classification. More specifically, a recent discriminative model of dynamic textures is adapted for the special case of traffic flows. This approach avoids the need for segmentation, tracking and motion estimation that typify extant approaches. Classification is based on matching distributions (or histograms) of spacetime orientation structure. Empirical evaluation on a publicly available data set shows high classification performance and robustness to typical environmental conditions (e.g., variable lighting).
{"title":"Classification of traffic video based on a spatiotemporal orientation analysis","authors":"K. Derpanis, Richard P. Wildes","doi":"10.1109/WACV.2011.5711560","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711560","url":null,"abstract":"This paper describes a system for classifying traffic congestion videos based on their observed visual dynamics. Central to the proposed system is treating traffic flow identification as an instance of dynamic texture classification. More specifically, a recent discriminative model of dynamic textures is adapted for the special case of traffic flows. This approach avoids the need for segmentation, tracking and motion estimation that typify extant approaches. Classification is based on matching distributions (or histograms) of spacetime orientation structure. Empirical evaluation on a publicly available data set shows high classification performance and robustness to typical environmental conditions (e.g., variable lighting).","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128500386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711550
Rubén Heras Evangelio, T. Senst, T. Sikora
Detecting static objects in video sequences has a high relevance in many surveillance scenarios like airports and railwaystations. In this paper we propose a system for the detection of static objects in crowded scenes that, based on the detection of two background models learning at different rates, classifies pixels with the help of a finite-state machine. The background is modelled by two mixtures of Gaussians with identical parameters except for the learning rate. The state machine provides the meaning for the interpretation of the results obtained from background subtraction and can be used to incorporate additional information cues, obtaining thus a flexible system specially suitable for real-life applications. The system was built in our surveillance application and successfully validated with several public datasets.
{"title":"Detection of static objects for the task of video surveillance","authors":"Rubén Heras Evangelio, T. Senst, T. Sikora","doi":"10.1109/WACV.2011.5711550","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711550","url":null,"abstract":"Detecting static objects in video sequences has a high relevance in many surveillance scenarios like airports and railwaystations. In this paper we propose a system for the detection of static objects in crowded scenes that, based on the detection of two background models learning at different rates, classifies pixels with the help of a finite-state machine. The background is modelled by two mixtures of Gaussians with identical parameters except for the learning rate. The state machine provides the meaning for the interpretation of the results obtained from background subtraction and can be used to incorporate additional information cues, obtaining thus a flexible system specially suitable for real-life applications. The system was built in our surveillance application and successfully validated with several public datasets.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122253106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711504
S. Crihalmeanu, A. Ross
Ocular biometrics has made significant progress over the past decade primarily due to advances in iris recognition. Initial research in the field of iris recognition focused on the acquisition and processing of frontal irides which may require considerable subject cooperation. However, when the iris is off-angle with respect to the acquisition device, the sclera (the white part of the eye) is exposed. The sclera is covered by a thin transparent layer called conjunctiva. Both the episclera and conjunctiva contain blood vessels that are observable from the outside. In this work, these blood vessels are referred to as conjunctival vasculature. Iris patterns are better observed in the near infrared spectrum while conjunctival vasculature is better seen in the visible spectrum. Therefore, multispectral (i.e., color-infrared) images of the eye are acquired to allow for the combination of the iris biometric with the conjunctival vasculature. The paper focuses on conjunctival vasculature enhancement, registration and matching. Initial results are promising and suggest the need for further investigation of this biometric in a bimodal configuration with iris.
{"title":"On the use of multispectral conjunctival vasculature as a soft biometric","authors":"S. Crihalmeanu, A. Ross","doi":"10.1109/WACV.2011.5711504","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711504","url":null,"abstract":"Ocular biometrics has made significant progress over the past decade primarily due to advances in iris recognition. Initial research in the field of iris recognition focused on the acquisition and processing of frontal irides which may require considerable subject cooperation. However, when the iris is off-angle with respect to the acquisition device, the sclera (the white part of the eye) is exposed. The sclera is covered by a thin transparent layer called conjunctiva. Both the episclera and conjunctiva contain blood vessels that are observable from the outside. In this work, these blood vessels are referred to as conjunctival vasculature. Iris patterns are better observed in the near infrared spectrum while conjunctival vasculature is better seen in the visible spectrum. Therefore, multispectral (i.e., color-infrared) images of the eye are acquired to allow for the combination of the iris biometric with the conjunctival vasculature. The paper focuses on conjunctival vasculature enhancement, registration and matching. Initial results are promising and suggest the need for further investigation of this biometric in a bimodal configuration with iris.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121050385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711548
K. Sankaranarayanan, Ming-Ching Chang, N. Krahnstoever
We present a real-time approach to estimating the gaze direction of multiple individuals using a network of far-field surveillance cameras. This work is part of a larger surveillance system that utilizes a network of fixed cameras as well as PTZ cameras to perform site-wide tracking of individuals. Based on the tracking information, one or more PTZ cameras are cooperatively controlled to obtain close-up facial images of individuals. Within these close-up shots, face detection and head pose estimation are performed and the results are provided back to the tracking system to track the individual gazes. A new cost metric based on location and gaze orientation is proposed to robustly associate head observations with tracker states. The tracking system can thus leverage the newly obtained gaze information for two purposes: (i) improve the localization of individuals in crowded settings, and (ii) aid high-level surveillance tasks such as understanding gesturing, interactions between individuals, and finding the object-of-interest that people are looking at. In security application, our system can detect if a subject is looking at the security cameras or guard posts.
{"title":"Tracking gaze direction from far-field surveillance cameras","authors":"K. Sankaranarayanan, Ming-Ching Chang, N. Krahnstoever","doi":"10.1109/WACV.2011.5711548","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711548","url":null,"abstract":"We present a real-time approach to estimating the gaze direction of multiple individuals using a network of far-field surveillance cameras. This work is part of a larger surveillance system that utilizes a network of fixed cameras as well as PTZ cameras to perform site-wide tracking of individuals. Based on the tracking information, one or more PTZ cameras are cooperatively controlled to obtain close-up facial images of individuals. Within these close-up shots, face detection and head pose estimation are performed and the results are provided back to the tracking system to track the individual gazes. A new cost metric based on location and gaze orientation is proposed to robustly associate head observations with tracker states. The tracking system can thus leverage the newly obtained gaze information for two purposes: (i) improve the localization of individuals in crowded settings, and (ii) aid high-level surveillance tasks such as understanding gesturing, interactions between individuals, and finding the object-of-interest that people are looking at. In security application, our system can detect if a subject is looking at the security cameras or guard posts.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116755650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711484
Ruisheng Wang, Jeff Bach, F. Ferrie
We present an automatic approach to window and façade detection from LiDAR (Light Detection And Ranging) data collected from a moving vehicle along streets in urban environments. The proposed method combines bottom-up with top-down strategies to extract façade planes from noisy LiDAR point clouds. The window detection is achieved through a two-step approach: potential window point detection and window localization. The facade pattern is automatically inferred to enhance the robustness of the window detection. Experimental results on six datasets result in 71.2% and 88.9% in the first two datasets, 100% for the rest four datasets in terms of completeness rate, and 100% correctness rate for all the tested datasets, which demonstrate the effectiveness of the proposed solution. The application potential includes generation of building facade models with street-level details and texture synthesis for producing realistic occlusion-free façade texture.
{"title":"Window detection from mobile LiDAR data","authors":"Ruisheng Wang, Jeff Bach, F. Ferrie","doi":"10.1109/WACV.2011.5711484","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711484","url":null,"abstract":"We present an automatic approach to window and façade detection from LiDAR (Light Detection And Ranging) data collected from a moving vehicle along streets in urban environments. The proposed method combines bottom-up with top-down strategies to extract façade planes from noisy LiDAR point clouds. The window detection is achieved through a two-step approach: potential window point detection and window localization. The facade pattern is automatically inferred to enhance the robustness of the window detection. Experimental results on six datasets result in 71.2% and 88.9% in the first two datasets, 100% for the rest four datasets in terms of completeness rate, and 100% correctness rate for all the tested datasets, which demonstrate the effectiveness of the proposed solution. The application potential includes generation of building facade models with street-level details and texture synthesis for producing realistic occlusion-free façade texture.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131489099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711488
Marynel Vázquez, Aaron Steinfeld
We present an interactive, computational approach for assisting users with visual impairments during photographic documentation of transit problems. Our technique can be described as a method to improve picture composition, while retaining visual information that is expected to be most relevant. Our system considers the position of the estimated region of interest (ROI) of a photo, and camera orientation. Saliency maps and Gestalt theory are used for guiding the user towards a more balanced picture. Our current implementation for mobile phones uses optic flow to update the internal knowledge of the position of the ROI and tilt sensor readings to correct non horizontal or vertical camera orientations. Using ground truth labels, we confirmed our method proposes valid strategies for improving image composition. Future work includes an optimized implementation and user studies.
{"title":"An assisted photography method for street scenes","authors":"Marynel Vázquez, Aaron Steinfeld","doi":"10.1109/WACV.2011.5711488","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711488","url":null,"abstract":"We present an interactive, computational approach for assisting users with visual impairments during photographic documentation of transit problems. Our technique can be described as a method to improve picture composition, while retaining visual information that is expected to be most relevant. Our system considers the position of the estimated region of interest (ROI) of a photo, and camera orientation. Saliency maps and Gestalt theory are used for guiding the user towards a more balanced picture. Our current implementation for mobile phones uses optic flow to update the internal knowledge of the position of the ROI and tilt sensor readings to correct non horizontal or vertical camera orientations. Using ground truth labels, we confirmed our method proposes valid strategies for improving image composition. Future work includes an optimized implementation and user studies.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117320500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711569
V. Castañeda, D. Mateus, Nassir Navab
This paper describes an extension to the Monocular Simultaneous Localization and Mapping (MonoSLAM) method that relies on the images provided by a combined high resolution Time of Flight (HR-ToF) sensor. In its standard formulation MonoSLAM estimates the depth of each tracked feature as the camera moves. This depth estimation depends both on the quality of the feature tracking and the previous camera position estimates. Additionally, MonoSLAM requires a set of known features to initialize the scale of the map and the world coordinate system. We propose to use the combined high resolution ToF sensor to incorporate depth measures into the MonoSLAM framework while keeping the accuracy of the feature detection. In practice, we use a ToF (Time of Flight) and a high-resolution (HR) camera in a calibrated and synchronized set-up and modify the measurement model and observation updates of MonoSLAM. The proposed method does not require known features to initialize a map. Experiments show first, that the depth measurements in our method improve the results of camera localization when compared to the MonoSLAM approach using HR images alone; and second, that HR images are required for reliable tracking.
{"title":"SLAM combining ToF and high-resolution cameras","authors":"V. Castañeda, D. Mateus, Nassir Navab","doi":"10.1109/WACV.2011.5711569","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711569","url":null,"abstract":"This paper describes an extension to the Monocular Simultaneous Localization and Mapping (MonoSLAM) method that relies on the images provided by a combined high resolution Time of Flight (HR-ToF) sensor. In its standard formulation MonoSLAM estimates the depth of each tracked feature as the camera moves. This depth estimation depends both on the quality of the feature tracking and the previous camera position estimates. Additionally, MonoSLAM requires a set of known features to initialize the scale of the map and the world coordinate system. We propose to use the combined high resolution ToF sensor to incorporate depth measures into the MonoSLAM framework while keeping the accuracy of the feature detection. In practice, we use a ToF (Time of Flight) and a high-resolution (HR) camera in a calibrated and synchronized set-up and modify the measurement model and observation updates of MonoSLAM. The proposed method does not require known features to initialize a map. Experiments show first, that the depth measurements in our method improve the results of camera localization when compared to the MonoSLAM approach using HR images alone; and second, that HR images are required for reliable tracking.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122176448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}