Omni-directional sensors are useful in obtaining a 360/spl deg/ field of view of a scene for robot navigation, scene modeling, and telepresence. A method is presented to recover 3D scene structure and camera motion from a sequence of multiple images captured by an omnidirectional catadioptric camera. This 3D model is then used to localize other panoramic images taken in the vicinity. This goal is achieved by tracking the trajectories of SIFT keypoints, and finding the path they travel by utilizing a Hough transform technique modified for panoramic imagery. This technique is applied to spatio-temporal feature extraction in the three-dimensional space of an image sequence, as that scene points trace a horizontal line trajectory relative to the camera. SIFT (scale invariant feature transform) keypoints are distinctive image features which can be identified between images invariant to scale and rotation. Together these methods are applied to reconstruct a three-dimensional model from a sequence of panoramic images, where the panoramic camera was translating in a straight line horizontal path. Only the camera/mirror geometry is known a priori. The camera positions and the world model is determined, up to a scale factor. Experimental results of model building and camera localization using this model are shown.
{"title":"Structure from motion using SIFT features and the PH transform with panoramic imagery","authors":"M. Fiala","doi":"10.1109/CRV.2005.78","DOIUrl":"https://doi.org/10.1109/CRV.2005.78","url":null,"abstract":"Omni-directional sensors are useful in obtaining a 360/spl deg/ field of view of a scene for robot navigation, scene modeling, and telepresence. A method is presented to recover 3D scene structure and camera motion from a sequence of multiple images captured by an omnidirectional catadioptric camera. This 3D model is then used to localize other panoramic images taken in the vicinity. This goal is achieved by tracking the trajectories of SIFT keypoints, and finding the path they travel by utilizing a Hough transform technique modified for panoramic imagery. This technique is applied to spatio-temporal feature extraction in the three-dimensional space of an image sequence, as that scene points trace a horizontal line trajectory relative to the camera. SIFT (scale invariant feature transform) keypoints are distinctive image features which can be identified between images invariant to scale and rotation. Together these methods are applied to reconstruct a three-dimensional model from a sequence of panoramic images, where the panoramic camera was translating in a straight line horizontal path. Only the camera/mirror geometry is known a priori. The camera positions and the world model is determined, up to a scale factor. Experimental results of model building and camera localization using this model are shown.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133281459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A rapid and automatic iterative corner extraction and matching for 2D mosaic construction is presented. This new system progressively estimates the geometric transformation parameters between two misaligned images. It combines corner extraction, matching, and transformation parameters estimation into an iterative scheme. By aligning the images in successive iterations, accuracy improves significantly. The accurately aligned images are used to re-extract new features, which are subsequently matched to select correspondences used to estimate a transformation with n-degrees of freedom. The false correspondences are suppressed progressively to achieve an accurate transformation estimate. The system is used to construct a mosaic from two misaligned images. The performance of the system is demonstrated experimentally using various images of differing complexity.
{"title":"Iterative corner extraction and matching for mosaic construction","authors":"Salem Alkaabi, F. Deravi","doi":"10.1109/CRV.2005.50","DOIUrl":"https://doi.org/10.1109/CRV.2005.50","url":null,"abstract":"A rapid and automatic iterative corner extraction and matching for 2D mosaic construction is presented. This new system progressively estimates the geometric transformation parameters between two misaligned images. It combines corner extraction, matching, and transformation parameters estimation into an iterative scheme. By aligning the images in successive iterations, accuracy improves significantly. The accurately aligned images are used to re-extract new features, which are subsequently matched to select correspondences used to estimate a transformation with n-degrees of freedom. The false correspondences are suppressed progressively to achieve an accurate transformation estimate. The system is used to construct a mosaic from two misaligned images. The performance of the system is demonstrated experimentally using various images of differing complexity.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133700466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In video sequences, edges in 2D images (frames) produces 3D surface in the spatio-temporal volume, in this paper, we propose to consider temporal collisions between edges, and thus objects, as 3D ridges in the spatio-temporal volume. Collisions (i.e. ridge points) can be located using the maximum principal curvature and the principal curvature direction. Using the detected collisions, we then propose a technique to detect overlapping objects events in an image sequence, by neither computing depth or optical flow. We present successful experiments on real image sequences.
{"title":"Collision and event detection using geometric features in spatio-temporal volumes","authors":"M. Bolduc, F. Deschênes","doi":"10.1109/CRV.2005.26","DOIUrl":"https://doi.org/10.1109/CRV.2005.26","url":null,"abstract":"In video sequences, edges in 2D images (frames) produces 3D surface in the spatio-temporal volume, in this paper, we propose to consider temporal collisions between edges, and thus objects, as 3D ridges in the spatio-temporal volume. Collisions (i.e. ridge points) can be located using the maximum principal curvature and the principal curvature direction. Using the detected collisions, we then propose a technique to detect overlapping objects events in an image sequence, by neither computing depth or optical flow. We present successful experiments on real image sequences.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"337 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133898105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Docking craft in space and guiding mining machines are areas that often use remote video cameras equipped with one or more controllable light sources. In these applications, the problem of parameter selection arises: how to choose the best parameters for the camera and lights? Another problem is that a single image often cannot capture the whole scene properly and a composite image needs to be rendered. In this paper, we report on our progress with the CITO Lights and Camera project that addresses the parameter selection and merging problems for such systems. The prototype knowledge-based controller adjusts lighting to iteratively acquire a collection of images of a target. At every stage, an entropy-based merging module combines these images to produce a composite. The result is a final composite image that is optimized for further image processing tasks, such as pose estimation or tracking.
{"title":"Controlling camera and lights for intelligent image acquisition and merging","authors":"O. Borzenko, Y. Lespérance, M. Jenkin","doi":"10.1109/CRV.2005.29","DOIUrl":"https://doi.org/10.1109/CRV.2005.29","url":null,"abstract":"Docking craft in space and guiding mining machines are areas that often use remote video cameras equipped with one or more controllable light sources. In these applications, the problem of parameter selection arises: how to choose the best parameters for the camera and lights? Another problem is that a single image often cannot capture the whole scene properly and a composite image needs to be rendered. In this paper, we report on our progress with the CITO Lights and Camera project that addresses the parameter selection and merging problems for such systems. The prototype knowledge-based controller adjusts lighting to iteratively acquire a collection of images of a target. At every stage, an entropy-based merging module combines these images to produce a composite. The result is a final composite image that is optimized for further image processing tasks, such as pose estimation or tracking.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"389 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115853255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. C. Santana, O. Déniz-Suárez, Cayetano Guerra, M. Hernández-Tejera
This paper describes a face detection system which goes beyond traditional approaches normally designed for still images. First the video stream context is considered to apply the detector, and therefore, the resulting system is designed taking into consideration a main feature available in a video stream, i.e. temporal coherence. The resulting system builds a feature based model for each detected face, and searches them using various model information in the next frame. The results achieved for video stream processing outperform Rowley-Kanade's and Viola-Jones' solutions providing eye and face data in a reduced time with a notable correct detection rate.
{"title":"Real-time detection of faces in video streams","authors":"M. C. Santana, O. Déniz-Suárez, Cayetano Guerra, M. Hernández-Tejera","doi":"10.1109/CRV.2005.64","DOIUrl":"https://doi.org/10.1109/CRV.2005.64","url":null,"abstract":"This paper describes a face detection system which goes beyond traditional approaches normally designed for still images. First the video stream context is considered to apply the detector, and therefore, the resulting system is designed taking into consideration a main feature available in a video stream, i.e. temporal coherence. The resulting system builds a feature based model for each detected face, and searches them using various model information in the next frame. The results achieved for video stream processing outperform Rowley-Kanade's and Viola-Jones' solutions providing eye and face data in a reduced time with a notable correct detection rate.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122602135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image segmentation and labelling are the two conceptual operations in image classification. As the remote sensing community uses more powerful segmentation procedures with spatial constraint, new possibilities can be explored for labelling. Instead of assigning a label to a single observation (pixel), whole segments of image are labelled at once implying the use of multivariate samples rather than pixel vectors. This approach to image classification also offers new possibilities for using a priori information about the classes such as existing maps or object signature libraries. The present paper addresses the two issues. First a labelling scheme is presented that gathers evidence about the classes from incomplete a priori information using a "cognitive reasoning" approach. Then, five different metrics are compared for the label assignment and are combined through a voting scheme. The results show that very different results can be obtained depending on the metric chosen. The metric combination through voting, being a suboptimal approach does not necessarily provide the best results but could be a safe alternative to choosing only one metric.
{"title":"Comparing classification metrics for labeling segmented remote sensing images","authors":"P. Maillard, David A Clausi","doi":"10.1109/CRV.2005.28","DOIUrl":"https://doi.org/10.1109/CRV.2005.28","url":null,"abstract":"Image segmentation and labelling are the two conceptual operations in image classification. As the remote sensing community uses more powerful segmentation procedures with spatial constraint, new possibilities can be explored for labelling. Instead of assigning a label to a single observation (pixel), whole segments of image are labelled at once implying the use of multivariate samples rather than pixel vectors. This approach to image classification also offers new possibilities for using a priori information about the classes such as existing maps or object signature libraries. The present paper addresses the two issues. First a labelling scheme is presented that gathers evidence about the classes from incomplete a priori information using a \"cognitive reasoning\" approach. Then, five different metrics are compared for the label assignment and are combined through a voting scheme. The results show that very different results can be obtained depending on the metric chosen. The metric combination through voting, being a suboptimal approach does not necessarily provide the best results but could be a safe alternative to choosing only one metric.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"145 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129645006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes an algorithm, which calculates the approximate head pose of partially occluded faces without training or manual initialization. The presented approach works on low-resolution Webcam images. The algorithm is based on the observation that for small depth rotations of a head the rotation angles can be approximated linearly. It uses the CamShift (continuous adaptive mean shift) algorithm to track the users head. With a pyramidal implementation of an iterative Lucas-Kanade optical flow algorithm, a certain feature point in the face is tracked. Pan and tilt of the head are estimated from the shift, of the feature point relative to the center of the head. 3D position and roll are estimated from the CamShift, results.
{"title":"Head pose estimation of partially occluded faces","authors":"Markus T. Wenzel, W. Schiffmann","doi":"10.1109/CRV.2005.45","DOIUrl":"https://doi.org/10.1109/CRV.2005.45","url":null,"abstract":"This paper describes an algorithm, which calculates the approximate head pose of partially occluded faces without training or manual initialization. The presented approach works on low-resolution Webcam images. The algorithm is based on the observation that for small depth rotations of a head the rotation angles can be approximated linearly. It uses the CamShift (continuous adaptive mean shift) algorithm to track the users head. With a pyramidal implementation of an iterative Lucas-Kanade optical flow algorithm, a certain feature point in the face is tracked. Pan and tilt of the head are estimated from the shift, of the feature point relative to the center of the head. 3D position and roll are estimated from the CamShift, results.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"38 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113942858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}