This paper presents algorithms for detecting moving objects, tracking them and interpreting their relationships. The algorithms are based on color and texture analysis in HSV space for real time processing. Our goal is to study human interaction by tracking people and objects for surveillance applications. The object detection algorithm is based on color histograms and iteratively divided interest regions for motion detection. The tracking algorithm is based on correlograms which combine spectral and spatial information to match detected objects in consecutive frames.
{"title":"Object detection and tracking using iterative division and correlograms","authors":"R. Bourezak, Guillaume-Alexandre Bilodeau","doi":"10.1109/CRV.2006.52","DOIUrl":"https://doi.org/10.1109/CRV.2006.52","url":null,"abstract":"This paper presents algorithms for detecting moving objects, tracking them and interpreting their relationships. The algorithms are based on color and texture analysis in HSV space for real time processing. Our goal is to study human interaction by tracking people and objects for surveillance applications. The object detection algorithm is based on color histograms and iteratively divided interest regions for motion detection. The tracking algorithm is based on correlograms which combine spectral and spatial information to match detected objects in consecutive frames.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115345616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents the problem of constructing a 3D terrain model for localization and navigation of planetary rover. We presented our approach to 3D terrain reconstruction from large sparse range data sets. In space robotics applications, an accurate and up-to-date model of the environment is very important for a variety of reasons. In particular, the model can be used for safe tele-operation, path planning and mapping points of interest. We propose an on-line terrain modeling using data provided by an on-board high resolution, accurate, 3D range sensor. Our approach is based on on-line acquisition of range scans from different view-points with overlapping regions, merge them together into a single point cloud, and then fit an irregular triangular mesh on the merged data. The outdoor experimental results demonstrate the effectiveness of the reconstructed terrain model for rover localization, path planning and motion execution scenario.
{"title":"3D Terrain Modeling for Rover Localization and Navigation","authors":"J. Bakambu, P. Allard, E. Dupuis","doi":"10.1109/CRV.2006.2","DOIUrl":"https://doi.org/10.1109/CRV.2006.2","url":null,"abstract":"This paper presents the problem of constructing a 3D terrain model for localization and navigation of planetary rover. We presented our approach to 3D terrain reconstruction from large sparse range data sets. In space robotics applications, an accurate and up-to-date model of the environment is very important for a variety of reasons. In particular, the model can be used for safe tele-operation, path planning and mapping points of interest. We propose an on-line terrain modeling using data provided by an on-board high resolution, accurate, 3D range sensor. Our approach is based on on-line acquisition of range scans from different view-points with overlapping regions, merge them together into a single point cloud, and then fit an irregular triangular mesh on the merged data. The outdoor experimental results demonstrate the effectiveness of the reconstructed terrain model for rover localization, path planning and motion execution scenario.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121121998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robert Sim, P. Elinas, Matt Griffin, Alex Shyr, J. Little
This paper addresses the problem of simultaneous localization and mapping (SLAM) using vision-based sensing. We present and analyse an implementation of a Rao- Blackwellised particle filter (RBPF) that uses stereo vision to localize a camera and 3D landmarks as the camera moves through an unknown environment. Our implementation is robust, can operate in real-time, and can operate without odometric or inertial measurements. Furthermore, our approach supports a 6-degree-of-freedom pose representation, vision-based ego-motion estimation, adaptive resampling, monocular operation, and a selection of odometry-based, observation-based, and mixture (combining local and global pose estimation) proposal distributions. This paper also examines the run-time behavior of efficiently designed RBPFs, providing an extensive empirical analysis of the memory and processing characteristics of RBPFs for vision-based SLAM. Finally, we present experimental results demonstrating the accuracy and efficiency of our approach.
{"title":"Design and analysis of a framework for real-time vision-based SLAM using Rao-Blackwellised particle filters","authors":"Robert Sim, P. Elinas, Matt Griffin, Alex Shyr, J. Little","doi":"10.1109/CRV.2006.25","DOIUrl":"https://doi.org/10.1109/CRV.2006.25","url":null,"abstract":"This paper addresses the problem of simultaneous localization and mapping (SLAM) using vision-based sensing. We present and analyse an implementation of a Rao- Blackwellised particle filter (RBPF) that uses stereo vision to localize a camera and 3D landmarks as the camera moves through an unknown environment. Our implementation is robust, can operate in real-time, and can operate without odometric or inertial measurements. Furthermore, our approach supports a 6-degree-of-freedom pose representation, vision-based ego-motion estimation, adaptive resampling, monocular operation, and a selection of odometry-based, observation-based, and mixture (combining local and global pose estimation) proposal distributions. This paper also examines the run-time behavior of efficiently designed RBPFs, providing an extensive empirical analysis of the memory and processing characteristics of RBPFs for vision-based SLAM. Finally, we present experimental results demonstrating the accuracy and efficiency of our approach.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127132991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper considers the problem of shape-from-shading using nearby planar distributed illuminants. It is shown that a rectangular planar nearby distributed uniform isotropic illuminant shining on a small Lambertian surface patch is equivalent to a single isotropic point light source at infinity. A closed-form solution is given for the equivalent point light source direction in terms of the illuminant corner locations. Equivalent point light sources can be obtained for multiple rectangular illuminants allowing standard photometric stereo algorithms to be used. An extension is given to the case of a rectangular planar illuminant with arbitrary radiance distribution. It is shown that a Walsh function approximation to the arbitrary illuminant distribution leads to an efficient computation of the equivalent point light source directions. A search technique employing a solution consistency measure is presented to handle the case of unknown depths. Applications of the theory presented in this paper include visual user-interfaces using shape-from-shading algorithms making use of the illumination from computer monitors, or movie screens.
{"title":"Photometric Stereo with Nearby Planar Distributed Illuminants","authors":"James J. Clark","doi":"10.1109/CRV.2006.55","DOIUrl":"https://doi.org/10.1109/CRV.2006.55","url":null,"abstract":"This paper considers the problem of shape-from-shading using nearby planar distributed illuminants. It is shown that a rectangular planar nearby distributed uniform isotropic illuminant shining on a small Lambertian surface patch is equivalent to a single isotropic point light source at infinity. A closed-form solution is given for the equivalent point light source direction in terms of the illuminant corner locations. Equivalent point light sources can be obtained for multiple rectangular illuminants allowing standard photometric stereo algorithms to be used. An extension is given to the case of a rectangular planar illuminant with arbitrary radiance distribution. It is shown that a Walsh function approximation to the arbitrary illuminant distribution leads to an efficient computation of the equivalent point light source directions. A search technique employing a solution consistency measure is presented to handle the case of unknown depths. Applications of the theory presented in this paper include visual user-interfaces using shape-from-shading algorithms making use of the illumination from computer monitors, or movie screens.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114373636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current MIDI recording and transmitting technology allows teachers to teach piano playing remotely (or off-line): a teacher plays a MIDI-keyboard at one place and a student observes the played piano keys on another MIDI-keyboard at another place. What this technology does not allow is to see how the piano keys are played, namely: which hand and finger was used to play a key. In this paper we present a video recognition tool that makes it possible to provide this information. A video-camera is mounted on top of the piano keyboard and video recognition techniques are then used to calibrate piano image with MIDI sound, then to detect and track pianist hands and then to annotate the fingers that play the piano. The result of the obtained video annotation of piano playing can then be shown on a computer screen for further perusal by a piano teacher or a student.
{"title":"Detection and tracking of pianist hands and fingers","authors":"D. Gorodnichy, A. Yogeswaran","doi":"10.1109/CRV.2006.26","DOIUrl":"https://doi.org/10.1109/CRV.2006.26","url":null,"abstract":"Current MIDI recording and transmitting technology allows teachers to teach piano playing remotely (or off-line): a teacher plays a MIDI-keyboard at one place and a student observes the played piano keys on another MIDI-keyboard at another place. What this technology does not allow is to see how the piano keys are played, namely: which hand and finger was used to play a key. In this paper we present a video recognition tool that makes it possible to provide this information. A video-camera is mounted on top of the piano keyboard and video recognition techniques are then used to calibrate piano image with MIDI sound, then to detect and track pianist hands and then to annotate the fingers that play the piano. The result of the obtained video annotation of piano playing can then be shown on a computer screen for further perusal by a piano teacher or a student.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124480899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image segmentation is a key low-level vision process for many tasks in image understanding and applications in image processing. The groups of related pixels produced by a segmentation are expected to reveal some information about the objects in a scene. Many equally valid segmentations exist for most images depending on the context of the segmentation, in particular, the levelof- detail. Segmentation hierarchies can represent different level-of-detail simultaneously. In this paper, we employ the Earth Mover’s Distance (EMD) as a segmentation criterion in a hierarchical colour segmentation algorithm. The use of EMD is motivated by its success in content-based image retrieval based on colour signatures. We also develop a novel evaluation method which looks at the stability of segmentation hierarchies in stereo images. This evaluation method allows us to compare the EMD segmentation criterion with variants and with criterions based on mean colour.
{"title":"Evaluation of Colour Image Segmentation Hierarchies","authors":"D. MacDonald, J. Lang, M. McAllister","doi":"10.1109/CRV.2006.31","DOIUrl":"https://doi.org/10.1109/CRV.2006.31","url":null,"abstract":"Image segmentation is a key low-level vision process for many tasks in image understanding and applications in image processing. The groups of related pixels produced by a segmentation are expected to reveal some information about the objects in a scene. Many equally valid segmentations exist for most images depending on the context of the segmentation, in particular, the levelof- detail. Segmentation hierarchies can represent different level-of-detail simultaneously. In this paper, we employ the Earth Mover’s Distance (EMD) as a segmentation criterion in a hierarchical colour segmentation algorithm. The use of EMD is motivated by its success in content-based image retrieval based on colour signatures. We also develop a novel evaluation method which looks at the stability of segmentation hierarchies in stereo images. This evaluation method allows us to compare the EMD segmentation criterion with variants and with criterions based on mean colour.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134035375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most segmentation algorithms are based on the assumption of intensity homogeneity within an object. However, in many applications, the object of interest contains more than one homogenous region. Even when the object’s shape is known, such object is not effectively extracted. In this paper, we propose a segmentation process for the objects containing 2 homogenous regions. Our method is based on the level set method. We construct the shape model from the set of manually extracted objects. The parameters that represent the shape model are coefficients of PCA basis. Instead of defining a new cost-function based on heterogeneity assumption, we repeatedly form a homogenous region inside the evolving curve and evolve the curve by the level set method. Our experiment on medical images indicated that our method effectively segmented object with one and two homogenous regions.
{"title":"Shape-Based Object Segmentation with Simultaneous Intensity Adjustment","authors":"Sarawut Tae-O-Sot, S. Jitapunkul, S. Auethavekiat","doi":"10.1109/CRV.2006.64","DOIUrl":"https://doi.org/10.1109/CRV.2006.64","url":null,"abstract":"Most segmentation algorithms are based on the assumption of intensity homogeneity within an object. However, in many applications, the object of interest contains more than one homogenous region. Even when the object’s shape is known, such object is not effectively extracted. In this paper, we propose a segmentation process for the objects containing 2 homogenous regions. Our method is based on the level set method. We construct the shape model from the set of manually extracted objects. The parameters that represent the shape model are coefficients of PCA basis. Instead of defining a new cost-function based on heterogeneity assumption, we repeatedly form a homogenous region inside the evolving curve and evolve the curve by the level set method. Our experiment on medical images indicated that our method effectively segmented object with one and two homogenous regions.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130695464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Computational models of visual attention result in considerable data compression by eliminating processing on regions likely to be devoid of meaningful content. While saliency maps in static images is indexed on image region (pixels), psychovisual data indicates that in dynamic scenes human attention is object driven and localized motion is a significant determiner of object conspicuity. We have introduced a confidence map, which indicates the uncertainty in the position of the moving objects incorporating the exponential loss of information as we move away from the fovea. We improve the model further using a computational model of visual attention based on perceptual grouping of objects with motion and computation of a motion saliency map based on localized motion conspicuity of the objects. Behaviors exhibited in the system include attentive focus on moving wholes, shifting focus in multiple object motion, focus on objects moving contrary to the majority motion. We also present experimental data contrasting the model with human gaze tracking in a simple visual task.
{"title":"Confidence Based updation of Motion Conspicuity in Dynamic Scenes","authors":"V. Singh, Subhransu Maji, A. Mukerjee","doi":"10.1109/CRV.2006.24","DOIUrl":"https://doi.org/10.1109/CRV.2006.24","url":null,"abstract":"Computational models of visual attention result in considerable data compression by eliminating processing on regions likely to be devoid of meaningful content. While saliency maps in static images is indexed on image region (pixels), psychovisual data indicates that in dynamic scenes human attention is object driven and localized motion is a significant determiner of object conspicuity. We have introduced a confidence map, which indicates the uncertainty in the position of the moving objects incorporating the exponential loss of information as we move away from the fovea. We improve the model further using a computational model of visual attention based on perceptual grouping of objects with motion and computation of a motion saliency map based on localized motion conspicuity of the objects. Behaviors exhibited in the system include attentive focus on moving wholes, shifting focus in multiple object motion, focus on objects moving contrary to the majority motion. We also present experimental data contrasting the model with human gaze tracking in a simple visual task.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132907242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a realistic interpretation of the bluespill problem that affects blue-screen matting in digital image composition. This phenomenon consists in the contamination of the foreground object color by the background color. Based on the notion of mutual illumination introduced by Forsyth and Zisserman [5], we interpret the bluespill phenomenon and propose an extended composition equation explicitly taking this into consideration. Experiments confirm the accuracy of the proposed method. Comparisons are made with the classical composition equation.
{"title":"Toward a Realistic Interpretation of Blue-spill for Blue-screen Matting","authors":"Jonathan Dupont, F. Deschênes","doi":"10.1109/CRV.2006.77","DOIUrl":"https://doi.org/10.1109/CRV.2006.77","url":null,"abstract":"This paper proposes a realistic interpretation of the bluespill problem that affects blue-screen matting in digital image composition. This phenomenon consists in the contamination of the foreground object color by the background color. Based on the notion of mutual illumination introduced by Forsyth and Zisserman [5], we interpret the bluespill phenomenon and propose an extended composition equation explicitly taking this into consideration. Experiments confirm the accuracy of the proposed method. Comparisons are made with the classical composition equation.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133207885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image retrieval methods aim to retrieve relevant images from an image database that are similar to the query image. The ability to effectively retrieve non-alphanumeric data is a complex issue. The problem becomes even more difficult due to the high dimension of the variable space associated with the images. Image classification is a very active and promising research domain in the area of image management and retrieval. In this paper, we propose a new image classification and retrieval scheme that automatically selects the discriminating features. Our method consists of two phases: (i) classification of images on the basis of maximum cross correlation and (ii) retrieval of images from the database against a given query image. The proposed retrieval algorithm recursively searches similar images on the basis of their correlation against a given query image from a set of registered images in the database. The algorithm is very efficient, provided that the mean images of all of the classes are computed and available in advance. The proposed method classifies the images on the basis of maximum correlation so that the images with more similarities and, hence, exhibiting maximum correlation with each other are grouped in the same class and, are retrieved accordingly.
{"title":"Image Classification and Retrieval using Correlation","authors":"Imran Ahmad, M. T. Ibrahim","doi":"10.1109/CRV.2006.40","DOIUrl":"https://doi.org/10.1109/CRV.2006.40","url":null,"abstract":"Image retrieval methods aim to retrieve relevant images from an image database that are similar to the query image. The ability to effectively retrieve non-alphanumeric data is a complex issue. The problem becomes even more difficult due to the high dimension of the variable space associated with the images. Image classification is a very active and promising research domain in the area of image management and retrieval. In this paper, we propose a new image classification and retrieval scheme that automatically selects the discriminating features. Our method consists of two phases: (i) classification of images on the basis of maximum cross correlation and (ii) retrieval of images from the database against a given query image. The proposed retrieval algorithm recursively searches similar images on the basis of their correlation against a given query image from a set of registered images in the database. The algorithm is very efficient, provided that the mean images of all of the classes are computed and available in advance. The proposed method classifies the images on the basis of maximum correlation so that the images with more similarities and, hence, exhibiting maximum correlation with each other are grouped in the same class and, are retrieved accordingly.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114361122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}