Multiview image matching methods typically require feature point correspondences. We propose a novel spatial topology method that represents the space with a set of connected projective invariant features. Typically, isolated features, such as corners, cannot be matched reliably. Hence, limitations are imposed on viewpoint changes, or projective invariant descriptions are needed. The fundamental matrix is discovered using stochastic optimization requiring a large number of features. In contrast, our enhanced feature set models connectivity in space, forming a unique configuration that can be matched with few features and over large viewpoint changes. Our features are derived from edges, their curvatures, and neighborhood relationships. A probabilistic spatial topology graph models the space using these features and a second graph represents the neighborhood relationships. Probabilistic graph matching is used to find feature correspondences. Our results show robust feature detection and an average 80% discovery rate of feature matches.
{"title":"Spatial Topology Graphs for Feature-Minimal Correspondence","authors":"Z. Tauber, Ze-Nian Li, M. S. Drew","doi":"10.1109/CRV.2007.60","DOIUrl":"https://doi.org/10.1109/CRV.2007.60","url":null,"abstract":"Multiview image matching methods typically require feature point correspondences. We propose a novel spatial topology method that represents the space with a set of connected projective invariant features. Typically, isolated features, such as corners, cannot be matched reliably. Hence, limitations are imposed on viewpoint changes, or projective invariant descriptions are needed. The fundamental matrix is discovered using stochastic optimization requiring a large number of features. In contrast, our enhanced feature set models connectivity in space, forming a unique configuration that can be matched with few features and over large viewpoint changes. Our features are derived from edges, their curvatures, and neighborhood relationships. A probabilistic spatial topology graph models the space using these features and a second graph represents the neighborhood relationships. Probabilistic graph matching is used to find feature correspondences. Our results show robust feature detection and an average 80% discovery rate of feature matches.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129216679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Strongly similar subimages contain different views of the same object. In subimage search, the user selects an image region and the retrieval system attempts to find matching subimages in an image database that are strongly similar. Solutions have been proposed using salient features or "interest points" that have associated descriptor vectors. However, searching large image databases by exhaustive comparison of interest point descriptors is not feasible. To solve this problem, we propose a novel off-line indexing scheme based on the most significant bits (MSBs) of these descriptors. On-line search uses this index file to limit the search to interest points whose descriptors have the same MSB value, a process up to three orders of magnitude faster than exhaustive search. It is also incremental, since the index file for a union of a group of images can be created by merging the index files of the individual image groups. The effectiveness of the approach is demonstrated experimentally on a variety of image databases.
{"title":"Efficient indexing for strongly similar subimage retrieval","authors":"G. Roth, W. Scott","doi":"10.1109/CRV.2007.24","DOIUrl":"https://doi.org/10.1109/CRV.2007.24","url":null,"abstract":"Strongly similar subimages contain different views of the same object. In subimage search, the user selects an image region and the retrieval system attempts to find matching subimages in an image database that are strongly similar. Solutions have been proposed using salient features or \"interest points\" that have associated descriptor vectors. However, searching large image databases by exhaustive comparison of interest point descriptors is not feasible. To solve this problem, we propose a novel off-line indexing scheme based on the most significant bits (MSBs) of these descriptors. On-line search uses this index file to limit the search to interest points whose descriptors have the same MSB value, a process up to three orders of magnitude faster than exhaustive search. It is also incremental, since the index file for a union of a group of images can be created by merging the index files of the individual image groups. The effectiveness of the approach is demonstrated experimentally on a variety of image databases.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129392655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Developments in the consumer market have indicated that the average user of a personal computer is likely to also own a webcam. With the emergence of this new user group will come a new set of applications, which will require a user-friendly way to calibrate the position of the camera with respect to the location of the screen. This paper presents a fully automatic method to calibrate a screen-camera setup, using a single moving spherical mirror. Unlike other methods, our algorithm needs no user intervention other then moving around a spherical mirror. In addition, if the user provides the algorithm with the exact radius of the sphere in millimeters, the scale of the computed solution is uniquely defined.
{"title":"Screen-Camera Calibration using a Spherical Mirror","authors":"Yannick Francken, C. Hermans, P. Bekaert","doi":"10.1109/CRV.2007.59","DOIUrl":"https://doi.org/10.1109/CRV.2007.59","url":null,"abstract":"Developments in the consumer market have indicated that the average user of a personal computer is likely to also own a webcam. With the emergence of this new user group will come a new set of applications, which will require a user-friendly way to calibrate the position of the camera with respect to the location of the screen. This paper presents a fully automatic method to calibrate a screen-camera setup, using a single moving spherical mirror. Unlike other methods, our algorithm needs no user intervention other then moving around a spherical mirror. In addition, if the user provides the algorithm with the exact radius of the sphere in millimeters, the scale of the computed solution is uniquely defined.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122678002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Hansen, B. K. Mortensen, P. Duizer, Jens R. Andersen, T. Moeslund
In this paper we present a system for automatic annotation of humans passing a surveillance camera. Each human has 4 associated annotations: the primary color of the clothing, the height, and focus of attention. The annotation occurs after robust background subtraction based on a Codebook representation. The primary colors of the clothing are estimated by grouping similar pixels according to a body model. The height is estimated based on a 3D mapping using the head and feet. Lastly, the focus of attention is defined as the overall direction of the head, which is estimated using changes in intensity at four different positions. Results show successful detection and hence successful annotation for most test sequences.
{"title":"Automatic Annotation of Humans in Surveillance Video","authors":"D. Hansen, B. K. Mortensen, P. Duizer, Jens R. Andersen, T. Moeslund","doi":"10.1109/CRV.2007.12","DOIUrl":"https://doi.org/10.1109/CRV.2007.12","url":null,"abstract":"In this paper we present a system for automatic annotation of humans passing a surveillance camera. Each human has 4 associated annotations: the primary color of the clothing, the height, and focus of attention. The annotation occurs after robust background subtraction based on a Codebook representation. The primary colors of the clothing are estimated by grouping similar pixels according to a body model. The height is estimated based on a 3D mapping using the head and feet. Lastly, the focus of attention is defined as the overall direction of the head, which is estimated using changes in intensity at four different positions. Results show successful detection and hence successful annotation for most test sequences.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122963662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a finite mixture model of generalized Gaussian distributions (GDD) for robust segmentation and data modeling in the presence of noise and outliers. The model has more flexibility to adapt the shape of data and less sensibility for over-fitting the number of classes than the Gaussian mixture. In a first part of the present work, we propose a derivation of the maximum-likelihood estimation of the parameters of the new mixture model and we propose an information-theory based approach for the selection of the number of classes. In a second part, we propose some applications relating to image, motion and foreground segmentation to measure the performance of the new model in image data modeling with comparison to the Gaussian mixture.
{"title":"Finite Generalized Gaussian Mixture Modeling and Applications to Image and Video Foreground Segmentation","authors":"M. S. Allili, N. Bouguila, D. Ziou","doi":"10.1117/1.2898125","DOIUrl":"https://doi.org/10.1117/1.2898125","url":null,"abstract":"In this paper, we propose a finite mixture model of generalized Gaussian distributions (GDD) for robust segmentation and data modeling in the presence of noise and outliers. The model has more flexibility to adapt the shape of data and less sensibility for over-fitting the number of classes than the Gaussian mixture. In a first part of the present work, we propose a derivation of the maximum-likelihood estimation of the parameters of the new mixture model and we propose an information-theory based approach for the selection of the number of classes. In a second part, we propose some applications relating to image, motion and foreground segmentation to measure the performance of the new model in image data modeling with comparison to the Gaussian mixture.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131109101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we introduce the Fourier tag, a synthetic fiducial marker used to visually encode information and provide controllable positioning. The Fourier tag is a synthetic target akin to a bar-code that specifies multi-bit information which can be efficiently and robustly detected in an image. Moreover, the Fourier tag has the beneficial property that the bit string it encodes has variable length as a function of the distance between the camera and the target. This follows from the fact that the effective resolution decreases as an effect of perspective. This paper introduces the Fourier tag, describes its design, and illustrates its properties experimentally.
{"title":"Fourier tags: Smoothly degradable fiducial markers for use in human-robot interaction","authors":"Junaed Sattar, Eric Bourque, P. Giguère, G. Dudek","doi":"10.1109/CRV.2007.34","DOIUrl":"https://doi.org/10.1109/CRV.2007.34","url":null,"abstract":"In this paper we introduce the Fourier tag, a synthetic fiducial marker used to visually encode information and provide controllable positioning. The Fourier tag is a synthetic target akin to a bar-code that specifies multi-bit information which can be efficiently and robustly detected in an image. Moreover, the Fourier tag has the beneficial property that the bit string it encodes has variable length as a function of the distance between the camera and the target. This follows from the fact that the effective resolution decreases as an effect of perspective. This paper introduces the Fourier tag, describes its design, and illustrates its properties experimentally.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132573991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a technique for the construction of a camera sensor model for visual SLAM. The proposed method is an extension of the general camera calibration procedure and requires the camera to observe a planar checkerboard pattern shown at different orientations. By iteratively placing the pattern at different distances from the camera, we can find a relationship between the measurement noise covariance matrix and the range. We conclude that the error distribution of a camera sensor follows a Gaussian distribution, based on the Geary's test, and the magnitude of the error variance is linearly related to the range between the camera and the features being observed. Our sensor model can potentially benefit visual SLAM algorithms by varying its measurement noise covariance matrix with range.
{"title":"Camera Sensor Model for Visual SLAM","authors":"Jing Wu, Hong Zhang","doi":"10.1109/CRV.2007.14","DOIUrl":"https://doi.org/10.1109/CRV.2007.14","url":null,"abstract":"In this paper, we present a technique for the construction of a camera sensor model for visual SLAM. The proposed method is an extension of the general camera calibration procedure and requires the camera to observe a planar checkerboard pattern shown at different orientations. By iteratively placing the pattern at different distances from the camera, we can find a relationship between the measurement noise covariance matrix and the range. We conclude that the error distribution of a camera sensor follows a Gaussian distribution, based on the Geary's test, and the magnitude of the error variance is linearly related to the range between the camera and the features being observed. Our sensor model can potentially benefit visual SLAM algorithms by varying its measurement noise covariance matrix with range.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116006858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An active vision system has to enable the implementation of reactive visual processes in real time. Given a stereoscopic vision system, the vergence angle, together with version and tilt angles, describes uniquely the fixation point in space. We interpret vision and motor control, and in particular we focus on developing and testing of a control strategy that fits the Hering's law, by studying the cooperation of vergence and version movements. The analysis of the simulation results confirmed the advantages of the Hering's law to achieve fast system reactions. We show that real-time active vergence and depth estimation become possible when the estimated disparity is reliable and fast. In this framework, the advantage of a simple and fast phase-based technique for depth estimation that allows real-time stereo processing with sub-pixel resolution is also discussed.
{"title":"Version and vergence control of a stereo camera head by fitting the movement into the Hering's law","authors":"J. Samarawickrama, S. Sabatini","doi":"10.1109/CRV.2007.69","DOIUrl":"https://doi.org/10.1109/CRV.2007.69","url":null,"abstract":"An active vision system has to enable the implementation of reactive visual processes in real time. Given a stereoscopic vision system, the vergence angle, together with version and tilt angles, describes uniquely the fixation point in space. We interpret vision and motor control, and in particular we focus on developing and testing of a control strategy that fits the Hering's law, by studying the cooperation of vergence and version movements. The analysis of the simulation results confirmed the advantages of the Hering's law to achieve fast system reactions. We show that real-time active vergence and depth estimation become possible when the estimated disparity is reliable and fast. In this framework, the advantage of a simple and fast phase-based technique for depth estimation that allows real-time stereo processing with sub-pixel resolution is also discussed.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114328754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces a number of innovative no-reference algorithms to assess the perceived quality of realtime analog and digital television and video streams. A prototype system is developed to locate and measure the impact of three types of impairments that commonly impair television and video signals. Analog sequences are tested for the presence of random noise. In the case of digital signals, two fundamental types of errors are of interest. The first is the blocking artifact that is so pervasive among DCT-based compression schemes such as MPEG. The second category includes errors caused by random changes to the bit stream of a signal. Of the various forms that these distortions may take on, only those that appear as "colored blocks" are detected by this system. Ideas to address the remaining issues are discussed.
{"title":"A Prototype No-Reference Video Quality System","authors":"R. Dosselmann, X. Yang","doi":"10.1109/CRV.2007.6","DOIUrl":"https://doi.org/10.1109/CRV.2007.6","url":null,"abstract":"This paper introduces a number of innovative no-reference algorithms to assess the perceived quality of realtime analog and digital television and video streams. A prototype system is developed to locate and measure the impact of three types of impairments that commonly impair television and video signals. Analog sequences are tested for the presence of random noise. In the case of digital signals, two fundamental types of errors are of interest. The first is the blocking artifact that is so pervasive among DCT-based compression schemes such as MPEG. The second category includes errors caused by random changes to the bit stream of a signal. Of the various forms that these distortions may take on, only those that appear as \"colored blocks\" are detected by this system. Ideas to address the remaining issues are discussed.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"426 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123146796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Lalonde, David Byrns, L. Gagnon, N. Teasdale, D. Laurendeau
This paper reports on the implementation of a GPU-based, real-time eye blink detector on very low contrast images acquired under near-infrared illumination. This detector is part of a multi-sensor data acquisition and analysis system for driver performance assessment and training. Eye blinks are detected inside regions of interest that are aligned with the subject's eyes at initialization. Alignment is maintained through time by tracking SIFT feature points that are used to estimate the affine transformation between the initial face pose and the pose in subsequent frames. The GPU implementation of the SIFT feature point extraction algorithm ensures real-time processing. An eye blink detection rate of 97% is obtained on a video dataset of 33,000 frames showing 237 blinks from 22 subjects.
{"title":"Real-time eye blink detection with GPU-based SIFT tracking","authors":"M. Lalonde, David Byrns, L. Gagnon, N. Teasdale, D. Laurendeau","doi":"10.1109/CRV.2007.54","DOIUrl":"https://doi.org/10.1109/CRV.2007.54","url":null,"abstract":"This paper reports on the implementation of a GPU-based, real-time eye blink detector on very low contrast images acquired under near-infrared illumination. This detector is part of a multi-sensor data acquisition and analysis system for driver performance assessment and training. Eye blinks are detected inside regions of interest that are aligned with the subject's eyes at initialization. Alignment is maintained through time by tracking SIFT feature points that are used to estimate the affine transformation between the initial face pose and the pose in subsequent frames. The GPU implementation of the SIFT feature point extraction algorithm ensures real-time processing. An eye blink detection rate of 97% is obtained on a video dataset of 33,000 frames showing 237 blinks from 22 subjects.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122640448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}