Realistic rendering in real-time augmented reality applications leads one to consider physical interactions between real and virtual worlds. One of these interactions is mutual occlusions in the rendered viewpoint. This paper presents two approaches for handling occlusions when the real objects can be displaced or deformed. The first approach is model-based. It is suited for a static viewpoint and relies only on a tracked bounding volume model within which the object’s silhouette is carved. The second approach is depthbased and makes it possible to change the viewpoint by exploiting a handheld stereo camera. Both approaches are devised to minimize the effect of real object tracking errors in the rendered viewpoint.
{"title":"Handling Occlusions in Real-time Augmented Reality : Dealing with Movable Real and Virtual Objects","authors":"P. Fortin, P. Hébert","doi":"10.1109/CRV.2006.38","DOIUrl":"https://doi.org/10.1109/CRV.2006.38","url":null,"abstract":"Realistic rendering in real-time augmented reality applications leads one to consider physical interactions between real and virtual worlds. One of these interactions is mutual occlusions in the rendered viewpoint. This paper presents two approaches for handling occlusions when the real objects can be displaced or deformed. The first approach is model-based. It is suited for a static viewpoint and relies only on a tracked bounding volume model within which the object’s silhouette is carved. The second approach is depthbased and makes it possible to change the viewpoint by exploiting a handheld stereo camera. Both approaches are devised to minimize the effect of real object tracking errors in the rendered viewpoint.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122390923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gibbs Random Fields (GRFs), which produce elegant models, but which have very poor computational speed have been widely applied to image segmentation. In contrast to block-based hierarchies usually constructed for GRFs, the irregular region-based approach is a more natural model in segmenting real images. In this paper, we show that the fineto- coarse region-based hierarchical regions framework for the well-known Potts model can be extended to non-edge based interactions. By deliberately oversegmenting at the finer scale, the method proceeds conservatively by avoiding the construction of regions which straddle a region boundary by computing region mean differences. This demonstrates the hierarchical method is able to model region interactions through new generalizations at higher levels in the hierarchy which represent regions. Promising results are presented.
吉布斯随机场(Gibbs Random Fields, GRFs)产生的模型简洁,但其计算速度较差,已被广泛应用于图像分割中。相对于通常为grf构建的基于块的层次结构,基于不规则区域的方法是一种更自然的真实图像分割模型。在本文中,我们证明了著名的Potts模型的基于精细到粗糙区域的分层区域框架可以扩展到非边缘的相互作用。通过在更精细的尺度上故意进行过分割,该方法通过计算区域均值差来避免构建跨越区域边界的区域,从而保守地进行。这证明了分层方法能够通过在表示区域的层次结构的更高级别上的新泛化来建模区域交互。提出了有希望的结果。
{"title":"Hierarchical Region Mean-Based Image Segmentation","authors":"S. Wesolkowski, P. Fieguth","doi":"10.1109/CRV.2006.39","DOIUrl":"https://doi.org/10.1109/CRV.2006.39","url":null,"abstract":"Gibbs Random Fields (GRFs), which produce elegant models, but which have very poor computational speed have been widely applied to image segmentation. In contrast to block-based hierarchies usually constructed for GRFs, the irregular region-based approach is a more natural model in segmenting real images. In this paper, we show that the fineto- coarse region-based hierarchical regions framework for the well-known Potts model can be extended to non-edge based interactions. By deliberately oversegmenting at the finer scale, the method proceeds conservatively by avoiding the construction of regions which straddle a region boundary by computing region mean differences. This demonstrates the hierarchical method is able to model region interactions through new generalizations at higher levels in the hierarchy which represent regions. Promising results are presented.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134345539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a template-based algorithm to track and recognize athlete’s actions in an integrated system using only visual information. Conventional template-based action recognition systems usually consider action recognition and tracking as two independent problems, and solve them separately. In contrast, our algorithm emphasizes that tracking and action recognition can be tightly coupled into a single framework, where tracking assists action recognition and vise versa. Moreover, this paper proposes to represent the athletes by the PCA-HOG descriptor, which can be computed by first transforming the athletes to the grids of Histograms of Oriented Gradient (HOG) descriptor and then project it to a linear subspace by Principal Component Analysis (PCA). The exploitation of the PCA-HOG descriptor not only helps the tracker to be robust under illumination, pose, and view-point changes, but also implicitly centers the figure in the tracking region, which makes action recognition possible. Empirical results in hockey and soccer sequences show the effectiveness of this algorithm.
{"title":"Simultaneous Tracking and Action Recognition using the PCA-HOG Descriptor","authors":"Wei-Lwun Lu, J. Little","doi":"10.1109/CRV.2006.66","DOIUrl":"https://doi.org/10.1109/CRV.2006.66","url":null,"abstract":"This paper presents a template-based algorithm to track and recognize athlete’s actions in an integrated system using only visual information. Conventional template-based action recognition systems usually consider action recognition and tracking as two independent problems, and solve them separately. In contrast, our algorithm emphasizes that tracking and action recognition can be tightly coupled into a single framework, where tracking assists action recognition and vise versa. Moreover, this paper proposes to represent the athletes by the PCA-HOG descriptor, which can be computed by first transforming the athletes to the grids of Histograms of Oriented Gradient (HOG) descriptor and then project it to a linear subspace by Principal Component Analysis (PCA). The exploitation of the PCA-HOG descriptor not only helps the tracker to be robust under illumination, pose, and view-point changes, but also implicitly centers the figure in the tracking region, which makes action recognition possible. Empirical results in hockey and soccer sequences show the effectiveness of this algorithm.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129356215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the problem of vision-based position estimation in urban environments. In particular, we are interested in position estimation from visual cues, but using only limited computational resources. Our particular solution to this problem is based on representing the variability of the "horizon" of the cityscape when seen from within the city; that is, the outlines of the rooftops of adjacent buildings. By encoding the image using only such a one-dimensional contour, we obtain an image encoding that is exceedingly compact. This, in turn, allows us to both efficiently transmit this representation to a remote "recognition engine" as well as allowing for an efficient storage and matching process. We outline our approach and representation, and provide experimental data supporting its feasibility.
{"title":"Urban Position Estimation from One Dimensional Visual Cues","authors":"Derek Johns, G. Dudek","doi":"10.1109/CRV.2006.81","DOIUrl":"https://doi.org/10.1109/CRV.2006.81","url":null,"abstract":"We consider the problem of vision-based position estimation in urban environments. In particular, we are interested in position estimation from visual cues, but using only limited computational resources. Our particular solution to this problem is based on representing the variability of the \"horizon\" of the cityscape when seen from within the city; that is, the outlines of the rooftops of adjacent buildings. By encoding the image using only such a one-dimensional contour, we obtain an image encoding that is exceedingly compact. This, in turn, allows us to both efficiently transmit this representation to a remote \"recognition engine\" as well as allowing for an efficient storage and matching process. We outline our approach and representation, and provide experimental data supporting its feasibility.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130872701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Intelligent Transportation Systems need methods to automatically monitor the road traffic, and especially track vehicles. Most research has concentrated on highways. Traffic in intersections is more variable, with multiple entrance and exit regions. This paper describes an extension to intersections of the feature-tracking algorithm described in [1]. Vehicle features are rarely tracked from their entrance in the field of view to their exit. Our algorithm can accommodate the problem caused by the disruption of feature tracks. It is evaluated on video sequences recorded on four different intersections.
{"title":"A feature-based tracking algorithm for vehicles in intersections","authors":"N. Saunier, T. Sayed","doi":"10.1109/CRV.2006.3","DOIUrl":"https://doi.org/10.1109/CRV.2006.3","url":null,"abstract":"Intelligent Transportation Systems need methods to automatically monitor the road traffic, and especially track vehicles. Most research has concentrated on highways. Traffic in intersections is more variable, with multiple entrance and exit regions. This paper describes an extension to intersections of the feature-tracking algorithm described in [1]. Vehicle features are rarely tracked from their entrance in the field of view to their exit. Our algorithm can accommodate the problem caused by the disruption of feature tracks. It is evaluated on video sequences recorded on four different intersections.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130221703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Face recognition is one of the most intensively studied topics in computer vision and pattern recognition. Facial expression, which changes face geometry, usually has an adverse effect on the performance of a face recognition system. On the other hand, face geometry is a useful cue for recognition. Taking these into account, we utilize the idea of separating geometry and texture information in a face image and model the two types of information by projecting them into separate PCA spaces which are specially designed to capture the distinctive features among different individuals. Subsequently, the texture and geometry attributes are re-combined to form a classifier which is capable of recognizing faces with different expressions. Finally, by studying face geometry, we are able to determine which type of facial expression has been carried out, thus build an expression classifier. Numerical validations of the proposed method are given.
{"title":"Expression-Invariant Face Recognition with Expression Classification","authors":"Xiaoxing Li, Greg Mori, Hao Zhang","doi":"10.1109/CRV.2006.34","DOIUrl":"https://doi.org/10.1109/CRV.2006.34","url":null,"abstract":"Face recognition is one of the most intensively studied topics in computer vision and pattern recognition. Facial expression, which changes face geometry, usually has an adverse effect on the performance of a face recognition system. On the other hand, face geometry is a useful cue for recognition. Taking these into account, we utilize the idea of separating geometry and texture information in a face image and model the two types of information by projecting them into separate PCA spaces which are specially designed to capture the distinctive features among different individuals. Subsequently, the texture and geometry attributes are re-combined to form a classifier which is capable of recognizing faces with different expressions. Finally, by studying face geometry, we are able to determine which type of facial expression has been carried out, thus build an expression classifier. Numerical validations of the proposed method are given.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115928207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Shahid, G. Okouneva, D. McTavish, J. Karpynczyk
This paper presents and demonstrates an automated generic approach to improving the accuracy and stability of iterative pose estimation in computer vision applications. The class of problem involves the use of calibrated CCD camera video imagery to compute the pose of a slowly moving object based on an arrangement of visual targets on the surface of the object. The basis of stereo-vision algorithms is to minimize a re-projection error cost function. The proposed method estimates the optimal target locations within the area of interest. The optimal target configuration delivers the minimal condition number of the linear system associated with the iterative algorithm. The method is demonstrated for the case when targets are located within a 3D domain. Two pose estimation algorithms are compared: single camera and two-camera algorithms. A better accuracy in pose estimation can be achieved with a single camera algorithm with optimized target locations. Also, this method can be applied to perform optimization of target locations attached to a 2D surface.
{"title":"Stability Improvement of Vision Algorithms","authors":"K. Shahid, G. Okouneva, D. McTavish, J. Karpynczyk","doi":"10.1109/CRV.2006.69","DOIUrl":"https://doi.org/10.1109/CRV.2006.69","url":null,"abstract":"This paper presents and demonstrates an automated generic approach to improving the accuracy and stability of iterative pose estimation in computer vision applications. The class of problem involves the use of calibrated CCD camera video imagery to compute the pose of a slowly moving object based on an arrangement of visual targets on the surface of the object. The basis of stereo-vision algorithms is to minimize a re-projection error cost function. The proposed method estimates the optimal target locations within the area of interest. The optimal target configuration delivers the minimal condition number of the linear system associated with the iterative algorithm. The method is demonstrated for the case when targets are located within a 3D domain. Two pose estimation algorithms are compared: single camera and two-camera algorithms. A better accuracy in pose estimation can be achieved with a single camera algorithm with optimized target locations. Also, this method can be applied to perform optimization of target locations attached to a 2D surface.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116630435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image panoramas are of importance for virtual navigation in remote or synthetic environments. To process these panoramas, different representations have been proposed; this paper presents a study of cubic panoramas. Standard projective geometry concepts are adapted to cubic panoramas to derive the notions of fundamental matrix, essential matrix and the equivalent of stereo rectification. Methods and results are presented which could be very helpful in obtaining solutions to disparity estimation, pose estimation and view interpolation problems in the context of cubic panoramas.
{"title":"Epipolar Geometry for the Rectification of Cubic Panoramas","authors":"Florian Kangni, R. Laganière","doi":"10.1109/CRV.2006.29","DOIUrl":"https://doi.org/10.1109/CRV.2006.29","url":null,"abstract":"Image panoramas are of importance for virtual navigation in remote or synthetic environments. To process these panoramas, different representations have been proposed; this paper presents a study of cubic panoramas. Standard projective geometry concepts are adapted to cubic panoramas to derive the notions of fundamental matrix, essential matrix and the equivalent of stereo rectification. Methods and results are presented which could be very helpful in obtaining solutions to disparity estimation, pose estimation and view interpolation problems in the context of cubic panoramas.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128596801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study is an investigation of the application of ant colony optimization to image thresholding. This paper presents an approach where one ant is assigned to each pixel of an image and then moves around the image seeking low grayscale regions. Experimental results demonstrate that the proposed ant-based method performs better than other two established thresholding algorithms. Further work must be conducted to optimize the algorithm parameters, improve the analysis of the pheromone data and reduce computation time. However, the study indicates that an ant-based approach has the potential of becoming an established image thresholding technique.
{"title":"Image Thresholding Using Ant Colony Optimization","authors":"Alice R. Malisia, H. Tizhoosh","doi":"10.1109/CRV.2006.42","DOIUrl":"https://doi.org/10.1109/CRV.2006.42","url":null,"abstract":"This study is an investigation of the application of ant colony optimization to image thresholding. This paper presents an approach where one ant is assigned to each pixel of an image and then moves around the image seeking low grayscale regions. Experimental results demonstrate that the proposed ant-based method performs better than other two established thresholding algorithms. Further work must be conducted to optimize the algorithm parameters, improve the analysis of the pheromone data and reduce computation time. However, the study indicates that an ant-based approach has the potential of becoming an established image thresholding technique.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134216572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, we intend to go one step further to overcome the difficulty that lies in the gap between low-level media features (e.g. colors, texture, motion, etc.) and high-level concepts to perform a reliable content-based indexing and retrieval. More especially, our work proposes a new way to establish a connection between both geometric and radiometric deformations and the characterization of them in terms of camera operations. Based on both the apparent motion and the defocus blur (low-level features), we estimate extrinsic and intrinsic camera parameter changes, and then deduce 3D camera operations (i.e. mid-level features), such as panning/tracking, tilting/booming, zooming/ dollying and rolling, as well as focus changes. Finally, camera operations are recorded into an index which is then used for video retrieval. Experiments confirm that the proposed mid-level features can be accurately deduced from low-level features and that they can be used for indexing and retrieval purpose.
{"title":"Interpreting Camera Operations in the Context of Content-based Video Indexing and Retrieval","authors":"Wei Pan, F. Deschênes","doi":"10.1109/CRV.2006.44","DOIUrl":"https://doi.org/10.1109/CRV.2006.44","url":null,"abstract":"In this work, we intend to go one step further to overcome the difficulty that lies in the gap between low-level media features (e.g. colors, texture, motion, etc.) and high-level concepts to perform a reliable content-based indexing and retrieval. More especially, our work proposes a new way to establish a connection between both geometric and radiometric deformations and the characterization of them in terms of camera operations. Based on both the apparent motion and the defocus blur (low-level features), we estimate extrinsic and intrinsic camera parameter changes, and then deduce 3D camera operations (i.e. mid-level features), such as panning/tracking, tilting/booming, zooming/ dollying and rolling, as well as focus changes. Finally, camera operations are recorded into an index which is then used for video retrieval. Experiments confirm that the proposed mid-level features can be accurately deduced from low-level features and that they can be used for indexing and retrieval purpose.","PeriodicalId":369170,"journal":{"name":"The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121301536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}