The way images are decomposed and represented biases how well subsequent object learning and recognition methods will perform. We choose to initially represent the images by sets of local distinctive regions and their description vectors. We evaluate the problems of distinctive region detection and description in two separate stages, by first reviewing some of the state-of-the-art methods, and then discussing the methods we propose to use for object category recognition. In comparing the performance of our region detection-description technique for scale and rotation invariance with the performance of the other detection-description techniques, we find that our approach provides better results than existing methods, in the context of object category recognition. The evaluation consists of clustering similar descriptor regions and computing (1) the number of single measure clusters (measures intra-class sensitivity), (2) cluster precision clusters (measures how clusters are shared between different classes) and (3) the generalizability property of regions (measures matching to classes). Our technique, which is a variant on the Kadir-Brady saliency detector scored better and not worse than all the other methods evaluated.
{"title":"Region detection and description for Object Category Recognition","authors":"E. F. Ersi, J. Zelek","doi":"10.1109/CRV.2007.55","DOIUrl":"https://doi.org/10.1109/CRV.2007.55","url":null,"abstract":"The way images are decomposed and represented biases how well subsequent object learning and recognition methods will perform. We choose to initially represent the images by sets of local distinctive regions and their description vectors. We evaluate the problems of distinctive region detection and description in two separate stages, by first reviewing some of the state-of-the-art methods, and then discussing the methods we propose to use for object category recognition. In comparing the performance of our region detection-description technique for scale and rotation invariance with the performance of the other detection-description techniques, we find that our approach provides better results than existing methods, in the context of object category recognition. The evaluation consists of clustering similar descriptor regions and computing (1) the number of single measure clusters (measures intra-class sensitivity), (2) cluster precision clusters (measures how clusters are shared between different classes) and (3) the generalizability property of regions (measures matching to classes). Our technique, which is a variant on the Kadir-Brady saliency detector scored better and not worse than all the other methods evaluated.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130250654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The use of localized principal component analysis is examined for visual position determination in the presence of varying degrees of occlusions. Occlusions lead to substantial position measurement errors when projecting images into eigenspace. One way to improve robustness to occlusions is to select small sub-windows so that if some sub-windows are occluded, others can still accurately identify position. The location of candidate sub-windows are predetermined from a set of training images by subtracting the average image from each and then selecting regions using an attention operator. Since attention operators can be computationally time-intensive, the location of all sub-windows are determined a-priori during the training phase. The sub-windows in each of the training images are then projected into eigenspace. Once the training phase is complete, the run-time execution can be performed efficiently since all the sub-windows have been preselected. Input images are classified by each sub-window; majority voting is then used to determine the position estimate. Various experiments are performed including linear and rotational motion, and the ego motion of a mobile robot. This technique is shown to provide greater position measurement accuracy in the presence of severe occlusions as compared to the projection of entire images.
{"title":"Robust Subspace Position Measurement Using Localized Sub-Windows","authors":"A. Smit, D. Schuurman","doi":"10.1109/CRV.2007.57","DOIUrl":"https://doi.org/10.1109/CRV.2007.57","url":null,"abstract":"The use of localized principal component analysis is examined for visual position determination in the presence of varying degrees of occlusions. Occlusions lead to substantial position measurement errors when projecting images into eigenspace. One way to improve robustness to occlusions is to select small sub-windows so that if some sub-windows are occluded, others can still accurately identify position. The location of candidate sub-windows are predetermined from a set of training images by subtracting the average image from each and then selecting regions using an attention operator. Since attention operators can be computationally time-intensive, the location of all sub-windows are determined a-priori during the training phase. The sub-windows in each of the training images are then projected into eigenspace. Once the training phase is complete, the run-time execution can be performed efficiently since all the sub-windows have been preselected. Input images are classified by each sub-window; majority voting is then used to determine the position estimate. Various experiments are performed including linear and rotational motion, and the ego motion of a mobile robot. This technique is shown to provide greater position measurement accuracy in the presence of severe occlusions as compared to the projection of entire images.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122149470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces a number of innovative no-reference algorithms to assess the perceived quality of realtime analog and digital television and video streams. A prototype system is developed to locate and measure the impact of three types of impairments that commonly impair television and video signals. Analog sequences are tested for the presence of random noise. In the case of digital signals, two fundamental types of errors are of interest. The first is the blocking artifact that is so pervasive among DCT-based compression schemes such as MPEG. The second category includes errors caused by random changes to the bit stream of a signal. Of the various forms that these distortions may take on, only those that appear as "colored blocks" are detected by this system. Ideas to address the remaining issues are discussed.
{"title":"A Prototype No-Reference Video Quality System","authors":"R. Dosselmann, X. Yang","doi":"10.1109/CRV.2007.6","DOIUrl":"https://doi.org/10.1109/CRV.2007.6","url":null,"abstract":"This paper introduces a number of innovative no-reference algorithms to assess the perceived quality of realtime analog and digital television and video streams. A prototype system is developed to locate and measure the impact of three types of impairments that commonly impair television and video signals. Analog sequences are tested for the presence of random noise. In the case of digital signals, two fundamental types of errors are of interest. The first is the blocking artifact that is so pervasive among DCT-based compression schemes such as MPEG. The second category includes errors caused by random changes to the bit stream of a signal. Of the various forms that these distortions may take on, only those that appear as \"colored blocks\" are detected by this system. Ideas to address the remaining issues are discussed.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"426 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123146796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes an approach to compute view normalized body part trajectories of pedestrians from monocular video sequences. The proposed approach first extracts the 2D trajectories of both feet and of the head from tracked silhouettes. On that basis, it segments the walking trajectory into piecewise linear segments. Finally, a normalization process is applied to head and feet trajectories over each obtained straight walking segment. View normalization makes head and feet trajectories appear as if seen from a fronto-parallel viewpoint. The latter is assumed to be optimal for gait modeling and recognition purposes. The proposed approach is fully automatic as it requires neither manual initialization nor camera calibration.
{"title":"Computing View-normalized Body Parts Trajectories","authors":"F. Jean, R. Bergevin, A. Albu","doi":"10.1109/CRV.2007.19","DOIUrl":"https://doi.org/10.1109/CRV.2007.19","url":null,"abstract":"This paper proposes an approach to compute view normalized body part trajectories of pedestrians from monocular video sequences. The proposed approach first extracts the 2D trajectories of both feet and of the head from tracked silhouettes. On that basis, it segments the walking trajectory into piecewise linear segments. Finally, a normalization process is applied to head and feet trajectories over each obtained straight walking segment. View normalization makes head and feet trajectories appear as if seen from a fronto-parallel viewpoint. The latter is assumed to be optimal for gait modeling and recognition purposes. The proposed approach is fully automatic as it requires neither manual initialization nor camera calibration.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127534800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we propose two methods for minimizing objective functions of discrete functions with continuous value domain. Many practical problems in the area of computer vision are continuous-valued, and discrete optimization methods of graph-cut type cannot be applied directly. This is different with the proposed methods. The first method is an add-on for multiple-label graph-cut. In the second one, binary graph-cut is firstly used to generate regions of support within different ranges of the signal. Secondly, a robust error minimization is approximated based on the previously determined regions. The advantages and properties of the new approaches are explained and visualized using synthetic test data. The methods are compared to ordinary multi-label graph-cut and robust smoothing for the application of disparity estimation. They show better quality of results compared to the other approaches and the second algorithm is significantly faster than multi-label graph-cut.
{"title":"Extending Graph-Cut to Continuous Value Domain Minimization","authors":"M. Felsberg","doi":"10.1109/CRV.2007.29","DOIUrl":"https://doi.org/10.1109/CRV.2007.29","url":null,"abstract":"In this paper we propose two methods for minimizing objective functions of discrete functions with continuous value domain. Many practical problems in the area of computer vision are continuous-valued, and discrete optimization methods of graph-cut type cannot be applied directly. This is different with the proposed methods. The first method is an add-on for multiple-label graph-cut. In the second one, binary graph-cut is firstly used to generate regions of support within different ranges of the signal. Secondly, a robust error minimization is approximated based on the previously determined regions. The advantages and properties of the new approaches are explained and visualized using synthetic test data. The methods are compared to ordinary multi-label graph-cut and robust smoothing for the application of disparity estimation. They show better quality of results compared to the other approaches and the second algorithm is significantly faster than multi-label graph-cut.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"43 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132879877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Model based tracking is one key component of many systems today, e.g. within video surveillance or human computer interfaces (HCI). Our approach consists of a combination of particle filters (PFs) and active appearance models (AAMs): the PFAAM. It combines the robustness of PFs with the precision of AAMs. Experimental results are given. PFAAM shows superior perfomance compared to both standard AAMs and PFs using AAMs as cues only, i.e. without using a local optimization loop.
{"title":"PFAAM An Active Appearance Model based Particle Filter for both Robust and Precise Tracking","authors":"S. Fleck, M. Hoffmann, K. Hunter, A. Schilling","doi":"10.1109/CRV.2007.50","DOIUrl":"https://doi.org/10.1109/CRV.2007.50","url":null,"abstract":"Model based tracking is one key component of many systems today, e.g. within video surveillance or human computer interfaces (HCI). Our approach consists of a combination of particle filters (PFs) and active appearance models (AAMs): the PFAAM. It combines the robustness of PFs with the precision of AAMs. Experimental results are given. PFAAM shows superior perfomance compared to both standard AAMs and PFs using AAMs as cues only, i.e. without using a local optimization loop.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133635120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a robust video foreground modeling by using a finite mixture model of generalized Gaussian distributions (GDD). The model has a flexibility to model the video background in the presence of sudden illumination changes and shadows, allowing for an efficient foreground segmentation. In a first part of the present work, we propose a derivation of the online estimation of the parameters of the mixture of GDDS and we propose a Bayesian approach for the selection of the number of classes. In a second part, we show experiments of video foreground segmentation demonstrating the performance of the proposed model.
{"title":"A Robust Video Foreground Segmentation by Using Generalized Gaussian Mixture Modeling","authors":"M. S. Allili, N. Bouguila, D. Ziou","doi":"10.1109/CRV.2007.7","DOIUrl":"https://doi.org/10.1109/CRV.2007.7","url":null,"abstract":"In this paper, we propose a robust video foreground modeling by using a finite mixture model of generalized Gaussian distributions (GDD). The model has a flexibility to model the video background in the presence of sudden illumination changes and shadows, allowing for an efficient foreground segmentation. In a first part of the present work, we propose a derivation of the online estimation of the parameters of the mixture of GDDS and we propose a Bayesian approach for the selection of the number of classes. In a second part, we show experiments of video foreground segmentation demonstrating the performance of the proposed model.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131358917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a new class of reconstruction algorithms that are basically different from the traditional approaches. We deviate from the traditional technique which treats the pixels of the image as point samples. In this work, the pixels are treated as rectangular surface samples. It is in conformity with image formation process, in particular for CCD/CMOS sensors, which are a matrix of rectangular surfaces sensitive to the light. We show that results of better quality in terms of the measurements employed are obtained by formulating the reconstruction as a two-stage process: the restoration of image followed by the application of the point spread function (PSF) of the imaging sensor. By coupling the PSF with the reconstruction process, we satisfy a measure of accuracy that is based on the physical limitations of the sensor. Effective techniques for the restoration of image are derived to invert the effects of the PSF and estimate the original image. For the algorithm of restoration, we introduce a new method of interpolation implying a sequence of images, not necessarily a temporal sequence, shifted compared to an image of reference.
{"title":"Super-resolution based on interpolation and global sub pixel translation","authors":"Kamel Mecheri, D. Ziou, F. Deschênes","doi":"10.1109/CRV.2007.62","DOIUrl":"https://doi.org/10.1109/CRV.2007.62","url":null,"abstract":"In this paper we present a new class of reconstruction algorithms that are basically different from the traditional approaches. We deviate from the traditional technique which treats the pixels of the image as point samples. In this work, the pixels are treated as rectangular surface samples. It is in conformity with image formation process, in particular for CCD/CMOS sensors, which are a matrix of rectangular surfaces sensitive to the light. We show that results of better quality in terms of the measurements employed are obtained by formulating the reconstruction as a two-stage process: the restoration of image followed by the application of the point spread function (PSF) of the imaging sensor. By coupling the PSF with the reconstruction process, we satisfy a measure of accuracy that is based on the physical limitations of the sensor. Effective techniques for the restoration of image are derived to invert the effects of the PSF and estimate the original image. For the algorithm of restoration, we introduce a new method of interpolation implying a sequence of images, not necessarily a temporal sequence, shifted compared to an image of reference.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133530429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identification and recognition of objects in digital images is a fundamental task in robotic vision. Here we propose an approach based on clustering of feature extracted from HSV color space and depth, using a hierarchical self organizing map (HSOM). Binocular images are first preprocessed using a watershed algorithm; adjacent regions are then merged based on HSV similarities. For each region we compute a six element features vector: median depth (computed as disparity), median H, S, V values, and the X and Y coordinates of its centroid. These are the input to the HSOM network which is allowed to learn on the first image of a sequence. The trained network is then used to segment other images of the same scene. If, on the new image, the same neuron responds to regions that belong to the same object, the object is considered as recognized. The technique achieves good results, recognizing up to 82% of the objects.
{"title":"Identification and Recognition of Objects in Color Stereo Images Using a Hierachial SOM","authors":"G. Bertolini, S. Ramat","doi":"10.1109/CRV.2007.39","DOIUrl":"https://doi.org/10.1109/CRV.2007.39","url":null,"abstract":"Identification and recognition of objects in digital images is a fundamental task in robotic vision. Here we propose an approach based on clustering of feature extracted from HSV color space and depth, using a hierarchical self organizing map (HSOM). Binocular images are first preprocessed using a watershed algorithm; adjacent regions are then merged based on HSV similarities. For each region we compute a six element features vector: median depth (computed as disparity), median H, S, V values, and the X and Y coordinates of its centroid. These are the input to the HSOM network which is allowed to learn on the first image of a sequence. The trained network is then used to segment other images of the same scene. If, on the new image, the same neuron responds to regions that belong to the same object, the object is considered as recognized. The technique achieves good results, recognizing up to 82% of the objects.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114115237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a new controlled lighting apparatus which uses a raster display device as a light source. The setup has the advantage over other alternatives in that it is relatively inexpensive and uses commonly available components. The apparatus is studied through application to shape recovery using photometric stereo. Experiments on synthetic and real images demonstrate how the depth map of an object can be recovered using only a camera and a computer monitor.
{"title":"Using a Raster Display for Photometric Stereo","authors":"N. Funk, Herbert Yang","doi":"10.1109/CRV.2007.66","DOIUrl":"https://doi.org/10.1109/CRV.2007.66","url":null,"abstract":"This paper presents a new controlled lighting apparatus which uses a raster display device as a light source. The setup has the advantage over other alternatives in that it is relatively inexpensive and uses commonly available components. The apparatus is studied through application to shape recovery using photometric stereo. Experiments on synthetic and real images demonstrate how the depth map of an object can be recovered using only a camera and a computer monitor.","PeriodicalId":304254,"journal":{"name":"Fourth Canadian Conference on Computer and Robot Vision (CRV '07)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128947996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}