In this paper we describe a technique to infer the topology and connectivity information of a network of cameras based on observed motion in the environment. While the technique can use labels from reliable cameras systems, the algorithm is powerful enough to function using ambiguous tracking data. The method requires no prior knowledge of the relative locations of the cameras and operates under very weak environmental assumptions. Our approach stochastically samples plausible agent trajectories based on a delay model that allows for transitions to and from sources and sinks in the environment. The technique demonstrates considerable robustness both to sensor error and non-trivial patterns of agent motion. The output of the method is a Markov model describing the behavior of agents in the system and the underlying traffic patterns. The concept is demonstrated with simulation data and verified with experiments conducted on a six camera sensor network.
{"title":"Topology inference for a vision-based sensor network","authors":"D. Marinakis, G. Dudek","doi":"10.1109/CRV.2005.81","DOIUrl":"https://doi.org/10.1109/CRV.2005.81","url":null,"abstract":"In this paper we describe a technique to infer the topology and connectivity information of a network of cameras based on observed motion in the environment. While the technique can use labels from reliable cameras systems, the algorithm is powerful enough to function using ambiguous tracking data. The method requires no prior knowledge of the relative locations of the cameras and operates under very weak environmental assumptions. Our approach stochastically samples plausible agent trajectories based on a delay model that allows for transitions to and from sources and sinks in the environment. The technique demonstrates considerable robustness both to sensor error and non-trivial patterns of agent motion. The output of the method is a Markov model describing the behavior of agents in the system and the underlying traffic patterns. The concept is demonstrated with simulation data and verified with experiments conducted on a six camera sensor network.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115615698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present an approach for video surveillance involving (a) moving object detection, (b) tracking and (c) normal/abnormal event recognition. The detection step uses an adaptive background subtraction technique with a shadow elimination model based on the color constancy principle. The target tracking involves a direct and inverse matrix matching process. The novelty of the paper lies mainly in the recognition stage, where we consider local motion properties (flow vector), and more global ones expressed by elliptic Fourier descriptors. From these temporal trajectory characterizations, two Kohonen maps allow to distinguish normal behavior from abnormal or suspicious ones. The classification results show a 94.6 % correct recognition rate with video sequences taken by a low cost webcam. Finally, this algorithm can be fully implemented in real-time.
{"title":"Real-time video surveillance with self-organizing maps","authors":"M. Dahmane, J. Meunier","doi":"10.1109/CRV.2005.65","DOIUrl":"https://doi.org/10.1109/CRV.2005.65","url":null,"abstract":"In this paper, we present an approach for video surveillance involving (a) moving object detection, (b) tracking and (c) normal/abnormal event recognition. The detection step uses an adaptive background subtraction technique with a shadow elimination model based on the color constancy principle. The target tracking involves a direct and inverse matrix matching process. The novelty of the paper lies mainly in the recognition stage, where we consider local motion properties (flow vector), and more global ones expressed by elliptic Fourier descriptors. From these temporal trajectory characterizations, two Kohonen maps allow to distinguish normal behavior from abnormal or suspicious ones. The classification results show a 94.6 % correct recognition rate with video sequences taken by a low cost webcam. Finally, this algorithm can be fully implemented in real-time.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116893423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automatic attention-seeking gesture recognition is an enabling element of synchronous distance learning. Recognizing attention seeking gestures is complicated by the temporal nature of the signal that must be recognized and by the similarity between attention seeking gestures and non-attention seeking gestures. Here we describe two approaches to the recognition problem that utilize HMMs to learn the class of attention seeking gestures. An explicit approach that encodes the temporal nature of the gestures within the HMM, and an implicit approach that augments the input token sequence with temporal markers are presented. Experimental results demonstrate that the explicit approach is more accurate.
{"title":"Recognizing hand-raising gestures using HMM","authors":"M. Hossain, M. Jenkin","doi":"10.1109/CRV.2005.67","DOIUrl":"https://doi.org/10.1109/CRV.2005.67","url":null,"abstract":"Automatic attention-seeking gesture recognition is an enabling element of synchronous distance learning. Recognizing attention seeking gestures is complicated by the temporal nature of the signal that must be recognized and by the similarity between attention seeking gestures and non-attention seeking gestures. Here we describe two approaches to the recognition problem that utilize HMMs to learn the class of attention seeking gestures. An explicit approach that encodes the temporal nature of the gestures within the HMM, and an implicit approach that augments the input token sequence with temporal markers are presented. Experimental results demonstrate that the explicit approach is more accurate.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121090648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the context of human-computer interaction, information about head pose is an important cue for building a statement about humans' focus of attention. In this paper, we present an approach to estimate horizontal head rotation of people inside a smart-room. This room is equipped with multiple cameras that aim to provide at least one facial view of the user at any location in the room. We use neural networks that were trained on samples of rotated heads in order to classify each camera view. Whenever there is more than one estimate of head rotation, we combine the different estimates into one joint hypothesis. We show experimentally, that by using the proposed combination scheme, the mean error for unknown users could be reduced by up to 50% when combining the estimates from multiple cameras.
{"title":"Multi-view head pose estimation using neural networks","authors":"M. Voit, Kai Nickel, R. Stiefelhagen","doi":"10.1109/CRV.2005.55","DOIUrl":"https://doi.org/10.1109/CRV.2005.55","url":null,"abstract":"In the context of human-computer interaction, information about head pose is an important cue for building a statement about humans' focus of attention. In this paper, we present an approach to estimate horizontal head rotation of people inside a smart-room. This room is equipped with multiple cameras that aim to provide at least one facial view of the user at any location in the room. We use neural networks that were trained on samples of rotated heads in order to classify each camera view. Whenever there is more than one estimate of head rotation, we combine the different estimates into one joint hypothesis. We show experimentally, that by using the proposed combination scheme, the mean error for unknown users could be reduced by up to 50% when combining the estimates from multiple cameras.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127382085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Analyzing human gait has become popular in computer vision. So far, however, contributions to this topic almost exclusively considered the problem of person identification. In this paper, we view gait analysis from a different angle and shall examine its use as a means to deduce the physical condition of people. Understanding the detection of unusual movement patterns as a two class problem leads to the idea of using support vector machines for classification. We thus present a homeomorphisms between 2D lattices and binary shapes that provides a robust vector space embedding of body silhouettes. Experimental results underline that feature vectors obtained from this scheme are well suited to detect abnormal gait wavering, faltering, and falling can be detected reliably across individuals without tracking or recognizing limbs or body parts.
{"title":"Detecting abnormal gait","authors":"C. Bauckhage, John K. Tsotsos, F. Bunn","doi":"10.1109/CRV.2005.32","DOIUrl":"https://doi.org/10.1109/CRV.2005.32","url":null,"abstract":"Analyzing human gait has become popular in computer vision. So far, however, contributions to this topic almost exclusively considered the problem of person identification. In this paper, we view gait analysis from a different angle and shall examine its use as a means to deduce the physical condition of people. Understanding the detection of unusual movement patterns as a two class problem leads to the idea of using support vector machines for classification. We thus present a homeomorphisms between 2D lattices and binary shapes that provides a robust vector space embedding of body silhouettes. Experimental results underline that feature vectors obtained from this scheme are well suited to detect abnormal gait wavering, faltering, and falling can be detected reliably across individuals without tracking or recognizing limbs or body parts.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122315049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a computer vision system to assist a human user in the monitoring of their medication habits. This task must be accomplished without the knowledge of any pill locations, as they are too small to track with a static camera, and are usually occluded. At the core of this process is a mixture of low-level, high-level, and heuristic techniques such as skin segmentation, face detection, template matching, and a novel approach to hand localization and occlusion handling. We discuss the approach taken towards this goal, along with the results of our testing phase.
{"title":"A computer vision system for monitoring medication intake","authors":"David Batz, Michael Batz, N. Lobo, M. Shah","doi":"10.1109/CRV.2005.5","DOIUrl":"https://doi.org/10.1109/CRV.2005.5","url":null,"abstract":"We propose a computer vision system to assist a human user in the monitoring of their medication habits. This task must be accomplished without the knowledge of any pill locations, as they are too small to track with a static camera, and are usually occluded. At the core of this process is a mixture of low-level, high-level, and heuristic techniques such as skin segmentation, face detection, template matching, and a novel approach to hand localization and occlusion handling. We discuss the approach taken towards this goal, along with the results of our testing phase.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122480991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We report on four algorithms for recovering dense depth maps from long image sequences, where the camera motion is known a priori. All methods use a Kalman filter to integrate intensity derivatives or optical flow over time to increase accuracy.
{"title":"A quantitative comparison of 4 algorithms for recovering dense accurate depth","authors":"Baozhong Tian, J. Barron","doi":"10.1109/CRV.2005.11","DOIUrl":"https://doi.org/10.1109/CRV.2005.11","url":null,"abstract":"We report on four algorithms for recovering dense depth maps from long image sequences, where the camera motion is known a priori. All methods use a Kalman filter to integrate intensity derivatives or optical flow over time to increase accuracy.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128236281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce a new exact Euclidean distance transform algorithm for binary images based on the linear-time Legendre Transform algorithm. The three-step algorithm uses dimension reduction and convex analysis results on the Legendre-Fenchel transform to achieve linear-time complexity. First, computation on a grid (the image) is reduced to computation on a line, then the convex envelope is computed, and finally the squared Euclidean distance transform is obtained. Examples and an extension to non-binary images are provided.
{"title":"A linear Euclidean distance transform algorithm based on the linear-time Legendre transform","authors":"Y. Lucet","doi":"10.1109/CRV.2005.7","DOIUrl":"https://doi.org/10.1109/CRV.2005.7","url":null,"abstract":"We introduce a new exact Euclidean distance transform algorithm for binary images based on the linear-time Legendre Transform algorithm. The three-step algorithm uses dimension reduction and convex analysis results on the Legendre-Fenchel transform to achieve linear-time complexity. First, computation on a grid (the image) is reduced to computation on a line, then the convex envelope is computed, and finally the squared Euclidean distance transform is obtained. Examples and an extension to non-binary images are provided.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128493134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a novel approach for measuring deformations between image patches. Our algorithm is a variant of dynamic programming that is not inherently one-dimensional, and its scores are on a relative scale. The method is based on the combination of similarities between many overlapping sub-patches. The algorithm is designed to be robust to small deformations of parts at various positions and scales.
{"title":"A hierarchical nonparametric method for capturing nonrigid deformations","authors":"A. Ecker, S. Ullman","doi":"10.1109/CRV.2005.6","DOIUrl":"https://doi.org/10.1109/CRV.2005.6","url":null,"abstract":"We present a novel approach for measuring deformations between image patches. Our algorithm is a variant of dynamic programming that is not inherently one-dimensional, and its scores are on a relative scale. The method is based on the combination of similarities between many overlapping sub-patches. The algorithm is designed to be robust to small deformations of parts at various positions and scales.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116269316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper addresses an approach to scene reconstruction by inferring missing range data in a partial range map based on intensity image and sparse initial range data. It is assumed that the initial known range data is given on a number of scan lines one pixel width. This assumption is natural for a range sensor to acquire range data in a 3D real world environment. Both edge information of the intensity image and linear interpolation of the range data are used. Experiments show that this method gives very good results in inferring missing range data. It outperforms both the previous method and bilinear interpolation when a very small percentage of range data is known.
{"title":"Scene reconstruction with sparse range information","authors":"Guangyi Chen, G. Dudek, L. Torres-Méndez","doi":"10.1109/CRV.2005.70","DOIUrl":"https://doi.org/10.1109/CRV.2005.70","url":null,"abstract":"This paper addresses an approach to scene reconstruction by inferring missing range data in a partial range map based on intensity image and sparse initial range data. It is assumed that the initial known range data is given on a number of scan lines one pixel width. This assumption is natural for a range sensor to acquire range data in a 3D real world environment. Both edge information of the intensity image and linear interpolation of the range data are used. Experiments show that this method gives very good results in inferring missing range data. It outperforms both the previous method and bilinear interpolation when a very small percentage of range data is known.","PeriodicalId":307318,"journal":{"name":"The 2nd Canadian Conference on Computer and Robot Vision (CRV'05)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114716652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}