Pub Date : 2005-10-15DOI: 10.1109/VSPETS.2005.1570900
Li-Qun Xu, P. Puig
Static and dynamic occlusions due to stationary scene structures and/or interactions between moving objects are a major concern in tracking multiple objects in dynamic and cluttered visual scenes. We propose a hybrid blob- and appearance-based analysis framework as a solution to the problem, exploiting the strength of both. The core of this framework is an effective probabilistic appearance based technique for complex occlusions handling. We introduce in the conventional likelihood function a novel 'spatial-depth affinity metric' (SDAM), which utilises information of both spatial locations of pixels and dynamic depth ordering of the component objects forming a group, to improve object segmentation during occlusions. Depth ordering estimation is achieved through a combination of top-down and bottom-up approach. Experiments on some real-world difficult scenarios of low resolution and highly compressed videos demonstrate the very promising results achieved.
{"title":"A hybrid blob- and appearance-based framework for multi-object tracking through complex occlusions","authors":"Li-Qun Xu, P. Puig","doi":"10.1109/VSPETS.2005.1570900","DOIUrl":"https://doi.org/10.1109/VSPETS.2005.1570900","url":null,"abstract":"Static and dynamic occlusions due to stationary scene structures and/or interactions between moving objects are a major concern in tracking multiple objects in dynamic and cluttered visual scenes. We propose a hybrid blob- and appearance-based analysis framework as a solution to the problem, exploiting the strength of both. The core of this framework is an effective probabilistic appearance based technique for complex occlusions handling. We introduce in the conventional likelihood function a novel 'spatial-depth affinity metric' (SDAM), which utilises information of both spatial locations of pixels and dynamic depth ordering of the component objects forming a group, to improve object segmentation during occlusions. Depth ordering estimation is achieved through a combination of top-down and bottom-up approach. Experiments on some real-world difficult scenarios of low resolution and highly compressed videos demonstrate the very promising results achieved.","PeriodicalId":435841,"journal":{"name":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","volume":"542 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129997371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-10-15DOI: 10.1109/VSPETS.2005.1570929
John-Paul Renno, Dimitrios Makris, T. Ellis, Graeme A. Jones
The problem of colour constancy in the context of visual surveillance applications is addressed in this paper. We seek to reduce the variability of the surface colours inherent in the video of most indoor and outdoor surveillance scenarios to improve the robustness and reliability of applications which depend on reliable colour descriptions e.g. content retrieval. Two well-known colour constancy algorithms - the Grey-world and Gamut-mapping - are applied to frame sequences containing significant variations in the colour temperature of the illuminant. We also consider the problem of automatically selecting a reference image, representative of the scene under the canonical illuminant. A quantitative evaluation of the performance of the colour constancy algorithms is undertaken
{"title":"Application and Evaluation of Colour Constancy in Visual Surveillance","authors":"John-Paul Renno, Dimitrios Makris, T. Ellis, Graeme A. Jones","doi":"10.1109/VSPETS.2005.1570929","DOIUrl":"https://doi.org/10.1109/VSPETS.2005.1570929","url":null,"abstract":"The problem of colour constancy in the context of visual surveillance applications is addressed in this paper. We seek to reduce the variability of the surface colours inherent in the video of most indoor and outdoor surveillance scenarios to improve the robustness and reliability of applications which depend on reliable colour descriptions e.g. content retrieval. Two well-known colour constancy algorithms - the Grey-world and Gamut-mapping - are applied to frame sequences containing significant variations in the colour temperature of the illuminant. We also consider the problem of automatically selecting a reference image, representative of the scene under the canonical illuminant. A quantitative evaluation of the performance of the colour constancy algorithms is undertaken","PeriodicalId":435841,"journal":{"name":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131847764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-10-15DOI: 10.1109/VSPETS.2005.1570906
F. Dornaika, A. Sappa
The ability to detect and track human heads and faces in video sequences is useful in a great number of applications. In this paper, we present our recent 3D face tracker that combines online appearance models with an image registration technique. This monocular tracker runs in real-time and is drift insensitive. We introduce a scheme that takes into account the orientation of local facial regions into the registration technique. Moreover, we introduce a general framework for evaluating the developed appearance-based tracker. Precision and usability of the tracker are assessed using stereo-based range facial data from which ground truth 3D motions are inferred. This evaluation quantifies the monocular tracker accuracy, and identifies its working range in 3D space.
{"title":"Appearance-based 3D face tracker: an evaluation study","authors":"F. Dornaika, A. Sappa","doi":"10.1109/VSPETS.2005.1570906","DOIUrl":"https://doi.org/10.1109/VSPETS.2005.1570906","url":null,"abstract":"The ability to detect and track human heads and faces in video sequences is useful in a great number of applications. In this paper, we present our recent 3D face tracker that combines online appearance models with an image registration technique. This monocular tracker runs in real-time and is drift insensitive. We introduce a scheme that takes into account the orientation of local facial regions into the registration technique. Moreover, we introduce a general framework for evaluating the developed appearance-based tracker. Precision and usability of the tracker are assessed using stereo-based range facial data from which ground truth 3D motions are inferred. This evaluation quantifies the monocular tracker accuracy, and identifies its working range in 3D space.","PeriodicalId":435841,"journal":{"name":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131929224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-10-15DOI: 10.1109/VSPETS.2005.1570921
Kris Kitani, Y. Sato, A. Sugimoto
From the viewpoint of an intelligent video surveillance system, the high-level recognition of human activity requires a priori hierarchical domain knowledge as well as a means of reasoning based on that knowledge. We approach the problem of human activity recognition based on the understanding that activities are hierarchical, temporally constrained and temporally overlapped. While stochastic grammars and graphical models have been widely used for the recognition of human activity, methods combining hierarchy and complex queries have been limited. We propose a new method of merging and implementing the advantages of both approaches to recognize activities in real-time. To address the hierarchical nature of human activity recognition, we implement a hierarchical Bayesian network (HBN) based on a stochastic context-free grammar (SCFG). The HBN is applied to digressive substrings of the current string of evidence via deleted interpolation (DI) to calculate the probability distribution of overlapped activities in the current string. Preliminary results from the analysis of activity sequences from a video surveillance camera show the validity of our approach.
{"title":"Deleted Interpolation Using a Hierarchical Bayesian Grammar Network for Recognizing Human Activity","authors":"Kris Kitani, Y. Sato, A. Sugimoto","doi":"10.1109/VSPETS.2005.1570921","DOIUrl":"https://doi.org/10.1109/VSPETS.2005.1570921","url":null,"abstract":"From the viewpoint of an intelligent video surveillance system, the high-level recognition of human activity requires a priori hierarchical domain knowledge as well as a means of reasoning based on that knowledge. We approach the problem of human activity recognition based on the understanding that activities are hierarchical, temporally constrained and temporally overlapped. While stochastic grammars and graphical models have been widely used for the recognition of human activity, methods combining hierarchy and complex queries have been limited. We propose a new method of merging and implementing the advantages of both approaches to recognize activities in real-time. To address the hierarchical nature of human activity recognition, we implement a hierarchical Bayesian network (HBN) based on a stochastic context-free grammar (SCFG). The HBN is applied to digressive substrings of the current string of evidence via deleted interpolation (DI) to calculate the probability distribution of overlapped activities in the current string. Preliminary results from the analysis of activity sequences from a video surveillance camera show the validity of our approach.","PeriodicalId":435841,"journal":{"name":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121645904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-10-15DOI: 10.1109/VSPETS.2005.1570899
Piotr Dollár, V. Rabaud, G. Cottrell, Serge J. Belongie
A common trend in object recognition is to detect and leverage the use of sparse, informative feature points. The use of such features makes the problem more manageable while providing increased robustness to noise and pose variation. In this work we develop an extension of these ideas to the spatio-temporal case. For this purpose, we show that the direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and we propose an alternative. Anchoring off of these interest points, we devise a recognition algorithm based on spatio-temporally windowed data. We present recognition results on a variety of datasets including both human and rodent behavior.
{"title":"Behavior recognition via sparse spatio-temporal features","authors":"Piotr Dollár, V. Rabaud, G. Cottrell, Serge J. Belongie","doi":"10.1109/VSPETS.2005.1570899","DOIUrl":"https://doi.org/10.1109/VSPETS.2005.1570899","url":null,"abstract":"A common trend in object recognition is to detect and leverage the use of sparse, informative feature points. The use of such features makes the problem more manageable while providing increased robustness to noise and pose variation. In this work we develop an extension of these ideas to the spatio-temporal case. For this purpose, we show that the direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and we propose an alternative. Anchoring off of these interest points, we devise a recognition algorithm based on spatio-temporally windowed data. We present recognition results on a variety of datasets including both human and rodent behavior.","PeriodicalId":435841,"journal":{"name":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132981287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}