Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425334
C. Colombo, Dario Comanducci, A. Bimbo
This paper addresses the problem of classifying human actions in a video sequence. A representation eigenspace approach based on the PCA algorithm is used to train the classifier according to an incremental learning scheme based on a "one action, one eigenspace" approach. Before dimensionality reduction, a high dimensional description of each frame of the video sequence is constructed, based on foreground blob analysis. Classification is performed by matching incrementally the reduced representation of the test image sequence against each of the learned ones, and accumulating matching scores according to a probabilistic framework, until a decision is obtained. Experimental results with real video sequences are presented and discussed.
{"title":"Compact representation and probabilistic classification of human actions in videos","authors":"C. Colombo, Dario Comanducci, A. Bimbo","doi":"10.1109/AVSS.2007.4425334","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425334","url":null,"abstract":"This paper addresses the problem of classifying human actions in a video sequence. A representation eigenspace approach based on the PCA algorithm is used to train the classifier according to an incremental learning scheme based on a \"one action, one eigenspace\" approach. Before dimensionality reduction, a high dimensional description of each frame of the video sequence is constructed, based on foreground blob analysis. Classification is performed by matching incrementally the reduced representation of the test image sequence against each of the learned ones, and accumulating matching scores according to a probabilistic framework, until a decision is obtained. Experimental results with real video sequences are presented and discussed.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123215248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425367
K. Imamura, Masaki Hiraoka, H. Hashimoto
MPEG-4, which is a video coding standard, supports object-based functionalities for high efficiency coding. MPEG-7, a multimedia content description interface, handles the object data in, for example, retrieval and/or editing systems. Therefore, extraction of semantic video objects is an indispensable tool that benefits these newly developed schemes. In the present paper, we propose a technique that extracts the shape of moving objects by combining snakes and watershed algorithm. The proposed method comprises two steps. In the first step, snakes extract contours of moving objects as a result of the minimization of an energy function. In the second step, the conditional watershed algorithm extracts contours from a topographical surface including a new function term. This function term is introduced to improve the estimated contours considering boundaries of moving objects obtained by snakes. The efficiency of the proposed approach in moving object extraction is demonstrated through computer simulations.
{"title":"Watershed algorithm for moving object extraction considering energy minimization by snakes","authors":"K. Imamura, Masaki Hiraoka, H. Hashimoto","doi":"10.1109/AVSS.2007.4425367","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425367","url":null,"abstract":"MPEG-4, which is a video coding standard, supports object-based functionalities for high efficiency coding. MPEG-7, a multimedia content description interface, handles the object data in, for example, retrieval and/or editing systems. Therefore, extraction of semantic video objects is an indispensable tool that benefits these newly developed schemes. In the present paper, we propose a technique that extracts the shape of moving objects by combining snakes and watershed algorithm. The proposed method comprises two steps. In the first step, snakes extract contours of moving objects as a result of the minimization of an energy function. In the second step, the conditional watershed algorithm extracts contours from a topographical surface including a new function term. This function term is introduced to improve the estimated contours considering boundaries of moving objects obtained by snakes. The efficiency of the proposed approach in moving object extraction is demonstrated through computer simulations.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"33 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123598074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425375
G. Garibotto
The paper describes a method for people detection and tracking from multi-camera views. The proposed approach is based on 3D models of the person shape, where motion tracking is carried out in 3D space with re-projection onto calibrated images to perform target validation according to a prediction-verification paradigm. Multiple cameras with partial overlap can be used to cover a much wider area. The referred examples are based on the data base from PETS 2006 video sequences and a data base from EU-ISCAPS demonstration environment.
{"title":"3-D model-based people detection & tracking","authors":"G. Garibotto","doi":"10.1109/AVSS.2007.4425375","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425375","url":null,"abstract":"The paper describes a method for people detection and tracking from multi-camera views. The proposed approach is based on 3D models of the person shape, where motion tracking is carried out in 3D space with re-projection onto calibrated images to perform target validation according to a prediction-verification paradigm. Multiple cameras with partial overlap can be used to cover a much wider area. The referred examples are based on the data base from PETS 2006 video sequences and a data base from EU-ISCAPS demonstration environment.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121214153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425350
J. Lichtenauer, G. T. Holt, E. Hendriks, M. Reinders
A 3D visual hand gesture recognition method is proposed that detects correctly performed signs from stereo camera input. Hand tracking is based on skin detection with an adaptive chrominance model to get high accuracy. Informative high level motion properties are extracted to simplify the classification task. Each example is mapped onto a fixed reference sign by Dynamic Time Warping, to get precise time correspondences. The classification is done by combining weak classifiers based on robust statistics. Each base classifier assumes a uniform distribution of a single feature, determined by winsorization on the noisy training set. The operating point of the classifier is determined by stretching the uniform distributions of the base classifiers instead of changing the threshold on the total posterior likelihood. In a cross validation with 120 signs performed by 70 different persons, 95% of the test signs were correctly detected at a false positive rate of 5%.
{"title":"Sign language detection using 3D visual cues","authors":"J. Lichtenauer, G. T. Holt, E. Hendriks, M. Reinders","doi":"10.1109/AVSS.2007.4425350","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425350","url":null,"abstract":"A 3D visual hand gesture recognition method is proposed that detects correctly performed signs from stereo camera input. Hand tracking is based on skin detection with an adaptive chrominance model to get high accuracy. Informative high level motion properties are extracted to simplify the classification task. Each example is mapped onto a fixed reference sign by Dynamic Time Warping, to get precise time correspondences. The classification is done by combining weak classifiers based on robust statistics. Each base classifier assumes a uniform distribution of a single feature, determined by winsorization on the noisy training set. The operating point of the classifier is determined by stretching the uniform distributions of the base classifiers instead of changing the threshold on the total posterior likelihood. In a cross validation with 120 signs performed by 70 different persons, 95% of the test signs were correctly detected at a false positive rate of 5%.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116998094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425312
A. French, Asad Naeem, I. Dryden, T. Pridmore
This paper presents a new methodology for improving the tracking of multiple targets in complex scenes. The new method,Motion Parameter Sharing, incorporates social motion information into tracking predictions. This is achieved by allowing a tracker to share motion estimates within groups of targets which have previously been moving in a coordinated fashion. The method is intuitive and, as well as aiding the prediction estimates, allows the implicit formation of 'social groups' of targets as a side effect of the process. The underlying reasoning and method are presented, as well as a description of how the method fits into the framework of a typical Bayesian tracking system. This is followed by some preliminary results which suggest the method is more accurate and robust than algorithms which do not incorporate the social information available in multiple target scenarios.
{"title":"Using social effects to guide tracking in complex scenes","authors":"A. French, Asad Naeem, I. Dryden, T. Pridmore","doi":"10.1109/AVSS.2007.4425312","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425312","url":null,"abstract":"This paper presents a new methodology for improving the tracking of multiple targets in complex scenes. The new method,Motion Parameter Sharing, incorporates social motion information into tracking predictions. This is achieved by allowing a tracker to share motion estimates within groups of targets which have previously been moving in a coordinated fashion. The method is intuitive and, as well as aiding the prediction estimates, allows the implicit formation of 'social groups' of targets as a side effect of the process. The underlying reasoning and method are presented, as well as a description of how the method fits into the framework of a typical Bayesian tracking system. This is followed by some preliminary results which suggest the method is more accurate and robust than algorithms which do not incorporate the social information available in multiple target scenarios.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121610088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425353
Chen Wu, H. Aghajan
In multi-camera networks rich visual data is provided both spatially and temporally. In this paper a method of human posture estimation is described incorporating the concept of an opportunistic fusion framework aiming to employ manifold sources of visual information across space, time, and feature levels. One motivation for the proposed method is to reduce raw visual data in a single camera to elliptical parameterized segments for efficient communication between cameras. A 3D human body model is employed as the convergence point of spatiotemporal and feature fusion. It maintains both geometric parameters of the human posture and the adoptively learned appearance attributes, all of which are updated from the three dimensions of space, time and features of the opportunistic fusion. In sufficient confidence levels parameters of the 3D human body model are again used as feedback to aid subsequent in-node vision analysis. Color distribution registered in the model is used to initialize segmentation. Perceptually Organized Expectation Maximization (POEM) is then applied to refine color segments with observations from a single camera. Geometric configuration of the 3D skeleton is estimated by Particle Swarm Optimization (PSO).
{"title":"Model-based human posture estimation for gesture analysis in an opportunistic fusion smart camera network","authors":"Chen Wu, H. Aghajan","doi":"10.1109/AVSS.2007.4425353","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425353","url":null,"abstract":"In multi-camera networks rich visual data is provided both spatially and temporally. In this paper a method of human posture estimation is described incorporating the concept of an opportunistic fusion framework aiming to employ manifold sources of visual information across space, time, and feature levels. One motivation for the proposed method is to reduce raw visual data in a single camera to elliptical parameterized segments for efficient communication between cameras. A 3D human body model is employed as the convergence point of spatiotemporal and feature fusion. It maintains both geometric parameters of the human posture and the adoptively learned appearance attributes, all of which are updated from the three dimensions of space, time and features of the opportunistic fusion. In sufficient confidence levels parameters of the 3D human body model are again used as feedback to aid subsequent in-node vision analysis. Color distribution registered in the model is used to initialize segmentation. Perceptually Organized Expectation Maximization (POEM) is then applied to refine color segments with observations from a single camera. Geometric configuration of the 3D skeleton is estimated by Particle Swarm Optimization (PSO).","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121755431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425285
J. Tena, Raymond S. Smith, M. Hamouz, J. Kittler, A. Hilton, J. Illingworth
The ever growing need for improved security, surveillance and identity protection, calls for the creation of evermore reliable and robust face recognition technology that is scalable and can be deployed in all kinds of environments without compromising its effectiveness. In this paper we study the impact that pose correction has on the performance of 2D face recognition. To measure the effect, we use a state of the art 2D recognition algorithm. The pose correction is performed by means of 3D morphable model. Our results on the non frontal XM2VTS database showed that pose correction can improve recognition rates up to 30%.
{"title":"2D face pose normalisation using a 3D morphable model","authors":"J. Tena, Raymond S. Smith, M. Hamouz, J. Kittler, A. Hilton, J. Illingworth","doi":"10.1109/AVSS.2007.4425285","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425285","url":null,"abstract":"The ever growing need for improved security, surveillance and identity protection, calls for the creation of evermore reliable and robust face recognition technology that is scalable and can be deployed in all kinds of environments without compromising its effectiveness. In this paper we study the impact that pose correction has on the performance of 2D face recognition. To measure the effect, we use a state of the art 2D recognition algorithm. The pose correction is performed by means of 3D morphable model. Our results on the non frontal XM2VTS database showed that pose correction can improve recognition rates up to 30%.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131251730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425294
R. Wijnhoven, P. D. With
We present and experiment with a patch-based algorithm for the purpose of object classification in video surveillance. A feature vector is calculated based on template matching of a large set of image patches, within detected regions-of-interest (ROIs, also called blobs), of moving objects. Instead of matching direct image pixels, we use Gabor-filtered versions of the input image at several scales. We present results for a new typical video surveillance dataset containing over 9,000 object images. Additionally, we show results for the PETS 2001 dataset and another dataset from literature. Because our algorithm is not invariant to the object orientation, the set was split into four subsets with different orientation. We show the improvements, resulting from taking the object orientation into account. Using 50 training samples or higher, our resulting detection rate is on the average above 95%, which improves with the orientation consideration to 98%. Because of the inherent scalability of the algorithm, an embedded system implementation is well within reach.
{"title":"Experiments with patch-based object classification","authors":"R. Wijnhoven, P. D. With","doi":"10.1109/AVSS.2007.4425294","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425294","url":null,"abstract":"We present and experiment with a patch-based algorithm for the purpose of object classification in video surveillance. A feature vector is calculated based on template matching of a large set of image patches, within detected regions-of-interest (ROIs, also called blobs), of moving objects. Instead of matching direct image pixels, we use Gabor-filtered versions of the input image at several scales. We present results for a new typical video surveillance dataset containing over 9,000 object images. Additionally, we show results for the PETS 2001 dataset and another dataset from literature. Because our algorithm is not invariant to the object orientation, the set was split into four subsets with different orientation. We show the improvements, resulting from taking the object orientation into account. Using 50 training samples or higher, our resulting detection rate is on the average above 95%, which improves with the orientation consideration to 98%. Because of the inherent scalability of the algorithm, an embedded system implementation is well within reach.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134516182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425292
Federico Castanedo, M. A. Patricio, Jesús García, J. M. Molina
In this paper an approach for multi-sensor coordination in a multiagent visual sensor network is presented. A belief-desire-intention model of multiagent systems is employed. In this multiagent system, the interactions between several surveillance-sensor agents and their respective fusion agent are discussed. The surveillance process is improved using a bottom-up/top-down coordination approach, in which a fusion agent controls the coordination process. In the bottom-up phase the information is sent to the fusion agent. On the other hand, in the top-down stage, feedback messages are sent to those surveillance-sensor agents that are performing an inconsistency tracking process with regard to the global fused tracking process. This feedback information allows to the surveillance-sensor agent to correct its tracking process. Finally, preliminary experiments with the PETS 2006 database are presented.
{"title":"Bottom-up/top-down coordination in a multiagent visual sensor network","authors":"Federico Castanedo, M. A. Patricio, Jesús García, J. M. Molina","doi":"10.1109/AVSS.2007.4425292","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425292","url":null,"abstract":"In this paper an approach for multi-sensor coordination in a multiagent visual sensor network is presented. A belief-desire-intention model of multiagent systems is employed. In this multiagent system, the interactions between several surveillance-sensor agents and their respective fusion agent are discussed. The surveillance process is improved using a bottom-up/top-down coordination approach, in which a fusion agent controls the coordination process. In the bottom-up phase the information is sent to the fusion agent. On the other hand, in the top-down stage, feedback messages are sent to those surveillance-sensor agents that are performing an inconsistency tracking process with regard to the global fused tracking process. This feedback information allows to the surveillance-sensor agent to correct its tracking process. Finally, preliminary experiments with the PETS 2006 database are presented.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121923280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425320
S. Boragno, B. Boghossian, J. Black, D. Makris, S. Velastín
In this paper, a system for automatic robust video surveillance is described and in particular its application to the problem of locating vehicles that stop in prohibited area is discussed. The structure of software for video processing (alarm generation, interface with the operator and information storage) is outlined together with the hardware (Trimedia DSP boards and industrial computers) which constitutes an industrial-grade product. The emphasis on this paper is to demonstrate robust detection and hence we show the results of a performance evaluation process carried out with the UK's i-LIDS "Parked Vehicle " reference dataset.
{"title":"A DSP-based system for the detection of vehicles parked in prohibited areas","authors":"S. Boragno, B. Boghossian, J. Black, D. Makris, S. Velastín","doi":"10.1109/AVSS.2007.4425320","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425320","url":null,"abstract":"In this paper, a system for automatic robust video surveillance is described and in particular its application to the problem of locating vehicles that stop in prohibited area is discussed. The structure of software for video processing (alarm generation, interface with the operator and information storage) is outlined together with the hardware (Trimedia DSP boards and industrial computers) which constitutes an industrial-grade product. The emphasis on this paper is to demonstrate robust detection and hence we show the results of a performance evaluation process carried out with the UK's i-LIDS \"Parked Vehicle \" reference dataset.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127813706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}