Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425353
Chen Wu, H. Aghajan
In multi-camera networks rich visual data is provided both spatially and temporally. In this paper a method of human posture estimation is described incorporating the concept of an opportunistic fusion framework aiming to employ manifold sources of visual information across space, time, and feature levels. One motivation for the proposed method is to reduce raw visual data in a single camera to elliptical parameterized segments for efficient communication between cameras. A 3D human body model is employed as the convergence point of spatiotemporal and feature fusion. It maintains both geometric parameters of the human posture and the adoptively learned appearance attributes, all of which are updated from the three dimensions of space, time and features of the opportunistic fusion. In sufficient confidence levels parameters of the 3D human body model are again used as feedback to aid subsequent in-node vision analysis. Color distribution registered in the model is used to initialize segmentation. Perceptually Organized Expectation Maximization (POEM) is then applied to refine color segments with observations from a single camera. Geometric configuration of the 3D skeleton is estimated by Particle Swarm Optimization (PSO).
{"title":"Model-based human posture estimation for gesture analysis in an opportunistic fusion smart camera network","authors":"Chen Wu, H. Aghajan","doi":"10.1109/AVSS.2007.4425353","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425353","url":null,"abstract":"In multi-camera networks rich visual data is provided both spatially and temporally. In this paper a method of human posture estimation is described incorporating the concept of an opportunistic fusion framework aiming to employ manifold sources of visual information across space, time, and feature levels. One motivation for the proposed method is to reduce raw visual data in a single camera to elliptical parameterized segments for efficient communication between cameras. A 3D human body model is employed as the convergence point of spatiotemporal and feature fusion. It maintains both geometric parameters of the human posture and the adoptively learned appearance attributes, all of which are updated from the three dimensions of space, time and features of the opportunistic fusion. In sufficient confidence levels parameters of the 3D human body model are again used as feedback to aid subsequent in-node vision analysis. Color distribution registered in the model is used to initialize segmentation. Perceptually Organized Expectation Maximization (POEM) is then applied to refine color segments with observations from a single camera. Geometric configuration of the 3D skeleton is estimated by Particle Swarm Optimization (PSO).","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121755431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425374
Jinseok Lee, Byung Guk Kim, S. Cho, Sangjin Hong, W. Cho
This paper addresses the problem of 3-dimensional (3D) multitarget tracking using particle filter with the joint multitarget probability density (JMPD) technique. The estimation allows the nonlinear target motion with unlabeled measurement association as well as non-Gaussian target state densities. In addition, we decompose the 3D formulation into multiple 2D particle filters that operate on the 2D planes. Both selection and combining of the 2D particle filters for 3D tracking are presented and discussed. Finally, we analyze the tracking and association performance of the proposed approach especially in the cases of multitarget crossing and overlapping.
{"title":"Multitarget association and tracking in 3-D space based on particle filter with joint multitarget probability density","authors":"Jinseok Lee, Byung Guk Kim, S. Cho, Sangjin Hong, W. Cho","doi":"10.1109/AVSS.2007.4425374","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425374","url":null,"abstract":"This paper addresses the problem of 3-dimensional (3D) multitarget tracking using particle filter with the joint multitarget probability density (JMPD) technique. The estimation allows the nonlinear target motion with unlabeled measurement association as well as non-Gaussian target state densities. In addition, we decompose the 3D formulation into multiple 2D particle filters that operate on the 2D planes. Both selection and combining of the 2D particle filters for 3D tracking are presented and discussed. Finally, we analyze the tracking and association performance of the proposed approach especially in the cases of multitarget crossing and overlapping.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124291888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425334
C. Colombo, Dario Comanducci, A. Bimbo
This paper addresses the problem of classifying human actions in a video sequence. A representation eigenspace approach based on the PCA algorithm is used to train the classifier according to an incremental learning scheme based on a "one action, one eigenspace" approach. Before dimensionality reduction, a high dimensional description of each frame of the video sequence is constructed, based on foreground blob analysis. Classification is performed by matching incrementally the reduced representation of the test image sequence against each of the learned ones, and accumulating matching scores according to a probabilistic framework, until a decision is obtained. Experimental results with real video sequences are presented and discussed.
{"title":"Compact representation and probabilistic classification of human actions in videos","authors":"C. Colombo, Dario Comanducci, A. Bimbo","doi":"10.1109/AVSS.2007.4425334","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425334","url":null,"abstract":"This paper addresses the problem of classifying human actions in a video sequence. A representation eigenspace approach based on the PCA algorithm is used to train the classifier according to an incremental learning scheme based on a \"one action, one eigenspace\" approach. Before dimensionality reduction, a high dimensional description of each frame of the video sequence is constructed, based on foreground blob analysis. Classification is performed by matching incrementally the reduced representation of the test image sequence against each of the learned ones, and accumulating matching scores according to a probabilistic framework, until a decision is obtained. Experimental results with real video sequences are presented and discussed.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123215248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425375
G. Garibotto
The paper describes a method for people detection and tracking from multi-camera views. The proposed approach is based on 3D models of the person shape, where motion tracking is carried out in 3D space with re-projection onto calibrated images to perform target validation according to a prediction-verification paradigm. Multiple cameras with partial overlap can be used to cover a much wider area. The referred examples are based on the data base from PETS 2006 video sequences and a data base from EU-ISCAPS demonstration environment.
{"title":"3-D model-based people detection & tracking","authors":"G. Garibotto","doi":"10.1109/AVSS.2007.4425375","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425375","url":null,"abstract":"The paper describes a method for people detection and tracking from multi-camera views. The proposed approach is based on 3D models of the person shape, where motion tracking is carried out in 3D space with re-projection onto calibrated images to perform target validation according to a prediction-verification paradigm. Multiple cameras with partial overlap can be used to cover a much wider area. The referred examples are based on the data base from PETS 2006 video sequences and a data base from EU-ISCAPS demonstration environment.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121214153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425307
M. Kharbat, N. Aouf, A. Tsourdos, B. White
Capture mechanisms are used to transfer objects between two vehicles in the space with no physical contact. A sphere (canister) detection and tracking method using an enhanced Hough transform technique and Hinfin filter is proposed. The presented system aims to assist in the capture operation, currently investigated the European Space Agency and other partners, and to be used in space missions as an alternative to docking or berthing operations. Test results show the robustness and reliability of the proposed method. They also demonstrate the low computational and memory complexities needed.
{"title":"Sphere detection and tracking for a space capturing operation","authors":"M. Kharbat, N. Aouf, A. Tsourdos, B. White","doi":"10.1109/AVSS.2007.4425307","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425307","url":null,"abstract":"Capture mechanisms are used to transfer objects between two vehicles in the space with no physical contact. A sphere (canister) detection and tracking method using an enhanced Hough transform technique and Hinfin filter is proposed. The presented system aims to assist in the capture operation, currently investigated the European Space Agency and other partners, and to be used in space missions as an alternative to docking or berthing operations. Test results show the robustness and reliability of the proposed method. They also demonstrate the low computational and memory complexities needed.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121437844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425350
J. Lichtenauer, G. T. Holt, E. Hendriks, M. Reinders
A 3D visual hand gesture recognition method is proposed that detects correctly performed signs from stereo camera input. Hand tracking is based on skin detection with an adaptive chrominance model to get high accuracy. Informative high level motion properties are extracted to simplify the classification task. Each example is mapped onto a fixed reference sign by Dynamic Time Warping, to get precise time correspondences. The classification is done by combining weak classifiers based on robust statistics. Each base classifier assumes a uniform distribution of a single feature, determined by winsorization on the noisy training set. The operating point of the classifier is determined by stretching the uniform distributions of the base classifiers instead of changing the threshold on the total posterior likelihood. In a cross validation with 120 signs performed by 70 different persons, 95% of the test signs were correctly detected at a false positive rate of 5%.
{"title":"Sign language detection using 3D visual cues","authors":"J. Lichtenauer, G. T. Holt, E. Hendriks, M. Reinders","doi":"10.1109/AVSS.2007.4425350","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425350","url":null,"abstract":"A 3D visual hand gesture recognition method is proposed that detects correctly performed signs from stereo camera input. Hand tracking is based on skin detection with an adaptive chrominance model to get high accuracy. Informative high level motion properties are extracted to simplify the classification task. Each example is mapped onto a fixed reference sign by Dynamic Time Warping, to get precise time correspondences. The classification is done by combining weak classifiers based on robust statistics. Each base classifier assumes a uniform distribution of a single feature, determined by winsorization on the noisy training set. The operating point of the classifier is determined by stretching the uniform distributions of the base classifiers instead of changing the threshold on the total posterior likelihood. In a cross validation with 120 signs performed by 70 different persons, 95% of the test signs were correctly detected at a false positive rate of 5%.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116998094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425285
J. Tena, Raymond S. Smith, M. Hamouz, J. Kittler, A. Hilton, J. Illingworth
The ever growing need for improved security, surveillance and identity protection, calls for the creation of evermore reliable and robust face recognition technology that is scalable and can be deployed in all kinds of environments without compromising its effectiveness. In this paper we study the impact that pose correction has on the performance of 2D face recognition. To measure the effect, we use a state of the art 2D recognition algorithm. The pose correction is performed by means of 3D morphable model. Our results on the non frontal XM2VTS database showed that pose correction can improve recognition rates up to 30%.
{"title":"2D face pose normalisation using a 3D morphable model","authors":"J. Tena, Raymond S. Smith, M. Hamouz, J. Kittler, A. Hilton, J. Illingworth","doi":"10.1109/AVSS.2007.4425285","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425285","url":null,"abstract":"The ever growing need for improved security, surveillance and identity protection, calls for the creation of evermore reliable and robust face recognition technology that is scalable and can be deployed in all kinds of environments without compromising its effectiveness. In this paper we study the impact that pose correction has on the performance of 2D face recognition. To measure the effect, we use a state of the art 2D recognition algorithm. The pose correction is performed by means of 3D morphable model. Our results on the non frontal XM2VTS database showed that pose correction can improve recognition rates up to 30%.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131251730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425294
R. Wijnhoven, P. D. With
We present and experiment with a patch-based algorithm for the purpose of object classification in video surveillance. A feature vector is calculated based on template matching of a large set of image patches, within detected regions-of-interest (ROIs, also called blobs), of moving objects. Instead of matching direct image pixels, we use Gabor-filtered versions of the input image at several scales. We present results for a new typical video surveillance dataset containing over 9,000 object images. Additionally, we show results for the PETS 2001 dataset and another dataset from literature. Because our algorithm is not invariant to the object orientation, the set was split into four subsets with different orientation. We show the improvements, resulting from taking the object orientation into account. Using 50 training samples or higher, our resulting detection rate is on the average above 95%, which improves with the orientation consideration to 98%. Because of the inherent scalability of the algorithm, an embedded system implementation is well within reach.
{"title":"Experiments with patch-based object classification","authors":"R. Wijnhoven, P. D. With","doi":"10.1109/AVSS.2007.4425294","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425294","url":null,"abstract":"We present and experiment with a patch-based algorithm for the purpose of object classification in video surveillance. A feature vector is calculated based on template matching of a large set of image patches, within detected regions-of-interest (ROIs, also called blobs), of moving objects. Instead of matching direct image pixels, we use Gabor-filtered versions of the input image at several scales. We present results for a new typical video surveillance dataset containing over 9,000 object images. Additionally, we show results for the PETS 2001 dataset and another dataset from literature. Because our algorithm is not invariant to the object orientation, the set was split into four subsets with different orientation. We show the improvements, resulting from taking the object orientation into account. Using 50 training samples or higher, our resulting detection rate is on the average above 95%, which improves with the orientation consideration to 98%. Because of the inherent scalability of the algorithm, an embedded system implementation is well within reach.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134516182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425354
S. Duffner, Christophe Garcia
We present a face recognition technique based on a special type of convolutional neural network that is trained to extract characteristic features from face images and reconstruct the corresponding reference face images which are chosen beforehand for each individual to recognize. The reconstruction is realized by a so-called "bottle-neck" neural network that learns to project face images into a low-dimensional vector space and to reconstruct the respective reference images from the projected vectors. In contrast to methods based on the Principal Component Analysis (PCA), the Linear Discriminant Analysis (LDA) etc., the projection is non-linear and depends on the choice of the reference images. Moreover, local and global processing are closely interconnected and the respective parameters are conjointly learnt. Having trained the neural network, new face images can then be classified by comparing the respective projected vectors. We experimentally show that the choice of the reference images influences the final recognition performance and that this method outperforms linear projection methods in terms of precision and robustness.
{"title":"Face recognition using non-linear image reconstruction","authors":"S. Duffner, Christophe Garcia","doi":"10.1109/AVSS.2007.4425354","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425354","url":null,"abstract":"We present a face recognition technique based on a special type of convolutional neural network that is trained to extract characteristic features from face images and reconstruct the corresponding reference face images which are chosen beforehand for each individual to recognize. The reconstruction is realized by a so-called \"bottle-neck\" neural network that learns to project face images into a low-dimensional vector space and to reconstruct the respective reference images from the projected vectors. In contrast to methods based on the Principal Component Analysis (PCA), the Linear Discriminant Analysis (LDA) etc., the projection is non-linear and depends on the choice of the reference images. Moreover, local and global processing are closely interconnected and the respective parameters are conjointly learnt. Having trained the neural network, new face images can then be classified by comparing the respective projected vectors. We experimentally show that the choice of the reference images influences the final recognition performance and that this method outperforms linear projection methods in terms of precision and robustness.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128195805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425291
A. Kerhet, F. Leonardi, A. Boni, P. Lombardo, M. Magno, L. Benini
In contrast to video sensors which just "watch " the world, present-day research is aimed at developing intelligent devices able to interpret it locally. A number of such devices are available on the market, very powerful on the one hand, but requiring either connection to the power grid, or massive rechargeable batteries on the other. MicrelEye, the wireless video sensor node presented in this paper, targets a different design point: portability and a scanty power budget, while still providing a prominent level of intelligence, namely objects classification. To deal with such a challenging task, we propose and implement a new SVM-like hardware-oriented algorithm called ERSVM. The case study considered in this work is people detection. The obtained results suggest that the present technology allows for the design of simple intelligent video nodes capable of performing local classification tasks.
{"title":"Distributed video surveillance using hardware-friendly sparse large margin classifiers","authors":"A. Kerhet, F. Leonardi, A. Boni, P. Lombardo, M. Magno, L. Benini","doi":"10.1109/AVSS.2007.4425291","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425291","url":null,"abstract":"In contrast to video sensors which just \"watch \" the world, present-day research is aimed at developing intelligent devices able to interpret it locally. A number of such devices are available on the market, very powerful on the one hand, but requiring either connection to the power grid, or massive rechargeable batteries on the other. MicrelEye, the wireless video sensor node presented in this paper, targets a different design point: portability and a scanty power budget, while still providing a prominent level of intelligence, namely objects classification. To deal with such a challenging task, we propose and implement a new SVM-like hardware-oriented algorithm called ERSVM. The case study considered in this work is people detection. The obtained results suggest that the present technology allows for the design of simple intelligent video nodes capable of performing local classification tasks.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123572122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}