Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425332
Zsolt L. Husz, A. Wallace, P. Green
This paper considers the link between tracking algorithms and high-level human behavioural analysis, introducing the action primitives model that recovers symbolic labels from tracked limb configurations. The model consists of similar short-term actions, action primitives clusters, formed automatically and then labelled by supervised learning. The model allows both short actions and longer activities, either periodic or aperiodic. New labels are added incrementally. We determine the effects of model parameters on the labelling of action primitives using ground truth derived from a motion capture system. We also present a representative example of a labelled video sequence.
{"title":"Human activity recognition with action primitives","authors":"Zsolt L. Husz, A. Wallace, P. Green","doi":"10.1109/AVSS.2007.4425332","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425332","url":null,"abstract":"This paper considers the link between tracking algorithms and high-level human behavioural analysis, introducing the action primitives model that recovers symbolic labels from tracked limb configurations. The model consists of similar short-term actions, action primitives clusters, formed automatically and then labelled by supervised learning. The model allows both short actions and longer activities, either periodic or aperiodic. New labels are added incrementally. We determine the effects of model parameters on the labelling of action primitives using ground truth derived from a motion capture system. We also present a representative example of a labelled video sequence.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126605528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425342
F. Angella, Livier Reithler, Frédéric Gallesio
This article describes a new method which aims at the optimal deployment of sensors for video-surveillance systems, taking realistic models of fixed and PTZ cameras into account, as well as video analysis requirements. The approach relies on a spatial translation of constraints, a method for fast exploration of potential solutions and hardware acceleration of inter-visibility computation. This operational tool allows the evaluation of complex surveillance systems prior installation thanks to a precise simulation of their spatial coverage.
{"title":"Optimal deployment of cameras for video surveillance systems","authors":"F. Angella, Livier Reithler, Frédéric Gallesio","doi":"10.1109/AVSS.2007.4425342","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425342","url":null,"abstract":"This article describes a new method which aims at the optimal deployment of sensors for video-surveillance systems, taking realistic models of fixed and PTZ cameras into account, as well as video analysis requirements. The approach relies on a spatial translation of constraints, a method for fast exploration of potential solutions and hardware acceleration of inter-visibility computation. This operational tool allows the evaluation of complex surveillance systems prior installation thanks to a precise simulation of their spatial coverage.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123965826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425291
A. Kerhet, F. Leonardi, A. Boni, P. Lombardo, M. Magno, L. Benini
In contrast to video sensors which just "watch " the world, present-day research is aimed at developing intelligent devices able to interpret it locally. A number of such devices are available on the market, very powerful on the one hand, but requiring either connection to the power grid, or massive rechargeable batteries on the other. MicrelEye, the wireless video sensor node presented in this paper, targets a different design point: portability and a scanty power budget, while still providing a prominent level of intelligence, namely objects classification. To deal with such a challenging task, we propose and implement a new SVM-like hardware-oriented algorithm called ERSVM. The case study considered in this work is people detection. The obtained results suggest that the present technology allows for the design of simple intelligent video nodes capable of performing local classification tasks.
{"title":"Distributed video surveillance using hardware-friendly sparse large margin classifiers","authors":"A. Kerhet, F. Leonardi, A. Boni, P. Lombardo, M. Magno, L. Benini","doi":"10.1109/AVSS.2007.4425291","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425291","url":null,"abstract":"In contrast to video sensors which just \"watch \" the world, present-day research is aimed at developing intelligent devices able to interpret it locally. A number of such devices are available on the market, very powerful on the one hand, but requiring either connection to the power grid, or massive rechargeable batteries on the other. MicrelEye, the wireless video sensor node presented in this paper, targets a different design point: portability and a scanty power budget, while still providing a prominent level of intelligence, namely objects classification. To deal with such a challenging task, we propose and implement a new SVM-like hardware-oriented algorithm called ERSVM. The case study considered in this work is people detection. The obtained results suggest that the present technology allows for the design of simple intelligent video nodes capable of performing local classification tasks.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123572122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425363
G. Lefebvre, Christophe Garcia
We present a novel approach for face recognition based on salient singularity descriptors. The automatic feature extraction is performed thanks to a salient point detector, and the singularity information selection is performed by a SOM region-based structuring. The spatial singularity distribution is preserved in order to activate specific neuron maps and the local salient signature stimuli reveals the individual identity. This proposed method appears to be particularly robust to facial expressions and facial poses, as demonstrated in various experiments on well-known databases.
{"title":"Facial biometry by stimulating salient singularity masks","authors":"G. Lefebvre, Christophe Garcia","doi":"10.1109/AVSS.2007.4425363","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425363","url":null,"abstract":"We present a novel approach for face recognition based on salient singularity descriptors. The automatic feature extraction is performed thanks to a salient point detector, and the singularity information selection is performed by a SOM region-based structuring. The spatial singularity distribution is preserved in order to activate specific neuron maps and the local salient signature stimuli reveals the individual identity. This proposed method appears to be particularly robust to facial expressions and facial poses, as demonstrated in various experiments on well-known databases.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116042041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425306
J. M. D. Rincón, C. Orrite-Uruñuela, J. Jaraba
In this paper, we introduce an efficient method for particle selection in tracking objects in complex scenes. First, we improve the proposal distribution function of the tracking algorithm, including current observation, reducing the cost of evaluating particles with a very low likelihood. In addition, we use a partitioned sampling approach to decompose the dynamic state in several stages. It enables to deal with high-dimensional states without an excessive computational cost. To represent the color distribution, the appearance of the tracked object is modelled by sampled pixels. Based on this representation, the probability of any observation is estimated using non-parametric techniques in color space. As a result, we obtain a probability color density image (PDI) where each pixel points its membership to the target color model. In this way, the evaluation of all particles is accelerated by computing the likelihood p(zx) using the integral image of the PDI.
{"title":"An efficient particle filter for color-based tracking in complex scenes","authors":"J. M. D. Rincón, C. Orrite-Uruñuela, J. Jaraba","doi":"10.1109/AVSS.2007.4425306","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425306","url":null,"abstract":"In this paper, we introduce an efficient method for particle selection in tracking objects in complex scenes. First, we improve the proposal distribution function of the tracking algorithm, including current observation, reducing the cost of evaluating particles with a very low likelihood. In addition, we use a partitioned sampling approach to decompose the dynamic state in several stages. It enables to deal with high-dimensional states without an excessive computational cost. To represent the color distribution, the appearance of the tracked object is modelled by sampled pixels. Based on this representation, the probability of any observation is estimated using non-parametric techniques in color space. As a result, we obtain a probability color density image (PDI) where each pixel points its membership to the target color model. In this way, the evaluation of all particles is accelerated by computing the likelihood p(zx) using the integral image of the PDI.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133505254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425354
S. Duffner, Christophe Garcia
We present a face recognition technique based on a special type of convolutional neural network that is trained to extract characteristic features from face images and reconstruct the corresponding reference face images which are chosen beforehand for each individual to recognize. The reconstruction is realized by a so-called "bottle-neck" neural network that learns to project face images into a low-dimensional vector space and to reconstruct the respective reference images from the projected vectors. In contrast to methods based on the Principal Component Analysis (PCA), the Linear Discriminant Analysis (LDA) etc., the projection is non-linear and depends on the choice of the reference images. Moreover, local and global processing are closely interconnected and the respective parameters are conjointly learnt. Having trained the neural network, new face images can then be classified by comparing the respective projected vectors. We experimentally show that the choice of the reference images influences the final recognition performance and that this method outperforms linear projection methods in terms of precision and robustness.
{"title":"Face recognition using non-linear image reconstruction","authors":"S. Duffner, Christophe Garcia","doi":"10.1109/AVSS.2007.4425354","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425354","url":null,"abstract":"We present a face recognition technique based on a special type of convolutional neural network that is trained to extract characteristic features from face images and reconstruct the corresponding reference face images which are chosen beforehand for each individual to recognize. The reconstruction is realized by a so-called \"bottle-neck\" neural network that learns to project face images into a low-dimensional vector space and to reconstruct the respective reference images from the projected vectors. In contrast to methods based on the Principal Component Analysis (PCA), the Linear Discriminant Analysis (LDA) etc., the projection is non-linear and depends on the choice of the reference images. Moreover, local and global processing are closely interconnected and the respective parameters are conjointly learnt. Having trained the neural network, new face images can then be classified by comparing the respective projected vectors. We experimentally show that the choice of the reference images influences the final recognition performance and that this method outperforms linear projection methods in terms of precision and robustness.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128195805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425341
Gurumurthy Swaminathan, V. Venkoparao, S. Bedros
Face tracking is a key component for automated video surveillance systems. It supports and enhances tasks such as face recognition and video indexing. Face tracking in surveillance scenarios is a challenging problem due to ambient illumination variations, face pose changes, occlusions, and background clutter. We present an algorithm for tracking faces in surveillance video based on a particle filter mechanism using multiple appearance models for robust representation of the face. We propose color based appearance model complemented by an edge based appearance model using the Difference of Gaussian (DOG) filters. We demonstrate that combined appearance models are more robust in handling the face and scene variations than a single appearance model. For example, color template appearance model is better in handling pose variations but they deteriorate against illumination variations. Similarly, an edge based model is robust in handling illumination variations but they fail in handling substantial pose changes. Hence, a combined model is more robust in handling pose and illumination changes than either one of them by itself. We show how the algorithm performs on a real surveillance scenario where the face undergoes various pose and illumination changes. The algorithm runs in real-time at 20 fps on a standard 3.0 GHz desktop PC.
{"title":"Multiple appearance models for face tracking in surveillance videos","authors":"Gurumurthy Swaminathan, V. Venkoparao, S. Bedros","doi":"10.1109/AVSS.2007.4425341","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425341","url":null,"abstract":"Face tracking is a key component for automated video surveillance systems. It supports and enhances tasks such as face recognition and video indexing. Face tracking in surveillance scenarios is a challenging problem due to ambient illumination variations, face pose changes, occlusions, and background clutter. We present an algorithm for tracking faces in surveillance video based on a particle filter mechanism using multiple appearance models for robust representation of the face. We propose color based appearance model complemented by an edge based appearance model using the Difference of Gaussian (DOG) filters. We demonstrate that combined appearance models are more robust in handling the face and scene variations than a single appearance model. For example, color template appearance model is better in handling pose variations but they deteriorate against illumination variations. Similarly, an edge based model is robust in handling illumination variations but they fail in handling substantial pose changes. Hence, a combined model is more robust in handling pose and illumination changes than either one of them by itself. We show how the algorithm performs on a real surveillance scenario where the face undergoes various pose and illumination changes. The algorithm runs in real-time at 20 fps on a standard 3.0 GHz desktop PC.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"279 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131658646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425360
L. Snidaro, Massimo Belluz, G. Foresti
In this paper, we investigate the problem of representing and maintaining rule knowledge for a video surveillance application. We focus on complex events representation which cannot be straightforwardly represented by canonical means. In particular, we highlight the ongoing efforts for a unifying framework for computable rule and taxonomical knowledge representation.
{"title":"Representing and recognizing complex events in surveillance applications","authors":"L. Snidaro, Massimo Belluz, G. Foresti","doi":"10.1109/AVSS.2007.4425360","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425360","url":null,"abstract":"In this paper, we investigate the problem of representing and maintaining rule knowledge for a video surveillance application. We focus on complex events representation which cannot be straightforwardly represented by canonical means. In particular, we highlight the ongoing efforts for a unifying framework for computable rule and taxonomical knowledge representation.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123958009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425326
P. Zappi, Elisabetta Farella, L. Benini
Pyroelectric sensors are low-cost, low-power small components commonly used only to trigger alarm in presence of humans or moving objects. However, the use of an array of pyroelectric sensors can lead to extraction of more features such as direction of movements, speed, number of people and other characteristics. In this work a low-cost pyroelectric infrared sensor based wireless network is set up to be used for tracking people motion. A novel technique is proposed to distinguish the direction of movement and the number of people passing. The approach has low computational requirements, therefore it is well-suited to limited-resources devices such as wireless nodes. Tests performed gave promising results.
{"title":"Enhancing the spatial resolution of presence detection in a PIR based wireless surveillance network","authors":"P. Zappi, Elisabetta Farella, L. Benini","doi":"10.1109/AVSS.2007.4425326","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425326","url":null,"abstract":"Pyroelectric sensors are low-cost, low-power small components commonly used only to trigger alarm in presence of humans or moving objects. However, the use of an array of pyroelectric sensors can lead to extraction of more features such as direction of movements, speed, number of people and other characteristics. In this work a low-cost pyroelectric infrared sensor based wireless network is set up to be used for tracking people motion. A novel technique is proposed to distinguish the direction of movement and the number of people passing. The approach has low computational requirements, therefore it is well-suited to limited-resources devices such as wireless nodes. Tests performed gave promising results.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127262268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425315
M. Asadi, A. Dore, A. Beoldo, C. Regazzoni
The paper presents a new corner-model based learning method able to track non-rigid objects in the presence of occlusion. A voting mechanism followed by a probability density analysis of the voting space histogram is used to estimate new position of the target. The model is updated at any frame. The problem rises in the occlusion events where the occluder corners affect the model and the tracker may follow the occluder. The key point of the method toward success is automatically deciding on the corners to classify them into two classes, good and malicious corners. Good corners are used to update the model in a conservative way removing the corners that are voting to the highly voted wrong positions due to the occluder. This leads to a continuous model learning during occlusion. Experimental results show a successful tracking along with a more precise estimation of shape and motion during occlusion
{"title":"Tracking by using dynamic shape model learning in the presence of occlusion","authors":"M. Asadi, A. Dore, A. Beoldo, C. Regazzoni","doi":"10.1109/AVSS.2007.4425315","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425315","url":null,"abstract":"The paper presents a new corner-model based learning method able to track non-rigid objects in the presence of occlusion. A voting mechanism followed by a probability density analysis of the voting space histogram is used to estimate new position of the target. The model is updated at any frame. The problem rises in the occlusion events where the occluder corners affect the model and the tracker may follow the occluder. The key point of the method toward success is automatically deciding on the corners to classify them into two classes, good and malicious corners. Good corners are used to update the model in a conservative way removing the corners that are voting to the highly voted wrong positions due to the occluder. This leads to a continuous model learning during occlusion. Experimental results show a successful tracking along with a more precise estimation of shape and motion during occlusion","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126331499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}