Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425315
M. Asadi, A. Dore, A. Beoldo, C. Regazzoni
The paper presents a new corner-model based learning method able to track non-rigid objects in the presence of occlusion. A voting mechanism followed by a probability density analysis of the voting space histogram is used to estimate new position of the target. The model is updated at any frame. The problem rises in the occlusion events where the occluder corners affect the model and the tracker may follow the occluder. The key point of the method toward success is automatically deciding on the corners to classify them into two classes, good and malicious corners. Good corners are used to update the model in a conservative way removing the corners that are voting to the highly voted wrong positions due to the occluder. This leads to a continuous model learning during occlusion. Experimental results show a successful tracking along with a more precise estimation of shape and motion during occlusion
{"title":"Tracking by using dynamic shape model learning in the presence of occlusion","authors":"M. Asadi, A. Dore, A. Beoldo, C. Regazzoni","doi":"10.1109/AVSS.2007.4425315","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425315","url":null,"abstract":"The paper presents a new corner-model based learning method able to track non-rigid objects in the presence of occlusion. A voting mechanism followed by a probability density analysis of the voting space histogram is used to estimate new position of the target. The model is updated at any frame. The problem rises in the occlusion events where the occluder corners affect the model and the tracker may follow the occluder. The key point of the method toward success is automatically deciding on the corners to classify them into two classes, good and malicious corners. Good corners are used to update the model in a conservative way removing the corners that are voting to the highly voted wrong positions due to the occluder. This leads to a continuous model learning during occlusion. Experimental results show a successful tracking along with a more precise estimation of shape and motion during occlusion","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126331499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425360
L. Snidaro, Massimo Belluz, G. Foresti
In this paper, we investigate the problem of representing and maintaining rule knowledge for a video surveillance application. We focus on complex events representation which cannot be straightforwardly represented by canonical means. In particular, we highlight the ongoing efforts for a unifying framework for computable rule and taxonomical knowledge representation.
{"title":"Representing and recognizing complex events in surveillance applications","authors":"L. Snidaro, Massimo Belluz, G. Foresti","doi":"10.1109/AVSS.2007.4425360","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425360","url":null,"abstract":"In this paper, we investigate the problem of representing and maintaining rule knowledge for a video surveillance application. We focus on complex events representation which cannot be straightforwardly represented by canonical means. In particular, we highlight the ongoing efforts for a unifying framework for computable rule and taxonomical knowledge representation.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123958009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425326
P. Zappi, Elisabetta Farella, L. Benini
Pyroelectric sensors are low-cost, low-power small components commonly used only to trigger alarm in presence of humans or moving objects. However, the use of an array of pyroelectric sensors can lead to extraction of more features such as direction of movements, speed, number of people and other characteristics. In this work a low-cost pyroelectric infrared sensor based wireless network is set up to be used for tracking people motion. A novel technique is proposed to distinguish the direction of movement and the number of people passing. The approach has low computational requirements, therefore it is well-suited to limited-resources devices such as wireless nodes. Tests performed gave promising results.
{"title":"Enhancing the spatial resolution of presence detection in a PIR based wireless surveillance network","authors":"P. Zappi, Elisabetta Farella, L. Benini","doi":"10.1109/AVSS.2007.4425326","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425326","url":null,"abstract":"Pyroelectric sensors are low-cost, low-power small components commonly used only to trigger alarm in presence of humans or moving objects. However, the use of an array of pyroelectric sensors can lead to extraction of more features such as direction of movements, speed, number of people and other characteristics. In this work a low-cost pyroelectric infrared sensor based wireless network is set up to be used for tracking people motion. A novel technique is proposed to distinguish the direction of movement and the number of people passing. The approach has low computational requirements, therefore it is well-suited to limited-resources devices such as wireless nodes. Tests performed gave promising results.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127262268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425319
J. T. Lee, M. Ryoo, Matthew Riley, J. Aggarwal
With decreasing costs of high quality surveillance systems, human activity detection and tracking has become increasingly practical. Accordingly, automated systems have been designed for numerous detection tasks, but the task of detecting illegally parked vehicles has been left largely to the human operators of surveillance systems. We propose a methodology for detecting this event in realtime by applying a novel image projection that reduces the dimensionality of the image data and thus reduces the computational complexity of the segmentation and tracking processes. After event detection, we invert the transformation to recover the original appearance of the vehicle and to allow for further processing that may require the two dimensional data. The proposed algorithm is able to successfully recognize illegally parked vehicles in real-time in the i-LIDS bag and vehicle detection challenge datasets.
{"title":"Real-time detection of illegally parked vehicles using 1-D transformation","authors":"J. T. Lee, M. Ryoo, Matthew Riley, J. Aggarwal","doi":"10.1109/AVSS.2007.4425319","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425319","url":null,"abstract":"With decreasing costs of high quality surveillance systems, human activity detection and tracking has become increasingly practical. Accordingly, automated systems have been designed for numerous detection tasks, but the task of detecting illegally parked vehicles has been left largely to the human operators of surveillance systems. We propose a methodology for detecting this event in realtime by applying a novel image projection that reduces the dimensionality of the image data and thus reduces the computational complexity of the segmentation and tracking processes. After event detection, we invert the transformation to recover the original appearance of the vehicle and to allow for further processing that may require the two dimensional data. The proposed algorithm is able to successfully recognize illegally parked vehicles in real-time in the i-LIDS bag and vehicle detection challenge datasets.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125906210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425362
Caifeng Shan, S. Gong, P. McOwan
Computer vision based gender classification is an important component in visual surveillance systems. In this paper, we investigate gender classification from human gaits in image sequences, a relatively understudied problem. Moreover, we propose to fuse gait and face for improved gender discrimination. We exploit Canonical Correlation Analysis (CCA), a powerful tool that is well suited for relating two sets of measurements, to fuse the two modalities at the feature level. Experiments demonstrate that our multimodal gender recognition system achieves the superior recognition performance of 97.2% in large datasets.
{"title":"Learning gender from human gaits and faces","authors":"Caifeng Shan, S. Gong, P. McOwan","doi":"10.1109/AVSS.2007.4425362","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425362","url":null,"abstract":"Computer vision based gender classification is an important component in visual surveillance systems. In this paper, we investigate gender classification from human gaits in image sequences, a relatively understudied problem. Moreover, we propose to fuse gait and face for improved gender discrimination. We exploit Canonical Correlation Analysis (CCA), a powerful tool that is well suited for relating two sets of measurements, to fuse the two modalities at the feature level. Experiments demonstrate that our multimodal gender recognition system achieves the superior recognition performance of 97.2% in large datasets.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"75 2-3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133055342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425333
Ning Jin, F. Mokhtarian
We propose an image-based shape model for view-invariant human motion recognition. Image-based visual hull explicitly represents the 3D shape of an object, which is computed from a set of silhouettes. We then use the set of silhouettes to implicitly represent the visual hull. Due to the fact that a silhouette is the 2D projection of an object in the 3D world with respect to a certain camera, which is sensitive to the point of view, our multi-silhouette representation for the visual hull entails the correspondence between views. To guarantee the correspondence, we define a canonical multi-camera system and a canonical human body orientation in motions. We then "normalize" all the constructed visual hulls into the canonical multi-camera system, align them to follow the canonical orientation, and finally render them. The rendered views thereby satisfy the requirement of the correspondence. In our visual hull's representation, each silhouette is represented as a fixed number of sampled points on its closed contour, therefore, the 3D shape information is implicitly encoded into the concatenation of multiple 2D contours. Each motion class is then learned by a Hidden Markov Model (HMM) with mixture of Gaussians outputs. Experiments using our algorithm over some data sets give encouraging results.
{"title":"Image-based shape model for view-invariant human motion recognition","authors":"Ning Jin, F. Mokhtarian","doi":"10.1109/AVSS.2007.4425333","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425333","url":null,"abstract":"We propose an image-based shape model for view-invariant human motion recognition. Image-based visual hull explicitly represents the 3D shape of an object, which is computed from a set of silhouettes. We then use the set of silhouettes to implicitly represent the visual hull. Due to the fact that a silhouette is the 2D projection of an object in the 3D world with respect to a certain camera, which is sensitive to the point of view, our multi-silhouette representation for the visual hull entails the correspondence between views. To guarantee the correspondence, we define a canonical multi-camera system and a canonical human body orientation in motions. We then \"normalize\" all the constructed visual hulls into the canonical multi-camera system, align them to follow the canonical orientation, and finally render them. The rendered views thereby satisfy the requirement of the correspondence. In our visual hull's representation, each silhouette is represented as a fixed number of sampled points on its closed contour, therefore, the 3D shape information is implicitly encoded into the concatenation of multiple 2D contours. Each motion class is then learned by a Hidden Markov Model (HMM) with mixture of Gaussians outputs. Experiments using our algorithm over some data sets give encouraging results.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133541418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425302
C. Piciarelli, G. Foresti
One of the most promising approaches to event analysis in video sequences is based on the automatic modelling of common patterns of activity for later detection of anomalous events. This approach is especially useful in those applications that do not necessarily require the exact identification of the events, but need only the detection of anomalies that should be reported to a human operator (e.g. video surveillance or traffic monitoring applications). In this paper we propose a trajectory analysis method based on Support Vector Machines; the SVM model is trained on a given set of trajectories and can subsequently detect trajectories substantially differing from the training ones. Particular emphasis is placed on a novel method for estimating the parameter v, since it heavily influences the performances of the system but cannot be easily estimated a-priori. Experimental results are given both on synthetic and real-world data.
{"title":"Anomalous trajectory detection using support vector machines","authors":"C. Piciarelli, G. Foresti","doi":"10.1109/AVSS.2007.4425302","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425302","url":null,"abstract":"One of the most promising approaches to event analysis in video sequences is based on the automatic modelling of common patterns of activity for later detection of anomalous events. This approach is especially useful in those applications that do not necessarily require the exact identification of the events, but need only the detection of anomalies that should be reported to a human operator (e.g. video surveillance or traffic monitoring applications). In this paper we propose a trajectory analysis method based on Support Vector Machines; the SVM model is trained on a given set of trajectories and can subsequently detect trajectories substantially differing from the training ones. Particular emphasis is placed on a novel method for estimating the parameter v, since it heavily influences the performances of the system but cannot be easily estimated a-priori. Experimental results are given both on synthetic and real-world data.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"21 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133826502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425337
Juan Carlos San Miguel, J. Sanchez
This paper presents the results of analysing the effect of different motion segmentation techniques in a system that transmits the information captured by a static surveillance camera in an adaptative way based on the on-line generation of descriptions and their descriptions at different levels of detail. The video sequences are analyzed to detect the regions of activity (motion analysis) and to differentiate them from the background, and the corresponding descriptions (mainly MPEG-7 moving regions) are generated together with the textures of the moving regions and the associated background image. Depending on the available bandwidth, different levels of transmission are specified, ranging from just sending the descriptions generated to a transmission with all the associated images corresponding to the moving objects and background. We study the effect of three motion segmentation algorithms in several aspects such as accurate segmentation, size of the descriptions generated, computational efficiency and reconstructed data quality.
{"title":"On the effect of motion segmentation techniques in description based adaptive video transmission","authors":"Juan Carlos San Miguel, J. Sanchez","doi":"10.1109/AVSS.2007.4425337","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425337","url":null,"abstract":"This paper presents the results of analysing the effect of different motion segmentation techniques in a system that transmits the information captured by a static surveillance camera in an adaptative way based on the on-line generation of descriptions and their descriptions at different levels of detail. The video sequences are analyzed to detect the regions of activity (motion analysis) and to differentiate them from the background, and the corresponding descriptions (mainly MPEG-7 moving regions) are generated together with the textures of the moving regions and the associated background image. Depending on the available bandwidth, different levels of transmission are specified, ranging from just sending the descriptions generated to a transmission with all the associated images corresponding to the moving objects and background. We study the effect of three motion segmentation algorithms in several aspects such as accurate segmentation, size of the descriptions generated, computational efficiency and reconstructed data quality.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133359583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425310
W. Zajdel, J. D. Krijnders, T. Andringa, D. Gavrila
This paper presents a smart surveillance system named CASSANDRA, aimed at detecting instances of aggressive human behavior in public environments. A distinguishing aspect of CASSANDRA is the exploitation of the complimentary nature of audio and video sensing to disambiguate scene activity in real-life, noisy and dynamic environments. At the lower level, independent analysis of the audio and video streams yields intermediate descriptors of a scene like: "scream", "passing train" or "articulation energy". At the higher level, a Dynamic Bayesian Network is used as a fusion mechanism that produces an aggregate aggression indication for the current scene. Our prototype system is validated on a set of scenarios performed by professional actors at an actual train station to ensure a realistic audio and video noise setting.
{"title":"CASSANDRA: audio-video sensor fusion for aggression detection","authors":"W. Zajdel, J. D. Krijnders, T. Andringa, D. Gavrila","doi":"10.1109/AVSS.2007.4425310","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425310","url":null,"abstract":"This paper presents a smart surveillance system named CASSANDRA, aimed at detecting instances of aggressive human behavior in public environments. A distinguishing aspect of CASSANDRA is the exploitation of the complimentary nature of audio and video sensing to disambiguate scene activity in real-life, noisy and dynamic environments. At the lower level, independent analysis of the audio and video streams yields intermediate descriptors of a scene like: \"scream\", \"passing train\" or \"articulation energy\". At the higher level, a Dynamic Bayesian Network is used as a fusion mechanism that produces an aggregate aggression indication for the current scene. Our prototype system is validated on a set of scenarios performed by professional actors at an actual train station to ensure a realistic audio and video noise setting.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133370887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425295
A. Klausner, A. Tengg, C. Leistner, Stefan Erb, B. Rinner
In this article we present our software framework for embedded online data fusion, called I-SENSE. We discuss the fusion model and the decision modeling approach using support vector machines. Due to the system complexity and the genetic approach a data oriented model is introduced. The main focus of the article is targeted at our techniques for extracting features of acoustic-and visual-data. Experimental results of our "traffic surveillance" case study demonstrate the feasibility of our multi-level data fusion approach.
{"title":"An audio-visual sensor fusion approach for feature based vehicle identification","authors":"A. Klausner, A. Tengg, C. Leistner, Stefan Erb, B. Rinner","doi":"10.1109/AVSS.2007.4425295","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425295","url":null,"abstract":"In this article we present our software framework for embedded online data fusion, called I-SENSE. We discuss the fusion model and the decision modeling approach using support vector machines. Due to the system complexity and the genetic approach a data oriented model is introduced. The main focus of the article is targeted at our techniques for extracting features of acoustic-and visual-data. Experimental results of our \"traffic surveillance\" case study demonstrate the feasibility of our multi-level data fusion approach.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"395 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134543224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}