Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425343
R. Pflugfelder, H. Bischof
People tracking is of fundamental importance in multi-camera surveillance systems. In recent years, many approaches for multi-camera tracking have been discussed. Most methods use either various image features or the geometric relation between the cameras or both as a cue. It is a desire to know the geometry for distant cameras, because geometry is not influenced by, for example, drastic changes in object appearance or in scene illumination. However, the determination of the camera geometry is cumbersome. The paper tries to solve this problem and contributes in two different ways. On the one hand, an approach is presented that calibrates two distant cameras automatically. We continue previous work and focus especially on the calibration of the extrinsic parameters. Point correspondences are used for this task which are acquired by detecting points on top of people's heads. On the other hand, qualitative experimental results with the PETS 2006 benchmark data show that the self-calibration is accurate enough for a solely geometric tracking of people across distant cameras. Reliable features for a matching are hardly available in such cases.
{"title":"People tracking across two distant self-calibrated cameras","authors":"R. Pflugfelder, H. Bischof","doi":"10.1109/AVSS.2007.4425343","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425343","url":null,"abstract":"People tracking is of fundamental importance in multi-camera surveillance systems. In recent years, many approaches for multi-camera tracking have been discussed. Most methods use either various image features or the geometric relation between the cameras or both as a cue. It is a desire to know the geometry for distant cameras, because geometry is not influenced by, for example, drastic changes in object appearance or in scene illumination. However, the determination of the camera geometry is cumbersome. The paper tries to solve this problem and contributes in two different ways. On the one hand, an approach is presented that calibrates two distant cameras automatically. We continue previous work and focus especially on the calibration of the extrinsic parameters. Point correspondences are used for this task which are acquired by detecting points on top of people's heads. On the other hand, qualitative experimental results with the PETS 2006 benchmark data show that the self-calibration is accurate enough for a solely geometric tracking of people across distant cameras. Reliable features for a matching are hardly available in such cases.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124125298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425319
J. T. Lee, M. Ryoo, Matthew Riley, J. Aggarwal
With decreasing costs of high quality surveillance systems, human activity detection and tracking has become increasingly practical. Accordingly, automated systems have been designed for numerous detection tasks, but the task of detecting illegally parked vehicles has been left largely to the human operators of surveillance systems. We propose a methodology for detecting this event in realtime by applying a novel image projection that reduces the dimensionality of the image data and thus reduces the computational complexity of the segmentation and tracking processes. After event detection, we invert the transformation to recover the original appearance of the vehicle and to allow for further processing that may require the two dimensional data. The proposed algorithm is able to successfully recognize illegally parked vehicles in real-time in the i-LIDS bag and vehicle detection challenge datasets.
{"title":"Real-time detection of illegally parked vehicles using 1-D transformation","authors":"J. T. Lee, M. Ryoo, Matthew Riley, J. Aggarwal","doi":"10.1109/AVSS.2007.4425319","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425319","url":null,"abstract":"With decreasing costs of high quality surveillance systems, human activity detection and tracking has become increasingly practical. Accordingly, automated systems have been designed for numerous detection tasks, but the task of detecting illegally parked vehicles has been left largely to the human operators of surveillance systems. We propose a methodology for detecting this event in realtime by applying a novel image projection that reduces the dimensionality of the image data and thus reduces the computational complexity of the segmentation and tracking processes. After event detection, we invert the transformation to recover the original appearance of the vehicle and to allow for further processing that may require the two dimensional data. The proposed algorithm is able to successfully recognize illegally parked vehicles in real-time in the i-LIDS bag and vehicle detection challenge datasets.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125906210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425277
M. Kemp
There has been intense interest in the use of millimetre wave and terahertz technology for the detection of concealed weapons, explosives and other threats. Radiation at these frequencies is safe, penetrates barriers and has short enough wavelengths to allow discrimination between objects. In addition, many solids including explosives have characteristic spectroscopic signatures at terahertz wavelengths which can be used to identify them. This paper reviews the progress which has been made in recent years and identifies the achievements, challenges and prospects for these technologies in checkpoint people screening, stand off detection of improvised explosive devices (lEDs) and suicide bombers as well as more specialized screening tasks.
{"title":"Detecting hidden objects: Security imaging using millimetre-waves and terahertz","authors":"M. Kemp","doi":"10.1109/AVSS.2007.4425277","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425277","url":null,"abstract":"There has been intense interest in the use of millimetre wave and terahertz technology for the detection of concealed weapons, explosives and other threats. Radiation at these frequencies is safe, penetrates barriers and has short enough wavelengths to allow discrimination between objects. In addition, many solids including explosives have characteristic spectroscopic signatures at terahertz wavelengths which can be used to identify them. This paper reviews the progress which has been made in recent years and identifies the achievements, challenges and prospects for these technologies in checkpoint people screening, stand off detection of improvised explosive devices (lEDs) and suicide bombers as well as more specialized screening tasks.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130086897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425362
Caifeng Shan, S. Gong, P. McOwan
Computer vision based gender classification is an important component in visual surveillance systems. In this paper, we investigate gender classification from human gaits in image sequences, a relatively understudied problem. Moreover, we propose to fuse gait and face for improved gender discrimination. We exploit Canonical Correlation Analysis (CCA), a powerful tool that is well suited for relating two sets of measurements, to fuse the two modalities at the feature level. Experiments demonstrate that our multimodal gender recognition system achieves the superior recognition performance of 97.2% in large datasets.
{"title":"Learning gender from human gaits and faces","authors":"Caifeng Shan, S. Gong, P. McOwan","doi":"10.1109/AVSS.2007.4425362","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425362","url":null,"abstract":"Computer vision based gender classification is an important component in visual surveillance systems. In this paper, we investigate gender classification from human gaits in image sequences, a relatively understudied problem. Moreover, we propose to fuse gait and face for improved gender discrimination. We exploit Canonical Correlation Analysis (CCA), a powerful tool that is well suited for relating two sets of measurements, to fuse the two modalities at the feature level. Experiments demonstrate that our multimodal gender recognition system achieves the superior recognition performance of 97.2% in large datasets.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"75 2-3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133055342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425305
F. Maire
Maintenance trains travel in convoy. In Australia, only the first train of the convoy pays attention to the track signalization (the other convoy vehicles simply follow the preceding vehicle). Because of human errors, collisions can happen between the maintenance vehicles. Although an anti-collision system based on a laser distance meter is already in operation, the existing system has a limited range due to the curvature of the tracks. In this paper, we introduce an anti-collision system based on vision. The proposed system induces a 3D model of the track as a piecewise quadratic function (with continuity constraints on the function and its derivative). The geometric constraints of the rail tracks allow the creation of a completely self-calibrating system. Although road lane marking detection algorithms perform well most of the time for rail detection, the metallic surface of a rail does not always behave like a road lane marking. Therefore we had to develop new techniques to address the specific problems of the reflectance of rails.
{"title":"Vision based anti-collision system for rail track maintenance vehicles","authors":"F. Maire","doi":"10.1109/AVSS.2007.4425305","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425305","url":null,"abstract":"Maintenance trains travel in convoy. In Australia, only the first train of the convoy pays attention to the track signalization (the other convoy vehicles simply follow the preceding vehicle). Because of human errors, collisions can happen between the maintenance vehicles. Although an anti-collision system based on a laser distance meter is already in operation, the existing system has a limited range due to the curvature of the tracks. In this paper, we introduce an anti-collision system based on vision. The proposed system induces a 3D model of the track as a piecewise quadratic function (with continuity constraints on the function and its derivative). The geometric constraints of the rail tracks allow the creation of a completely self-calibrating system. Although road lane marking detection algorithms perform well most of the time for rail detection, the metallic surface of a rail does not always behave like a road lane marking. Therefore we had to develop new techniques to address the specific problems of the reflectance of rails.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128838180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425302
C. Piciarelli, G. Foresti
One of the most promising approaches to event analysis in video sequences is based on the automatic modelling of common patterns of activity for later detection of anomalous events. This approach is especially useful in those applications that do not necessarily require the exact identification of the events, but need only the detection of anomalies that should be reported to a human operator (e.g. video surveillance or traffic monitoring applications). In this paper we propose a trajectory analysis method based on Support Vector Machines; the SVM model is trained on a given set of trajectories and can subsequently detect trajectories substantially differing from the training ones. Particular emphasis is placed on a novel method for estimating the parameter v, since it heavily influences the performances of the system but cannot be easily estimated a-priori. Experimental results are given both on synthetic and real-world data.
{"title":"Anomalous trajectory detection using support vector machines","authors":"C. Piciarelli, G. Foresti","doi":"10.1109/AVSS.2007.4425302","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425302","url":null,"abstract":"One of the most promising approaches to event analysis in video sequences is based on the automatic modelling of common patterns of activity for later detection of anomalous events. This approach is especially useful in those applications that do not necessarily require the exact identification of the events, but need only the detection of anomalies that should be reported to a human operator (e.g. video surveillance or traffic monitoring applications). In this paper we propose a trajectory analysis method based on Support Vector Machines; the SVM model is trained on a given set of trajectories and can subsequently detect trajectories substantially differing from the training ones. Particular emphasis is placed on a novel method for estimating the parameter v, since it heavily influences the performances of the system but cannot be easily estimated a-priori. Experimental results are given both on synthetic and real-world data.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"21 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133826502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425333
Ning Jin, F. Mokhtarian
We propose an image-based shape model for view-invariant human motion recognition. Image-based visual hull explicitly represents the 3D shape of an object, which is computed from a set of silhouettes. We then use the set of silhouettes to implicitly represent the visual hull. Due to the fact that a silhouette is the 2D projection of an object in the 3D world with respect to a certain camera, which is sensitive to the point of view, our multi-silhouette representation for the visual hull entails the correspondence between views. To guarantee the correspondence, we define a canonical multi-camera system and a canonical human body orientation in motions. We then "normalize" all the constructed visual hulls into the canonical multi-camera system, align them to follow the canonical orientation, and finally render them. The rendered views thereby satisfy the requirement of the correspondence. In our visual hull's representation, each silhouette is represented as a fixed number of sampled points on its closed contour, therefore, the 3D shape information is implicitly encoded into the concatenation of multiple 2D contours. Each motion class is then learned by a Hidden Markov Model (HMM) with mixture of Gaussians outputs. Experiments using our algorithm over some data sets give encouraging results.
{"title":"Image-based shape model for view-invariant human motion recognition","authors":"Ning Jin, F. Mokhtarian","doi":"10.1109/AVSS.2007.4425333","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425333","url":null,"abstract":"We propose an image-based shape model for view-invariant human motion recognition. Image-based visual hull explicitly represents the 3D shape of an object, which is computed from a set of silhouettes. We then use the set of silhouettes to implicitly represent the visual hull. Due to the fact that a silhouette is the 2D projection of an object in the 3D world with respect to a certain camera, which is sensitive to the point of view, our multi-silhouette representation for the visual hull entails the correspondence between views. To guarantee the correspondence, we define a canonical multi-camera system and a canonical human body orientation in motions. We then \"normalize\" all the constructed visual hulls into the canonical multi-camera system, align them to follow the canonical orientation, and finally render them. The rendered views thereby satisfy the requirement of the correspondence. In our visual hull's representation, each silhouette is represented as a fixed number of sampled points on its closed contour, therefore, the 3D shape information is implicitly encoded into the concatenation of multiple 2D contours. Each motion class is then learned by a Hidden Markov Model (HMM) with mixture of Gaussians outputs. Experiments using our algorithm over some data sets give encouraging results.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133541418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425337
Juan Carlos San Miguel, J. Sanchez
This paper presents the results of analysing the effect of different motion segmentation techniques in a system that transmits the information captured by a static surveillance camera in an adaptative way based on the on-line generation of descriptions and their descriptions at different levels of detail. The video sequences are analyzed to detect the regions of activity (motion analysis) and to differentiate them from the background, and the corresponding descriptions (mainly MPEG-7 moving regions) are generated together with the textures of the moving regions and the associated background image. Depending on the available bandwidth, different levels of transmission are specified, ranging from just sending the descriptions generated to a transmission with all the associated images corresponding to the moving objects and background. We study the effect of three motion segmentation algorithms in several aspects such as accurate segmentation, size of the descriptions generated, computational efficiency and reconstructed data quality.
{"title":"On the effect of motion segmentation techniques in description based adaptive video transmission","authors":"Juan Carlos San Miguel, J. Sanchez","doi":"10.1109/AVSS.2007.4425337","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425337","url":null,"abstract":"This paper presents the results of analysing the effect of different motion segmentation techniques in a system that transmits the information captured by a static surveillance camera in an adaptative way based on the on-line generation of descriptions and their descriptions at different levels of detail. The video sequences are analyzed to detect the regions of activity (motion analysis) and to differentiate them from the background, and the corresponding descriptions (mainly MPEG-7 moving regions) are generated together with the textures of the moving regions and the associated background image. Depending on the available bandwidth, different levels of transmission are specified, ranging from just sending the descriptions generated to a transmission with all the associated images corresponding to the moving objects and background. We study the effect of three motion segmentation algorithms in several aspects such as accurate segmentation, size of the descriptions generated, computational efficiency and reconstructed data quality.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133359583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425310
W. Zajdel, J. D. Krijnders, T. Andringa, D. Gavrila
This paper presents a smart surveillance system named CASSANDRA, aimed at detecting instances of aggressive human behavior in public environments. A distinguishing aspect of CASSANDRA is the exploitation of the complimentary nature of audio and video sensing to disambiguate scene activity in real-life, noisy and dynamic environments. At the lower level, independent analysis of the audio and video streams yields intermediate descriptors of a scene like: "scream", "passing train" or "articulation energy". At the higher level, a Dynamic Bayesian Network is used as a fusion mechanism that produces an aggregate aggression indication for the current scene. Our prototype system is validated on a set of scenarios performed by professional actors at an actual train station to ensure a realistic audio and video noise setting.
{"title":"CASSANDRA: audio-video sensor fusion for aggression detection","authors":"W. Zajdel, J. D. Krijnders, T. Andringa, D. Gavrila","doi":"10.1109/AVSS.2007.4425310","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425310","url":null,"abstract":"This paper presents a smart surveillance system named CASSANDRA, aimed at detecting instances of aggressive human behavior in public environments. A distinguishing aspect of CASSANDRA is the exploitation of the complimentary nature of audio and video sensing to disambiguate scene activity in real-life, noisy and dynamic environments. At the lower level, independent analysis of the audio and video streams yields intermediate descriptors of a scene like: \"scream\", \"passing train\" or \"articulation energy\". At the higher level, a Dynamic Bayesian Network is used as a fusion mechanism that produces an aggregate aggression indication for the current scene. Our prototype system is validated on a set of scenarios performed by professional actors at an actual train station to ensure a realistic audio and video noise setting.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133370887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425295
A. Klausner, A. Tengg, C. Leistner, Stefan Erb, B. Rinner
In this article we present our software framework for embedded online data fusion, called I-SENSE. We discuss the fusion model and the decision modeling approach using support vector machines. Due to the system complexity and the genetic approach a data oriented model is introduced. The main focus of the article is targeted at our techniques for extracting features of acoustic-and visual-data. Experimental results of our "traffic surveillance" case study demonstrate the feasibility of our multi-level data fusion approach.
{"title":"An audio-visual sensor fusion approach for feature based vehicle identification","authors":"A. Klausner, A. Tengg, C. Leistner, Stefan Erb, B. Rinner","doi":"10.1109/AVSS.2007.4425295","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425295","url":null,"abstract":"In this article we present our software framework for embedded online data fusion, called I-SENSE. We discuss the fusion model and the decision modeling approach using support vector machines. Due to the system complexity and the genetic approach a data oriented model is introduced. The main focus of the article is targeted at our techniques for extracting features of acoustic-and visual-data. Experimental results of our \"traffic surveillance\" case study demonstrate the feasibility of our multi-level data fusion approach.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"395 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134543224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}