Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425297
Bailing Zhang, Junbum Park, Hanseok Ko
This paper addresses the video surveillance issue of automatically identifying moving vehicles and people from continuous observation of image sequences. With a single far-field surveillance camera, moving objects are first segmented by simple background subtraction. To reduce the redundancy and select the representative prototypes from input video streams, the self-organizing feature map (SOM) is applied for both training and testing sequences. The recognition scheme is designed based on the recently proposed kernel mutual subspace (KMS) model. As an alternative to some probability-based models, KMS does not make assumptions about the data sampling processing and offers an efficient and robust classifier. Experiments demonstrated a highly accurate recognition result, showing the model's applicability in real-world surveillance system.
{"title":"Combination of self-organization map and kernel mutual subspace method for video surveillance","authors":"Bailing Zhang, Junbum Park, Hanseok Ko","doi":"10.1109/AVSS.2007.4425297","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425297","url":null,"abstract":"This paper addresses the video surveillance issue of automatically identifying moving vehicles and people from continuous observation of image sequences. With a single far-field surveillance camera, moving objects are first segmented by simple background subtraction. To reduce the redundancy and select the representative prototypes from input video streams, the self-organizing feature map (SOM) is applied for both training and testing sequences. The recognition scheme is designed based on the recently proposed kernel mutual subspace (KMS) model. As an alternative to some probability-based models, KMS does not make assumptions about the data sampling processing and offers an efficient and robust classifier. Experiments demonstrated a highly accurate recognition result, showing the model's applicability in real-world surveillance system.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134340106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425373
F. Antonacci, Davide Riva, A. Sarti, M. Tagliasacchi, S. Tubaro
In this paper we consider the problem of tracking multiple acoustic sources in reverberant environments. The solution that we propose is based on the combination of two techniques. A blind source separation (BSS) method known as TRINICON [5] is applied to the signals acquired by the microphone arrays. The TRINICON de-mixing filters are used to obtain the Time Differences of Arrival (TDOAs), which are related to the source location through a nonlinear function. A particle filter is then applied in order to localize the sources. Particles move according to a swarm-like dynamics, which significatively reduces the number of particles involved with respect to traditional particle filter. We discuss results for the case of two sources and four microphone pairs. In addition, we propose a method, based on detecting source inactivity, which overcomes the ambiguities that intrinsically arise when only two microphone pairs are used. Experimental results demonstrate that the average localization error on a variety of pseudo-random trajectories is around 40 cm when the T60 reverberation time is 0.6s.
{"title":"Tracking of two acoustic sources in reverberant environments using a particle swarm optimizer","authors":"F. Antonacci, Davide Riva, A. Sarti, M. Tagliasacchi, S. Tubaro","doi":"10.1109/AVSS.2007.4425373","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425373","url":null,"abstract":"In this paper we consider the problem of tracking multiple acoustic sources in reverberant environments. The solution that we propose is based on the combination of two techniques. A blind source separation (BSS) method known as TRINICON [5] is applied to the signals acquired by the microphone arrays. The TRINICON de-mixing filters are used to obtain the Time Differences of Arrival (TDOAs), which are related to the source location through a nonlinear function. A particle filter is then applied in order to localize the sources. Particles move according to a swarm-like dynamics, which significatively reduces the number of particles involved with respect to traditional particle filter. We discuss results for the case of two sources and four microphone pairs. In addition, we propose a method, based on detecting source inactivity, which overcomes the ambiguities that intrinsically arise when only two microphone pairs are used. Experimental results demonstrate that the average localization error on a variety of pseudo-random trajectories is around 40 cm when the T60 reverberation time is 0.6s.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117173865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425345
Xiaoming Liu, N. Krahnstoever, Ting Yu, P. Tu
Computer vision approaches for retail applications can provide value far beyond the common domain of loss prevention. Gaining insight into the movement and behaviors of shoppers is of high interest for marketing, merchandizing, store operations and data mining. Of particular interest is the process of purchase decision making. What catches a customers attention? What products go unnoticed? What does a customer look at before making a final decision? Towards this goal we presents a system that detects and tracks both the location and gaze of shoppers in retail environments. While networks of standard overhead store cameras are used for tracking the location of customers, small in-shelf cameras are used for estimating customer gaze. The presented system operates robustly in real-time and can be deployed in a variety of retail applications.
{"title":"What are customers looking at?","authors":"Xiaoming Liu, N. Krahnstoever, Ting Yu, P. Tu","doi":"10.1109/AVSS.2007.4425345","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425345","url":null,"abstract":"Computer vision approaches for retail applications can provide value far beyond the common domain of loss prevention. Gaining insight into the movement and behaviors of shoppers is of high interest for marketing, merchandizing, store operations and data mining. Of particular interest is the process of purchase decision making. What catches a customers attention? What products go unnoticed? What does a customer look at before making a final decision? Towards this goal we presents a system that detects and tracks both the location and gaze of shoppers in retail environments. While networks of standard overhead store cameras are used for tracking the location of customers, small in-shelf cameras are used for estimating customer gaze. The presented system operates robustly in real-time and can be deployed in a variety of retail applications.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121631708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425330
P. Fihl, T. Moeslund
This paper deals with classification of human gait types based on the notion that different gait types are in fact different types of locomotion, i.e., running is not simply walking done faster. We present the duty-factor, which is a descriptor based on this notion. The duty-factor is independent on the speed of the human, the cameras setup etc. and hence a robust descriptor for gait classification. The duty-factor is basically a matter of measuring the ground support of the feet with respect to the stride. We estimate this by comparing the incoming silhouettes to a database of silhouettes with known ground support. Silhouettes are extracted using the codebook method and represented using shape contexts. The matching with database silhouettes is done using the Hungarian method. While manually estimated duty-factors show a clear classification the presented system contains misclassifications due to silhouette noise and ambiguities in the database silhouettes.
{"title":"Classification of gait types based on the duty-factor","authors":"P. Fihl, T. Moeslund","doi":"10.1109/AVSS.2007.4425330","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425330","url":null,"abstract":"This paper deals with classification of human gait types based on the notion that different gait types are in fact different types of locomotion, i.e., running is not simply walking done faster. We present the duty-factor, which is a descriptor based on this notion. The duty-factor is independent on the speed of the human, the cameras setup etc. and hence a robust descriptor for gait classification. The duty-factor is basically a matter of measuring the ground support of the feet with respect to the stride. We estimate this by comparing the incoming silhouettes to a database of silhouettes with known ground support. Silhouettes are extracted using the codebook method and represented using shape contexts. The matching with database silhouettes is done using the Hungarian method. While manually estimated duty-factors show a clear classification the presented system contains misclassifications due to silhouette noise and ambiguities in the database silhouettes.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123873226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425361
Augusto Destrero, F. Odone, A. Verri
We describe a trainable system for face detection and tracking. The structure of the system is based on multiple cues that discard non face areas as soon as possible: we combine motion, skin, and face detection. The latter is the core of our system and consists of a hierarchy of small SVM classifiers built on the output of an automatic feature selection procedure. Our feature selection is entirely data-driven and allows us to obtain powerful descriptions from a relatively small set of data. Finally, a Kalman tracking on the face region optimizes detection results over time. We present an experimental analysis of the face detection module and results obtained with the whole system on the specific task of counting people entering the scene.
{"title":"A system for face detection and tracking in unconstrained environments","authors":"Augusto Destrero, F. Odone, A. Verri","doi":"10.1109/AVSS.2007.4425361","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425361","url":null,"abstract":"We describe a trainable system for face detection and tracking. The structure of the system is based on multiple cues that discard non face areas as soon as possible: we combine motion, skin, and face detection. The latter is the core of our system and consists of a hierarchy of small SVM classifiers built on the output of an automatic feature selection procedure. Our feature selection is entirely data-driven and allows us to obtain powerful descriptions from a relatively small set of data. Finally, a Kalman tracking on the face region optimizes detection results over time. We present an experimental analysis of the face detection module and results obtained with the whole system on the specific task of counting people entering the scene.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123299359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425301
N. Anjum, A. Cavallaro
Perspective deformations on the image plane make the analysis of object behaviors difficult in surveillance video. In this paper, we improve the results of trajectory-based scene analysis by using single camera calibration for perspective rectification. First, the ground-plane view is estimated from perspective images captured from a single camera. Next, unsupervised fuzzy clustering is applied on the transformed trajectories to group similar behaviors and to isolate outliers. We evaluate the proposed approach on real outdoor surveillance scenarios with standard datasets and show that perspective rectification improves the accuracy of the trajectory clustering results.
{"title":"Single camera calibration for trajectory-based behavior analysis","authors":"N. Anjum, A. Cavallaro","doi":"10.1109/AVSS.2007.4425301","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425301","url":null,"abstract":"Perspective deformations on the image plane make the analysis of object behaviors difficult in surveillance video. In this paper, we improve the results of trajectory-based scene analysis by using single camera calibration for perspective rectification. First, the ground-plane view is estimated from perspective images captured from a single camera. Next, unsupervised fuzzy clustering is applied on the transformed trajectories to group similar behaviors and to isolate outliers. We evaluate the proposed approach on real outdoor surveillance scenarios with standard datasets and show that perspective rectification improves the accuracy of the trajectory clustering results.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128840919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425316
F. Porikli
This paper presents an abandoned item and illegally parked vehicle detection method for single static camera video surveillance applications. By processing the input video at different frame rates, two backgrounds are constructed; one for short-term and another for long-term. Each of these backgrounds is defined as a mixture of Gaussian models, which are adapted using online Bayesian update. Two binary foreground maps are estimated by comparing the current frame with the backgrounds, and motion statistics are aggregated in a likelihood image by applying a set of heuristics to the foreground maps. Likelihood image is then used to differentiate between the pixels that belong to moving objects, temporarily static regions and scene background. Depending on the application, the temporary static regions indicate abandoned items, illegally parked vehicles, objects removed from the scene, etc. The presented pixel-wise method does not require object tracking, thus its performance is not upper-bounded to error prone detection and correspondence tasks that usually fail for crowded scenes. It accurately segments objects even if they are fully occluded. It can also be effectively implemented on a parallel processing architecture.
{"title":"Detection of temporarily static regions by processing video at different frame rates","authors":"F. Porikli","doi":"10.1109/AVSS.2007.4425316","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425316","url":null,"abstract":"This paper presents an abandoned item and illegally parked vehicle detection method for single static camera video surveillance applications. By processing the input video at different frame rates, two backgrounds are constructed; one for short-term and another for long-term. Each of these backgrounds is defined as a mixture of Gaussian models, which are adapted using online Bayesian update. Two binary foreground maps are estimated by comparing the current frame with the backgrounds, and motion statistics are aggregated in a likelihood image by applying a set of heuristics to the foreground maps. Likelihood image is then used to differentiate between the pixels that belong to moving objects, temporarily static regions and scene background. Depending on the application, the temporary static regions indicate abandoned items, illegally parked vehicles, objects removed from the scene, etc. The presented pixel-wise method does not require object tracking, thus its performance is not upper-bounded to error prone detection and correspondence tasks that usually fail for crowded scenes. It accurately segments objects even if they are fully occluded. It can also be effectively implemented on a parallel processing architecture.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129385668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425369
Jian Li, S. G. Nikolov, C. Benton, N. Scott-Samuel
We describe our studies on summarising surveillance videos using optical flow information. The proposed method incorporates motion analysis into a video skimming scheme in which the playback speed is determined by the detectability of interesting motion behaviours according to prior information. A psycho-visual experiment was conducted to compare human performance and viewing strategy for summarised videos using standard video skimming techniques and a proposed motion-based adaptive summarisation technique.
{"title":"Adaptive summarisation of surveillance video sequences","authors":"Jian Li, S. G. Nikolov, C. Benton, N. Scott-Samuel","doi":"10.1109/AVSS.2007.4425369","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425369","url":null,"abstract":"We describe our studies on summarising surveillance videos using optical flow information. The proposed method incorporates motion analysis into a video skimming scheme in which the playback speed is determined by the detectability of interesting motion behaviours according to prior information. A psycho-visual experiment was conducted to compare human performance and viewing strategy for summarised videos using standard video skimming techniques and a proposed motion-based adaptive summarisation technique.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129542810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425347
A. Leykin, M. Tuceryan
We present a generalized extensible framework for automated recognition of swarming activities in video sequences. The trajectory of each individual is produced by the visual tracking sub-system and is further analyzed to detect certain types of high-level grouping behavior. We utilize recent findings in swarming behavior analysis to formulate a problem in terms of the specific distance function that we subsequently apply as part of the two-stage agglomerative clustering method to create a set of swarming events followed by a set of swarming activities. In this paper we present results for one particular type of swarming: shopper grouping. As part of this work the events detected in a relatively short time interval are further integrated into activities, the manifestation of prolonged high-level swarming behavior. The results demonstrate the ability of our method to detect such activities in congested surveillance videos. In particular in three hours of indoor retail store video, our method has correctly identified over85% of valid '"shopper-groups'" with a very low level of false positives, validated against human coded ground truth.
{"title":"Detecting shopper groups in video sequences","authors":"A. Leykin, M. Tuceryan","doi":"10.1109/AVSS.2007.4425347","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425347","url":null,"abstract":"We present a generalized extensible framework for automated recognition of swarming activities in video sequences. The trajectory of each individual is produced by the visual tracking sub-system and is further analyzed to detect certain types of high-level grouping behavior. We utilize recent findings in swarming behavior analysis to formulate a problem in terms of the specific distance function that we subsequently apply as part of the two-stage agglomerative clustering method to create a set of swarming events followed by a set of swarming activities. In this paper we present results for one particular type of swarming: shopper grouping. As part of this work the events detected in a relatively short time interval are further integrated into activities, the manifestation of prolonged high-level swarming behavior. The results demonstrate the ability of our method to detect such activities in congested surveillance videos. In particular in three hours of indoor retail store video, our method has correctly identified over85% of valid '\"shopper-groups'\" with a very low level of false positives, validated against human coded ground truth.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126879031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425308
Christopher S. Madden, M. Piccardi
This paper presents a framework based on robust shape and appearance features for matching the various tracks generated by a single individual moving within a surveillance system. Each track is first automatically analysed in order to detect and remove the frames affected by large segmentation errors and drastic changes in illumination. The object's features computed over the remaining frames prove more robust and capable of supporting correct matching of tracks even in the case of significantly disjointed camera views. The shape and appearance features used include a height estimate as well as illumination-tolerant colour representation of the individual's global colours and the colours of the upper and lower portions of clothing. The results of a test from a real surveillance system show that the combination of these four features can provide a probability of matching as high as 91 percent with 5 percent probability of false alarms under views which have significantly differing illumination levels and suffer from significant segmentation errors in as many as 1 in 4 frames.
{"title":"A framework for track matching across disjoint cameras using robust shape and appearance features","authors":"Christopher S. Madden, M. Piccardi","doi":"10.1109/AVSS.2007.4425308","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425308","url":null,"abstract":"This paper presents a framework based on robust shape and appearance features for matching the various tracks generated by a single individual moving within a surveillance system. Each track is first automatically analysed in order to detect and remove the frames affected by large segmentation errors and drastic changes in illumination. The object's features computed over the remaining frames prove more robust and capable of supporting correct matching of tracks even in the case of significantly disjointed camera views. The shape and appearance features used include a height estimate as well as illumination-tolerant colour representation of the individual's global colours and the colours of the upper and lower portions of clothing. The results of a test from a real surveillance system show that the combination of these four features can provide a probability of matching as high as 91 percent with 5 percent probability of false alarms under views which have significantly differing illumination levels and suffer from significant segmentation errors in as many as 1 in 4 frames.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127391501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}