Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425373
F. Antonacci, Davide Riva, A. Sarti, M. Tagliasacchi, S. Tubaro
In this paper we consider the problem of tracking multiple acoustic sources in reverberant environments. The solution that we propose is based on the combination of two techniques. A blind source separation (BSS) method known as TRINICON [5] is applied to the signals acquired by the microphone arrays. The TRINICON de-mixing filters are used to obtain the Time Differences of Arrival (TDOAs), which are related to the source location through a nonlinear function. A particle filter is then applied in order to localize the sources. Particles move according to a swarm-like dynamics, which significatively reduces the number of particles involved with respect to traditional particle filter. We discuss results for the case of two sources and four microphone pairs. In addition, we propose a method, based on detecting source inactivity, which overcomes the ambiguities that intrinsically arise when only two microphone pairs are used. Experimental results demonstrate that the average localization error on a variety of pseudo-random trajectories is around 40 cm when the T60 reverberation time is 0.6s.
{"title":"Tracking of two acoustic sources in reverberant environments using a particle swarm optimizer","authors":"F. Antonacci, Davide Riva, A. Sarti, M. Tagliasacchi, S. Tubaro","doi":"10.1109/AVSS.2007.4425373","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425373","url":null,"abstract":"In this paper we consider the problem of tracking multiple acoustic sources in reverberant environments. The solution that we propose is based on the combination of two techniques. A blind source separation (BSS) method known as TRINICON [5] is applied to the signals acquired by the microphone arrays. The TRINICON de-mixing filters are used to obtain the Time Differences of Arrival (TDOAs), which are related to the source location through a nonlinear function. A particle filter is then applied in order to localize the sources. Particles move according to a swarm-like dynamics, which significatively reduces the number of particles involved with respect to traditional particle filter. We discuss results for the case of two sources and four microphone pairs. In addition, we propose a method, based on detecting source inactivity, which overcomes the ambiguities that intrinsically arise when only two microphone pairs are used. Experimental results demonstrate that the average localization error on a variety of pseudo-random trajectories is around 40 cm when the T60 reverberation time is 0.6s.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117173865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425361
Augusto Destrero, F. Odone, A. Verri
We describe a trainable system for face detection and tracking. The structure of the system is based on multiple cues that discard non face areas as soon as possible: we combine motion, skin, and face detection. The latter is the core of our system and consists of a hierarchy of small SVM classifiers built on the output of an automatic feature selection procedure. Our feature selection is entirely data-driven and allows us to obtain powerful descriptions from a relatively small set of data. Finally, a Kalman tracking on the face region optimizes detection results over time. We present an experimental analysis of the face detection module and results obtained with the whole system on the specific task of counting people entering the scene.
{"title":"A system for face detection and tracking in unconstrained environments","authors":"Augusto Destrero, F. Odone, A. Verri","doi":"10.1109/AVSS.2007.4425361","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425361","url":null,"abstract":"We describe a trainable system for face detection and tracking. The structure of the system is based on multiple cues that discard non face areas as soon as possible: we combine motion, skin, and face detection. The latter is the core of our system and consists of a hierarchy of small SVM classifiers built on the output of an automatic feature selection procedure. Our feature selection is entirely data-driven and allows us to obtain powerful descriptions from a relatively small set of data. Finally, a Kalman tracking on the face region optimizes detection results over time. We present an experimental analysis of the face detection module and results obtained with the whole system on the specific task of counting people entering the scene.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123299359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425349
C. Panagiotakis, E. Ramasso, G. Tziritas, M. Rombaut, D. Pellerin
We propose a general framework that focuses on automatic individual/multiple people motion-shape analysis and on suitable features extraction that can be used on action/activity recognition problems under real, dynamical and unconstrained environments. We have considered various athletic videos from a single uncalibrated, possibly moving camera in order to evaluate the robustness of the proposed method. We have used an easily expanded hierarchical scheme in order to classify them to videos of individual and team sports. Robust, adaptive and independent from the camera motion, the proposed features are combined within Transferable Belief Model (TBM) framework providing a two level (frames and shot) video categorization. The experimental results of 97% individual/team sport categorization accuracy, using a dataset of more than 250 videos of athletic meetings indicate the good performance of the proposed scheme.
{"title":"Automatic people detection and counting for athletic videos classification","authors":"C. Panagiotakis, E. Ramasso, G. Tziritas, M. Rombaut, D. Pellerin","doi":"10.1109/AVSS.2007.4425349","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425349","url":null,"abstract":"We propose a general framework that focuses on automatic individual/multiple people motion-shape analysis and on suitable features extraction that can be used on action/activity recognition problems under real, dynamical and unconstrained environments. We have considered various athletic videos from a single uncalibrated, possibly moving camera in order to evaluate the robustness of the proposed method. We have used an easily expanded hierarchical scheme in order to classify them to videos of individual and team sports. Robust, adaptive and independent from the camera motion, the proposed features are combined within Transferable Belief Model (TBM) framework providing a two level (frames and shot) video categorization. The experimental results of 97% individual/team sport categorization accuracy, using a dataset of more than 250 videos of athletic meetings indicate the good performance of the proposed scheme.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122667984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425330
P. Fihl, T. Moeslund
This paper deals with classification of human gait types based on the notion that different gait types are in fact different types of locomotion, i.e., running is not simply walking done faster. We present the duty-factor, which is a descriptor based on this notion. The duty-factor is independent on the speed of the human, the cameras setup etc. and hence a robust descriptor for gait classification. The duty-factor is basically a matter of measuring the ground support of the feet with respect to the stride. We estimate this by comparing the incoming silhouettes to a database of silhouettes with known ground support. Silhouettes are extracted using the codebook method and represented using shape contexts. The matching with database silhouettes is done using the Hungarian method. While manually estimated duty-factors show a clear classification the presented system contains misclassifications due to silhouette noise and ambiguities in the database silhouettes.
{"title":"Classification of gait types based on the duty-factor","authors":"P. Fihl, T. Moeslund","doi":"10.1109/AVSS.2007.4425330","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425330","url":null,"abstract":"This paper deals with classification of human gait types based on the notion that different gait types are in fact different types of locomotion, i.e., running is not simply walking done faster. We present the duty-factor, which is a descriptor based on this notion. The duty-factor is independent on the speed of the human, the cameras setup etc. and hence a robust descriptor for gait classification. The duty-factor is basically a matter of measuring the ground support of the feet with respect to the stride. We estimate this by comparing the incoming silhouettes to a database of silhouettes with known ground support. Silhouettes are extracted using the codebook method and represented using shape contexts. The matching with database silhouettes is done using the Hungarian method. While manually estimated duty-factors show a clear classification the presented system contains misclassifications due to silhouette noise and ambiguities in the database silhouettes.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123873226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425339
Ghassan O. Karame, A. Stergiou, N. Katsarakis, Panagiotis Papageorgiou, Aristodemos Pnevmatikakis
In this paper, we address face tracking of multiple people in complex 3D scenes, using multiple calibrated and synchronized far-field recordings. We localize faces in every camera view and associate them across the different views. To cope with the complexity of 2D face localization introduced by the multitude of people and unconstrained face poses, a combination of stochastic and deterministic trackers, detectors and a Gaussian mixture model for face validation are utilized. Then faces of the same person seen from the different cameras are associated by first finding all possible associations and then choosing the best option by means of a 3D stochastic tracker. The performance of the proposed system is evaluated and is found enhanced compared to existing systems.
{"title":"2D and 3D face localization for complex scenes","authors":"Ghassan O. Karame, A. Stergiou, N. Katsarakis, Panagiotis Papageorgiou, Aristodemos Pnevmatikakis","doi":"10.1109/AVSS.2007.4425339","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425339","url":null,"abstract":"In this paper, we address face tracking of multiple people in complex 3D scenes, using multiple calibrated and synchronized far-field recordings. We localize faces in every camera view and associate them across the different views. To cope with the complexity of 2D face localization introduced by the multitude of people and unconstrained face poses, a combination of stochastic and deterministic trackers, detectors and a Gaussian mixture model for face validation are utilized. Then faces of the same person seen from the different cameras are associated by first finding all possible associations and then choosing the best option by means of a 3D stochastic tracker. The performance of the proposed system is evaluated and is found enhanced compared to existing systems.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126145110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425316
F. Porikli
This paper presents an abandoned item and illegally parked vehicle detection method for single static camera video surveillance applications. By processing the input video at different frame rates, two backgrounds are constructed; one for short-term and another for long-term. Each of these backgrounds is defined as a mixture of Gaussian models, which are adapted using online Bayesian update. Two binary foreground maps are estimated by comparing the current frame with the backgrounds, and motion statistics are aggregated in a likelihood image by applying a set of heuristics to the foreground maps. Likelihood image is then used to differentiate between the pixels that belong to moving objects, temporarily static regions and scene background. Depending on the application, the temporary static regions indicate abandoned items, illegally parked vehicles, objects removed from the scene, etc. The presented pixel-wise method does not require object tracking, thus its performance is not upper-bounded to error prone detection and correspondence tasks that usually fail for crowded scenes. It accurately segments objects even if they are fully occluded. It can also be effectively implemented on a parallel processing architecture.
{"title":"Detection of temporarily static regions by processing video at different frame rates","authors":"F. Porikli","doi":"10.1109/AVSS.2007.4425316","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425316","url":null,"abstract":"This paper presents an abandoned item and illegally parked vehicle detection method for single static camera video surveillance applications. By processing the input video at different frame rates, two backgrounds are constructed; one for short-term and another for long-term. Each of these backgrounds is defined as a mixture of Gaussian models, which are adapted using online Bayesian update. Two binary foreground maps are estimated by comparing the current frame with the backgrounds, and motion statistics are aggregated in a likelihood image by applying a set of heuristics to the foreground maps. Likelihood image is then used to differentiate between the pixels that belong to moving objects, temporarily static regions and scene background. Depending on the application, the temporary static regions indicate abandoned items, illegally parked vehicles, objects removed from the scene, etc. The presented pixel-wise method does not require object tracking, thus its performance is not upper-bounded to error prone detection and correspondence tasks that usually fail for crowded scenes. It accurately segments objects even if they are fully occluded. It can also be effectively implemented on a parallel processing architecture.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129385668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425369
Jian Li, S. G. Nikolov, C. Benton, N. Scott-Samuel
We describe our studies on summarising surveillance videos using optical flow information. The proposed method incorporates motion analysis into a video skimming scheme in which the playback speed is determined by the detectability of interesting motion behaviours according to prior information. A psycho-visual experiment was conducted to compare human performance and viewing strategy for summarised videos using standard video skimming techniques and a proposed motion-based adaptive summarisation technique.
{"title":"Adaptive summarisation of surveillance video sequences","authors":"Jian Li, S. G. Nikolov, C. Benton, N. Scott-Samuel","doi":"10.1109/AVSS.2007.4425369","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425369","url":null,"abstract":"We describe our studies on summarising surveillance videos using optical flow information. The proposed method incorporates motion analysis into a video skimming scheme in which the playback speed is determined by the detectability of interesting motion behaviours according to prior information. A psycho-visual experiment was conducted to compare human performance and viewing strategy for summarised videos using standard video skimming techniques and a proposed motion-based adaptive summarisation technique.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129542810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425301
N. Anjum, A. Cavallaro
Perspective deformations on the image plane make the analysis of object behaviors difficult in surveillance video. In this paper, we improve the results of trajectory-based scene analysis by using single camera calibration for perspective rectification. First, the ground-plane view is estimated from perspective images captured from a single camera. Next, unsupervised fuzzy clustering is applied on the transformed trajectories to group similar behaviors and to isolate outliers. We evaluate the proposed approach on real outdoor surveillance scenarios with standard datasets and show that perspective rectification improves the accuracy of the trajectory clustering results.
{"title":"Single camera calibration for trajectory-based behavior analysis","authors":"N. Anjum, A. Cavallaro","doi":"10.1109/AVSS.2007.4425301","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425301","url":null,"abstract":"Perspective deformations on the image plane make the analysis of object behaviors difficult in surveillance video. In this paper, we improve the results of trajectory-based scene analysis by using single camera calibration for perspective rectification. First, the ground-plane view is estimated from perspective images captured from a single camera. Next, unsupervised fuzzy clustering is applied on the transformed trajectories to group similar behaviors and to isolate outliers. We evaluate the proposed approach on real outdoor surveillance scenarios with standard datasets and show that perspective rectification improves the accuracy of the trajectory clustering results.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128840919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425345
Xiaoming Liu, N. Krahnstoever, Ting Yu, P. Tu
Computer vision approaches for retail applications can provide value far beyond the common domain of loss prevention. Gaining insight into the movement and behaviors of shoppers is of high interest for marketing, merchandizing, store operations and data mining. Of particular interest is the process of purchase decision making. What catches a customers attention? What products go unnoticed? What does a customer look at before making a final decision? Towards this goal we presents a system that detects and tracks both the location and gaze of shoppers in retail environments. While networks of standard overhead store cameras are used for tracking the location of customers, small in-shelf cameras are used for estimating customer gaze. The presented system operates robustly in real-time and can be deployed in a variety of retail applications.
{"title":"What are customers looking at?","authors":"Xiaoming Liu, N. Krahnstoever, Ting Yu, P. Tu","doi":"10.1109/AVSS.2007.4425345","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425345","url":null,"abstract":"Computer vision approaches for retail applications can provide value far beyond the common domain of loss prevention. Gaining insight into the movement and behaviors of shoppers is of high interest for marketing, merchandizing, store operations and data mining. Of particular interest is the process of purchase decision making. What catches a customers attention? What products go unnoticed? What does a customer look at before making a final decision? Towards this goal we presents a system that detects and tracks both the location and gaze of shoppers in retail environments. While networks of standard overhead store cameras are used for tracking the location of customers, small in-shelf cameras are used for estimating customer gaze. The presented system operates robustly in real-time and can be deployed in a variety of retail applications.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121631708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425348
A. Senior, L. Brown, A. Hampapur, Chiao-Fe Shu, Y. Zhai, R. Feris, Ying-li Tian, S. Borger, Christopher R. Carlson
We describe a set of tools for retail analytics based on a combination of video understanding and transaction-log. Tools are provided for loss prevention (returns fraud and cashier fraud), store operations (customer counting) and merchandising (display effectiveness). Results are presented on returns fraud and customer counting.
{"title":"Video analytics for retail","authors":"A. Senior, L. Brown, A. Hampapur, Chiao-Fe Shu, Y. Zhai, R. Feris, Ying-li Tian, S. Borger, Christopher R. Carlson","doi":"10.1109/AVSS.2007.4425348","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425348","url":null,"abstract":"We describe a set of tools for retail analytics based on a combination of video understanding and transaction-log. Tools are provided for loss prevention (returns fraud and cashier fraud), store operations (customer counting) and merchandising (display effectiveness). Results are presented on returns fraud and customer counting.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126740364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}