Pub Date : 2005-10-15DOI: 10.1109/VSPETS.2005.1570905
Daniela Hall, J. Nascimento, P. Ribeiro, E. Andrade, Plinio Moreno, S. Pesnel, T. List, R. Emonet, R. Fisher, J. S. Victor, J. Crowley
This article compares the performance of target detectors based on adaptive background differencing on public benchmark data. Five state of the art methods are described. The performance is evaluated using state of the art measures with respect to ground truth. The original points are the comparison to hand labelled ground truth and the evaluation on a large database. The simpler methods LOTS and SGM are more appropriate to the particular task as MGM using a more complex background model.
{"title":"Comparison of target detection algorithms using adaptive background models","authors":"Daniela Hall, J. Nascimento, P. Ribeiro, E. Andrade, Plinio Moreno, S. Pesnel, T. List, R. Emonet, R. Fisher, J. S. Victor, J. Crowley","doi":"10.1109/VSPETS.2005.1570905","DOIUrl":"https://doi.org/10.1109/VSPETS.2005.1570905","url":null,"abstract":"This article compares the performance of target detectors based on adaptive background differencing on public benchmark data. Five state of the art methods are described. The performance is evaluated using state of the art measures with respect to ground truth. The original points are the comparison to hand labelled ground truth and the evaluation on a large database. The simpler methods LOTS and SGM are more appropriate to the particular task as MGM using a more complex background model.","PeriodicalId":435841,"journal":{"name":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125101262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-10-15DOI: 10.1109/VSPETS.2005.1570920
J. Grahn, H. Kjellstromg
This paper presents a method for detection of humans in video. Detection is here formulated as the problem of classifying the image patterns in a range of windows of different size in a video frame as "human" or "non-human". Computational efficiency is of core importance, which leads us to utilize fast methods for image preprocessing and classification. Linear spatio-temporal difference filters are used to represent motion information in the image. Patterns of spatio-temporal pixel difference is classified using SVM, a classification method proven efficient for problems with high dimensionality and highly non-linear feature spaces. Furthermore, a cascade architecture is employed, to make use of the fact that most windows are easy to classify, while a few are difficult. The detection method shows promising results when tested on images from street scenes with humans of varying sizes and clothing.
{"title":"Using SVM for Efficient Detection of Human Motion","authors":"J. Grahn, H. Kjellstromg","doi":"10.1109/VSPETS.2005.1570920","DOIUrl":"https://doi.org/10.1109/VSPETS.2005.1570920","url":null,"abstract":"This paper presents a method for detection of humans in video. Detection is here formulated as the problem of classifying the image patterns in a range of windows of different size in a video frame as \"human\" or \"non-human\". Computational efficiency is of core importance, which leads us to utilize fast methods for image preprocessing and classification. Linear spatio-temporal difference filters are used to represent motion information in the image. Patterns of spatio-temporal pixel difference is classified using SVM, a classification method proven efficient for problems with high dimensionality and highly non-linear feature spaces. Furthermore, a cascade architecture is employed, to make use of the fact that most windows are easy to classify, while a few are difficult. The detection method shows promising results when tested on images from street scenes with humans of varying sizes and clothing.","PeriodicalId":435841,"journal":{"name":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116702929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-10-15DOI: 10.1109/VSPETS.2005.1570931
D. P. Young, J. Ferryman
This paper presents the PETS Metrics On-line Evaluation Service for computational visual surveillance algorithms. The service allows researchers to submit their algorithm results for evaluation against a set of applicable metrics. The results of the evaluation processes are publicly displayed allowing researchers to instantly view how their algorithm performs against previously submitted algorithms. The approach has been validated using seven motion segmentation algorithms.
{"title":"PETS Metrics: On-Line Performance Evaluation Service","authors":"D. P. Young, J. Ferryman","doi":"10.1109/VSPETS.2005.1570931","DOIUrl":"https://doi.org/10.1109/VSPETS.2005.1570931","url":null,"abstract":"This paper presents the PETS Metrics On-line Evaluation Service for computational visual surveillance algorithms. The service allows researchers to submit their algorithm results for evaluation against a set of applicable metrics. The results of the evaluation processes are publicly displayed allowing researchers to instantly view how their algorithm performs against previously submitted algorithms. The approach has been validated using seven motion segmentation algorithms.","PeriodicalId":435841,"journal":{"name":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130301175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-10-15DOI: 10.1109/VSPETS.2005.1570907
T. List, J. Bins, J. Vazquez, R. Fisher
When evaluating the performance of a computer-based visual tracking system one often wishes to compare results with a standard human observer. It is a natural assumption that humans fully understand the relatively simple scenes we subject our computers to and because of this, two human observers would draw the same conclusions about object positions, tracks, size and even simple behaviour patterns. But is that actually the case? This paper provides a baseline for how computer-based tracking results can be compared to a standard human observer.
{"title":"Performance evaluating the evaluator","authors":"T. List, J. Bins, J. Vazquez, R. Fisher","doi":"10.1109/VSPETS.2005.1570907","DOIUrl":"https://doi.org/10.1109/VSPETS.2005.1570907","url":null,"abstract":"When evaluating the performance of a computer-based visual tracking system one often wishes to compare results with a standard human observer. It is a natural assumption that humans fully understand the relatively simple scenes we subject our computers to and because of this, two human observers would draw the same conclusions about object positions, tracks, size and even simple behaviour patterns. But is that actually the case? This paper provides a baseline for how computer-based tracking results can be compared to a standard human observer.","PeriodicalId":435841,"journal":{"name":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130060361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-10-15DOI: 10.1109/VSPETS.2005.1570935
A. O. Balan, L. Sigal, Michael J. Black
The Bayesian estimation of 3D human motion from video sequences is quantitatively evaluated using synchronized, multi-camera, calibrated video and 3D ground truth poses acquired with a commercial motion capture system. While many methods for human pose estimation and tracking have been proposed, to date there has been no quantitative comparison. Our goal is to evaluate how different design choices influence tracking performance. Toward that end, we independently implemented two fairly standard Bayesian person trackers using two variants of particle filtering and propose an evaluation measure appropriate for assessing the quality of probabilistic tracking methods. In the Bayesian framework we compare various image likelihood functions and prior models of human motion that have been proposed in the literature. Our results suggest that in constrained laboratory environments, current methods perform quite well. Multiple cameras and background subtraction, however, are required to achieve reliable tracking suggesting that many current methods may be inappropriate in more natural settings. We discuss the implications of the study and the directions for future research that it entails
{"title":"A Quantitative Evaluation of Video-based 3D Person Tracking","authors":"A. O. Balan, L. Sigal, Michael J. Black","doi":"10.1109/VSPETS.2005.1570935","DOIUrl":"https://doi.org/10.1109/VSPETS.2005.1570935","url":null,"abstract":"The Bayesian estimation of 3D human motion from video sequences is quantitatively evaluated using synchronized, multi-camera, calibrated video and 3D ground truth poses acquired with a commercial motion capture system. While many methods for human pose estimation and tracking have been proposed, to date there has been no quantitative comparison. Our goal is to evaluate how different design choices influence tracking performance. Toward that end, we independently implemented two fairly standard Bayesian person trackers using two variants of particle filtering and propose an evaluation measure appropriate for assessing the quality of probabilistic tracking methods. In the Bayesian framework we compare various image likelihood functions and prior models of human motion that have been proposed in the literature. Our results suggest that in constrained laboratory environments, current methods perform quite well. Multiple cameras and background subtraction, however, are required to achieve reliable tracking suggesting that many current methods may be inappropriate in more natural settings. We discuss the implications of the study and the directions for future research that it entails","PeriodicalId":435841,"journal":{"name":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132294163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-10-15DOI: 10.1109/VSPETS.2005.1570922
David Tweed, Robert B. Fisher, J. Bins, T. List
The semantic interpretation of video sequences by computer is often formulated as probabilistically relating lower-level features to higher-level states, constrained by a transition graph. Using hidden Markov models inference is efficient but time-in-state data cannot be included, whereas using hidden semi-Markov models we can model duration but have inefficient inference. We present a new efficient O(T) algorithm for inference in certain HSMMs and show experimental results on video sequence interpretation in television footage to demonstrate that explicitly modelling time-in-state improves interpretation performance
{"title":"Efficient Hidden Semi-Markov Model Inference for Structured Video Sequences","authors":"David Tweed, Robert B. Fisher, J. Bins, T. List","doi":"10.1109/VSPETS.2005.1570922","DOIUrl":"https://doi.org/10.1109/VSPETS.2005.1570922","url":null,"abstract":"The semantic interpretation of video sequences by computer is often formulated as probabilistically relating lower-level features to higher-level states, constrained by a transition graph. Using hidden Markov models inference is efficient but time-in-state data cannot be included, whereas using hidden semi-Markov models we can model duration but have inefficient inference. We present a new efficient O(T) algorithm for inference in certain HSMMs and show experimental results on video sequence interpretation in television footage to demonstrate that explicitly modelling time-in-state improves interpretation performance","PeriodicalId":435841,"journal":{"name":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116267560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-10-15DOI: 10.1109/VSPETS.2005.1570934
L. Brown, M. Lu, Chiao-Fe Shu, Ying-li Tian, A. Hampapur
In this paper, we improve the effective performance of a surveillance system via post track analysis. Our system performs object detection via background subtraction followed by appearance based tracking. The primary outputs of the system however, are customized alarms which depend on the user's domain and needs. The ultimate performance therefore depends most critically on the Receiver Operating Characteristic curve of these alarms. We show that by strategically designing post tracking and alarm conditions, the effective performance of the system can be improved dramatically. This addresses the most significant error sources, namely, errors due to shadows, ghosting, temporally or spatially missing fragments and many of the false positives due to extreme lighting variations, specular reflections or irrelevant motion.
{"title":"Improving Performance via Post Track Analysis","authors":"L. Brown, M. Lu, Chiao-Fe Shu, Ying-li Tian, A. Hampapur","doi":"10.1109/VSPETS.2005.1570934","DOIUrl":"https://doi.org/10.1109/VSPETS.2005.1570934","url":null,"abstract":"In this paper, we improve the effective performance of a surveillance system via post track analysis. Our system performs object detection via background subtraction followed by appearance based tracking. The primary outputs of the system however, are customized alarms which depend on the user's domain and needs. The ultimate performance therefore depends most critically on the Receiver Operating Characteristic curve of these alarms. We show that by strategically designing post tracking and alarm conditions, the effective performance of the system can be improved dramatically. This addresses the most significant error sources, namely, errors due to shadows, ghosting, temporally or spatially missing fragments and many of the false positives due to extreme lighting variations, specular reflections or irrelevant motion.","PeriodicalId":435841,"journal":{"name":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123637975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-10-15DOI: 10.1109/VSPETS.2005.1570916
Wei-wei Yu, Xiao-long Teng, Chong-qing Liu
Locality Preserving Projections (LPP) is a linear projective map that arises by solving a variational problem that optimally preserves the neighborhood structure of the data set. Though LPP has been applied in many domains, it has limits to solve recognition problem. Thus, Discriminant Locality Preserving Projections (DLPP) is presented in this paper. The improvement of DLPP algorithm over LPP method benefits mostly from two aspects. One aspect is that DLPP tries to find the subspace that best discriminates different face classes by maximizing the between-class distance, while minimizing the within-class distance. The other aspect is that DLPP reduces the energy of noise and transformation difference as much as possible without sacrificing much of intrinsic difference. In the experiments, DLPP achieves the better face recognition performance than LPP.
{"title":"Discriminant Locality Preserving Projections: A New Method to Face Representation and Recognition","authors":"Wei-wei Yu, Xiao-long Teng, Chong-qing Liu","doi":"10.1109/VSPETS.2005.1570916","DOIUrl":"https://doi.org/10.1109/VSPETS.2005.1570916","url":null,"abstract":"Locality Preserving Projections (LPP) is a linear projective map that arises by solving a variational problem that optimally preserves the neighborhood structure of the data set. Though LPP has been applied in many domains, it has limits to solve recognition problem. Thus, Discriminant Locality Preserving Projections (DLPP) is presented in this paper. The improvement of DLPP algorithm over LPP method benefits mostly from two aspects. One aspect is that DLPP tries to find the subspace that best discriminates different face classes by maximizing the between-class distance, while minimizing the within-class distance. The other aspect is that DLPP reduces the energy of noise and transformation difference as much as possible without sacrificing much of intrinsic difference. In the experiments, DLPP achieves the better face recognition performance than LPP.","PeriodicalId":435841,"journal":{"name":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130574397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-10-15DOI: 10.1109/VSPETS.2005.1570912
Jing Li, S.Z. Li, Q. Pan, Tao Yang
This work presents a context enhancement method of low illumination video for night surveillance. A unique characteristic of the algorithm is its ability to extract and maintenance the meaningful information like highlight area or moving objects with low contrast in the enhanced image, meanwhile recover the surrounding scene information by fusing the daytime background image. A main challenge comes from the extraction of meaningful area in the night video sequence. To address this problem, a novel bidirectional extraction approach is presented. In evaluation experiments with real data, the notable information of the night video is extracted successfully and the background scene is fused smoothly with the night images to show enhanced surveillance video for observers.
{"title":"Illumination and motion-based video enhancement for night surveillance","authors":"Jing Li, S.Z. Li, Q. Pan, Tao Yang","doi":"10.1109/VSPETS.2005.1570912","DOIUrl":"https://doi.org/10.1109/VSPETS.2005.1570912","url":null,"abstract":"This work presents a context enhancement method of low illumination video for night surveillance. A unique characteristic of the algorithm is its ability to extract and maintenance the meaningful information like highlight area or moving objects with low contrast in the enhanced image, meanwhile recover the surrounding scene information by fusing the daytime background image. A main challenge comes from the extraction of meaningful area in the night video sequence. To address this problem, a novel bidirectional extraction approach is presented. In evaluation experiments with real data, the notable information of the night video is extracted successfully and the background scene is fused smoothly with the night images to show enhanced surveillance video for observers.","PeriodicalId":435841,"journal":{"name":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132559148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-10-15DOI: 10.1109/VSPETS.2005.1570903
A. del Bimbo, F. Pernici
This paper considers the problem of modeling an active observer to plan a sequence of decisions regarding what target to look at, through a foveal-sensing action. The gathered images by the active observer provides meaningful identification imagery of distant targets which are not recognizable in a wide angle view. We propose a framework in which a pan/tilt/zoom (PTZ) camera schedules saccades in order to acquire high resolution images of as many moving targets as possible before they leave the scene. We cast the whole problem as a particular kind of dynamic discrete optimization, specially as a novel on-line dynamic vehicle routing problem (DVRP) with deadlines. We show that using an optimal choice for the sensing order of targets the total time spent in visiting the targets by the active camera can be significantly reduced. To show the effectiveness of our approach we apply congestion analysis to a dual camera system in a master-slave configuration. We report that our framework gives good results in monitoring wide areas with little extra costs with respect to approaches using a large number of cameras.
{"title":"Distant targets identification as an on-line dynamic vehicle routing problem using an active-zooming camera","authors":"A. del Bimbo, F. Pernici","doi":"10.1109/VSPETS.2005.1570903","DOIUrl":"https://doi.org/10.1109/VSPETS.2005.1570903","url":null,"abstract":"This paper considers the problem of modeling an active observer to plan a sequence of decisions regarding what target to look at, through a foveal-sensing action. The gathered images by the active observer provides meaningful identification imagery of distant targets which are not recognizable in a wide angle view. We propose a framework in which a pan/tilt/zoom (PTZ) camera schedules saccades in order to acquire high resolution images of as many moving targets as possible before they leave the scene. We cast the whole problem as a particular kind of dynamic discrete optimization, specially as a novel on-line dynamic vehicle routing problem (DVRP) with deadlines. We show that using an optimal choice for the sensing order of targets the total time spent in visiting the targets by the active camera can be significantly reduced. To show the effectiveness of our approach we apply congestion analysis to a dual camera system in a master-slave configuration. We report that our framework gives good results in monitoring wide areas with little extra costs with respect to approaches using a large number of cameras.","PeriodicalId":435841,"journal":{"name":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130270455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}