Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425347
A. Leykin, M. Tuceryan
We present a generalized extensible framework for automated recognition of swarming activities in video sequences. The trajectory of each individual is produced by the visual tracking sub-system and is further analyzed to detect certain types of high-level grouping behavior. We utilize recent findings in swarming behavior analysis to formulate a problem in terms of the specific distance function that we subsequently apply as part of the two-stage agglomerative clustering method to create a set of swarming events followed by a set of swarming activities. In this paper we present results for one particular type of swarming: shopper grouping. As part of this work the events detected in a relatively short time interval are further integrated into activities, the manifestation of prolonged high-level swarming behavior. The results demonstrate the ability of our method to detect such activities in congested surveillance videos. In particular in three hours of indoor retail store video, our method has correctly identified over85% of valid '"shopper-groups'" with a very low level of false positives, validated against human coded ground truth.
{"title":"Detecting shopper groups in video sequences","authors":"A. Leykin, M. Tuceryan","doi":"10.1109/AVSS.2007.4425347","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425347","url":null,"abstract":"We present a generalized extensible framework for automated recognition of swarming activities in video sequences. The trajectory of each individual is produced by the visual tracking sub-system and is further analyzed to detect certain types of high-level grouping behavior. We utilize recent findings in swarming behavior analysis to formulate a problem in terms of the specific distance function that we subsequently apply as part of the two-stage agglomerative clustering method to create a set of swarming events followed by a set of swarming activities. In this paper we present results for one particular type of swarming: shopper grouping. As part of this work the events detected in a relatively short time interval are further integrated into activities, the manifestation of prolonged high-level swarming behavior. The results demonstrate the ability of our method to detect such activities in congested surveillance videos. In particular in three hours of indoor retail store video, our method has correctly identified over85% of valid '\"shopper-groups'\" with a very low level of false positives, validated against human coded ground truth.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126879031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425308
Christopher S. Madden, M. Piccardi
This paper presents a framework based on robust shape and appearance features for matching the various tracks generated by a single individual moving within a surveillance system. Each track is first automatically analysed in order to detect and remove the frames affected by large segmentation errors and drastic changes in illumination. The object's features computed over the remaining frames prove more robust and capable of supporting correct matching of tracks even in the case of significantly disjointed camera views. The shape and appearance features used include a height estimate as well as illumination-tolerant colour representation of the individual's global colours and the colours of the upper and lower portions of clothing. The results of a test from a real surveillance system show that the combination of these four features can provide a probability of matching as high as 91 percent with 5 percent probability of false alarms under views which have significantly differing illumination levels and suffer from significant segmentation errors in as many as 1 in 4 frames.
{"title":"A framework for track matching across disjoint cameras using robust shape and appearance features","authors":"Christopher S. Madden, M. Piccardi","doi":"10.1109/AVSS.2007.4425308","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425308","url":null,"abstract":"This paper presents a framework based on robust shape and appearance features for matching the various tracks generated by a single individual moving within a surveillance system. Each track is first automatically analysed in order to detect and remove the frames affected by large segmentation errors and drastic changes in illumination. The object's features computed over the remaining frames prove more robust and capable of supporting correct matching of tracks even in the case of significantly disjointed camera views. The shape and appearance features used include a height estimate as well as illumination-tolerant colour representation of the individual's global colours and the colours of the upper and lower portions of clothing. The results of a test from a real surveillance system show that the combination of these four features can provide a probability of matching as high as 91 percent with 5 percent probability of false alarms under views which have significantly differing illumination levels and suffer from significant segmentation errors in as many as 1 in 4 frames.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127391501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425284
M. Bae, A. Razdan, G. Farin
This paper presents a fully automated 3D face authentication (verification) and recognition (identification) method and recent results from our work in this area. The major contributions of our paper are: (a) the method can handle data with different facial expressions including hair, upper body, clothing, etc. and (b) development of weighted features for discrimination. The input to our system is a triangular mesh and it outputs a matching % against a gallery. Our method includes both surface and curve based features that are automatically extracted from a given face data. The test set for authentication consisted of 117 different people with 421 scans including different facial expressions. Our study shows equal error rate (EER) at 0.065% for normal faces and 1.13% in faces with expressions. We report verification rates of 100% in normal faces and 93.12% in faces with expressions at 0.1% FAR. For identification, our experiment shows 100% rate in normal faces and 95.6% in faces with expressions. From our experiment we conclude that combining feature points, profile curve, and partial face surface matching gives better authentication and recognition rate than any single matching method.
{"title":"Automated 3D Face authentication & recognition","authors":"M. Bae, A. Razdan, G. Farin","doi":"10.1109/AVSS.2007.4425284","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425284","url":null,"abstract":"This paper presents a fully automated 3D face authentication (verification) and recognition (identification) method and recent results from our work in this area. The major contributions of our paper are: (a) the method can handle data with different facial expressions including hair, upper body, clothing, etc. and (b) development of weighted features for discrimination. The input to our system is a triangular mesh and it outputs a matching % against a gallery. Our method includes both surface and curve based features that are automatically extracted from a given face data. The test set for authentication consisted of 117 different people with 421 scans including different facial expressions. Our study shows equal error rate (EER) at 0.065% for normal faces and 1.13% in faces with expressions. We report verification rates of 100% in normal faces and 93.12% in faces with expressions at 0.1% FAR. For identification, our experiment shows 100% rate in normal faces and 95.6% in faces with expressions. From our experiment we conclude that combining feature points, profile curve, and partial face surface matching gives better authentication and recognition rate than any single matching method.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131266051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425297
Bailing Zhang, Junbum Park, Hanseok Ko
This paper addresses the video surveillance issue of automatically identifying moving vehicles and people from continuous observation of image sequences. With a single far-field surveillance camera, moving objects are first segmented by simple background subtraction. To reduce the redundancy and select the representative prototypes from input video streams, the self-organizing feature map (SOM) is applied for both training and testing sequences. The recognition scheme is designed based on the recently proposed kernel mutual subspace (KMS) model. As an alternative to some probability-based models, KMS does not make assumptions about the data sampling processing and offers an efficient and robust classifier. Experiments demonstrated a highly accurate recognition result, showing the model's applicability in real-world surveillance system.
{"title":"Combination of self-organization map and kernel mutual subspace method for video surveillance","authors":"Bailing Zhang, Junbum Park, Hanseok Ko","doi":"10.1109/AVSS.2007.4425297","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425297","url":null,"abstract":"This paper addresses the video surveillance issue of automatically identifying moving vehicles and people from continuous observation of image sequences. With a single far-field surveillance camera, moving objects are first segmented by simple background subtraction. To reduce the redundancy and select the representative prototypes from input video streams, the self-organizing feature map (SOM) is applied for both training and testing sequences. The recognition scheme is designed based on the recently proposed kernel mutual subspace (KMS) model. As an alternative to some probability-based models, KMS does not make assumptions about the data sampling processing and offers an efficient and robust classifier. Experiments demonstrated a highly accurate recognition result, showing the model's applicability in real-world surveillance system.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134340106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425275
A. Coleman
Summary form only given. This overview talk will first introduce the Home Office Scientific Development Branch (HOSDB) as organisation and then will offer a summary of our programmes in the area of the physical security sector. The talk will explain how HOSDB is contributing to protection and law enforcement. I will use a series of examples to cover this area. In the second part, the talk shall focus on vision based systems and on HOSDB initiatives on this technology. I will provide a strategic view of initiatives aimed to cause innovation in the industry and academic research. I will then cover our initiatives in bench marking and in video evidence analysis. Finally, I will provide an overview of future technology trends from the HOSDB perspective.
{"title":"Technology, applications and innovations in physical security - A home office perspective","authors":"A. Coleman","doi":"10.1109/AVSS.2007.4425275","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425275","url":null,"abstract":"Summary form only given. This overview talk will first introduce the Home Office Scientific Development Branch (HOSDB) as organisation and then will offer a summary of our programmes in the area of the physical security sector. The talk will explain how HOSDB is contributing to protection and law enforcement. I will use a series of examples to cover this area. In the second part, the talk shall focus on vision based systems and on HOSDB initiatives on this technology. I will provide a strategic view of initiatives aimed to cause innovation in the industry and academic research. I will then cover our initiatives in bench marking and in video evidence analysis. Finally, I will provide an overview of future technology trends from the HOSDB perspective.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123845732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425346
P. L. Venetianer, Zhong Zhang, Andrew W. Scanlon, Yongtong Hu, A. Lipton
Loss prevention is a significant challenge in retail enterprises. A significant percentage of this loss occurs at point of sale (POS) terminals. POS data mining tools known collectively as exception based reporting (EBR) are helping retailers, but they have limitations as they can only work statistically on trends and anomalies in digital POS data. By applying video analytics techniques to POS transactions, it is possible to detect fraudulent or anomalous activity at the level of individual transactions. Very specific fraudulent behaviors that cannot be detected via POS data alone become clear when combined with video-derived data. ObjectVideo, a provider of intelligent video software, has produced a system called RetailWatch that combines POS information with video data to create a unique loss prevention tool. This paper describes the system architecture, algorithmic approach, and capabilities of the system, together with a customer case-study illustrating the results and effectiveness of the system.
{"title":"Video verification of point of sale transactions","authors":"P. L. Venetianer, Zhong Zhang, Andrew W. Scanlon, Yongtong Hu, A. Lipton","doi":"10.1109/AVSS.2007.4425346","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425346","url":null,"abstract":"Loss prevention is a significant challenge in retail enterprises. A significant percentage of this loss occurs at point of sale (POS) terminals. POS data mining tools known collectively as exception based reporting (EBR) are helping retailers, but they have limitations as they can only work statistically on trends and anomalies in digital POS data. By applying video analytics techniques to POS transactions, it is possible to detect fraudulent or anomalous activity at the level of individual transactions. Very specific fraudulent behaviors that cannot be detected via POS data alone become clear when combined with video-derived data. ObjectVideo, a provider of intelligent video software, has produced a system called RetailWatch that combines POS information with video data to create a unique loss prevention tool. This paper describes the system architecture, algorithmic approach, and capabilities of the system, together with a customer case-study illustrating the results and effectiveness of the system.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115844018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425355
V. Fursov, Nikita Kozin
The principal component analysis (PCA), also called the eigenfaces analysis, is one of the most extensively used face image recognition techniques. The idea of the method is decomposition of image vectors into a system of eigenvectors matched to the maximum eigenvalues. The method of proximity assessment of vectors composed of principal components essentially influences the recognition quality. In the paper the use of different indices of conjugation with subspace stretched on training vectors is considered as a proximity measure. It is shown that this approach is very effective in the case of a small number of training examples. The results of experiments for a standard ORL-face database are presented.
{"title":"Recognition through constructing the Eigenface classifiers using conjugation indices","authors":"V. Fursov, Nikita Kozin","doi":"10.1109/AVSS.2007.4425355","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425355","url":null,"abstract":"The principal component analysis (PCA), also called the eigenfaces analysis, is one of the most extensively used face image recognition techniques. The idea of the method is decomposition of image vectors into a system of eigenvectors matched to the maximum eigenvalues. The method of proximity assessment of vectors composed of principal components essentially influences the recognition quality. In the paper the use of different indices of conjugation with subspace stretched on training vectors is considered as a proximity measure. It is shown that this approach is very effective in the case of a small number of training examples. The results of experiments for a standard ORL-face database are presented.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116885296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425352
Liya Ding, Aleix M. Martinez
Manual signs in American sign language (ASL) are constructed using three building blocks -handshape, motion, and place of articulations. Only when these three are successfully estimated, can a sign by uniquely identified. Hence, the use of pattern recognition techniques that use only a subset of these is inappropriate. To achieve accurate classifications, the motion, the handshape and their three-dimensional position need to be recovered. In this paper, we define an algorithm to determine these three components form a single video sequence of two-dimensional pictures of a sign. We demonstrated the use of our algorithm in describing and recognizing a set of manual signs in ASL.
{"title":"Recovering the linguistic components of the manual signs in American Sign Language","authors":"Liya Ding, Aleix M. Martinez","doi":"10.1109/AVSS.2007.4425352","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425352","url":null,"abstract":"Manual signs in American sign language (ASL) are constructed using three building blocks -handshape, motion, and place of articulations. Only when these three are successfully estimated, can a sign by uniquely identified. Hence, the use of pattern recognition techniques that use only a subset of these is inappropriate. To achieve accurate classifications, the motion, the handshape and their three-dimensional position need to be recovered. In this paper, we define an algorithm to determine these three components form a single video sequence of two-dimensional pictures of a sign. We demonstrated the use of our algorithm in describing and recognizing a set of manual signs in ASL.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115734475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425289
A. Hampapur, L. Brown, R. Feris, A. Senior, Chiao-Fe Shu, Ying-li Tian, Y. Zhai, M. Lu
Surveillance video is used in two key modes, watching for known threats in real-time and searching for events of interest after the fact. Typically, real-time alerting is a localized function, e.g. airport security center receives and reacts to a "perimeter breach alert", while investigations often tend to encompass a large number of geographically distributed cameras like the London bombing, or Washington sniper incidents. Enabling effective search of surveillance video for investigation & preemption, involves indexing the video along multiple dimensions. This paper presents a framework for surveillance search which includes, video parsing, indexing and query mechanisms. It explores video parsing techniques which automatically extract index data from video, indexing which stores data in relational tables, retrieval which uses SQL queries to retrieve events of interest and the software architecture that integrates these technologies.
{"title":"Searching surveillance video","authors":"A. Hampapur, L. Brown, R. Feris, A. Senior, Chiao-Fe Shu, Ying-li Tian, Y. Zhai, M. Lu","doi":"10.1109/AVSS.2007.4425289","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425289","url":null,"abstract":"Surveillance video is used in two key modes, watching for known threats in real-time and searching for events of interest after the fact. Typically, real-time alerting is a localized function, e.g. airport security center receives and reacts to a \"perimeter breach alert\", while investigations often tend to encompass a large number of geographically distributed cameras like the London bombing, or Washington sniper incidents. Enabling effective search of surveillance video for investigation & preemption, involves indexing the video along multiple dimensions. This paper presents a framework for surveillance search which includes, video parsing, indexing and query mechanisms. It explores video parsing techniques which automatically extract index data from video, indexing which stores data in relational tables, retrieval which uses SQL queries to retrieve events of interest and the software architecture that integrates these technologies.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114556775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425317
P. L. Venetianer, Zhong Zhang, Weihong Yin, A. Lipton
Detecting stationary objects, such as an abandoned baggage or a parked vehicle is crucial in a wide range of video surveillance and monitoring applications. ObjectVideo, the leader in intelligent video software has been deploying commercial products to address these problems for the last 5 years. The ObjectVideo VEW and OnBoard system addresses these problems using an array of algorithms optimized for various scenario types and can be selected dynamically. This paper describes the key challenges and algorithms, and presents results on the standard i-LIDS dataset.
{"title":"Stationary target detection using the objectvideo surveillance system","authors":"P. L. Venetianer, Zhong Zhang, Weihong Yin, A. Lipton","doi":"10.1109/AVSS.2007.4425317","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425317","url":null,"abstract":"Detecting stationary objects, such as an abandoned baggage or a parked vehicle is crucial in a wide range of video surveillance and monitoring applications. ObjectVideo, the leader in intelligent video software has been deploying commercial products to address these problems for the last 5 years. The ObjectVideo VEW and OnBoard system addresses these problems using an array of algorithms optimized for various scenario types and can be selected dynamically. This paper describes the key challenges and algorithms, and presents results on the standard i-LIDS dataset.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128109491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}