Kim C. Ng, Hiroshi Ishiguro, Mohan M. Trivedi, T. Sogo
Accurate and efficient monitoring of dynamically changing environments is one of the most important requirements for visual surveillance systems. This paper describes development of a ubiquitous vision system for this monitoring purpose. The system consisting of multiple omni-directional vision sensors is developed to address two specific surveillance tasks: (1) Robust and accurate tracking and profiling of human activities, (2) Dynamic synthesis of virtual views for observing the environment from arbitrary vantage points.
{"title":"Monitoring dynamically changing environments by ubiquitous vision system","authors":"Kim C. Ng, Hiroshi Ishiguro, Mohan M. Trivedi, T. Sogo","doi":"10.1109/VS.1999.780270","DOIUrl":"https://doi.org/10.1109/VS.1999.780270","url":null,"abstract":"Accurate and efficient monitoring of dynamically changing environments is one of the most important requirements for visual surveillance systems. This paper describes development of a ubiquitous vision system for this monitoring purpose. The system consisting of multiple omni-directional vision sensors is developed to address two specific surveillance tasks: (1) Robust and accurate tracking and profiling of human activities, (2) Dynamic synthesis of virtual views for observing the environment from arbitrary vantage points.","PeriodicalId":371192,"journal":{"name":"Proceedings Second IEEE Workshop on Visual Surveillance (VS'99) (Cat. No.98-89223)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130630549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Presents a methodology for automatically identifying human action. We use a new approach to human activity recognition that incorporates a Bayesian framework. By tracking the movement of the head of the subject over consecutive frames of monocular grayscale image sequences, we recognize actions in the frontal or lateral view. Input sequences captured from a CCD camera are matched against stored models of actions. The action that is found to be closest to the input sequence is identified. In the present implementation, these actions include sitting down, standing up, bending down, getting up, hugging, squatting, rising from a squatting position, bending sideways, falling backward and walking. This methodology finds application in environments where constant monitoring of human activity is required, such as in department stores and airports.
{"title":"A Bayesian approach to human activity recognition","authors":"A. Madabhushi, J. Aggarwal","doi":"10.1109/VS.1999.780265","DOIUrl":"https://doi.org/10.1109/VS.1999.780265","url":null,"abstract":"Presents a methodology for automatically identifying human action. We use a new approach to human activity recognition that incorporates a Bayesian framework. By tracking the movement of the head of the subject over consecutive frames of monocular grayscale image sequences, we recognize actions in the frontal or lateral view. Input sequences captured from a CCD camera are matched against stored models of actions. The action that is found to be closest to the input sequence is identified. In the present implementation, these actions include sitting down, standing up, bending down, getting up, hugging, squatting, rising from a squatting position, bending sideways, falling backward and walking. This methodology finds application in environments where constant monitoring of human activity is required, such as in department stores and airports.","PeriodicalId":371192,"journal":{"name":"Proceedings Second IEEE Workshop on Visual Surveillance (VS'99) (Cat. No.98-89223)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130423902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objects in aerial images are extracted by using a method based on a two-dimensional viewer-centred model. Using this approach has advantages over existing methods because the models are easily created and it is efficient. Experiments illustrate the extraction and tracking of man-made objects in reconnaissance images. In later work we intend to extend the method to allow selection between different object types, or between different views of the same object.
{"title":"Using models to recognise man-made objects","authors":"A. L. Reno, D. Booth","doi":"10.1109/VS.1999.780266","DOIUrl":"https://doi.org/10.1109/VS.1999.780266","url":null,"abstract":"Objects in aerial images are extracted by using a method based on a two-dimensional viewer-centred model. Using this approach has advantages over existing methods because the models are easily created and it is efficient. Experiments illustrate the extraction and tracking of man-made objects in reconnaissance images. In later work we intend to extend the method to allow selection between different object types, or between different views of the same object.","PeriodicalId":371192,"journal":{"name":"Proceedings Second IEEE Workshop on Visual Surveillance (VS'99) (Cat. No.98-89223)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123860260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a technique for the registration of multiple surveillance cameras through the automatic alignment of image trajectories. The algorithm address the problem of recovering the relative pose of several stationary cameras that observe one or more objects in motion. Each camera tracks several objects to produce a set of trajectories in the image. Using a simple calibration procedure, we recover the relative orientation of each camera to the local ground plane in order to projectively unwarp image trajectories onto a nominal plane of correct orientation. Unwarped trajectory curves are then matched by solving for the 3D to 3D rotation, translation, and scale that bring them into alignment. The relative transform between a pair of cameras is derived from the independent camera-to-ground-plane rotations and the plane-to-plane transform computed from matched trajectories. Registration aligns n-cameras with respect to each other in a single camera frame (that of the reference camera). The approach recovers both the epipolar geometry between all cameras and the camera-to-ground rotation for each camera. After calibration, points that are known to lay on a world ground plane can be directly backprojected into each of the camera frames. The algorithm is demonstrated for two-camera and three-camera scenarios by tracking pedestrians as they move through a surveillance area and matching the resulting trajectories.
{"title":"Multi-view calibration from planar motion for video surveillance","authors":"C. Jaynes","doi":"10.1109/VS.1999.780269","DOIUrl":"https://doi.org/10.1109/VS.1999.780269","url":null,"abstract":"We present a technique for the registration of multiple surveillance cameras through the automatic alignment of image trajectories. The algorithm address the problem of recovering the relative pose of several stationary cameras that observe one or more objects in motion. Each camera tracks several objects to produce a set of trajectories in the image. Using a simple calibration procedure, we recover the relative orientation of each camera to the local ground plane in order to projectively unwarp image trajectories onto a nominal plane of correct orientation. Unwarped trajectory curves are then matched by solving for the 3D to 3D rotation, translation, and scale that bring them into alignment. The relative transform between a pair of cameras is derived from the independent camera-to-ground-plane rotations and the plane-to-plane transform computed from matched trajectories. Registration aligns n-cameras with respect to each other in a single camera frame (that of the reference camera). The approach recovers both the epipolar geometry between all cameras and the camera-to-ground rotation for each camera. After calibration, points that are known to lay on a world ground plane can be directly backprojected into each of the camera frames. The algorithm is demonstrated for two-camera and three-camera scenarios by tracking pedestrians as they move through a surveillance area and matching the resulting trajectories.","PeriodicalId":371192,"journal":{"name":"Proceedings Second IEEE Workshop on Visual Surveillance (VS'99) (Cat. No.98-89223)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124581378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Terrance E. Boult, R. Micheals, X. Gao, P. Lewis, C. Power, Weihong Yin, A. Erkan
Video surveillance involves watching an area for significant events. Perimeter security generally requires watching areas that afford trespassers reasonable cover and concealment. By definition, such "interesting" areas have limited visibility. Furthermore, targets of interest generally attempt to conceal themselves within the cover sometimes adding camouflage to further reduce their visibility. Such targets are only visible "while in motion". The combined result of limited visibility distance and target visibility severely reduces the usefulness of any panning-based approach. As a result, these situations call for a wide field of view, and are a natural application for omni-directional VSAM (video surveillance and monitoring). This paper describes an omni-directional tracking system. After motivating its use, we discuss some domain application constraints and background on the paracamera. We then go through the basic components of the frame-rate Lehigh Omni-directional Tracking System (LOTS) and describe some of its unique features. In particular, the system's combined performance depends on novel adaptive multi-background modeling and a novel quasi-connected-components technique. These key components are described in some detail, while other components are summarized. We end with a summary of an external evaluation of the system.
{"title":"Frame-rate omnidirectional surveillance and tracking of camouflaged and occluded targets","authors":"Terrance E. Boult, R. Micheals, X. Gao, P. Lewis, C. Power, Weihong Yin, A. Erkan","doi":"10.1109/VS.1999.780268","DOIUrl":"https://doi.org/10.1109/VS.1999.780268","url":null,"abstract":"Video surveillance involves watching an area for significant events. Perimeter security generally requires watching areas that afford trespassers reasonable cover and concealment. By definition, such \"interesting\" areas have limited visibility. Furthermore, targets of interest generally attempt to conceal themselves within the cover sometimes adding camouflage to further reduce their visibility. Such targets are only visible \"while in motion\". The combined result of limited visibility distance and target visibility severely reduces the usefulness of any panning-based approach. As a result, these situations call for a wide field of view, and are a natural application for omni-directional VSAM (video surveillance and monitoring). This paper describes an omni-directional tracking system. After motivating its use, we discuss some domain application constraints and background on the paracamera. We then go through the basic components of the frame-rate Lehigh Omni-directional Tracking System (LOTS) and describe some of its unique features. In particular, the system's combined performance depends on novel adaptive multi-background modeling and a novel quasi-connected-components technique. These key components are described in some detail, while other components are summarized. We end with a summary of an external evaluation of the system.","PeriodicalId":371192,"journal":{"name":"Proceedings Second IEEE Workshop on Visual Surveillance (VS'99) (Cat. No.98-89223)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128933843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
"Looking at People" is currently one of the most active application area in computer vision. This contribution provides a short overview of existing work on human motion as far as whole-body motion and gestures are concerned. The overview is based on a more extensive survey article (Gavria (1991)); here, the emphasis lies on surveillance scenarios.
{"title":"The analysis of human motion and its application for visual surveillance","authors":"D.M. Gavrilla","doi":"10.1109/VS.1999.780260","DOIUrl":"https://doi.org/10.1109/VS.1999.780260","url":null,"abstract":"\"Looking at People\" is currently one of the most active application area in computer vision. This contribution provides a short overview of existing work on human motion as far as whole-body motion and gestures are concerned. The overview is based on a more extensive survey article (Gavria (1991)); here, the emphasis lies on surveillance scenarios.","PeriodicalId":371192,"journal":{"name":"Proceedings Second IEEE Workshop on Visual Surveillance (VS'99) (Cat. No.98-89223)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126982280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes an automatic surveillance system, which performs labeling of events and interactions in an outdoor environment. The system is designed to monitor activities in an open parking lot. It consists of three components-an adaptive tracker, an event generator, which maps object tracks onto a set of pre-determined discrete events, and a stochastic parser. The system performs segmentation and labeling of surveillance video of a parking lot and identifies person-vehicle interactions, such as pick-up and drop-off. The system presented in this paper is developed jointly by MIT Media Lab and MIT Artificial Intelligence Lab.
{"title":"Video surveillance of interactions","authors":"Y. Ivanov, C. Stauffer, A. Bobick, W. Grimson","doi":"10.1109/VS.1999.780272","DOIUrl":"https://doi.org/10.1109/VS.1999.780272","url":null,"abstract":"This paper describes an automatic surveillance system, which performs labeling of events and interactions in an outdoor environment. The system is designed to monitor activities in an open parking lot. It consists of three components-an adaptive tracker, an event generator, which maps object tracks onto a set of pre-determined discrete events, and a stochastic parser. The system performs segmentation and labeling of surveillance video of a parking lot and identifies person-vehicle interactions, such as pick-up and drop-off. The system presented in this paper is developed jointly by MIT Media Lab and MIT Artificial Intelligence Lab.","PeriodicalId":371192,"journal":{"name":"Proceedings Second IEEE Workshop on Visual Surveillance (VS'99) (Cat. No.98-89223)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129758313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel approach to robust and flexible person tracking using an algorithm that combines two powerful stochastic modeling techniques: The first one is the technique of so-called Pseudo-2D Hidden Markov Models (P2DHMMs) used for capturing the shape of a person with an image frame, and the second technique is the well-known Kalman-filtering algorithm, that uses the output of the P2DHMM for tracking the person by estimation of a bounding box trajectory indicating the location of the person within the entire video sequence. Both algorithms are cooperating together in an optimal way, and with this cooperative feedback, the proposed approach even makes the tracking of persons possible in the presence of background motions, for instance caused by moving objects such as cars, or by camera operations as, for example, panning or zooming. We consider this as major advantage compared to most other tracking algorithms that are mostly not capable of dealing with background motion. Furthermore, the person to be tracked is not required to wear special equipment (e.g. sensors) or special clothing. We therefore believe that our proposed algorithm is among the first approaches capable of handling such a complex tracking problem. Our results are confirmed by several tracking examples in real scenarios, shown at the end of the paper and provided on the web server of our institute.
{"title":"Robust person tracking in real scenarios with non-stationary background using a statistical computer vision approach","authors":"G. Rigoll, B. Winterstein, S. Muller","doi":"10.1109/VS.1999.780267","DOIUrl":"https://doi.org/10.1109/VS.1999.780267","url":null,"abstract":"This paper presents a novel approach to robust and flexible person tracking using an algorithm that combines two powerful stochastic modeling techniques: The first one is the technique of so-called Pseudo-2D Hidden Markov Models (P2DHMMs) used for capturing the shape of a person with an image frame, and the second technique is the well-known Kalman-filtering algorithm, that uses the output of the P2DHMM for tracking the person by estimation of a bounding box trajectory indicating the location of the person within the entire video sequence. Both algorithms are cooperating together in an optimal way, and with this cooperative feedback, the proposed approach even makes the tracking of persons possible in the presence of background motions, for instance caused by moving objects such as cars, or by camera operations as, for example, panning or zooming. We consider this as major advantage compared to most other tracking algorithms that are mostly not capable of dealing with background motion. Furthermore, the person to be tracked is not required to wear special equipment (e.g. sensors) or special clothing. We therefore believe that our proposed algorithm is among the first approaches capable of handling such a complex tracking problem. Our results are confirmed by several tracking examples in real scenarios, shown at the end of the paper and provided on the web server of our institute.","PeriodicalId":371192,"journal":{"name":"Proceedings Second IEEE Workshop on Visual Surveillance (VS'99) (Cat. No.98-89223)","volume":"31 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114018164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camera based fixed systems are routinely used for monitoring highway traffic. For this purpose inductive loops and microwave sensors are mainly used. Both techniques achieve very good counting accuracy and are capable of discriminating trucks and cars. However pedestrians and cyclists are mostly counted manually. In this paper, we describe a new camera based automatic system that utilizes Kalman filtering in tracking and Learning Vector Quantization (LVQ) for classifying the observations to pedestrians and cyclists. Both the requirements for such systems and the algorithms used are described. The tests performed show that the system achieves around 80%-90% accuracy in counting and classification.
{"title":"A real-time system for monitoring of cyclists and pedestrians","authors":"J. Heikkilä, O. Silvén","doi":"10.1109/VS.1999.780271","DOIUrl":"https://doi.org/10.1109/VS.1999.780271","url":null,"abstract":"Camera based fixed systems are routinely used for monitoring highway traffic. For this purpose inductive loops and microwave sensors are mainly used. Both techniques achieve very good counting accuracy and are capable of discriminating trucks and cars. However pedestrians and cyclists are mostly counted manually. In this paper, we describe a new camera based automatic system that utilizes Kalman filtering in tracking and Learning Vector Quantization (LVQ) for classifying the observations to pedestrians and cyclists. Both the requirements for such systems and the algorithms used are described. The tests performed show that the system achieves around 80%-90% accuracy in counting and classification.","PeriodicalId":371192,"journal":{"name":"Proceedings Second IEEE Workshop on Visual Surveillance (VS'99) (Cat. No.98-89223)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128218787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a colour tracker for use in visual surveillance. The tracker is part of a framework designed to monitor a dynamic scene with more than one camera. Colour tracking complements spatial tracking: it can also be used over large temporal intervals, and between spatially uncalibrated cameras. The colour distributions from objects are modelled, and measures of difference between them are discussed. A context is required for assessing the significance of any difference. It is provided by an analysis of the noise processes: first on the camera capture, then on the underlying variability of the signal. We present results comparing parametric and explicit representations, the inclusion and omission of intensity data, and single and multiple cameras.
{"title":"Multi-camera colour tracking","authors":"J. Orwell, Paolo Remagnino, Graeme A. Jones","doi":"10.1109/VS.1999.780264","DOIUrl":"https://doi.org/10.1109/VS.1999.780264","url":null,"abstract":"We propose a colour tracker for use in visual surveillance. The tracker is part of a framework designed to monitor a dynamic scene with more than one camera. Colour tracking complements spatial tracking: it can also be used over large temporal intervals, and between spatially uncalibrated cameras. The colour distributions from objects are modelled, and measures of difference between them are discussed. A context is required for assessing the significance of any difference. It is provided by an analysis of the noise processes: first on the camera capture, then on the underlying variability of the signal. We present results comparing parametric and explicit representations, the inclusion and omission of intensity data, and single and multiple cameras.","PeriodicalId":371192,"journal":{"name":"Proceedings Second IEEE Workshop on Visual Surveillance (VS'99) (Cat. No.98-89223)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125251752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}