Pub Date : 2007-09-05DOI: 10.1142/S021800140900765X
Sijun Lu, Jian Zhang, D. Feng
This paper proposes an efficient method for detecting ghost and left objects in surveillance video, which, if not identified, may lead to errors or wasted computation in background modeling and object tracking in surveillance systems. This method contains two main steps: the first one is to detect stationary objects, which narrows down the evaluation targets to a very small number of foreground blobs; the second step is to discriminate the candidates between ghost and left objects. For the first step, we introduce a novel stationary object detection method based on continuous object tracking and shape matching. For the second step, we propose a fast and robust inpainting method to differentiate between ghost and left objects by constructing the real background using the candidate 's corresponding regions in the input and the background images. The effectiveness of our method has been validated by experiments over a variety of video sequences.
{"title":"An efficient method for detecting ghost and left objects in surveillance video","authors":"Sijun Lu, Jian Zhang, D. Feng","doi":"10.1142/S021800140900765X","DOIUrl":"https://doi.org/10.1142/S021800140900765X","url":null,"abstract":"This paper proposes an efficient method for detecting ghost and left objects in surveillance video, which, if not identified, may lead to errors or wasted computation in background modeling and object tracking in surveillance systems. This method contains two main steps: the first one is to detect stationary objects, which narrows down the evaluation targets to a very small number of foreground blobs; the second step is to discriminate the candidates between ghost and left objects. For the first step, we introduce a novel stationary object detection method based on continuous object tracking and shape matching. For the second step, we propose a fast and robust inpainting method to differentiate between ghost and left objects by constructing the real background using the candidate 's corresponding regions in the input and the background images. The effectiveness of our method has been validated by experiments over a variety of video sequences.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130846645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425281
K. Kalgaonkar, B. Raj
A person's gait is a characteristic that might be employed to identify him/her automatically. Conventionally, automatic for gait-based identification of subjects employ video and image processing to characterize gait. In this paper we present an Acoustic Doppler Sensor(ADS) based technique for the characterization of gait. The ADS is very inexpensive sensor that can be built using off-the-shelf components, for under $20 USD at today's prices. We show that remarkably good gait recognition is possible with the ADS sensor.
{"title":"Acoustic Doppler sonar for gait recogination","authors":"K. Kalgaonkar, B. Raj","doi":"10.1109/AVSS.2007.4425281","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425281","url":null,"abstract":"A person's gait is a characteristic that might be employed to identify him/her automatically. Conventionally, automatic for gait-based identification of subjects employ video and image processing to characterize gait. In this paper we present an Acoustic Doppler Sensor(ADS) based technique for the characterization of gait. The ADS is very inexpensive sensor that can be built using off-the-shelf components, for under $20 USD at today's prices. We show that remarkably good gait recognition is possible with the ADS sensor.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116838366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425303
Suyu Kong, Conrad Sanderson, B. Lovell
We describe a pedestrian classification and tracking system that is able to track and label multiple people in an outdoor environment such as a railway station. The features selected for appearance modelling are circular colour histograms for the hue and conventional colour histograms for the saturation and value components. We combine blob matching with a particle filter for tracking and augment these algorithms with colour appearance models to track multiple people in the presence of occlusion. In the object classification stage, hierarchical chamfer matching combined with particle filtering is applied to classify commuters in the railway station into several classes. Classes of interest include normal commuters, commuters with backpacks, commuters with suitcases, and mothers with their children.
{"title":"Classifying and tracking multiple persons for proactive surveillance of mass transport systems","authors":"Suyu Kong, Conrad Sanderson, B. Lovell","doi":"10.1109/AVSS.2007.4425303","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425303","url":null,"abstract":"We describe a pedestrian classification and tracking system that is able to track and label multiple people in an outdoor environment such as a railway station. The features selected for appearance modelling are circular colour histograms for the hue and conventional colour histograms for the saturation and value components. We combine blob matching with a particle filter for tracking and augment these algorithms with colour appearance models to track multiple people in the presence of occlusion. In the object classification stage, hierarchical chamfer matching combined with particle filtering is applied to classify commuters in the railway station into several classes. Classes of interest include normal commuters, commuters with backpacks, commuters with suitcases, and mothers with their children.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121802190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425280
G. Valenzise, L. Gerosa, M. Tagliasacchi, F. Antonacci, A. Sarti
This paper describes an audio-based video surveillance system which automatically detects anomalous audio events in a public square, such as screams or gunshots, and localizes the position of the acoustic source, in such a way that a video-camera is steered consequently. The system employs two parallel GMM classifiers for discriminating screams from noise and gunshots from noise, respectively. Each classifier is trained using different features, chosen from a set of both conventional and innovative audio features. The location of the acoustic source which has produced the sound event is estimated by computing the time difference of arrivals of the signal at a microphone array and using linear-correction least square localization algorithm. Experimental results show that our system can detect events with a precision of 93% at a false rejection rate of 5% when the SNR is 10dB, while the source direction can be estimated with a precision of one degree. A real-time implementation of the system is going to be installed in a public square of Milan.
{"title":"Scream and gunshot detection and localization for audio-surveillance systems","authors":"G. Valenzise, L. Gerosa, M. Tagliasacchi, F. Antonacci, A. Sarti","doi":"10.1109/AVSS.2007.4425280","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425280","url":null,"abstract":"This paper describes an audio-based video surveillance system which automatically detects anomalous audio events in a public square, such as screams or gunshots, and localizes the position of the acoustic source, in such a way that a video-camera is steered consequently. The system employs two parallel GMM classifiers for discriminating screams from noise and gunshots from noise, respectively. Each classifier is trained using different features, chosen from a set of both conventional and innovative audio features. The location of the acoustic source which has produced the sound event is estimated by computing the time difference of arrivals of the signal at a microphone array and using linear-correction least square localization algorithm. Experimental results show that our system can detect events with a precision of 93% at a false rejection rate of 5% when the SNR is 10dB, while the source direction can be estimated with a precision of one degree. A real-time implementation of the system is going to be installed in a public square of Milan.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127864247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425287
A. Lipton, John I. W. Clark, B. Thompson, Gary Myers, S. Titus, Zhong Zhang, P. L. Venetianer
Video analytics for security and surveillance applications is becoming commonplace. Advances in algorithm robustness and low-cost video platforms have allowed analytics to become an ingredient for many different devices ranging from cameras to encoders to routers to storage. As algorithms become more refined, the analytics paradigm shifts from a human-support model to an automation model. In this context, ObjectVideo, the leader in intelligent video, has created a new concept in video analytics devices - the intelligent vision sensor (IVS). This device consists of a video imager and lens combined with an onboard processor and communication channel. This low-cost device turns video imagery into actionable information that can be used in building automation and business intelligence applications. This paper describes the technical and market drivers that facilitate the creation and adoption of the IVS device as well as a specific case study involving an application for heating, ventilation, and air conditioning (HVAC) and lighting control.
{"title":"The Intelligent vision sensor: Turning video into information","authors":"A. Lipton, John I. W. Clark, B. Thompson, Gary Myers, S. Titus, Zhong Zhang, P. L. Venetianer","doi":"10.1109/AVSS.2007.4425287","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425287","url":null,"abstract":"Video analytics for security and surveillance applications is becoming commonplace. Advances in algorithm robustness and low-cost video platforms have allowed analytics to become an ingredient for many different devices ranging from cameras to encoders to routers to storage. As algorithms become more refined, the analytics paradigm shifts from a human-support model to an automation model. In this context, ObjectVideo, the leader in intelligent video, has created a new concept in video analytics devices - the intelligent vision sensor (IVS). This device consists of a video imager and lens combined with an onboard processor and communication channel. This low-cost device turns video imagery into actionable information that can be used in building automation and business intelligence applications. This paper describes the technical and market drivers that facilitate the creation and adoption of the IVS device as well as a specific case study involving an application for heating, ventilation, and air conditioning (HVAC) and lighting control.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117027590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425357
Anh-Tuan Nghiem, F. Brémond, M. Thonnat, V. Valentin
This paper presents the results of ETISEO, a performance evaluation project for video surveillance systems. Many other projects have already evaluated the performance of video surveillance systems, but more on an end-user point of view. ETISEO aims at studying the dependency between algorithms and the video characteristics. Firstly we describe ETISEO methodology which consists in addressing each video processing problem separately. Secondly, we present the main evaluation metrics of ETISEO as well as their benefits, limitations and conditions of use. Finally, we discuss about the contributions of ETISEO to the evaluation community.
{"title":"ETISEO, performance evaluation for video surveillance systems","authors":"Anh-Tuan Nghiem, F. Brémond, M. Thonnat, V. Valentin","doi":"10.1109/AVSS.2007.4425357","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425357","url":null,"abstract":"This paper presents the results of ETISEO, a performance evaluation project for video surveillance systems. Many other projects have already evaluated the performance of video surveillance systems, but more on an end-user point of view. ETISEO aims at studying the dependency between algorithms and the video characteristics. Firstly we describe ETISEO methodology which consists in addressing each video processing problem separately. Secondly, we present the main evaluation metrics of ETISEO as well as their benefits, limitations and conditions of use. Finally, we discuss about the contributions of ETISEO to the evaluation community.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122384935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425338
Chung-Ching Chang, H. Aghajan
In this paper, we propose a joint face orientation estimation technique for face profile collection in smart camera networks. The system is composed of in-node coarse estimation and joint refined estimation between cameras. In-node signal processing algorithms are designed to be lightweight to reduce computation load, yielding coarse estimates which may be erroneous. The proposed model-based technique determines the orientation and the angular motion of the face using two features, namely the hair-face ratio and the head optical flow. These features yield an estimate of the face orientation and the angular velocity through least squares (LS) analysis. In the joint refined estimation step, a discrete-time linear dynamical model is defined. Spatiotemporal consistency between cameras is measured by a cost function, which is minimized through linear quadratic regulation (LQR) to yield a robust closed-loop feedback system that estimates the face orientation, angular motion, and relative angular difference to the face between cameras. Based on the face orientation estimates, a collection of face profile are accumulated over time as the human subject moves around. The proposed technique does not require camera locations to be known in prior, and hence is applicable to vision networks deployed casually without localization.
{"title":"A LQR spatiotemporal fusion technique for face profile collection in smart camera surveillance","authors":"Chung-Ching Chang, H. Aghajan","doi":"10.1109/AVSS.2007.4425338","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425338","url":null,"abstract":"In this paper, we propose a joint face orientation estimation technique for face profile collection in smart camera networks. The system is composed of in-node coarse estimation and joint refined estimation between cameras. In-node signal processing algorithms are designed to be lightweight to reduce computation load, yielding coarse estimates which may be erroneous. The proposed model-based technique determines the orientation and the angular motion of the face using two features, namely the hair-face ratio and the head optical flow. These features yield an estimate of the face orientation and the angular velocity through least squares (LS) analysis. In the joint refined estimation step, a discrete-time linear dynamical model is defined. Spatiotemporal consistency between cameras is measured by a cost function, which is minimized through linear quadratic regulation (LQR) to yield a robust closed-loop feedback system that estimates the face orientation, angular motion, and relative angular difference to the face between cameras. Based on the face orientation estimates, a collection of face profile are accumulated over time as the human subject moves around. The proposed technique does not require camera locations to be known in prior, and hence is applicable to vision networks deployed casually without localization.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133663562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425364
B. Valentine, S. Apewokin, L. Wills, D. S. Wills, A. Gentile
Traditional video scene analysis depends on accurate background modeling to identify salient foreground objects. However, in many important surveillance applications, saliency is defined by the appearance of a new non-ephemeral object that is between the foreground and background. This midground realm is defined by a temporal window following the object's appearance; but it also depends on adaptive background modeling to allow detection with scene variations (e.g., occlusion, small illumination changes). The human visual system is ill-suited for midground detection. For example, when surveying a busy airline terminal, it is difficult (but important) to detect an unattended bag which appears in the scene. This paper introduces a midground detection technique which emphasizes computational and storage efficiency. The approach uses a new adaptive, pixel-level modeling technique derived from existing backgrounding methods. Experimental results demonstrate that this technique can accurately and efficiently identify midground objects in real-world scenes, including PETS2006 and AVSS2007 challenge datasets.
{"title":"Midground object detection in real world video scenes","authors":"B. Valentine, S. Apewokin, L. Wills, D. S. Wills, A. Gentile","doi":"10.1109/AVSS.2007.4425364","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425364","url":null,"abstract":"Traditional video scene analysis depends on accurate background modeling to identify salient foreground objects. However, in many important surveillance applications, saliency is defined by the appearance of a new non-ephemeral object that is between the foreground and background. This midground realm is defined by a temporal window following the object's appearance; but it also depends on adaptive background modeling to allow detection with scene variations (e.g., occlusion, small illumination changes). The human visual system is ill-suited for midground detection. For example, when surveying a busy airline terminal, it is difficult (but important) to detect an unattended bag which appears in the scene. This paper introduces a midground detection technique which emphasizes computational and storage efficiency. The approach uses a new adaptive, pixel-level modeling technique derived from existing backgrounding methods. Experimental results demonstrate that this technique can accurately and efficiently identify midground objects in real-world scenes, including PETS2006 and AVSS2007 challenge datasets.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131209997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425374
Jinseok Lee, Byung Guk Kim, S. Cho, Sangjin Hong, W. Cho
This paper addresses the problem of 3-dimensional (3D) multitarget tracking using particle filter with the joint multitarget probability density (JMPD) technique. The estimation allows the nonlinear target motion with unlabeled measurement association as well as non-Gaussian target state densities. In addition, we decompose the 3D formulation into multiple 2D particle filters that operate on the 2D planes. Both selection and combining of the 2D particle filters for 3D tracking are presented and discussed. Finally, we analyze the tracking and association performance of the proposed approach especially in the cases of multitarget crossing and overlapping.
{"title":"Multitarget association and tracking in 3-D space based on particle filter with joint multitarget probability density","authors":"Jinseok Lee, Byung Guk Kim, S. Cho, Sangjin Hong, W. Cho","doi":"10.1109/AVSS.2007.4425374","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425374","url":null,"abstract":"This paper addresses the problem of 3-dimensional (3D) multitarget tracking using particle filter with the joint multitarget probability density (JMPD) technique. The estimation allows the nonlinear target motion with unlabeled measurement association as well as non-Gaussian target state densities. In addition, we decompose the 3D formulation into multiple 2D particle filters that operate on the 2D planes. Both selection and combining of the 2D particle filters for 3D tracking are presented and discussed. Finally, we analyze the tracking and association performance of the proposed approach especially in the cases of multitarget crossing and overlapping.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124291888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-05DOI: 10.1109/AVSS.2007.4425307
M. Kharbat, N. Aouf, A. Tsourdos, B. White
Capture mechanisms are used to transfer objects between two vehicles in the space with no physical contact. A sphere (canister) detection and tracking method using an enhanced Hough transform technique and Hinfin filter is proposed. The presented system aims to assist in the capture operation, currently investigated the European Space Agency and other partners, and to be used in space missions as an alternative to docking or berthing operations. Test results show the robustness and reliability of the proposed method. They also demonstrate the low computational and memory complexities needed.
{"title":"Sphere detection and tracking for a space capturing operation","authors":"M. Kharbat, N. Aouf, A. Tsourdos, B. White","doi":"10.1109/AVSS.2007.4425307","DOIUrl":"https://doi.org/10.1109/AVSS.2007.4425307","url":null,"abstract":"Capture mechanisms are used to transfer objects between two vehicles in the space with no physical contact. A sphere (canister) detection and tracking method using an enhanced Hough transform technique and Hinfin filter is proposed. The presented system aims to assist in the capture operation, currently investigated the European Space Agency and other partners, and to be used in space missions as an alternative to docking or berthing operations. Test results show the robustness and reliability of the proposed method. They also demonstrate the low computational and memory complexities needed.","PeriodicalId":371050,"journal":{"name":"2007 IEEE Conference on Advanced Video and Signal Based Surveillance","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121437844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}