Pub Date : 2007-06-17DOI: 10.1109/CVPR.2007.383500
Zhang Tao, T. Boult, R. C. Johnson
The concept of the Bayesian optimal single threshold is a well established and widely used classification technique. In this paper, we prove that when spatial cohesion is assumed for targets, a better classification result than the "optimal" single threshold classification can be achieved. Under the assumption of spatial cohesion and certain prior knowledge about the target and background, the method can be further simplified as dual threshold classification. In core-dual threshold classification, spatial cohesion within the target core allows "continuation" linking values to fall between the two thresholds to the target core; classical Bayesian classification is employed beyond the dual thresholds. The core-dual threshold algorithm can be built into a Markov random field model (MRF). From this MRF model, the dual thresholds can be obtained and optimal classification can be achieved. In some practical applications, a simple method called symmetric subtraction may be employed to determine effective dual thresholds in real time. Given dual thresholds, the quasi-connected component algorithm is shown to be a deterministic implementation of the MRF core-dual threshold model combining the dual thresholds, extended neighborhoods and efficient connected component computation.
{"title":"Two thresholds are better than one","authors":"Zhang Tao, T. Boult, R. C. Johnson","doi":"10.1109/CVPR.2007.383500","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383500","url":null,"abstract":"The concept of the Bayesian optimal single threshold is a well established and widely used classification technique. In this paper, we prove that when spatial cohesion is assumed for targets, a better classification result than the \"optimal\" single threshold classification can be achieved. Under the assumption of spatial cohesion and certain prior knowledge about the target and background, the method can be further simplified as dual threshold classification. In core-dual threshold classification, spatial cohesion within the target core allows \"continuation\" linking values to fall between the two thresholds to the target core; classical Bayesian classification is employed beyond the dual thresholds. The core-dual threshold algorithm can be built into a Markov random field model (MRF). From this MRF model, the dual thresholds can be obtained and optimal classification can be achieved. In some practical applications, a simple method called symmetric subtraction may be employed to determine effective dual thresholds in real time. Given dual thresholds, the quasi-connected component algorithm is shown to be a deterministic implementation of the MRF core-dual threshold model combining the dual thresholds, extended neighborhoods and efficient connected component computation.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123830289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-06-17DOI: 10.1109/CVPR.2007.383244
Aurélie Bugeau, P. Pérez
Detecting and segmenting moving objects in dynamic scenes is a hard but essential task in a number of applications such as surveillance. Most existing methods only give good results in the case of persistent or slowly changing background, or if both the objects and the background are rigid. In this paper, we propose a new method for direct detection and segmentation of foreground moving objects in the absence of such constraints. First, groups of pixels having similar motion and photometric features are extracted. For this first step only a sub-grid of image pixels is used to reduce computational cost and improve robustness to noise. We introduce the use of p-value to validate optical flow estimates and of automatic bandwidth selection in the mean shift clustering algorithm. In a second stage, segmentation of the object associated to a given cluster is performed in a MAP/MRF framework. Our method is able to handle moving camera and several different motions in the background. Experiments on challenging sequences show the performance of the proposed method and its utility for video analysis in complex scenes.
{"title":"Detection and segmentation of moving objects in highly dynamic scenes","authors":"Aurélie Bugeau, P. Pérez","doi":"10.1109/CVPR.2007.383244","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383244","url":null,"abstract":"Detecting and segmenting moving objects in dynamic scenes is a hard but essential task in a number of applications such as surveillance. Most existing methods only give good results in the case of persistent or slowly changing background, or if both the objects and the background are rigid. In this paper, we propose a new method for direct detection and segmentation of foreground moving objects in the absence of such constraints. First, groups of pixels having similar motion and photometric features are extracted. For this first step only a sub-grid of image pixels is used to reduce computational cost and improve robustness to noise. We introduce the use of p-value to validate optical flow estimates and of automatic bandwidth selection in the mean shift clustering algorithm. In a second stage, segmentation of the object associated to a given cluster is performed in a MAP/MRF framework. Our method is able to handle moving camera and several different motions in the background. Experiments on challenging sequences show the performance of the proposed method and its utility for video analysis in complex scenes.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123145775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-06-17DOI: 10.1109/CVPR.2007.383227
S. Biswas, G. Aggarwal, R. Chellappa
Most shape matching methods are either fast but too simplistic to give the desired performance or promising as far as performance is concerned but computationally demanding. In this paper, we present a very simple and efficient approach that not only performs almost as good as many state-of-the-art techniques but also scales up to large databases. In the proposed approach, each shape is indexed based on a variety of simple and easily computable features which are invariant to articulations and rigid transformations. The features characterize pairwise geometric relationships between interest points on the shape, thereby providing robustness to the approach. Shapes are retrieved using an efficient scheme which does not involve costly operations like shape-wise alignment or establishing correspondences. Even for a moderate size database of 1000 shapes, the retrieval process is several times faster than most techniques with similar performance. Extensive experimental results are presented to illustrate the advantages of our approach as compared to the best in the field.
{"title":"Efficient Indexing For Articulation Invariant Shape Matching And Retrieval","authors":"S. Biswas, G. Aggarwal, R. Chellappa","doi":"10.1109/CVPR.2007.383227","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383227","url":null,"abstract":"Most shape matching methods are either fast but too simplistic to give the desired performance or promising as far as performance is concerned but computationally demanding. In this paper, we present a very simple and efficient approach that not only performs almost as good as many state-of-the-art techniques but also scales up to large databases. In the proposed approach, each shape is indexed based on a variety of simple and easily computable features which are invariant to articulations and rigid transformations. The features characterize pairwise geometric relationships between interest points on the shape, thereby providing robustness to the approach. Shapes are retrieved using an efficient scheme which does not involve costly operations like shape-wise alignment or establishing correspondences. Even for a moderate size database of 1000 shapes, the retrieval process is several times faster than most techniques with similar performance. Extensive experimental results are presented to illustrate the advantages of our approach as compared to the best in the field.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123642540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-06-17DOI: 10.1109/CVPR.2007.383502
Keni Bernardin, F. V. D. Camp, R. Stiefelhagen
This paper presents an automatic system for the monitoring of indoor environments using pan-tilt-zoomable cameras. A combination of Haar-feature classifier-based detection and color histogram filtering is used to achieve reliable initialization of person tracks even in the presence of camera movement. A combination of adaptive color and KLT feature trackers for face and upper body allows for robust tracking and track recovery in the presence of occlusion or interference. The continuous recomputation of camera parameters, coupled with a fuzzy controlling scheme allow for smooth tracking of moving targets as well as acquisition of stable facial close ups, similar to the natural behavior of a human cameraman. The system is tested on a series of natural indoor monitoring scenarios and shows a high degree of naturalness, flexibility and robustness.
{"title":"Automatic Person Detection and Tracking using Fuzzy Controlled Active Cameras","authors":"Keni Bernardin, F. V. D. Camp, R. Stiefelhagen","doi":"10.1109/CVPR.2007.383502","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383502","url":null,"abstract":"This paper presents an automatic system for the monitoring of indoor environments using pan-tilt-zoomable cameras. A combination of Haar-feature classifier-based detection and color histogram filtering is used to achieve reliable initialization of person tracks even in the presence of camera movement. A combination of adaptive color and KLT feature trackers for face and upper body allows for robust tracking and track recovery in the presence of occlusion or interference. The continuous recomputation of camera parameters, coupled with a fuzzy controlling scheme allow for smooth tracking of moving targets as well as acquisition of stable facial close ups, similar to the natural behavior of a human cameraman. The system is tested on a series of natural indoor monitoring scenarios and shows a high degree of naturalness, flexibility and robustness.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120878596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-06-17DOI: 10.1109/CVPR.2007.383180
Hao Jiang, S. Fels, J. Little
We propose a linear programming relaxation scheme for the class of multiple object tracking problems where the inter-object interaction metric is convex and the intra-object term quantifying object state continuity may use any metric. The proposed scheme models object tracking as a multi-path searching problem. It explicitly models track interaction, such as object spatial layout consistency or mutual occlusion, and optimizes multiple object tracks simultaneously. The proposed scheme does not rely on track initialization and complex heuristics. It has much less average complexity than previous efficient exhaustive search methods such as extended dynamic programming and is found to be able to find the global optimum with high probability. We have successfully applied the proposed method to multiple object tracking in video streams.
{"title":"A Linear Programming Approach for Multiple Object Tracking","authors":"Hao Jiang, S. Fels, J. Little","doi":"10.1109/CVPR.2007.383180","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383180","url":null,"abstract":"We propose a linear programming relaxation scheme for the class of multiple object tracking problems where the inter-object interaction metric is convex and the intra-object term quantifying object state continuity may use any metric. The proposed scheme models object tracking as a multi-path searching problem. It explicitly models track interaction, such as object spatial layout consistency or mutual occlusion, and optimizes multiple object tracks simultaneously. The proposed scheme does not rely on track initialization and complex heuristics. It has much less average complexity than previous efficient exhaustive search methods such as extended dynamic programming and is found to be able to find the global optimum with high probability. We have successfully applied the proposed method to multiple object tracking in video streams.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121176023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-06-17DOI: 10.1109/CVPR.2007.383300
Edgar Seemann, Mario Fritz, B. Schiele
Object class detection in scenes of realistic complexity remains a challenging task in computer vision. Most recent approaches focus on a single and general model for object class detection. However, in particular in the context of image sequences, it may be advantageous to adapt the general model to a more object-instance specific model in order to detect this particular object reliably within the image sequence. In this work we present a generative object model that is capable to scale from a general object class model to a more specific object-instance model. This allows to detect class instances as well as to distinguish between individual object instances reliably. We experimentally evaluate the performance of the proposed system on both still images and image sequences.
{"title":"Towards Robust Pedestrian Detection in Crowded Image Sequences","authors":"Edgar Seemann, Mario Fritz, B. Schiele","doi":"10.1109/CVPR.2007.383300","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383300","url":null,"abstract":"Object class detection in scenes of realistic complexity remains a challenging task in computer vision. Most recent approaches focus on a single and general model for object class detection. However, in particular in the context of image sequences, it may be advantageous to adapt the general model to a more object-instance specific model in order to detect this particular object reliably within the image sequence. In this work we present a generative object model that is capable to scale from a general object class model to a more specific object-instance model. This allows to detect class instances as well as to distinguish between individual object instances reliably. We experimentally evaluate the performance of the proposed system on both still images and image sequences.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114200809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-06-17DOI: 10.1109/CVPR.2007.383347
Ruiduo Yang, Sudeep Sarkar, B. Loeding
One of the hard problems in automated sign language recognition is the movement epenthesis (me) problem. Movement epenthesis is the gesture movement that bridges two consecutive signs. This effect can be over a long duration and involve variations in hand shape, position, and movement, making it hard to explicitly model these intervening segments. This creates a problem when trying to match individual signs to full sign sentences since for many chunks of the sentence, corresponding to these mes, we do not have models. We present an approach based on version of a dynamic programming framework, called Level Building, to simultaneously segment and match signs to continuous sign language sentences in the presence of movement epenthesis (me). We enhance the classical Level Building framework so that it can accommodate me labels for which we do not have explicit models. This enhanced Level Building algorithm is then coupled with a trigram grammar model to optimally segment and label sign language sentences. We demonstrate the efficiency of the algorithm using a single view video dataset of continuous sign language sentences. We obtain 83% word level recognition rate with the enhanced Level Building approach, as opposed to a 20% recognition rate using a classical Level Building framework on the same dataset. The proposed approach is novel since it does not need explicit models for movement epenthesis.
{"title":"Enhanced Level Building Algorithm for the Movement Epenthesis Problem in Sign Language Recognition","authors":"Ruiduo Yang, Sudeep Sarkar, B. Loeding","doi":"10.1109/CVPR.2007.383347","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383347","url":null,"abstract":"One of the hard problems in automated sign language recognition is the movement epenthesis (me) problem. Movement epenthesis is the gesture movement that bridges two consecutive signs. This effect can be over a long duration and involve variations in hand shape, position, and movement, making it hard to explicitly model these intervening segments. This creates a problem when trying to match individual signs to full sign sentences since for many chunks of the sentence, corresponding to these mes, we do not have models. We present an approach based on version of a dynamic programming framework, called Level Building, to simultaneously segment and match signs to continuous sign language sentences in the presence of movement epenthesis (me). We enhance the classical Level Building framework so that it can accommodate me labels for which we do not have explicit models. This enhanced Level Building algorithm is then coupled with a trigram grammar model to optimally segment and label sign language sentences. We demonstrate the efficiency of the algorithm using a single view video dataset of continuous sign language sentences. We obtain 83% word level recognition rate with the enhanced Level Building approach, as opposed to a 20% recognition rate using a classical Level Building framework on the same dataset. The proposed approach is novel since it does not need explicit models for movement epenthesis.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":" 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113949543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-06-17DOI: 10.1109/CVPR.2007.383179
Selim Benhimane, A. Ladikos, V. Lepetit, Nassir Navab
We propose a method that dramatically improves the performance of template-based matching in terms of size of convergence region and computation time. This is done by selecting a subset of the template that verifies the assumption (made during optimization) of linearity or quadraticity with respect to the motion parameters. We call these subsets linear or quadratic subsets. While subset selection approaches have already been proposed, they generally do not attempt to provide linear or quadratic subsets and rely on heuristics such as textured-ness. Because a naive search for the optimal subset would result in a combinatorial explosion for large templates, we propose a simple algorithm that does not aim for the optimal subset but provides a very good linear or quadratic subset at low cost, even for large templates. Simulation results and experiments with real sequences show the superiority of the proposed method compared to existing subset selection approaches.
{"title":"Linear and Quadratic Subsets for Template-Based Tracking","authors":"Selim Benhimane, A. Ladikos, V. Lepetit, Nassir Navab","doi":"10.1109/CVPR.2007.383179","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383179","url":null,"abstract":"We propose a method that dramatically improves the performance of template-based matching in terms of size of convergence region and computation time. This is done by selecting a subset of the template that verifies the assumption (made during optimization) of linearity or quadraticity with respect to the motion parameters. We call these subsets linear or quadratic subsets. While subset selection approaches have already been proposed, they generally do not attempt to provide linear or quadratic subsets and rely on heuristics such as textured-ness. Because a naive search for the optimal subset would result in a combinatorial explosion for large templates, we propose a simple algorithm that does not aim for the optimal subset but provides a very good linear or quadratic subset at low cost, even for large templates. Simulation results and experiments with real sequences show the superiority of the proposed method compared to existing subset selection approaches.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126363470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-06-17DOI: 10.1109/CVPR.2007.383503
Lun Zhang, S. Li, Xiao-Tong Yuan, Shiming Xiang
Classifying moving objects to semantically meaningful categories is important for automatic visual surveillance. However, this is a challenging problem due to the factors related to the limited object size, large intra-class variations of objects in a same class owing to different viewing angles and lighting, and real-time performance requirement in real-world applications. This paper describes an appearance-based method to achieve real-time and robust objects classification in diverse camera viewing angles. A new descriptor, i.e., the multi-block local binary pattern (MB-LBP), is proposed to capture the large-scale structures in object appearances. Based on MB-LBP features, an adaBoost algorithm is introduced to select a subset of discriminative features as well as construct the strong two-class classifier. To deal with the non-metric feature value of MB-LBP features, a multi-branch regression tree is developed as the weak classifiers of the boosting. Finally, the error correcting output code (ECOC) is introduced to achieve robust multi-class classification performance. Experimental results show that our approach can achieve real-time and robust object classification in diverse scenes.
{"title":"Real-time Object Classification in Video Surveillance Based on Appearance Learning","authors":"Lun Zhang, S. Li, Xiao-Tong Yuan, Shiming Xiang","doi":"10.1109/CVPR.2007.383503","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383503","url":null,"abstract":"Classifying moving objects to semantically meaningful categories is important for automatic visual surveillance. However, this is a challenging problem due to the factors related to the limited object size, large intra-class variations of objects in a same class owing to different viewing angles and lighting, and real-time performance requirement in real-world applications. This paper describes an appearance-based method to achieve real-time and robust objects classification in diverse camera viewing angles. A new descriptor, i.e., the multi-block local binary pattern (MB-LBP), is proposed to capture the large-scale structures in object appearances. Based on MB-LBP features, an adaBoost algorithm is introduced to select a subset of discriminative features as well as construct the strong two-class classifier. To deal with the non-metric feature value of MB-LBP features, a multi-branch regression tree is developed as the weak classifiers of the boosting. Finally, the error correcting output code (ECOC) is introduced to achieve robust multi-class classification performance. Experimental results show that our approach can achieve real-time and robust object classification in diverse scenes.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128130862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-06-17DOI: 10.1109/CVPR.2007.383243
Bi Song, Namrata Vaswani, A. Roy-Chowdhury
We present a novel framework for tracking of a long sequence of human activities, including the time instances of change from one activity to the next, using a closed-loop, non-linear dynamical feedback system. A composite feature vector describing the shape, color and motion of the objects, and a non-linear, piecewise stationary, stochastic dynamical model describing its spatio-temporal evolution, are used for tracking. The tracking error or expected log likelihood, which serves as a feedback signal, is used to automatically detect changes and switch between activities happening one after another in a long video sequence. Whenever a change is detected, the tracker is re initialized automatically by comparing the input image with learned models of the activities. Unlike some other approaches that can track a sequence of activities, we do not need to know the transition probabilities between the activities, which can be difficult to estimate in many application scenarios. We demonstrate the effectiveness of the method on multiple indoor and outdoor real-life videos and analyze its performance.
{"title":"Closed-Loop Tracking and Change Detection in Multi-Activity Sequences","authors":"Bi Song, Namrata Vaswani, A. Roy-Chowdhury","doi":"10.1109/CVPR.2007.383243","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383243","url":null,"abstract":"We present a novel framework for tracking of a long sequence of human activities, including the time instances of change from one activity to the next, using a closed-loop, non-linear dynamical feedback system. A composite feature vector describing the shape, color and motion of the objects, and a non-linear, piecewise stationary, stochastic dynamical model describing its spatio-temporal evolution, are used for tracking. The tracking error or expected log likelihood, which serves as a feedback signal, is used to automatically detect changes and switch between activities happening one after another in a long video sequence. Whenever a change is detected, the tracker is re initialized automatically by comparing the input image with learned models of the activities. Unlike some other approaches that can track a sequence of activities, we do not need to know the transition probabilities between the activities, which can be difficult to estimate in many application scenarios. We demonstrate the effectiveness of the method on multiple indoor and outdoor real-life videos and analyze its performance.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134535112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}