Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206641
Ramin Mehran, Alexis Oyama, M. Shah
In this paper we introduce a novel method to detect and localize abnormal behaviors in crowd videos using Social Force model. For this purpose, a grid of particles is placed over the image and it is advected with the space-time average of optical flow. By treating the moving particles as individuals, their interaction forces are estimated using social force model. The interaction force is then mapped into the image plane to obtain Force Flow for every pixel in every frame. Randomly selected spatio-temporal volumes of Force Flow are used to model the normal behavior of the crowd. We classify frames as normal and abnormal by using a bag of words approach. The regions of anomalies in the abnormal frames are localized using interaction forces. The experiments are conducted on a publicly available dataset from University of Minnesota for escape panic scenarios and a challenging dataset of crowd videos taken from the web. The experiments show that the proposed method captures the dynamics of the crowd behavior successfully. In addition, we have shown that the social force approach outperforms similar approaches based on pure optical flow.
{"title":"Abnormal crowd behavior detection using social force model","authors":"Ramin Mehran, Alexis Oyama, M. Shah","doi":"10.1109/CVPR.2009.5206641","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206641","url":null,"abstract":"In this paper we introduce a novel method to detect and localize abnormal behaviors in crowd videos using Social Force model. For this purpose, a grid of particles is placed over the image and it is advected with the space-time average of optical flow. By treating the moving particles as individuals, their interaction forces are estimated using social force model. The interaction force is then mapped into the image plane to obtain Force Flow for every pixel in every frame. Randomly selected spatio-temporal volumes of Force Flow are used to model the normal behavior of the crowd. We classify frames as normal and abnormal by using a bag of words approach. The regions of anomalies in the abnormal frames are localized using interaction forces. The experiments are conducted on a publicly available dataset from University of Minnesota for escape panic scenarios and a challenging dataset of crowd videos taken from the web. The experiments show that the proposed method captures the dynamics of the crowd behavior successfully. In addition, we have shown that the social force approach outperforms similar approaches based on pure optical flow.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115598452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206782
M. Pechaud, R. Keriven, G. Peyré
This paper presents a new method to extract tubular structures from bi-dimensional images. The core of the proposed algorithm is the computation of geodesic curves over a four-dimensional space that includes local orientation and scale. These shortest paths follow closely the centerline of tubular structures, provide an estimation of the radius and can deal robustly with crossings over the image plane. Numerical experiments on a database of synthetic and natural images show the superiority of the proposed approach with respect to several method based on shortest paths extractions.
{"title":"Extraction of tubular structures over an orientation domain","authors":"M. Pechaud, R. Keriven, G. Peyré","doi":"10.1109/CVPR.2009.5206782","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206782","url":null,"abstract":"This paper presents a new method to extract tubular structures from bi-dimensional images. The core of the proposed algorithm is the computation of geodesic curves over a four-dimensional space that includes local orientation and scale. These shortest paths follow closely the centerline of tubular structures, provide an estimation of the radius and can deal robustly with crossings over the image plane. Numerical experiments on a database of synthetic and natural images show the superiority of the proposed approach with respect to several method based on shortest paths extractions.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124372380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206857
Feng Liu, Michael Gleicher
This paper presents an algorithm for automatically detecting and segmenting a moving object from a monocular video. Detecting and segmenting a moving object from a video with limited object motion is challenging. Since existing automatic algorithms rely on motion to detect the moving object, they cannot work well when the object motion is sparse and insufficient. In this paper, we present an unsupervised algorithm to learn object color and locality cues from the sparse motion information. We first detect key frames with reliable motion cues and then estimate moving sub-objects based on these motion cues using a Markov Random Field (MRF) framework. From these sub-objects, we learn an appearance model as a color Gaussian Mixture Model. To avoid the false classification of background pixels with similar color to the moving objects, the locations of these sub-objects are propagated to neighboring frames as locality cues. Finally, robust moving object segmentation is achieved by combining these learned color and locality cues with motion cues in a MRF framework. Experiments on videos with a variety of object and camera motion demonstrate the effectiveness of this algorithm.
{"title":"Learning color and locality cues for moving object detection and segmentation","authors":"Feng Liu, Michael Gleicher","doi":"10.1109/CVPR.2009.5206857","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206857","url":null,"abstract":"This paper presents an algorithm for automatically detecting and segmenting a moving object from a monocular video. Detecting and segmenting a moving object from a video with limited object motion is challenging. Since existing automatic algorithms rely on motion to detect the moving object, they cannot work well when the object motion is sparse and insufficient. In this paper, we present an unsupervised algorithm to learn object color and locality cues from the sparse motion information. We first detect key frames with reliable motion cues and then estimate moving sub-objects based on these motion cues using a Markov Random Field (MRF) framework. From these sub-objects, we learn an appearance model as a color Gaussian Mixture Model. To avoid the false classification of background pixels with similar color to the moving objects, the locations of these sub-objects are propagated to neighboring frames as locality cues. Finally, robust moving object segmentation is achieved by combining these learned color and locality cues with motion cues in a MRF framework. Experiments on videos with a variety of object and camera motion demonstrate the effectiveness of this algorithm.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117234308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206590
Daniel Munoz, Andrew Bagnell
We address the problem of label assignment in computer vision: given a novel 3D or 2D scene, we wish to assign a unique label to every site (voxel, pixel, superpixel, etc.). To this end, the Markov Random Field framework has proven to be a model of choice as it uses contextual information to yield improved classification results over locally independent classifiers. In this work we adapt a functional gradient approach for learning high-dimensional parameters of random fields in order to perform discrete, multi-label classification. With this approach we can learn robust models involving high-order interactions better than the previously used learning method. We validate the approach in the context of point cloud classification and improve the state of the art. In addition, we successfully demonstrate the generality of the approach on the challenging vision problem of recovering 3-D geometric surfaces from images.
{"title":"Contextual classification with functional Max-Margin Markov Networks","authors":"Daniel Munoz, Andrew Bagnell","doi":"10.1109/CVPR.2009.5206590","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206590","url":null,"abstract":"We address the problem of label assignment in computer vision: given a novel 3D or 2D scene, we wish to assign a unique label to every site (voxel, pixel, superpixel, etc.). To this end, the Markov Random Field framework has proven to be a model of choice as it uses contextual information to yield improved classification results over locally independent classifiers. In this work we adapt a functional gradient approach for learning high-dimensional parameters of random fields in order to perform discrete, multi-label classification. With this approach we can learn robust models involving high-order interactions better than the previously used learning method. We validate the approach in the context of point cloud classification and improve the state of the art. In addition, we successfully demonstrate the generality of the approach on the challenging vision problem of recovering 3-D geometric surfaces from images.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121058716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206616
P. Roth, Sabine Sternig, H. Grabner, H. Bischof
In this paper we present an adaptive but robust object detector for static cameras by introducing classifier grids. Instead of using a sliding window for object detection we propose to train a separate classifier for each image location, obtaining a very specific object detector with a low false alarm rate. For each classifier corresponding to a grid element we estimate two generative representations in parallel, one describing the object's class and one describing the background. These are combined in order to obtain a discriminative model. To enable to adapt to changing environments these classifiers are learned on-line (i.e., boosting). Continuously learning (24 hours a day, 7 days a week) requires a stable system. In our method this is ensured by a fixed object representation while updating only the representation of the background. We demonstrate the stability in a long-term experiment by running the system for a whole week, which shows a stable performance over time. In addition, we compare the proposed approach to state-of-the-art methods in the field of person and car detection. In both cases we obtain competitive results.
{"title":"Classifier grids for robust adaptive object detection","authors":"P. Roth, Sabine Sternig, H. Grabner, H. Bischof","doi":"10.1109/CVPR.2009.5206616","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206616","url":null,"abstract":"In this paper we present an adaptive but robust object detector for static cameras by introducing classifier grids. Instead of using a sliding window for object detection we propose to train a separate classifier for each image location, obtaining a very specific object detector with a low false alarm rate. For each classifier corresponding to a grid element we estimate two generative representations in parallel, one describing the object's class and one describing the background. These are combined in order to obtain a discriminative model. To enable to adapt to changing environments these classifiers are learned on-line (i.e., boosting). Continuously learning (24 hours a day, 7 days a week) requires a stable system. In our method this is ensured by a fixed object representation while updating only the representation of the background. We demonstrate the stability in a long-term experiment by running the system for a whole week, which shows a stable performance over time. In addition, we compare the proposed approach to state-of-the-art methods in the field of person and car detection. In both cases we obtain competitive results.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124904814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206692
Peng Wang, Terrence Chen, Ying Zhu, Wei Zhang, S. Zhou, D. Comaniciu
A guidewire is a medical device inserted into vessels during image guided interventions for balloon inflation. During interventions, the guidewire undergoes non-rigid deformation due to patients' breathing and cardiac motions, and such 3D motions are complicated when being projected onto the 2D fluoroscopy. Furthermore, in fluoroscopy there exist severe image artifacts and other wire-like structures. All these make robust guidewire tracking challenging. To address these challenges, this paper presents a probabilistic framework for robust guidewire tracking. We first introduce a semantic guidewire model that contains three parts, including a catheter tip, a guidewire tip and a guidewire body. Measurements of different parts are integrated into a Bayesian framework as measurements of a whole guidewire for robust guidewire tracking. Moreover, for each part, two types of measurements, one from learning-based detectors and the other from online appearance models, are applied and combined. A hierarchical and multi-resolution tracking scheme is then developed based on kernel-based measurement smoothing to track guidewires effectively and efficiently in a coarse-to-fine manner. The presented framework has been validated on a test set of 47 sequences, and achieves a mean tracking error of less than 2 pixels. This demonstrates the great potential of our method for clinical applications.
{"title":"Robust guidewire tracking in fluoroscopy","authors":"Peng Wang, Terrence Chen, Ying Zhu, Wei Zhang, S. Zhou, D. Comaniciu","doi":"10.1109/CVPR.2009.5206692","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206692","url":null,"abstract":"A guidewire is a medical device inserted into vessels during image guided interventions for balloon inflation. During interventions, the guidewire undergoes non-rigid deformation due to patients' breathing and cardiac motions, and such 3D motions are complicated when being projected onto the 2D fluoroscopy. Furthermore, in fluoroscopy there exist severe image artifacts and other wire-like structures. All these make robust guidewire tracking challenging. To address these challenges, this paper presents a probabilistic framework for robust guidewire tracking. We first introduce a semantic guidewire model that contains three parts, including a catheter tip, a guidewire tip and a guidewire body. Measurements of different parts are integrated into a Bayesian framework as measurements of a whole guidewire for robust guidewire tracking. Moreover, for each part, two types of measurements, one from learning-based detectors and the other from online appearance models, are applied and combined. A hierarchical and multi-resolution tracking scheme is then developed based on kernel-based measurement smoothing to track guidewires effectively and efficiently in a coarse-to-fine manner. The presented framework has been validated on a test set of 47 sequences, and achieves a mean tracking error of less than 2 pixels. This demonstrates the great potential of our method for clinical applications.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125082165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206568
Tingting Jiang, F. Jurie, C. Schmid
The aim of this work is to learn a shape prior model for an object class and to improve shape matching with the learned shape prior. Given images of example instances, we can learn a mean shape of the object class as well as the variations of non-affine and affine transformations separately based on the thin plate spline (TPS) parameterization. Unlike previous methods, for learning, we represent shapes by vector fields instead of features which makes our learning approach general. During shape matching, we inject the shape prior knowledge and make the matching result consistent with the training examples. This is achieved by an extension of the TPS-RPM algorithm which finds a closed form solution for the TPS transformation coherent with the learned transformations. We test our approach by using it to learn shape prior models for all the five object classes in the ETHZ Shape Classes. The results show that the learning accuracy is better than previous work and the learned shape prior models are helpful for object matching in real applications such as object classification.
{"title":"Learning shape prior models for object matching","authors":"Tingting Jiang, F. Jurie, C. Schmid","doi":"10.1109/CVPR.2009.5206568","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206568","url":null,"abstract":"The aim of this work is to learn a shape prior model for an object class and to improve shape matching with the learned shape prior. Given images of example instances, we can learn a mean shape of the object class as well as the variations of non-affine and affine transformations separately based on the thin plate spline (TPS) parameterization. Unlike previous methods, for learning, we represent shapes by vector fields instead of features which makes our learning approach general. During shape matching, we inject the shape prior knowledge and make the matching result consistent with the training examples. This is achieved by an extension of the TPS-RPM algorithm which finds a closed form solution for the TPS transformation coherent with the learned transformations. We test our approach by using it to learn shape prior models for all the five object classes in the ETHZ Shape Classes. The results show that the learning accuracy is better than previous work and the learned shape prior models are helpful for object matching in real applications such as object classification.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123521409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206805
Tom Yeh, John J. Lee, Trevor Darrell
Object localization and recognition are important problems in computer vision. However, in many applications, exhaustive search over all object models and image locations is computationally prohibitive. While several methods have been proposed to make either recognition or localization more efficient, few have dealt with both tasks simultaneously. This paper proposes an efficient method for concurrent object localization and recognition based on a data-dependent multi-class branch-and-bound formalism. Existing bag-of-features recognition techniques which can be expressed as weighted combinations of feature counts can be readily adapted to our method. We present experimental results that demonstrate the merit of our algorithm in terms of recognition accuracy, localization accuracy, and speed, compared to baseline approaches including exhaustive search, implicit-shape model (ISM), and efficient sub-window search (ESS). Moreover, we develop two extensions to consider non-rectangular bounding regions-composite boxes and polygons-and demonstrate their ability to achieve higher recognition scores compared to traditional rectangular bounding boxes.
{"title":"Fast concurrent object localization and recognition","authors":"Tom Yeh, John J. Lee, Trevor Darrell","doi":"10.1109/CVPR.2009.5206805","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206805","url":null,"abstract":"Object localization and recognition are important problems in computer vision. However, in many applications, exhaustive search over all object models and image locations is computationally prohibitive. While several methods have been proposed to make either recognition or localization more efficient, few have dealt with both tasks simultaneously. This paper proposes an efficient method for concurrent object localization and recognition based on a data-dependent multi-class branch-and-bound formalism. Existing bag-of-features recognition techniques which can be expressed as weighted combinations of feature counts can be readily adapted to our method. We present experimental results that demonstrate the merit of our algorithm in terms of recognition accuracy, localization accuracy, and speed, compared to baseline approaches including exhaustive search, implicit-shape model (ISM), and efficient sub-window search (ESS). Moreover, we develop two extensions to consider non-rectangular bounding regions-composite boxes and polygons-and demonstrate their ability to achieve higher recognition scores compared to traditional rectangular bounding boxes.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125493676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206648
Yang Cong, Haifeng Gong, Song-Chun Zhu, Yandong Tang
In this paper, we present a novel algorithm based on flow velocity field estimation to count the number of pedestrians across a detection line or inside a specified region. We regard pedestrians across the line as fluid flow, and design a novel model to estimate the flow velocity field. By integrating over time, the dynamic mosaics are constructed to count the number of pixels and edges passed through the line. Consequentially, the number of pedestrians can be estimated by quadratic regression, with the number of weighted pixels and edges as input. The regressors are learned off line from several camera tilt angles, and have taken the calibration information into account. We use tilt-angle-specific learning to ensure direct deployment and avoid overfitting while the commonly used scene-specific learning scheme needs on-site annotation and always trends to overfitting. Experiments on a variety of videos verified that the proposed method can give accurate estimation under different camera setup in real-time.
{"title":"Flow mosaicking: Real-time pedestrian counting without scene-specific learning","authors":"Yang Cong, Haifeng Gong, Song-Chun Zhu, Yandong Tang","doi":"10.1109/CVPR.2009.5206648","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206648","url":null,"abstract":"In this paper, we present a novel algorithm based on flow velocity field estimation to count the number of pedestrians across a detection line or inside a specified region. We regard pedestrians across the line as fluid flow, and design a novel model to estimate the flow velocity field. By integrating over time, the dynamic mosaics are constructed to count the number of pixels and edges passed through the line. Consequentially, the number of pedestrians can be estimated by quadratic regression, with the number of weighted pixels and edges as input. The regressors are learned off line from several camera tilt angles, and have taken the calibration information into account. We use tilt-angle-specific learning to ensure direct deployment and avoid overfitting while the commonly used scene-specific learning scheme needs on-site annotation and always trends to overfitting. Experiments on a variety of videos verified that the proposed method can give accurate estimation under different camera setup in real-time.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127030831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206538
Richard Roberts, C. Potthast, F. Dellaert
This paper deals with estimation of dense optical flow and ego-motion in a generalized imaging system by exploiting probabilistic linear subspace constraints on the flow. We deal with the extended motion of the imaging system through an environment that we assume to have some degree of statistical regularity. For example, in autonomous ground vehicles the structure of the environment around the vehicle is far from arbitrary, and the depth at each pixel is often approximately constant. The subspace constraints hold not only for perspective cameras, but in fact for a very general class of imaging systems, including catadioptric and multiple-view systems. Using minimal assumptions about the imaging system, we learn a probabilistic subspace constraint that captures the statistical regularity of the scene geometry relative to an imaging system. We propose an extension to probabilistic PCA (Tipping and Bishop, 1999) as a way to robustly learn this subspace from recorded imagery, and demonstrate its use in conjunction with a sparse optical flow algorithm. To deal with the sparseness of the input flow, we use a generative model to estimate the subspace using only the observed flow measurements. Additionally, to identify and cope with image regions that violate subspace constraints, such as moving objects, objects that violate the depth regularity, or gross flow estimation errors, we employ a per-pixel Gaussian mixture outlier process. We demonstrate results of finding the optical flow subspaces and employing them to estimate dense flow and to recover camera motion for a variety of imaging systems in several different environments.
本文利用广义成像系统中密集光流的概率线性子空间约束,研究了密集光流和自运动的估计。我们通过假设具有一定程度的统计规律性的环境来处理成像系统的扩展运动。例如,在自主地面车辆中,车辆周围环境的结构远非任意的,并且每个像素处的深度通常近似恒定。子空间的限制不仅适用于透视相机,实际上也适用于非常一般的成像系统,包括反射镜和多视角系统。使用关于成像系统的最小假设,我们学习了一个概率子空间约束,该约束捕获了相对于成像系统的场景几何的统计规律性。我们提出了对概率PCA的扩展(Tipping and Bishop, 1999),作为一种从记录图像中鲁棒学习该子空间的方法,并演示了它与稀疏光流算法的结合使用。为了处理输入流的稀疏性,我们使用生成模型仅使用观察到的流量测量来估计子空间。此外,为了识别和处理违反子空间约束的图像区域,如移动物体、违反深度规则的物体或总流量估计误差,我们采用了每像素高斯混合离群值过程。我们展示了寻找光流子空间的结果,并利用它们来估计密集流,并在几个不同的环境中恢复各种成像系统的相机运动。
{"title":"Learning general optical flow subspaces for egomotion estimation and detection of motion anomalies","authors":"Richard Roberts, C. Potthast, F. Dellaert","doi":"10.1109/CVPR.2009.5206538","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206538","url":null,"abstract":"This paper deals with estimation of dense optical flow and ego-motion in a generalized imaging system by exploiting probabilistic linear subspace constraints on the flow. We deal with the extended motion of the imaging system through an environment that we assume to have some degree of statistical regularity. For example, in autonomous ground vehicles the structure of the environment around the vehicle is far from arbitrary, and the depth at each pixel is often approximately constant. The subspace constraints hold not only for perspective cameras, but in fact for a very general class of imaging systems, including catadioptric and multiple-view systems. Using minimal assumptions about the imaging system, we learn a probabilistic subspace constraint that captures the statistical regularity of the scene geometry relative to an imaging system. We propose an extension to probabilistic PCA (Tipping and Bishop, 1999) as a way to robustly learn this subspace from recorded imagery, and demonstrate its use in conjunction with a sparse optical flow algorithm. To deal with the sparseness of the input flow, we use a generative model to estimate the subspace using only the observed flow measurements. Additionally, to identify and cope with image regions that violate subspace constraints, such as moving objects, objects that violate the depth regularity, or gross flow estimation errors, we employ a per-pixel Gaussian mixture outlier process. We demonstrate results of finding the optical flow subspaces and employing them to estimate dense flow and to recover camera motion for a variety of imaging systems in several different environments.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115199983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}