Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710741
T. Heap, David C. Hogg
Existing object tracking algorithms generally use some form of local optimisation, assuming that an object's position and shape change smoothly over time. In some situations this assumption is not valid: the track able shape of an object may change discontinuously, for example if it is the 2D silhouette of a 3D object. In this paper we propose a novel method for modelling temporal shape discontinuities explicitly. Allowable shapes are represented as a union of (learned) bounded regions within a shape space. Discontinuous shape changes are described in terms of transitions between these regions. Transition probabilities are learned from training sequences and stored in a Markov model. In this way we can create 'wormholes' in shape space. Tracking with such models is via an adaptation, of the CONDENSATION algorithm.
{"title":"Wormholes in shape space: tracking through discontinuous changes in shape","authors":"T. Heap, David C. Hogg","doi":"10.1109/ICCV.1998.710741","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710741","url":null,"abstract":"Existing object tracking algorithms generally use some form of local optimisation, assuming that an object's position and shape change smoothly over time. In some situations this assumption is not valid: the track able shape of an object may change discontinuously, for example if it is the 2D silhouette of a 3D object. In this paper we propose a novel method for modelling temporal shape discontinuities explicitly. Allowable shapes are represented as a union of (learned) bounded regions within a shape space. Discontinuous shape changes are described in terms of transitions between these regions. Transition probabilities are learned from training sequences and stored in a Markov model. In this way we can create 'wormholes' in shape space. Tracking with such models is via an adaptation, of the CONDENSATION algorithm.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115524157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710827
Zoran Duric, E. Rivlin, A. Rosenfeld
Many types of common objects, such as tools and vehicles, usually move in simple ways when they are wielded or driven. The natural axes of the object tend to remain aligned with the local trihedron defined by the object's trajectory. Based on this observation we use a model called Frenet-Serret motion which corresponds to the motion of a moving trihedron along a space curve. Knowing how the Frenet-Serret frame is changing relative to the observer gives us essential information for understanding the object's motion. This is illustrated here for four examples, involving tools (a wrench and a saw) and vehicles (an accelerating van, a turning taxi).
{"title":"Understanding the motions of tools and vehicles","authors":"Zoran Duric, E. Rivlin, A. Rosenfeld","doi":"10.1109/ICCV.1998.710827","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710827","url":null,"abstract":"Many types of common objects, such as tools and vehicles, usually move in simple ways when they are wielded or driven. The natural axes of the object tend to remain aligned with the local trihedron defined by the object's trajectory. Based on this observation we use a model called Frenet-Serret motion which corresponds to the motion of a moving trihedron along a space curve. Knowing how the Frenet-Serret frame is changing relative to the observer gives us essential information for understanding the object's motion. This is illustrated here for four examples, involving tools (a wrench and a saw) and vehicles (an accelerating van, a turning taxi).","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"515 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123082004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710843
Yuan-fang Wang, P. Liang
This paper addresses 3D shape recovery and motion estimation using a realistic camera model with an aperture and a shutter. The spatial blur and temporal smear effects induced by the camera's finite aperture and shutter speed are used for inferring both the shape and motion of the imaged objects.
{"title":"3D shape and motion analysis from image blur and smear: a unified approach","authors":"Yuan-fang Wang, P. Liang","doi":"10.1109/ICCV.1998.710843","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710843","url":null,"abstract":"This paper addresses 3D shape recovery and motion estimation using a realistic camera model with an aperture and a shutter. The spatial blur and temporal smear effects induced by the camera's finite aperture and shutter speed are used for inferring both the shape and motion of the imaged objects.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121810151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710698
Simon Baker, S. Nayar
Conventional video cameras have limited fields of view which make them restrictive for certain applications in computational vision. A catadioptric sensor uses a combination of lenses and mirrors placed in a carefully arranged configuration to capture a much wider field of view. When designing a catadioptric sensor, the shape of the mirror(s) should ideally be selected to ensure that the complete catadioptric system has a single effective viewpoint. In this paper, we derive the complete class of single-lens single-mirror catadioptric sensors which have a single viewpoint and an expression for the spatial resolution of a catadioptric sensor in terms of the resolution of the camera used to construct it. We also include a preliminary analysis of the defocus blur caused by the use of a curved mirror.
{"title":"A theory of catadioptric image formation","authors":"Simon Baker, S. Nayar","doi":"10.1109/ICCV.1998.710698","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710698","url":null,"abstract":"Conventional video cameras have limited fields of view which make them restrictive for certain applications in computational vision. A catadioptric sensor uses a combination of lenses and mirrors placed in a carefully arranged configuration to capture a much wider field of view. When designing a catadioptric sensor, the shape of the mirror(s) should ideally be selected to ensure that the complete catadioptric system has a single effective viewpoint. In this paper, we derive the complete class of single-lens single-mirror catadioptric sensors which have a single viewpoint and an expression for the spatial resolution of a catadioptric sensor in terms of the resolution of the camera used to construct it. We also include a preliminary analysis of the defocus blur caused by the use of a curved mirror.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122103909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710707
M. Isard, A. Blake
There is considerable interest in the computer vision community in representing and modelling motion. Motion models are used as predictors to increase the robustness and accuracy of visual trackers, and as classifiers for gesture recognition. This paper presents a significant development of random sampling methods to allow automatic switching between multiple motion models as a natural extension of the tracking process. The Bayesian mixed-state framework is described in its generality, and the example of a bouncing ball is used to demonstrate that a mixed-state model can significantly improve tracking performance in heavy clutter. The relevance of the approach to the problem of gesture recognition is then investigated using a tracker which is able to follow the natural drawing action of a hand holding a pen, and switches state according to the hand's motion.
{"title":"A mixed-state condensation tracker with automatic model-switching","authors":"M. Isard, A. Blake","doi":"10.1109/ICCV.1998.710707","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710707","url":null,"abstract":"There is considerable interest in the computer vision community in representing and modelling motion. Motion models are used as predictors to increase the robustness and accuracy of visual trackers, and as classifiers for gesture recognition. This paper presents a significant development of random sampling methods to allow automatic switching between multiple motion models as a natural extension of the tracking process. The Bayesian mixed-state framework is described in its generality, and the example of a bouncing ball is used to demonstrate that a mixed-state model can significantly improve tracking performance in heavy clutter. The relevance of the approach to the problem of gesture recognition is then investigated using a tracker which is able to follow the natural drawing action of a hand holding a pen, and switches state according to the hand's motion.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124227177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710798
P. Torr, Andrew Zisserman
A new method is presented for robustly estimating multiple view relations from image point correspondences. There are three new contributions, the first is a general purpose method of parametrizing these relations using point correspondences. The second contribution is the formulation of a common Maximum Likelihood Estimate (MLE) for each of the multiple view relations. The parametrization facilitates a constrained optimization to obtain this MLE. The third contribution is a new robust algorithm, MLESAC, for obtaining the point correspondences. The method is general and its use is illustrated for the estimation of fundamental matrices, image to image homographies and quadratic transformations. Results are given for both synthetic and real images. It is demonstrated that the method gives results equal or superior to previous approaches.
{"title":"Robust computation and parametrization of multiple view relations","authors":"P. Torr, Andrew Zisserman","doi":"10.1109/ICCV.1998.710798","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710798","url":null,"abstract":"A new method is presented for robustly estimating multiple view relations from image point correspondences. There are three new contributions, the first is a general purpose method of parametrizing these relations using point correspondences. The second contribution is the formulation of a common Maximum Likelihood Estimate (MLE) for each of the multiple view relations. The parametrization facilitates a constrained optimization to obtain this MLE. The third contribution is a new robust algorithm, MLESAC, for obtaining the point correspondences. The method is general and its use is illustrated for the estimation of fundamental matrices, image to image homographies and quadratic transformations. Results are given for both synthetic and real images. It is demonstrated that the method gives results equal or superior to previous approaches.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129934649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710811
R. Sára, R. Bajcsy
We address the problem of automatically reconstructing m-manifolds of unknown topology from unorganized points in metric p-spaces obtained from a noisy measurement process . The point set is first approximated by a collection of oriented primitive fuzzy sets over a range of resolutions. Hierarchical multiresolution representation is then computed based on the relation of relative containment defined on the collection. Finally, manifold structure is recovered by establishing connectivity between these primitives based on proximity, compatibility of position and orientation and local topological constraints. The method has been successfully applied to the problem of surface reconstruction from polynocular-stereo data with many outliers.
{"title":"Fish-scales: representing fuzzy manifolds","authors":"R. Sára, R. Bajcsy","doi":"10.1109/ICCV.1998.710811","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710811","url":null,"abstract":"We address the problem of automatically reconstructing m-manifolds of unknown topology from unorganized points in metric p-spaces obtained from a noisy measurement process . The point set is first approximated by a collection of oriented primitive fuzzy sets over a range of resolutions. Hierarchical multiresolution representation is then computed based on the relation of relative containment defined on the collection. Finally, manifold structure is recovered by establishing connectivity between these primitives based on proximity, compatibility of position and orientation and local topological constraints. The method has been successfully applied to the problem of surface reconstruction from polynocular-stereo data with many outliers.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129334236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710835
S. Fejes, L. Davis
The dimensionality of visual motion analysis can be reduced by analyzing projections of flow vector fields. In contrast to motion vector fields, these projections exhibit simple geometric properties which are invariant to the scene structure and depend only on the camera motion. Using these properties, structure and motion can be either completely or partially decoupled. We estimate motion parameters from projections of flow fields by using robust techniques, implemented an a reclusive observer model. The model is applicable to general camera motion and to large field of view and requires no point correspondence. We demonstrate our projection method on the problem of detecting independently moving objects from a moving camera. Using the projection approach, the problem can be reduced to a one-dimensional optimization process which involves robust line-fitting and outlier detection. Instantaneous detection measurements are integrated temporally using tracking and spatially applying grouping of coherently moving points.
{"title":"What can projections of flow fields tell us about the visual motion","authors":"S. Fejes, L. Davis","doi":"10.1109/ICCV.1998.710835","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710835","url":null,"abstract":"The dimensionality of visual motion analysis can be reduced by analyzing projections of flow vector fields. In contrast to motion vector fields, these projections exhibit simple geometric properties which are invariant to the scene structure and depend only on the camera motion. Using these properties, structure and motion can be either completely or partially decoupled. We estimate motion parameters from projections of flow fields by using robust techniques, implemented an a reclusive observer model. The model is applicable to general camera motion and to large field of view and requires no point correspondence. We demonstrate our projection method on the problem of detecting independently moving objects from a moving camera. Using the projection approach, the problem can be reduced to a one-dimensional optimization process which involves robust line-fitting and outlier detection. Instantaneous detection measurements are integrated temporally using tracking and spatially applying grouping of coherently moving points.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124644813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710859
N. Paragios, R. Deriche
This paper presents a framework for detecting and tracking moving objects in a sequence of images. Using a statistical approach, where the inter-frame difference is modeled by a mixture of two Laplacian or Gaussian distributions, and an energy minimization based approach, we reformulate the motion detection and tracking problem as a front propagation problem. The Euler-Lagrange equation of the designed energy functional is first derived and the flow minimizing the energy is then obtained. Following the work by Caselles et al. (1995) and Malladi et al. (1995), the contours to be detected and tracked are modeled as geodesic active contours evolving toward the minimum of the designed energy, under the influence of internal and external image dependent forces. Using the level set formulation scheme of Osher and Sethian (1988), complex curves can be detected and tracked and topological changes for the evolving curves are naturally managed. To reduce the computational cost required by a direct implementation, of the formulation scheme of Osher and Sethian (1988), a new approach exploiting aspects from the classical narrow band and fast marching methods is proposed and favorably compared to them. In order to further reduce the CPU time, a multi-scale approach has also been considered. Very promising experimental results are provided using real video sequences.
本文提出了一种在图像序列中检测和跟踪运动目标的框架。使用统计方法,其中帧间差异由两个拉普拉斯或高斯分布的混合建模,以及基于能量最小化的方法,我们将运动检测和跟踪问题重新表述为前传播问题。首先推导了设计能量泛函的欧拉-拉格朗日方程,得到了能量最小的流动。在Caselles et al.(1995)和Malladi et al.(1995)的工作之后,在内外部图像依赖力的影响下,将待检测和跟踪的轮廓建模为向设计能量最小演化的测地线活动轮廓。使用Osher和Sethian(1988)的水平集表述方案,可以检测和跟踪复杂的曲线,并且可以自然地管理进化曲线的拓扑变化。为了减少直接实现Osher和Sethian(1988)的公式方案所需的计算成本,提出了一种利用经典窄带和快速行进方法的新方法,并与它们进行了比较。为了进一步减少CPU时间,还考虑了多尺度方法。利用真实的视频序列,得到了很有希望的实验结果。
{"title":"A PDE-based level-set approach for detection and tracking of moving objects","authors":"N. Paragios, R. Deriche","doi":"10.1109/ICCV.1998.710859","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710859","url":null,"abstract":"This paper presents a framework for detecting and tracking moving objects in a sequence of images. Using a statistical approach, where the inter-frame difference is modeled by a mixture of two Laplacian or Gaussian distributions, and an energy minimization based approach, we reformulate the motion detection and tracking problem as a front propagation problem. The Euler-Lagrange equation of the designed energy functional is first derived and the flow minimizing the energy is then obtained. Following the work by Caselles et al. (1995) and Malladi et al. (1995), the contours to be detected and tracked are modeled as geodesic active contours evolving toward the minimum of the designed energy, under the influence of internal and external image dependent forces. Using the level set formulation scheme of Osher and Sethian (1988), complex curves can be detected and tracked and topological changes for the evolving curves are naturally managed. To reduce the computational cost required by a direct implementation, of the formulation scheme of Osher and Sethian (1988), a new approach exploiting aspects from the classical narrow band and fast marching methods is proposed and favorably compared to them. In order to further reduce the CPU time, a multi-scale approach has also been considered. Very promising experimental results are provided using real video sequences.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130159106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-04DOI: 10.1109/ICCV.1998.710806
Long Quan, Zhong-Dan Lan
The determination of the position and the orientation of the camera from the known correspondences of the reference points and the image points is known as the problem of pose estimation in computer vision or space resection in photogrammetry. It is well known that using 3 corresponding points has at most 4 solutions. Less appears to be known about the cases of 4 and 5 points. In this paper, we describe linear solutions that always give the unique solution to 4-point and 5-point pose determination for the reference points not lying on the critical configurations. The same linear method can also be extended to any n/spl ges/5 points. The robustness and accuracy of the method are experimented both on simulated and real images.
{"title":"Linear N/spl ges/4-point pose determination","authors":"Long Quan, Zhong-Dan Lan","doi":"10.1109/ICCV.1998.710806","DOIUrl":"https://doi.org/10.1109/ICCV.1998.710806","url":null,"abstract":"The determination of the position and the orientation of the camera from the known correspondences of the reference points and the image points is known as the problem of pose estimation in computer vision or space resection in photogrammetry. It is well known that using 3 corresponding points has at most 4 solutions. Less appears to be known about the cases of 4 and 5 points. In this paper, we describe linear solutions that always give the unique solution to 4-point and 5-point pose determination for the reference points not lying on the critical configurations. The same linear method can also be extended to any n/spl ges/5 points. The robustness and accuracy of the method are experimented both on simulated and real images.","PeriodicalId":270671,"journal":{"name":"Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127689666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}