We present a technique for guiding vergence movements for an active stereo camera system and for calculating dense disparity maps. Both processes are described in the same theoretical framework based on phase differences in complex Gabor filter responses, modeling receptive field properties in the visual cortex. While the camera movements are computed with input images of coarse spatial resolution, the disparity map calculation uses a finer resolution in the scale space. The correspondence problem is solved implicitly by restricting the disparity range around zero disparity (Panum′s area in the human visual system). The vergence process is interpreted as a mechanism to minimize global disparity, thereby setting a 3D region of interest for subsequent disparity detection. The disparity map represents smaller local disparities as an important cue for depth perception. Experimental data for the integrated performance of vergence in natural scenes followed by disparity map calculations are presented.
{"title":"Phase-Based Binocular Vergence Control and Depth Reconstruction Using Active Vision","authors":"Theimer W.M., Mallot H.A.","doi":"10.1006/ciun.1994.1061","DOIUrl":"10.1006/ciun.1994.1061","url":null,"abstract":"<div><p>We present a technique for guiding vergence movements for an active stereo camera system and for calculating dense disparity maps. Both processes are described in the same theoretical framework based on phase differences in complex Gabor filter responses, modeling receptive field properties in the visual cortex. While the camera movements are computed with input images of coarse spatial resolution, the disparity map calculation uses a finer resolution in the scale space. The correspondence problem is solved implicitly by restricting the disparity range around zero disparity (Panum′s area in the human visual system). The vergence process is interpreted as a mechanism to minimize <em>global disparity</em>, thereby setting a 3D region of interest for subsequent disparity detection. The disparity map represents smaller <em>local disparities</em> as an important cue for depth perception. Experimental data for the integrated performance of vergence in natural scenes followed by disparity map calculations are presented.</p></div>","PeriodicalId":100350,"journal":{"name":"CVGIP: Image Understanding","volume":"60 3","pages":"Pages 343-358"},"PeriodicalIF":0.0,"publicationDate":"1994-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1006/ciun.1994.1061","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"51091674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Humans appear to have the ability to infer the three-dimensional shape of an object from its outline or occluding contour. In computer vision, an occluding contour is an important clue in the shape recovery process. So far, only a small family of shapes, in particular, shapes generated by generalized cylinders, and shapes of polyhedrons, has been investigated in computer vision. However, there are many instances where these types of surfaces are not applicable, e.g., a solid with holes or a solid with no planar surfaces. This paper proposes a new concept called the default shape theory which includes the solid of revolution as a special case. A default shape is a function whose domain is a closed contour and whose range is a vector representation of a three-dimensional solid. The application of this new concept to recover the direction of the light source is demonstrated in this paper. Experimental results show that the default-shape-based method has comparable performance as the Pentland light source determination algorithm. This paper also proves the relationship between one particular type of default shape with the solid of revolution. Other possible definitions of default shape are also discussed.
{"title":"Default Shape Theory: With Application to the Computation of the Direction of the Light Source","authors":"Vega O.E., Yang Y.H.","doi":"10.1006/ciun.1994.1058","DOIUrl":"10.1006/ciun.1994.1058","url":null,"abstract":"<div><p>Humans appear to have the ability to infer the three-dimensional shape of an object from its outline or occluding contour. In computer vision, an occluding contour is an important clue in the shape recovery process. So far, only a small family of shapes, in particular, shapes generated by generalized cylinders, and shapes of polyhedrons, has been investigated in computer vision. However, there are many instances where these types of surfaces are not applicable, e.g., a solid with holes or a solid with no planar surfaces. This paper proposes a new concept called the <em>default shape theory</em> which includes the solid of revolution as a special case. A default shape is a function whose domain is a closed contour and whose range is a vector representation of a three-dimensional solid. The application of this new concept to recover the direction of the light source is demonstrated in this paper. Experimental results show that the default-shape-based method has comparable performance as the Pentland light source determination algorithm. This paper also proves the relationship between one particular type of default shape with the solid of revolution. Other possible definitions of default shape are also discussed.</p></div>","PeriodicalId":100350,"journal":{"name":"CVGIP: Image Understanding","volume":"60 3","pages":"Pages 285-299"},"PeriodicalIF":0.0,"publicationDate":"1994-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1006/ciun.1994.1058","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79298436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We address in this paper how to find clusters based on proximity and planar facets based on coplanarity from 3D line segments obtained from stereo. The proposed methods are efficient and have been tested with many real stereo data. These procedures are indispensable in many applications including scene interpretation, object modeling, and object recognition. We show their application to 3D motion determination. We have developed an algorithm based on the hypothesize-and-verify paradigm to register two consecutive 3D frames obtained from stereo and estimate their transformation/motion. By grouping 3D line segments in each frame into clusters and planes, we can reduce effectively the complexity of the hypothesis generation phase.
{"title":"Finding Planes and Clusters of Objects from 3D Line Segments with Application to 3D Motion Determination","authors":"Zhang Z.Y., Faugeras O.D.","doi":"10.1006/ciun.1994.1057","DOIUrl":"10.1006/ciun.1994.1057","url":null,"abstract":"<div><p>We address in this paper how to find clusters based on proximity and planar facets based on coplanarity from 3D line segments obtained from stereo. The proposed methods are efficient and have been tested with many real stereo data. These procedures are indispensable in many applications including scene interpretation, object modeling, and object recognition. We show their application to 3D motion determination. We have developed an algorithm based on the hypothesize-and-verify paradigm to register two consecutive 3D frames obtained from stereo and estimate their transformation/motion. By grouping 3D line segments in each frame into clusters and planes, we can reduce effectively the complexity of the hypothesis generation phase.</p></div>","PeriodicalId":100350,"journal":{"name":"CVGIP: Image Understanding","volume":"60 3","pages":"Pages 267-284"},"PeriodicalIF":0.0,"publicationDate":"1994-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1006/ciun.1994.1057","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90417940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper mathematically analyzes and proposes new solutions for the problem of estimating the camera 3D location and orientation (pose determination) from a matched set of 3D model and 2D image landmark features. Least-squares techniques for line tokens, which minimize both rotation and translation simultaneously, are developed and shown to be far superior to the earlier techniques which solved for rotation first and then translation. However, least-squares techniques fail catastrophically when outliers (or gross errors) are present in the match data. Outliers arise frequently due to incorrect correspondences or gross errors in the 3D model. Robust techniques for pose determination are developed to handle data contaminated by fewer than 50.0% outliers. Finally, the sensitivity of pose determination to incorrect estimates of camera parameters is analyzed. It is shown that for small field of view systems, offsets in the image center do not significantly affect the location of the camera in a world coordinate system. Errors in the focal length significantly affect only the component of translation along the optical axis in the pose computation.
{"title":"Robust Methods for Estimating Pose and a Sensitivity Analysis","authors":"Kumar R., Hanson A.R.","doi":"10.1006/ciun.1994.1060","DOIUrl":"10.1006/ciun.1994.1060","url":null,"abstract":"<div><p>This paper mathematically analyzes and proposes new solutions for the problem of estimating the camera 3D location and orientation (<em>pose determination</em>) from a matched set of 3D model and 2D image landmark features. Least-squares techniques for line tokens, which minimize both rotation and translation simultaneously, are developed and shown to be far superior to the earlier techniques which solved for rotation first and then translation. However, least-squares techniques fail catastrophically when outliers (or gross errors) are present in the match data. Outliers arise frequently due to incorrect correspondences or gross errors in the 3D model. Robust techniques for pose determination are developed to handle data contaminated by fewer than 50.0% outliers. Finally, the sensitivity of pose determination to incorrect estimates of camera parameters is analyzed. It is shown that for small field of view systems, offsets in the image center do not significantly affect the location of the camera in a world coordinate system. Errors in the focal length significantly affect only the component of translation along the optical axis in the pose computation.</p></div>","PeriodicalId":100350,"journal":{"name":"CVGIP: Image Understanding","volume":"60 3","pages":"Pages 313-342"},"PeriodicalIF":0.0,"publicationDate":"1994-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1006/ciun.1994.1060","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87995181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In robot navigation a model of the environment needs to be reconstructed for various applications, including path planning, obstacle avoidance, and determining where the robot is located. Traditionally, the model was acquired using two images (two-frame structure from motion) but the acquired models were unreliable and inaccurate. Recently, research has shifted to using several frames (multiframe structure from motion) instead of just two frames. However, almost none of the reported multiframe algorithms have produced accurate and stable reconstructions for general robot motion. The main reason seems to be that the primary source of error in the reconstruction-the error in the underlying motion-has been mostly ignored. Intuitively, if a reconstruction of the scene is made up of points, this motion error affects each reconstructed point in a systematic way. For example, if the translation of the robot is erroneous in a certain direction, all the reconstructed points would be shifted along the same direction. The contributions of this paper include mathematically isolating the effect of the motion error (as correlations in the structure error) and showing theoretically that these correlations can improve existing multiframe structure from motion techniques. Finally it is shown that new experimental results and previously reported work confirm the theoretical predictions.
{"title":"Refining 3D Reconstructions: A Theoretical and Experimental Study of the Effect of Cross-Correlations","authors":"Thomas J.I., Hanson A., Oliensis J.","doi":"10.1006/ciun.1994.1062","DOIUrl":"https://doi.org/10.1006/ciun.1994.1062","url":null,"abstract":"<div><p>In robot navigation a model of the environment needs to be reconstructed for various applications, including path planning, obstacle avoidance, and determining where the robot is located. Traditionally, the model was acquired using two images (two-frame structure from motion) but the acquired models were unreliable and inaccurate. Recently, research has shifted to using several frames (multiframe structure from motion) instead of just two frames. However, almost none of the reported multiframe algorithms have produced accurate and stable reconstructions for general robot motion. The main reason seems to be that the primary source of error in the reconstruction-the error in the underlying motion-has been mostly ignored. Intuitively, if a reconstruction of the scene is made up of points, this motion error affects each reconstructed point in a systematic way. For example, if the translation of the robot is erroneous in a certain direction, all the reconstructed points would be shifted along the same direction. The contributions of this paper include mathematically isolating the effect of the motion error (as correlations in the structure error) and showing theoretically that these correlations can improve existing multiframe structure from motion techniques. Finally it is shown that new experimental results and previously reported work confirm the theoretical predictions.</p></div>","PeriodicalId":100350,"journal":{"name":"CVGIP: Image Understanding","volume":"60 3","pages":"Pages 359-370"},"PeriodicalIF":0.0,"publicationDate":"1994-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1006/ciun.1994.1062","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91983810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Author Index for Volume 60","authors":"","doi":"10.1006/ciun.1994.1066","DOIUrl":"https://doi.org/10.1006/ciun.1994.1066","url":null,"abstract":"","PeriodicalId":100350,"journal":{"name":"CVGIP: Image Understanding","volume":"60 3","pages":"Page 399"},"PeriodicalIF":0.0,"publicationDate":"1994-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1006/ciun.1994.1066","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92027842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract In robot navigation a model of the environment needs to be reconstructed for various applications, including path planning, obstacle avoidance, and determining where the robot is located. Traditionally, the model was acquired using two images (two-frame structure from motion) but the acquired models were unreliable and inaccurate. Recently, research has shifted to using several frames (multiframe structure from motion) instead of just two frames. However, almost none of the reported multiframe algorithms have produced accurate and stable reconstructions for general robot motion. The main reason seems to be that the primary source of error in the reconstruction-the error in the underlying motion-has been mostly ignored. Intuitively, if a reconstruction of the scene is made up of points, this motion error affects each reconstructed point in a systematic way. For example, if the translation of the robot is erroneous in a certain direction, all the reconstructed points would be shifted along the same direction. The contributions of this paper include mathematically isolating the effect of the motion error (as correlations in the structure error) and showing theoretically that these correlations can improve existing multiframe structure from motion techniques. Finally it is shown that new experimental results and previously reported work confirm the theoretical predictions.
{"title":"Refining 3D reconstruction: a theoretical and experimental study of the effect of cross-correlations","authors":"J. I. Thomas, A. Hanson, J. Oliensis","doi":"10.1006/CIUN.1994.1062","DOIUrl":"https://doi.org/10.1006/CIUN.1994.1062","url":null,"abstract":"Abstract In robot navigation a model of the environment needs to be reconstructed for various applications, including path planning, obstacle avoidance, and determining where the robot is located. Traditionally, the model was acquired using two images (two-frame structure from motion) but the acquired models were unreliable and inaccurate. Recently, research has shifted to using several frames (multiframe structure from motion) instead of just two frames. However, almost none of the reported multiframe algorithms have produced accurate and stable reconstructions for general robot motion. The main reason seems to be that the primary source of error in the reconstruction-the error in the underlying motion-has been mostly ignored. Intuitively, if a reconstruction of the scene is made up of points, this motion error affects each reconstructed point in a systematic way. For example, if the translation of the robot is erroneous in a certain direction, all the reconstructed points would be shifted along the same direction. The contributions of this paper include mathematically isolating the effect of the motion error (as correlations in the structure error) and showing theoretically that these correlations can improve existing multiframe structure from motion techniques. Finally it is shown that new experimental results and previously reported work confirm the theoretical predictions.","PeriodicalId":100350,"journal":{"name":"CVGIP: Image Understanding","volume":"1 1","pages":"359-370"},"PeriodicalIF":0.0,"publicationDate":"1994-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82205917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An iterative algorithm for 3D structure reconstruction from two perspective projections is proposed. The basis of the method is the eight-point algorithm (Longuet-Higgins, Nature 293(10), 1981, 133-135; Tsai and Huang, IEEE Trans. PAMI 6, 1984, 13-27). A drawback of the eight-point algorithm is that it requires at least eight point correspondences. Further, there are certain point configurations for which the algorithm fails. For example, the eight corners of a cube on a quadratic surface passing through the focal points of the cameras form such a degenerate configuration. By combining the eight-point algorithm with an SVD (singular value decomposition) characterization of the so-called E-matrix (Faugeras and Maybank, Internat. J. Comput. Vision 4, 1990, 225-246; Huang and Faugeras, IEEE Trans. PAMI 11, 1989, 1310-1312), the proposed iterative algorithm solves the 3D reconstruction problem even from less than eight points. The algorithm is also free from the artificial degeneracy problem inherent to the eight-point algorithm. The iteration in the algorithm takes place only if the configuration is degenerate or violates the SVD characterization due to measurement error. Otherwise the computation is O(N) as in the eight-point algorithm.
{"title":"3D Structure Reconstruction from Point Correspondences between two Perspective Projections","authors":"Kara A., Wilkes D.M., Kawamura K.","doi":"10.1006/ciun.1994.1065","DOIUrl":"10.1006/ciun.1994.1065","url":null,"abstract":"<div><p>An iterative algorithm for 3D structure reconstruction from two perspective projections is proposed. The basis of the method is the <em>eight-point algorithm</em> (Longuet-Higgins, <em>Nature</em> 293(10), 1981, 133-135; Tsai and Huang, <em>IEEE Trans. PAMI</em> 6, 1984, 13-27). A drawback of the eight-point algorithm is that it requires at least eight point correspondences. Further, there are certain point configurations for which the algorithm fails. For example, the eight corners of a cube on a quadratic surface passing through the focal points of the cameras form such a degenerate configuration. By combining the eight-point algorithm with an SVD (singular value decomposition) characterization of the so-called <em>E</em>-matrix (Faugeras and Maybank, <em>Internat. J. Comput. Vision</em> 4, 1990, 225-246; Huang and Faugeras, <em>IEEE Trans. PAMI</em> 11, 1989, 1310-1312), the proposed iterative algorithm solves the 3D reconstruction problem even from less than eight points. The algorithm is also free from the artificial degeneracy problem inherent to the eight-point algorithm. The iteration in the algorithm takes place only if the configuration is degenerate or violates the SVD characterization due to measurement error. Otherwise the computation is <em>O</em>(<em>N</em>) as in the eight-point algorithm.</p></div>","PeriodicalId":100350,"journal":{"name":"CVGIP: Image Understanding","volume":"60 3","pages":"Pages 392-397"},"PeriodicalIF":0.0,"publicationDate":"1994-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1006/ciun.1994.1065","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72611953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A new "computational" formulation of cross ratio is presented with a view to applications to computer vision problems by extending the framework of "computational projective geometry" of Kanatani (Image Understand. 54, 1991, 333-348). As typical examples, we construct procedures for computing the 3-D orientation of a planar shape from its 2-D projection image and the focus of expansion from an image trajectory of a single point by taking advantage of the perspective invariance of cross ratio and "projective coordinates," and the resulting 3-D interpretation of "harmonic range."
{"title":"Computational Cross Ratio for Computer Vision","authors":"Kanatani K.","doi":"10.1006/ciun.1994.1063","DOIUrl":"10.1006/ciun.1994.1063","url":null,"abstract":"<div><p>A new \"computational\" formulation of cross ratio is presented with a view to applications to computer vision problems by extending the framework of \"computational projective geometry\" of Kanatani (<em>Image Understand.</em> 54, 1991, 333-348). As typical examples, we construct procedures for computing the 3-D orientation of a planar shape from its 2-D projection image and the focus of expansion from an image trajectory of a single point by taking advantage of the perspective invariance of cross ratio and \"projective coordinates,\" and the resulting 3-D interpretation of \"harmonic range.\"</p></div>","PeriodicalId":100350,"journal":{"name":"CVGIP: Image Understanding","volume":"60 3","pages":"Pages 371-381"},"PeriodicalIF":0.0,"publicationDate":"1994-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1006/ciun.1994.1063","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82060116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A statistical foundation is given to the problem of hypothesizing and testing geometric properties of image data heuristically derived by Kanatani (CVGIP: Image Understanding54 (1991), 333-348). Points and lines in the image are represented by "N-vectors" and their reliability is evaluated by their "covariance matrices". Under a Gaussian approximation of the distribution, the test takes the form of a χ2 test. Test criteria are explicitly stated for model matching and testing edge groupings, vanishing points, focuses of expansion, and vanishing lines.
{"title":"Statistical Foundation for Hypothesis Testing of Image Data","authors":"Kanatani K.","doi":"10.1006/ciun.1994.1064","DOIUrl":"10.1006/ciun.1994.1064","url":null,"abstract":"<div><p>A statistical foundation is given to the problem of hypothesizing and testing geometric properties of image data heuristically derived by Kanatani (<em>CVGIP: Image Understanding</em><em>54</em> (1991), 333-348). Points and lines in the image are represented by \"N-vectors\" and their reliability is evaluated by their \"covariance matrices\". Under a Gaussian approximation of the distribution, the test takes the form of a χ<sup>2</sup> test. Test criteria are explicitly stated for model matching and testing edge groupings, vanishing points, focuses of expansion, and vanishing lines.</p></div>","PeriodicalId":100350,"journal":{"name":"CVGIP: Image Understanding","volume":"60 3","pages":"Pages 382-391"},"PeriodicalIF":0.0,"publicationDate":"1994-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1006/ciun.1994.1064","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88197717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}