Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711483
Bohyung Han, Jihun Hamm, Jack Sim
In automatic video summarization, visual summary is constructed typically based on the analysis of low-level features with little consideration of video semantics. However, the contextual and semantic information of a video is marginally related to low-level features in practice although they are useful to compute visual similarity between frames. Therefore, we propose a novel video summarization technique, where the semantically important information is extracted from a set of keyframes given by human and the summary of a video is constructed based on the automatic temporal segmentation using the analysis of inter-frame similarity to the keyframes. Toward this goal, we model a video sequence with a dissimilarity matrix based on bidirectional similarity measure between every pair of frames, and subsequently characterize the structure of the video by a nonlinear manifold embedding. Then, we formulate video summarization as a variant of the 0–1 knapsack problem, which is solved by dynamic programming efficiently. The effectiveness of our algorithm is illustrated quantitatively and qualitatively using realistic videos collected from YouTube.
{"title":"Personalized video summarization with human in the loop","authors":"Bohyung Han, Jihun Hamm, Jack Sim","doi":"10.1109/WACV.2011.5711483","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711483","url":null,"abstract":"In automatic video summarization, visual summary is constructed typically based on the analysis of low-level features with little consideration of video semantics. However, the contextual and semantic information of a video is marginally related to low-level features in practice although they are useful to compute visual similarity between frames. Therefore, we propose a novel video summarization technique, where the semantically important information is extracted from a set of keyframes given by human and the summary of a video is constructed based on the automatic temporal segmentation using the analysis of inter-frame similarity to the keyframes. Toward this goal, we model a video sequence with a dissimilarity matrix based on bidirectional similarity measure between every pair of frames, and subsequently characterize the structure of the video by a nonlinear manifold embedding. Then, we formulate video summarization as a variant of the 0–1 knapsack problem, which is solved by dynamic programming efficiently. The effectiveness of our algorithm is illustrated quantitatively and qualitatively using realistic videos collected from YouTube.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124903179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711507
A. Dantcheva, N. Erdogmus, J. Dugelay
This work studies eye color as a soft biometric trait and provides a novel insight about the influence of pertinent factors in this context, like color spaces, illumination and presence of glasses. A motivation for the paper is the fact that the human iris color is an essential facial trait for Caucasians, which can be employed in iris pattern recognition systems for pruning the search or in soft biometrics systems for person re-identification. Towards studying iris color as a soft biometric trait, we consider a system for automatic detection of eye color, based on standard facial images. The system entails automatic iris localization, followed by classification based on Gaussian Mixture Models with Expectation Maximization. We finally provide related detection results on the UBIRIS2 database employable in a real time eye color detection system.
{"title":"On the reliability of eye color as a soft biometric trait","authors":"A. Dantcheva, N. Erdogmus, J. Dugelay","doi":"10.1109/WACV.2011.5711507","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711507","url":null,"abstract":"This work studies eye color as a soft biometric trait and provides a novel insight about the influence of pertinent factors in this context, like color spaces, illumination and presence of glasses. A motivation for the paper is the fact that the human iris color is an essential facial trait for Caucasians, which can be employed in iris pattern recognition systems for pruning the search or in soft biometrics systems for person re-identification. Towards studying iris color as a soft biometric trait, we consider a system for automatic detection of eye color, based on standard facial images. The system entails automatic iris localization, followed by classification based on Gaussian Mixture Models with Expectation Maximization. We finally provide related detection results on the UBIRIS2 database employable in a real time eye color detection system.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125310425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711570
L. Zappella, A. D. Bue, X. Lladó, J. Salvi
This paper presents a novel approach to simultaneously compute the motion segmentation and the 3D reconstruction of a set of 2D points extracted from an image sequence. Starting from an initial segmentation, our method proposes an iterative procedure that corrects the misclassified points while reconstructing the 3D scene, which is composed of objects that move independently. This optimization procedure is made by considering two well-known principles: firstly, in multi-body Structure from Motion the matrix describing the 3D shape is sparse, secondly, the segmented 2D points must give a valid 3D reconstruction given the rotational metric constraints. Our formulation results in a bilinear optimization where sparsity and metric constraints are enforced at each iteration of the algorithm. The final result is the corrected segmentation, the 3D structure of the moving objects and an orthographic camera matrix for each motion and each frame. Results are shown on synthetic sequences and a preliminary application on real sequences of the Hopkins 155 database is presented.
{"title":"Simultaneous motion segmentation and Structure from Motion","authors":"L. Zappella, A. D. Bue, X. Lladó, J. Salvi","doi":"10.1109/WACV.2011.5711570","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711570","url":null,"abstract":"This paper presents a novel approach to simultaneously compute the motion segmentation and the 3D reconstruction of a set of 2D points extracted from an image sequence. Starting from an initial segmentation, our method proposes an iterative procedure that corrects the misclassified points while reconstructing the 3D scene, which is composed of objects that move independently. This optimization procedure is made by considering two well-known principles: firstly, in multi-body Structure from Motion the matrix describing the 3D shape is sparse, secondly, the segmented 2D points must give a valid 3D reconstruction given the rotational metric constraints. Our formulation results in a bilinear optimization where sparsity and metric constraints are enforced at each iteration of the algorithm. The final result is the corrected segmentation, the 3D structure of the moving objects and an orthographic camera matrix for each motion and each frame. Results are shown on synthetic sequences and a preliminary application on real sequences of the Hopkins 155 database is presented.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"509 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115889128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711489
Mahmoud Bassiouny, M. El-Saban
Object instance matching is a cornerstone component in many computer vision applications such as image search, augmented reality and unsupervised tagging. The common flow in these applications is to take an input image and match it against a database of previously enrolled images of objects of interest. This is usually difficult as one needs to capture an image corresponding to an object view already present in the database, especially in the case of 3D objects with high curvature where light reflection, viewpoint change and partial occlusion can significantly alter the appearance of the captured image. Rather than relying on having numerous views of each object in the database, we propose an alternative method of capturing a short video sequence scanning a certain object and utilize information from multiple frames to improve the chance of a successful match in the database. The matching step combines local features from a number of frames and incrementally forms a point cloud describing the object. We conduct experiments on a database of different object types showing promising matching results on both a privately collected set of videos and those freely available on the Web such that on YouTube. Increase in accuracy of up to 20% over the baseline of using a single frame matching is shown to be possible.
{"title":"Object matching using feature aggregation over a frame sequence","authors":"Mahmoud Bassiouny, M. El-Saban","doi":"10.1109/WACV.2011.5711489","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711489","url":null,"abstract":"Object instance matching is a cornerstone component in many computer vision applications such as image search, augmented reality and unsupervised tagging. The common flow in these applications is to take an input image and match it against a database of previously enrolled images of objects of interest. This is usually difficult as one needs to capture an image corresponding to an object view already present in the database, especially in the case of 3D objects with high curvature where light reflection, viewpoint change and partial occlusion can significantly alter the appearance of the captured image. Rather than relying on having numerous views of each object in the database, we propose an alternative method of capturing a short video sequence scanning a certain object and utilize information from multiple frames to improve the chance of a successful match in the database. The matching step combines local features from a number of frames and incrementally forms a point cloud describing the object. We conduct experiments on a database of different object types showing promising matching results on both a privately collected set of videos and those freely available on the Web such that on YouTube. Increase in accuracy of up to 20% over the baseline of using a single frame matching is shown to be possible.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115628406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711478
Quan Wang, Wei Guan, Suya You
Finding corresponding image points is a challenging computer vision problem, especially for confusing scenes with surfaces of low textures or repeated patterns. Despite the well-known challenges of extracting conceptually meaningful high-level matching primitives, many recent works describe high-level image features such as edge groups, lines and regions, which are more distinctive than traditional local appearance based features, to tackle such difficult scenes. In this paper, we propose a different and more general approach, which treats the image matching problem as a recognition problem of spatially related image patch sets. We construct augmented semi-global descriptors (ordinal codes) based on subsets of scale and orientation invariant local keypoint descriptors. Tied ranking problem of ordinal codes is handled by increasingly keypoint sampling around image patch sets. Finally, similarities of augmented features are measured using Spearman correlation coefficient. Our proposed method is compatible with a large range of existing local image descriptors. Experimental results based on standard benchmark datasets and SURF descriptors have demonstrated its distinctiveness and effectiveness.
{"title":"Augmented distinctive features for efficient image matching","authors":"Quan Wang, Wei Guan, Suya You","doi":"10.1109/WACV.2011.5711478","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711478","url":null,"abstract":"Finding corresponding image points is a challenging computer vision problem, especially for confusing scenes with surfaces of low textures or repeated patterns. Despite the well-known challenges of extracting conceptually meaningful high-level matching primitives, many recent works describe high-level image features such as edge groups, lines and regions, which are more distinctive than traditional local appearance based features, to tackle such difficult scenes. In this paper, we propose a different and more general approach, which treats the image matching problem as a recognition problem of spatially related image patch sets. We construct augmented semi-global descriptors (ordinal codes) based on subsets of scale and orientation invariant local keypoint descriptors. Tied ranking problem of ordinal codes is handled by increasingly keypoint sampling around image patch sets. Finally, similarities of augmented features are measured using Spearman correlation coefficient. Our proposed method is compatible with a large range of existing local image descriptors. Experimental results based on standard benchmark datasets and SURF descriptors have demonstrated its distinctiveness and effectiveness.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117032881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711482
Bo Li, H. Johan
2D sketch-3D model alignment is important for many applications such as sketch-based 3D model retrieval, sketch-based 3D modeling as well as model-based vision and recognition. In this paper, we propose a 2D sketch-3D model alignment algorithm using view context and shape context matching. A sketch consists of a set of curves. A 3D model is typically a 3D triangle mesh. It includes two main steps: precomputation and actual alignment. In the precomputation, we extract the view context features of a set of sample views for a 3D model to be aligned. To speed up the precomputation, two computationally efficient and rotation-invariant features, Zernike moments and Fourier descriptors are used to represent a view. In the actual alignment, we prune most sample views which are dissimilar to the sketch very quickly based on their view context similarities. Finally, to find an approximate pose, we only compare the sketch with a very small portion (e.g. 5% in our experiments) of the sample views based on shape context matching. Experiments on two types of datasets show that the algorithm can align 2D sketches with 3D models approximately.
{"title":"View context based 2D sketch-3D model alignment","authors":"Bo Li, H. Johan","doi":"10.1109/WACV.2011.5711482","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711482","url":null,"abstract":"2D sketch-3D model alignment is important for many applications such as sketch-based 3D model retrieval, sketch-based 3D modeling as well as model-based vision and recognition. In this paper, we propose a 2D sketch-3D model alignment algorithm using view context and shape context matching. A sketch consists of a set of curves. A 3D model is typically a 3D triangle mesh. It includes two main steps: precomputation and actual alignment. In the precomputation, we extract the view context features of a set of sample views for a 3D model to be aligned. To speed up the precomputation, two computationally efficient and rotation-invariant features, Zernike moments and Fourier descriptors are used to represent a view. In the actual alignment, we prune most sample views which are dissimilar to the sketch very quickly based on their view context similarities. Finally, to find an approximate pose, we only compare the sketch with a very small portion (e.g. 5% in our experiments) of the sample views based on shape context matching. Experiments on two types of datasets show that the algorithm can align 2D sketches with 3D models approximately.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121788939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711573
Jonathan Ventura, Tobias Höllerer
In this paper we report an evaluation of keypoint descriptor compression using as little as 16 bits to describe a single keypoint. We use spectral hashing to compress keypoint descriptors, and match them using the Hamming distance. By indexing the keypoints in a binary tree, we can quickly recognize keypoints with a very small database, and efficiently insert new keypoints. Our tests using image datasets with perspective distortion show the method to enable fast keypoint recognition and image retrieval with a small code size, and point towards potential applications for scalable visual SLAM on mobile phones.
{"title":"Fast and scalable keypoint recognition and image retrieval using binary codes","authors":"Jonathan Ventura, Tobias Höllerer","doi":"10.1109/WACV.2011.5711573","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711573","url":null,"abstract":"In this paper we report an evaluation of keypoint descriptor compression using as little as 16 bits to describe a single keypoint. We use spectral hashing to compress keypoint descriptors, and match them using the Hamming distance. By indexing the keypoints in a binary tree, we can quickly recognize keypoints with a very small database, and efficiently insert new keypoints. Our tests using image datasets with perspective distortion show the method to enable fast keypoint recognition and image retrieval with a small code size, and point towards potential applications for scalable visual SLAM on mobile phones.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122160219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711547
D. Vaquero, Natasha Gelfand, M. Tico, K. Pulli, M. Turk
All-in-focus imaging is a computational photography technique that produces images free of defocus blur by capturing a stack of images focused at different distances and merging them into a single sharp result. Current approaches assume that images have been captured offline, and that a reasonably powerful computer is available to process them. In contrast, we focus on the problem of how to capture such input stacks in an efficient and scene-adaptive fashion. Inspired by passive autofocus techniques, which select a single best plane of focus in the scene, we propose a method to automatically select a minimal set of images, focused at different depths, such that all objects in a given scene are in focus in at least one image. We aim to minimize both the amount of time spent metering the scene and capturing the images, and the total amount of high-resolution data that is captured. The algorithm first analyzes a set of low-resolution sharpness measurements of the scene while continuously varying the focus distance of the lens. From these measurements, we estimate the final lens positions required to capture all objects in the scene in acceptable focus. We demonstrate the use of our technique in a mobile computational photography scenario, where it is essential to minimize image capture time (as the camera is typically handheld) and processing time (as the computation and energy resources are limited).
{"title":"Generalized autofocus","authors":"D. Vaquero, Natasha Gelfand, M. Tico, K. Pulli, M. Turk","doi":"10.1109/WACV.2011.5711547","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711547","url":null,"abstract":"All-in-focus imaging is a computational photography technique that produces images free of defocus blur by capturing a stack of images focused at different distances and merging them into a single sharp result. Current approaches assume that images have been captured offline, and that a reasonably powerful computer is available to process them. In contrast, we focus on the problem of how to capture such input stacks in an efficient and scene-adaptive fashion. Inspired by passive autofocus techniques, which select a single best plane of focus in the scene, we propose a method to automatically select a minimal set of images, focused at different depths, such that all objects in a given scene are in focus in at least one image. We aim to minimize both the amount of time spent metering the scene and capturing the images, and the total amount of high-resolution data that is captured. The algorithm first analyzes a set of low-resolution sharpness measurements of the scene while continuously varying the focus distance of the lens. From these measurements, we estimate the final lens positions required to capture all objects in the scene in acceptable focus. We demonstrate the use of our technique in a mobile computational photography scenario, where it is essential to minimize image capture time (as the camera is typically handheld) and processing time (as the computation and energy resources are limited).","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128163000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711542
L. Ríha, M. Manohar
The connected component labeling is an essential task for detecting moving objects and tracking them in video surveillance application. Since tracking algorithms are designed for real-time applications, efficiencies of the underlying algorithms become critical. In this paper we present a new one-pass algorithm for computing minimal binding rectangles of all the connected components of background foreground segmented video frames (binary data) using GPU accelerator. The given image frame is scanned once in raster scan mode and the background foreground transition information is stored in a directed-graph where each transition is represented by a node. This data structure contains the locations of object edges in every row, and it is used to detect connected components in the image and extract its main features, e.g. bounding box size and location, location of the centroid, real size, etc. Further we use GPU acceleration to speed up feature extraction from the image to a directed graph from which minimal bounding rectangles will be computed subsequently. Also we compare the performance of GPU acceleration (using Tesla C2050 accelerator card) with the performance of multi-core (up 24 cores) general purpose CPU implementation of the algorithm.
{"title":"GPU accelerated one-pass algorithm for computing minimal rectangles of connected components","authors":"L. Ríha, M. Manohar","doi":"10.1109/WACV.2011.5711542","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711542","url":null,"abstract":"The connected component labeling is an essential task for detecting moving objects and tracking them in video surveillance application. Since tracking algorithms are designed for real-time applications, efficiencies of the underlying algorithms become critical. In this paper we present a new one-pass algorithm for computing minimal binding rectangles of all the connected components of background foreground segmented video frames (binary data) using GPU accelerator. The given image frame is scanned once in raster scan mode and the background foreground transition information is stored in a directed-graph where each transition is represented by a node. This data structure contains the locations of object edges in every row, and it is used to detect connected components in the image and extract its main features, e.g. bounding box size and location, location of the centroid, real size, etc. Further we use GPU acceleration to speed up feature extraction from the image to a directed graph from which minimal bounding rectangles will be computed subsequently. Also we compare the performance of GPU acceleration (using Tesla C2050 accelerator card) with the performance of multi-core (up 24 cores) general purpose CPU implementation of the algorithm.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132043779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711521
Jens Puwein, R. Ziegler, Julia Vogel, M. Pollefeys
Real-world camera networks are often characterized by very wide baselines covering a wide range of viewpoints. We describe a method not only calibrating each camera sequence added to the system automatically, but also taking advantage of multi-view correspondences to make the entire calibration framework more robust. Novel camera sequences can be seamlessly integrated into the system at any time, adding to the robustness of future computations. One of the challenges consists in establishing correspondences between cameras. Initializing a bag of features from a calibrated frame, correspondences between cameras are established in a two-step procedure. First, affine invariant features of camera sequences are warped into a common coordinate frame and a coarse matching is obtained between the collected features and the incrementally built and updated bag of features. This allows us to warp images to a common view. Second, scale invariant features are extracted from the warped images. This leads to both more numerous and more accurate correspondences. Finally, the parameters are optimized in a bundle adjustment. Adding the feature descriptors and the optimized 3D positions to the bag of features, we obtain a feature-based scene abstraction, allowing for the calibration of novel sequences and the correction of drift in single-view calibration tracking. We demonstrate that our approach can deal with wide baselines. Novel sequences can seamlessly be integrated in the calibration framework.
{"title":"Robust multi-view camera calibration for wide-baseline camera networks","authors":"Jens Puwein, R. Ziegler, Julia Vogel, M. Pollefeys","doi":"10.1109/WACV.2011.5711521","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711521","url":null,"abstract":"Real-world camera networks are often characterized by very wide baselines covering a wide range of viewpoints. We describe a method not only calibrating each camera sequence added to the system automatically, but also taking advantage of multi-view correspondences to make the entire calibration framework more robust. Novel camera sequences can be seamlessly integrated into the system at any time, adding to the robustness of future computations. One of the challenges consists in establishing correspondences between cameras. Initializing a bag of features from a calibrated frame, correspondences between cameras are established in a two-step procedure. First, affine invariant features of camera sequences are warped into a common coordinate frame and a coarse matching is obtained between the collected features and the incrementally built and updated bag of features. This allows us to warp images to a common view. Second, scale invariant features are extracted from the warped images. This leads to both more numerous and more accurate correspondences. Finally, the parameters are optimized in a bundle adjustment. Adding the feature descriptors and the optimized 3D positions to the bag of features, we obtain a feature-based scene abstraction, allowing for the calibration of novel sequences and the correction of drift in single-view calibration tracking. We demonstrate that our approach can deal with wide baselines. Novel sequences can seamlessly be integrated in the calibration framework.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130305231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}