N. Campbell, George Vogiatzis, Carlos Hernández, R. Cipolla
This paper addresses the problem of automatically obtaining the object/background segmentation of a rigid 3D object observed in a set of images that have been calibrated for camera pose and intrinsics. Such segmentations can be used to obtain a shape representation of a potentially texture-less object by computing a visual hull. We propose an automatic approach where the object to be segmented is identified by the pose of the cameras instead of user input such as 2D bounding rectangles or brush-strokes. The key behind our method is a pairwise MRF framework that combines (a) foreground/background appearance models, (b) epipolar constraints and (c) weak stereo correspondence into a single segmentation cost function that can be efficiently solved by Graph-cuts. The segmentation thus obtained is further improved using silhouette coherency and then used to update the foreground/background appearance models which are fed into the next Graph-cut computation. These two steps are iterated until segmentation convergences. Our method can automatically provide a 3D surface representation even in texture-less scenes where MVS methods might fail. Furthermore, it confers improved performance in images where the object is not readily separable from the background in colour space, an area that previous segmentation approaches have found challenging.
{"title":"Automatic Object Segmentation from Calibrated Images","authors":"N. Campbell, George Vogiatzis, Carlos Hernández, R. Cipolla","doi":"10.1109/CVMP.2011.21","DOIUrl":"https://doi.org/10.1109/CVMP.2011.21","url":null,"abstract":"This paper addresses the problem of automatically obtaining the object/background segmentation of a rigid 3D object observed in a set of images that have been calibrated for camera pose and intrinsics. Such segmentations can be used to obtain a shape representation of a potentially texture-less object by computing a visual hull. We propose an automatic approach where the object to be segmented is identified by the pose of the cameras instead of user input such as 2D bounding rectangles or brush-strokes. The key behind our method is a pairwise MRF framework that combines (a) foreground/background appearance models, (b) epipolar constraints and (c) weak stereo correspondence into a single segmentation cost function that can be efficiently solved by Graph-cuts. The segmentation thus obtained is further improved using silhouette coherency and then used to update the foreground/background appearance models which are fed into the next Graph-cut computation. These two steps are iterated until segmentation convergences. Our method can automatically provide a 3D surface representation even in texture-less scenes where MVS methods might fail. Furthermore, it confers improved performance in images where the object is not readily separable from the background in colour space, an area that previous segmentation approaches have found challenging.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116279558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel dense motion capture technique which creates a temporally consistent mesh sequence from several calibrated and synchronised video sequences of a dynamic object. A surface patch model based on the topology of a user-specified reference mesh is employed to track the surface of the object over time. Multi-view 3D matching of surface patches using a novel cooperative minimisation approach provides initial motion estimates which are robust to large, rapid non-rigid changes of shape. A Laplacian deformation subsequently regularises the motion of the whole mesh using the weighted vertex displacements as soft constraints. An unregistered surface geometry independently reconstructed at each frame is incorporated as a shape prior to improve the quality of tracking. The method is evaluated in a challenging scenario of facial performance capture. Results demonstrate accurate tracking of fast, complex expressions over long sequences without use of markers or a pattern.
{"title":"Cooperative patch-based 3D surface tracking","authors":"M. Klaudiny, A. Hilton","doi":"10.1109/CVMP.2011.14","DOIUrl":"https://doi.org/10.1109/CVMP.2011.14","url":null,"abstract":"This paper presents a novel dense motion capture technique which creates a temporally consistent mesh sequence from several calibrated and synchronised video sequences of a dynamic object. A surface patch model based on the topology of a user-specified reference mesh is employed to track the surface of the object over time. Multi-view 3D matching of surface patches using a novel cooperative minimisation approach provides initial motion estimates which are robust to large, rapid non-rigid changes of shape. A Laplacian deformation subsequently regularises the motion of the whole mesh using the weighted vertex displacements as soft constraints. An unregistered surface geometry independently reconstructed at each frame is incorporated as a shape prior to improve the quality of tracking. The method is evaluated in a challenging scenario of facial performance capture. Results demonstrate accurate tracking of fast, complex expressions over long sequences without use of markers or a pattern.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129878895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a novel algorithm for automated video production based on content ranking. The proposed algorithm generates videos by performing camera selection while minimizing the number of inter-camera switch. We model the problem as a finite horizon Partially Observable Markov Decision Process over temporal windows and we use a multivariate Gaussian distribution to represent the content-quality score for each camera. The performance of the proposed approach is demonstrated on a multi-camera setup of fixed cameras with partially overlapping fields of view. Subjective experiments based on the Turing test confirmed the quality of the automatically produced videos. The proposed approach is also compared with recent methods based on Recursive Decision and on Dynamic Bayesian Networks and its results outperform both methods.
{"title":"Multi-camera Scheduling for Video Production","authors":"F. Daniyal, A. Cavallaro","doi":"10.1109/CVMP.2011.8","DOIUrl":"https://doi.org/10.1109/CVMP.2011.8","url":null,"abstract":"We present a novel algorithm for automated video production based on content ranking. The proposed algorithm generates videos by performing camera selection while minimizing the number of inter-camera switch. We model the problem as a finite horizon Partially Observable Markov Decision Process over temporal windows and we use a multivariate Gaussian distribution to represent the content-quality score for each camera. The performance of the proposed approach is demonstrated on a multi-camera setup of fixed cameras with partially overlapping fields of view. Subjective experiments based on the Turing test confirmed the quality of the automatically produced videos. The proposed approach is also compared with recent methods based on Recursive Decision and on Dynamic Bayesian Networks and its results outperform both methods.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128694956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Finding dense correspondences between two images is a well-researched but still unsolved problem. For various tasks in computer graphics, e.g. image interpolation, obtaining plausible correspondences is a vital component. We present an interactive tool that allows the user to modify and correct dense correspondence maps between two given images. Incorporating state-of-the art algorithms in image segmentation, correspondence estimation and optical flow, our tool assists the user in selecting and correcting mismatched correspondences.
{"title":"Flowlab - An Interactive Tool for Editing Dense Image Correspondences","authors":"F. Klose, K. Ruhl, C. Lipski, M. Magnor","doi":"10.1109/CVMP.2011.13","DOIUrl":"https://doi.org/10.1109/CVMP.2011.13","url":null,"abstract":"Finding dense correspondences between two images is a well-researched but still unsolved problem. For various tasks in computer graphics, e.g. image interpolation, obtaining plausible correspondences is a vital component. We present an interactive tool that allows the user to modify and correct dense correspondence maps between two given images. Incorporating state-of-the art algorithms in image segmentation, correspondence estimation and optical flow, our tool assists the user in selecting and correcting mismatched correspondences.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116132908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Color difference between views of a stereo pair is a challenging problem. Applications such as compression of stereo image demands the compensation of color differences which is typically done by methods called color mapping. Color mapping is based on feature correspondences. From these feature correspondences, color correspondences are generated which is ultimately used for the color mapping model. This paper focuses on detection of outliers in the feature correspondences. We propose novel iterative outlier removal method which exploits the neighborhood color information of the feature correspondences. From the analysis of our experimental results and comparing with existing methods we conclude by arguing that spatial color neighborhood information around the feature correspondences along with an iterative color mapping can detect outliers in general and can bring a robust color correction.
{"title":"Robust Color Correction for Stereo","authors":"H. Faridul, J. Stauder, A. Trémeau","doi":"10.1109/CVMP.2011.18","DOIUrl":"https://doi.org/10.1109/CVMP.2011.18","url":null,"abstract":"Color difference between views of a stereo pair is a challenging problem. Applications such as compression of stereo image demands the compensation of color differences which is typically done by methods called color mapping. Color mapping is based on feature correspondences. From these feature correspondences, color correspondences are generated which is ultimately used for the color mapping model. This paper focuses on detection of outliers in the feature correspondences. We propose novel iterative outlier removal method which exploits the neighborhood color information of the feature correspondences. From the analysis of our experimental results and comparing with existing methods we conclude by arguing that spatial color neighborhood information around the feature correspondences along with an iterative color mapping can detect outliers in general and can bring a robust color correction.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131379349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rene Kaiser, M. Thaler, Andreas Kriechbaum, Hannes Fassold, W. Bailer, Jakub Rosner
For enabling immersive user experiences for interactive TV services and automating camera view selection and framing, knowledge of the location of persons in a scene is essential. We describe an architecture for detecting and tracking persons in high-resolution panoramic video streams, obtained from the Omni Cam, a panoramic camera stitching video streams from 6 HD resolution tiles. We use a CUDA accelerated feature point tracker, a blob detector and a CUDA HOG person detector, which are used for region tracking in each of the tiles before fusing the results for the entire panorama. In this paper we focus on the application of the HOG person detector in real-time and the speedup of the feature point tracker by porting it to NVIDIA's Fermi architecture. Evaluations indicate significant speedup for our feature point tracker implementation, enabling the entire process in a real-time system.
为了实现交互式电视服务的沉浸式用户体验和自动相机视图选择和取景,了解场景中人物的位置至关重要。我们描述了一种用于在高分辨率全景视频流中检测和跟踪人员的架构,该视频流来自Omni Cam, Omni Cam是一种全景相机,从6个高清分辨率块拼接视频流。我们使用CUDA加速特征点跟踪器,blob检测器和CUDA HOG人检测器,在融合整个全景图的结果之前,它们用于每个瓷砖的区域跟踪。本文重点研究了HOG人检测器的实时应用,并将HOG特征点跟踪器移植到NVIDIA的Fermi架构中,提高了HOG特征点跟踪器的速度。评估表明我们的特征点跟踪器实现显著加速,使整个过程在实时系统中实现。
{"title":"Real-time Person Tracking in High-resolution Panoramic Video for Automated Broadcast Production","authors":"Rene Kaiser, M. Thaler, Andreas Kriechbaum, Hannes Fassold, W. Bailer, Jakub Rosner","doi":"10.1109/CVMP.2011.9","DOIUrl":"https://doi.org/10.1109/CVMP.2011.9","url":null,"abstract":"For enabling immersive user experiences for interactive TV services and automating camera view selection and framing, knowledge of the location of persons in a scene is essential. We describe an architecture for detecting and tracking persons in high-resolution panoramic video streams, obtained from the Omni Cam, a panoramic camera stitching video streams from 6 HD resolution tiles. We use a CUDA accelerated feature point tracker, a blob detector and a CUDA HOG person detector, which are used for region tracking in each of the tiles before fusing the results for the entire panorama. In this paper we focus on the application of the HOG person detector in real-time and the speedup of the feature point tracker by porting it to NVIDIA's Fermi architecture. Evaluations indicate significant speedup for our feature point tracker implementation, enabling the entire process in a real-time system.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133432035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuan Li, M. Shaw, D. Pickup, D. Cosker, P. Willis, P. Hall
This paper describes an approach for automatically producing convincing water surfaces from video data in real time. Fluids simulation has long been studied in the Computer Graphics literature, but the methods developed are expensive and require input from highly trained artists. In contrast our method is a low cost Computer Vision based solution which requires only a single video as a source. Our output consists of an animated mesh of the water surface captured together with surface velocities and texture maps from the video data. As an example of what can be done with this data, a modified form of video textures is used to create naturalistic infinite transition loops of the captured water surface. We demonstrate our approach over a wide range of inputs, including quiescent lakes, breaking sea waves, and waterfalls. All source video we use are taken from a third-party publicly available database.
{"title":"Realtime Video Based Water Surface Approximation","authors":"Chuan Li, M. Shaw, D. Pickup, D. Cosker, P. Willis, P. Hall","doi":"10.1109/CVMP.2011.19","DOIUrl":"https://doi.org/10.1109/CVMP.2011.19","url":null,"abstract":"This paper describes an approach for automatically producing convincing water surfaces from video data in real time. Fluids simulation has long been studied in the Computer Graphics literature, but the methods developed are expensive and require input from highly trained artists. In contrast our method is a low cost Computer Vision based solution which requires only a single video as a source. Our output consists of an animated mesh of the water surface captured together with surface velocities and texture maps from the video data. As an example of what can be done with this data, a modified form of video textures is used to create naturalistic infinite transition loops of the captured water surface. We demonstrate our approach over a wide range of inputs, including quiescent lakes, breaking sea waves, and waterfalls. All source video we use are taken from a third-party publicly available database.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126112554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper deals with projective shape and motion reconstruction by subspace iterations. A prerequisite of factorization-style algorithms is that all feature points need be observed in all images, a condition which is hardly realistic in real videos. We therefore address the problem of estimating structure and motion considering missing features. The proposed algorithm does not require initialization and uniformly handles all available data. The computed solution is global in the sense that it does not merge partial solutions incrementally or hierarchically. The global cost due to the factorization is further amended by local constraints to regularize and stabilize the estimations. It is shown how both costs can be jointly minimized in the presence of unobserved points. By synthetic and real image sequences with up to $60%$ missing data we demonstrate that our algorithm is accurate and reliable.
{"title":"Projective Reconstruction from Incomplete Trajectories by Global and Local Constraints","authors":"H. Ackermann, B. Rosenhahn","doi":"10.1109/CVMP.2011.15","DOIUrl":"https://doi.org/10.1109/CVMP.2011.15","url":null,"abstract":"The paper deals with projective shape and motion reconstruction by subspace iterations. A prerequisite of factorization-style algorithms is that all feature points need be observed in all images, a condition which is hardly realistic in real videos. We therefore address the problem of estimating structure and motion considering missing features. The proposed algorithm does not require initialization and uniformly handles all available data. The computed solution is global in the sense that it does not merge partial solutions incrementally or hierarchically. The global cost due to the factorization is further amended by local constraints to regularize and stabilize the estimations. It is shown how both costs can be jointly minimized in the presence of unobserved points. By synthetic and real image sequences with up to $60%$ missing data we demonstrate that our algorithm is accurate and reliable.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133613074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The imagination of the online photographic community has recently been sparked by so-called cinema graphs: short, seamlessly looping animated GIF images created from video in which only parts of the image move. These cinema graphs capture the dynamics of one particular region in an image for dramatic effect, and provide the creator with control over what part of a moment to capture. We create a cinema graphs authoring tool combining video motion stabilisation, segmentation, interactive motion selection, motion loop detection and selection, and cinema graph rendering. Our work pushes toward the easy and versatile creation of moments that cannot be represented with still imagery.
{"title":"Towards Moment Imagery: Automatic Cinemagraphs","authors":"J. Tompkin, Fabrizio Pece, K. Subr, J. Kautz","doi":"10.1109/CVMP.2011.16","DOIUrl":"https://doi.org/10.1109/CVMP.2011.16","url":null,"abstract":"The imagination of the online photographic community has recently been sparked by so-called cinema graphs: short, seamlessly looping animated GIF images created from video in which only parts of the image move. These cinema graphs capture the dynamics of one particular region in an image for dramatic effect, and provide the creator with control over what part of a moment to capture. We create a cinema graphs authoring tool combining video motion stabilisation, segmentation, interactive motion selection, motion loop detection and selection, and cinema graph rendering. Our work pushes toward the easy and versatile creation of moments that cannot be represented with still imagery.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132747978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Smolic, Steven Poulakos, Simon Heinzle, P. Greisen, Manuel Lang, A. Sorkine-Hornung, Miquel A. Farre, N. Stefanoski, Oliver Wang, L. Schnyder, Rafael Monroy, M. Gross
Stereoscopic 3D (S3D) has reached wide levels of adoption in consumer and professional markets. However, production of high quality S3D content is still a difficult and expensive art. Various S3D production tools and systems have been released recently to assist high quality content creation. This paper presents a number of such algorithms, tools and systems developed at Disney Research Zurich, which all make use of disparity-aware processing.
{"title":"Disparity-Aware Stereo 3D Production Tools","authors":"A. Smolic, Steven Poulakos, Simon Heinzle, P. Greisen, Manuel Lang, A. Sorkine-Hornung, Miquel A. Farre, N. Stefanoski, Oliver Wang, L. Schnyder, Rafael Monroy, M. Gross","doi":"10.1109/CVMP.2011.25","DOIUrl":"https://doi.org/10.1109/CVMP.2011.25","url":null,"abstract":"Stereoscopic 3D (S3D) has reached wide levels of adoption in consumer and professional markets. However, production of high quality S3D content is still a difficult and expensive art. Various S3D production tools and systems have been released recently to assist high quality content creation. This paper presents a number of such algorithms, tools and systems developed at Disney Research Zurich, which all make use of disparity-aware processing.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126943460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}