Regions of reflections contain two semi-transparent layers moving over each other. This generates two motion vectors per pel. Current multiple motion estimators either extend the usual brightness consistency assumption to two motions or are based on the Fourier phase shift relationship. Both approaches assume constant motion over at least three frames. As a result they can not handle temporally active motion due to camera shake or acceleration. This paper proposes a new approach for multiple motion estimation by modeling the correct motions as the ones generating the best layer separation of the examined reflection. A Bayesian framework is proposed which then admits a solution using candidate motions generated from KLT trajectories and a layer separation technique. We use novel temporal priors and our results show handling of strong motion inconsistencies and improvements over previous work.
{"title":"Motion Estimation for Regions of Reflections through Layer Separation","authors":"Mohamed A. Elgharib, François Pitié, A. Kokaram","doi":"10.1109/CVMP.2011.12","DOIUrl":"https://doi.org/10.1109/CVMP.2011.12","url":null,"abstract":"Regions of reflections contain two semi-transparent layers moving over each other. This generates two motion vectors per pel. Current multiple motion estimators either extend the usual brightness consistency assumption to two motions or are based on the Fourier phase shift relationship. Both approaches assume constant motion over at least three frames. As a result they can not handle temporally active motion due to camera shake or acceleration. This paper proposes a new approach for multiple motion estimation by modeling the correct motions as the ones generating the best layer separation of the examined reflection. A Bayesian framework is proposed which then admits a solution using candidate motions generated from KLT trajectories and a layer separation technique. We use novel temporal priors and our results show handling of strong motion inconsistencies and improvements over previous work.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126788271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a practical technique for image-based relighting under environmental illumination which greatly reduces the number of required photographs compared to traditional techniques, while still achieving high quality editable relighting results. The proposed method employs an optimization procedure to combine spherical harmonics, a global lighting basis, with a set of local lights. Our choice of lighting basis captures both low and high frequency components of typical surface reflectance functions while generating close approximations to the ground truth with an order of magnitude less data. This technique benefits the acquisition process by reducing the number of required photographs, while simplifying the modification of reflectance data and enabling artistic lighting edits for post-production effects. Here, we demonstrate two desirable lighting edits, modifying light intensity and angular width, employing the proposed lighting basis.
{"title":"Practical Image-Based Relighting and Editing with Spherical-Harmonics and Local Lights","authors":"Borom Tunwattanapong, A. Ghosh, P. Debevec","doi":"10.1109/CVMP.2011.22","DOIUrl":"https://doi.org/10.1109/CVMP.2011.22","url":null,"abstract":"We present a practical technique for image-based relighting under environmental illumination which greatly reduces the number of required photographs compared to traditional techniques, while still achieving high quality editable relighting results. The proposed method employs an optimization procedure to combine spherical harmonics, a global lighting basis, with a set of local lights. Our choice of lighting basis captures both low and high frequency components of typical surface reflectance functions while generating close approximations to the ground truth with an order of magnitude less data. This technique benefits the acquisition process by reducing the number of required photographs, while simplifying the modification of reflectance data and enabling artistic lighting edits for post-production effects. Here, we demonstrate two desirable lighting edits, modifying light intensity and angular width, employing the proposed lighting basis.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125163756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kensuke Ikeya, K. Hisatomi, Miwa Katayama, Y. Iwadate
We propose a method to estimate depth from three wide-baseline camera images using belief propagation. With this method, message propagation is restricted to reduce the effects of boundary overreach, and max and min values and kurtosis of message energy distribution are used to reduce errors caused by large occlusion and texture less areas. In experiments, we focused on scenes of the traditional Japanese sport of sumo and created 3D models from three HD images using our method. We displayed them on a 3D display using the principle of integral photography (IP). We confirmed from the experimental results that our method was effective for estimating depth.
{"title":"Depth Estimation from Three Cameras Using Belief Propagation: 3D Modelling of Sumo Wrestling","authors":"Kensuke Ikeya, K. Hisatomi, Miwa Katayama, Y. Iwadate","doi":"10.1109/CVMP.2011.20","DOIUrl":"https://doi.org/10.1109/CVMP.2011.20","url":null,"abstract":"We propose a method to estimate depth from three wide-baseline camera images using belief propagation. With this method, message propagation is restricted to reduce the effects of boundary overreach, and max and min values and kurtosis of message energy distribution are used to reduce errors caused by large occlusion and texture less areas. In experiments, we focused on scenes of the traditional Japanese sport of sumo and created 3D models from three HD images using our method. We displayed them on a 3D display using the principle of integral photography (IP). We confirmed from the experimental results that our method was effective for estimating depth.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121359032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A shape constrained Laplacian mesh deformation approach is introduced for interactive editing of mesh sequences. This allows low-level constraints, such as foot or hand contact, to be imposed while preserving the natural dynamics of the captured surface. The approach also allows artistic manipulation of motion style to achieve effects such as squash-and-stretch. Interactive editing of key-frames is followed by automatic temporal propagation over a window of frames. User edits are seamlessly integrated into the captured mesh sequence. Three spatio-temporal interpolation methods are evaluated. Results on a variety of real and synthetic sequences demonstrate that the approach enables flexible manipulation of captured 3D video sequences.
{"title":"Space-time Editing of 3D Video Sequences","authors":"M. Tejera, A. Hilton","doi":"10.1109/CVMP.2011.23","DOIUrl":"https://doi.org/10.1109/CVMP.2011.23","url":null,"abstract":"A shape constrained Laplacian mesh deformation approach is introduced for interactive editing of mesh sequences. This allows low-level constraints, such as foot or hand contact, to be imposed while preserving the natural dynamics of the captured surface. The approach also allows artistic manipulation of motion style to achieve effects such as squash-and-stretch. Interactive editing of key-frames is followed by automatic temporal propagation over a window of frames. User edits are seamlessly integrated into the captured mesh sequence. Three spatio-temporal interpolation methods are evaluated. Results on a variety of real and synthetic sequences demonstrate that the approach enables flexible manipulation of captured 3D video sequences.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134194117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a detailed blueprint of our stereoscopic freeviewpoint video system. Using unsynchronized footage as input, we can render virtual camera paths in the post-production stage. The movement of the virtual camera also extends to the temporal domain, so that slow-motion and freeze-and-rotate shots are possible. As a proof-of-concept, a full length stereoscopic HD music video has been produced using our approach.
{"title":"Making of Who Cares? HD Stereoscopic Free Viewpoint Video","authors":"C. Lipski, F. Klose, K. Ruhl, M. Magnor","doi":"10.1109/CVMP.2011.7","DOIUrl":"https://doi.org/10.1109/CVMP.2011.7","url":null,"abstract":"We present a detailed blueprint of our stereoscopic freeviewpoint video system. Using unsynchronized footage as input, we can render virtual camera paths in the post-production stage. The movement of the virtual camera also extends to the temporal domain, so that slow-motion and freeze-and-rotate shots are possible. As a proof-of-concept, a full length stereoscopic HD music video has been produced using our approach.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133122514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a framework for efficient reconstruction of dense scene structure from video. Sequential structure-from-motion recovers camera information from video, providing only sparse 3D points. We build a dense 3D point cloud by performing full-frame tracking and depth estimation across sequences. First, we present a novel algorithm for sequential frame selection to extract a set of key frames with sufficient parallax for accurate depth reconstruction. Second, we introduce a technique for efficient reconstruction using dense tracking with geometrically correct optimisation of depth and orientation. Key frame selection is also performed in optimisation to provide accurate depth reconstruction for different scene elements. We test our work on benchmark footage and scenes containing local non-rigid motion, foreground clutter and occlusions to show comparable performance to state of the art techniques. We also show a substantial increase in speed on real world footage compared to existing methods, when they succeed, and successful reconstructions when they fail.
{"title":"Efficient Dense Reconstruction from Video","authors":"Phil Parsonage, A. Hilton, J. Starck","doi":"10.1109/CVMP.2011.10","DOIUrl":"https://doi.org/10.1109/CVMP.2011.10","url":null,"abstract":"We present a framework for efficient reconstruction of dense scene structure from video. Sequential structure-from-motion recovers camera information from video, providing only sparse 3D points. We build a dense 3D point cloud by performing full-frame tracking and depth estimation across sequences. First, we present a novel algorithm for sequential frame selection to extract a set of key frames with sufficient parallax for accurate depth reconstruction. Second, we introduce a technique for efficient reconstruction using dense tracking with geometrically correct optimisation of depth and orientation. Key frame selection is also performed in optimisation to provide accurate depth reconstruction for different scene elements. We test our work on benchmark footage and scenes containing local non-rigid motion, foreground clutter and occlusions to show comparable performance to state of the art techniques. We also show a substantial increase in speed on real world footage compared to existing methods, when they succeed, and successful reconstructions when they fail.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132032842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a new approach for feature description used in image processing and robust image recognition algorithms such as 3D camera tracking, view reconstruction or 3D scene analysis. State of the art feature detectors distinguish interest point detection and description. The former is commonly performed in scale space, while the latter is used to describe a normalized support region using histograms of gradients or similar derivatives of the grayscale image patch. This approach has proven to be very successful. However, the descriptors are usually of high dimensionality in order to achieve a high descriptiveness. Against this background, we propose a binarized descriptor which has a low memory usage and good matching performance. The descriptor is composed of binarized responses resulting from a set of folding operations applied to the normalized support region. We demonstrate the real-time capabilities of the feature descriptor in a stereo matching environment.
{"title":"Semantic Kernels Binarized - A Feature Descriptor for Fast and Robust Matching","authors":"Frederik Zilly, C. Riechert, P. Eisert, P. Kauff","doi":"10.1109/CVMP.2011.11","DOIUrl":"https://doi.org/10.1109/CVMP.2011.11","url":null,"abstract":"This paper presents a new approach for feature description used in image processing and robust image recognition algorithms such as 3D camera tracking, view reconstruction or 3D scene analysis. State of the art feature detectors distinguish interest point detection and description. The former is commonly performed in scale space, while the latter is used to describe a normalized support region using histograms of gradients or similar derivatives of the grayscale image patch. This approach has proven to be very successful. However, the descriptors are usually of high dimensionality in order to achieve a high descriptiveness. Against this background, we propose a binarized descriptor which has a low memory usage and good matching performance. The descriptor is composed of binarized responses resulting from a set of folding operations applied to the normalized support region. We demonstrate the real-time capabilities of the feature descriptor in a stereo matching environment.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"673 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116104476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, the look of hand-drawn sketches has become fashionable in video production. This paper introduces a software tool to produce corresponding videos in real time during lectures, presentations, or in the studio. Currently, two styles are available. In the first style, a hand seems to draw on a whiteboard. In the second style, the presenter seems to stand behind a transparent board, which is simulated with the help of a camera. In both cases, the input stems from a standard graphics tablet. The image of the lecturer's arm is synthesized from photographs and animated through inverse kinematics, the sounds of the pen and the eraser are synthesized from recordings. Auxiliary functions include a ghosted script for the presenter and drag drop of graphical elements prepared in advance.
{"title":"A Real-time Production Tool for Animated Hand Sketches","authors":"J. Loviscach","doi":"10.1109/CVMP.2011.17","DOIUrl":"https://doi.org/10.1109/CVMP.2011.17","url":null,"abstract":"In recent years, the look of hand-drawn sketches has become fashionable in video production. This paper introduces a software tool to produce corresponding videos in real time during lectures, presentations, or in the studio. Currently, two styles are available. In the first style, a hand seems to draw on a whiteboard. In the second style, the presenter seems to stand behind a transparent board, which is simulated with the help of a camera. In both cases, the input stems from a standard graphics tablet. The image of the lecturer's arm is synthesized from photographs and animated through inverse kinematics, the sounds of the pen and the eraser are synthesized from recordings. Auxiliary functions include a ghosted script for the presenter and drag drop of graphical elements prepared in advance.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128860831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew Jones, Graham Fyffe, Xueming Yu, Alex Ma, Jay Busch, M. Bolas, P. Debevec
Head-mounted cameras are an increasingly important tool for capturing facial performances to drive virtual characters. They provide a fixed, unoccluded view of the face, useful for observing motion capture dots or as input to video analysis. However, the 2D imagery captured with these systems is typically affected by ambient light and generally fails to record subtle 3D shape changes as the face performs. We have developed a system that augments a head-mounted camera with LED-based photometric stereo. The system allows observation of the face independent of the ambient light and generates per-pixel surface normals so that the performance is recorded dynamically in 3D. The resulting data can be used for facial relighting or as better input to machine learning algorithms for driving an animated face.
{"title":"Head-Mounted Photometric Stereo for Performance Capture","authors":"Andrew Jones, Graham Fyffe, Xueming Yu, Alex Ma, Jay Busch, M. Bolas, P. Debevec","doi":"10.1145/1837026.1837088","DOIUrl":"https://doi.org/10.1145/1837026.1837088","url":null,"abstract":"Head-mounted cameras are an increasingly important tool for capturing facial performances to drive virtual characters. They provide a fixed, unoccluded view of the face, useful for observing motion capture dots or as input to video analysis. However, the 2D imagery captured with these systems is typically affected by ambient light and generally fails to record subtle 3D shape changes as the face performs. We have developed a system that augments a head-mounted camera with LED-based photometric stereo. The system allows observation of the face independent of the ambient light and generates per-pixel surface normals so that the performance is recorded dynamically in 3D. The resulting data can be used for facial relighting or as better input to machine learning algorithms for driving an animated face.","PeriodicalId":167135,"journal":{"name":"2011 Conference for Visual Media Production","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131210935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}