Plasticine is a core material in the production of stop-motion animation. In some situations it is valuable to use computer simulation techniques to generate objects that appear to be made of plasticine. In order to render this material accurately, we present a new shading model which is based on the true physical properties of plasticine. We show that our new model represents the material approximately 20% more accurately than other existing state of the art surface shaders. Results are shown comparing our new model to state of the art models and to an existing solution used in production.
{"title":"Plasticine shading","authors":"L. Howell, Philip Child, P. Hall","doi":"10.1145/2668904.2668933","DOIUrl":"https://doi.org/10.1145/2668904.2668933","url":null,"abstract":"Plasticine is a core material in the production of stop-motion animation. In some situations it is valuable to use computer simulation techniques to generate objects that appear to be made of plasticine. In order to render this material accurately, we present a new shading model which is based on the true physical properties of plasticine. We show that our new model represents the material approximately 20% more accurately than other existing state of the art surface shaders. Results are shown comparing our new model to state of the art models and to an existing solution used in production.","PeriodicalId":401915,"journal":{"name":"Proceedings of the 11th European Conference on Visual Media Production","volume":"1965 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125693119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CUDA applications and general-purpose GPU (GPGPU) programs are widely used nowadays for solving computationally intensive tasks. There is a substantial effort in the form of tools, papers, books and features that are targeted at GPGPU APIs such as CUDA and OpenCL. The GPU architecture, being substantially different from the traditional CPU ones (x86, PowerPC, ARM) requires a different approach and introduces a different set of challenges. Apart from the traditional and well examined GPGPU problems - such as memory access patterns, parallel designs and occupancy, there is yet another really important, but not well studied setback - from one point onward, the bigger the CUDA application gets (in terms of lines of code) the slower it becomes, mostly due to register spilling. Register spilling is more or less a problem for most of the available architectures today, but it can easily become a massive bottleneck on the GPU due to its nature. We are going to examine in detail why this happens, what are the common ways to solve it, and we are going to propose one simple, presently undocumented approach that may be used to alleviate the issue in some situations. For the purpose of this paper we will focus on the NVidia Fermi Architecture
{"title":"Optimizing large scale CUDA applications using input data specific optimizations","authors":"B. Taskov","doi":"10.1145/2668904.2668941","DOIUrl":"https://doi.org/10.1145/2668904.2668941","url":null,"abstract":"CUDA applications and general-purpose GPU (GPGPU) programs are widely used nowadays for solving computationally intensive tasks. There is a substantial effort in the form of tools, papers, books and features that are targeted at GPGPU APIs such as CUDA and OpenCL. The GPU architecture, being substantially different from the traditional CPU ones (x86, PowerPC, ARM) requires a different approach and introduces a different set of challenges. Apart from the traditional and well examined GPGPU problems - such as memory access patterns, parallel designs and occupancy, there is yet another really important, but not well studied setback - from one point onward, the bigger the CUDA application gets (in terms of lines of code) the slower it becomes, mostly due to register spilling. Register spilling is more or less a problem for most of the available architectures today, but it can easily become a massive bottleneck on the GPU due to its nature. We are going to examine in detail why this happens, what are the common ways to solve it, and we are going to propose one simple, presently undocumented approach that may be used to alleviate the issue in some situations. For the purpose of this paper we will focus on the NVidia Fermi Architecture","PeriodicalId":401915,"journal":{"name":"Proceedings of the 11th European Conference on Visual Media Production","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114698984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alastair Barber, Matthew A. Brown, Paul Hogbin, D. Cosker
Estimating changes in camera parameters, such as motion, focal length and exposure time over a single frame or sequence of frames is an integral part of many computer vision applications. Rapid changes in these parameters often cause motion blur to be present in an image, which can make traditional methods of feature identification and tracking difficult. Here we present a method for estimating the scale changes brought about by change in focal length from a single motion-blurred frame. We also use the results from two seperate methods for determining the rotation of a pair of motion-blurred frames to estimate the exposure time of a frame (i.e. the shutter angle).
{"title":"Estimating camera intrinsics from motion blur","authors":"Alastair Barber, Matthew A. Brown, Paul Hogbin, D. Cosker","doi":"10.1145/2668904.2668934","DOIUrl":"https://doi.org/10.1145/2668904.2668934","url":null,"abstract":"Estimating changes in camera parameters, such as motion, focal length and exposure time over a single frame or sequence of frames is an integral part of many computer vision applications. Rapid changes in these parameters often cause motion blur to be present in an image, which can make traditional methods of feature identification and tracking difficult. Here we present a method for estimating the scale changes brought about by change in focal length from a single motion-blurred frame. We also use the results from two seperate methods for determining the rotation of a pair of motion-blurred frames to estimate the exposure time of a frame (i.e. the shutter angle).","PeriodicalId":401915,"journal":{"name":"Proceedings of the 11th European Conference on Visual Media Production","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127622771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High efficiency video coding has made it possible to stream video over bandwidth constrained communication networks. Depending on bit rate requirements, a video encoder sacrifices some image details which can then introduce visual artefacts. Due to aggressive encoding a contouring staircase artefact called banding can be observed in image regions with very low texture. This paper presents a solution for removing banding artefacts using image filtering and dithering techniques. A new banding index (BI) metric is also presented for quantitatively measuring the amount of banding in an image. Using this BI metric, we assess how much banding YouTube video encoding introduces in a video test dataset. There is a debanding filter in ffmpeg called gradfun. We compare the results of our debanding technique with those of gradfun on the YouTube test dataset.
{"title":"Advanced video debanding","authors":"G. Baugh, A. Kokaram, François Pitié","doi":"10.1145/2668904.2668912","DOIUrl":"https://doi.org/10.1145/2668904.2668912","url":null,"abstract":"High efficiency video coding has made it possible to stream video over bandwidth constrained communication networks. Depending on bit rate requirements, a video encoder sacrifices some image details which can then introduce visual artefacts. Due to aggressive encoding a contouring staircase artefact called banding can be observed in image regions with very low texture. This paper presents a solution for removing banding artefacts using image filtering and dithering techniques. A new banding index (BI) metric is also presented for quantitatively measuring the amount of banding in an image. Using this BI metric, we assess how much banding YouTube video encoding introduces in a video test dataset. There is a debanding filter in ffmpeg called gradfun. We compare the results of our debanding technique with those of gradfun on the YouTube test dataset.","PeriodicalId":401915,"journal":{"name":"Proceedings of the 11th European Conference on Visual Media Production","volume":"268 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116620332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a framework for automatically generating multiple clips suitable for video editing by simulating pan-tilt-zoom camera movements within the frame of a single static camera. Assuming important actors and objects can be localized using computer vision techniques, our method requires only minimal user input to define the subject matter of each sub-clip. The composition of each sub-clip is automatically computed in a novel L1-norm optimization framework. Our approach encodes several common cinematographic practices into a single convex cost function minimization problem, resulting in aesthetically pleasing sub-clips which can easily be edited together using off-the-shelf multi-clip video editing software. We demonstrate our approach on five video sequences of a live theatre performance by generating multiple synchronized subclips for each sequence.
{"title":"Multi-clip video editing from a single viewpoint","authors":"Vineet Gandhi, Rémi Ronfard, Michael Gleicher","doi":"10.1145/2668904.2668936","DOIUrl":"https://doi.org/10.1145/2668904.2668936","url":null,"abstract":"We propose a framework for automatically generating multiple clips suitable for video editing by simulating pan-tilt-zoom camera movements within the frame of a single static camera. Assuming important actors and objects can be localized using computer vision techniques, our method requires only minimal user input to define the subject matter of each sub-clip. The composition of each sub-clip is automatically computed in a novel L1-norm optimization framework. Our approach encodes several common cinematographic practices into a single convex cost function minimization problem, resulting in aesthetically pleasing sub-clips which can easily be edited together using off-the-shelf multi-clip video editing software. We demonstrate our approach on five video sequences of a live theatre performance by generating multiple synchronized subclips for each sequence.","PeriodicalId":401915,"journal":{"name":"Proceedings of the 11th European Conference on Visual Media Production","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116981011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a saliency-based parameter tuning algorithm that can optimize the parameters of tone mapping operators automatically by minimizing the saliency distortion caused by the process of tone mapping. The algorithm employs an improved saliency detection model for HDR images, and the saliency distortion is quantified as the Kullback-Leibler divergence between the saliency distributions of the tone mapped images and those of the corresponding HDR images. We show that the minimization can be accomplished by employing an evolution strategy with individuals representing parameter settings and fitness values based on saliency distortion. The effectiveness of our algorithm is demonstrated through experiments using several tone mapping operators and test images.
{"title":"Saliency-based parameter tuning for tone mapping","authors":"Xihe Gao, Stephen Brooks, D. Arnold","doi":"10.1145/2668904.2668939","DOIUrl":"https://doi.org/10.1145/2668904.2668939","url":null,"abstract":"We present a saliency-based parameter tuning algorithm that can optimize the parameters of tone mapping operators automatically by minimizing the saliency distortion caused by the process of tone mapping. The algorithm employs an improved saliency detection model for HDR images, and the saliency distortion is quantified as the Kullback-Leibler divergence between the saliency distributions of the tone mapped images and those of the corresponding HDR images. We show that the minimization can be accomplished by employing an evolution strategy with individuals representing parameter settings and fitness values based on saliency distortion. The effectiveness of our algorithm is demonstrated through experiments using several tone mapping operators and test images.","PeriodicalId":401915,"journal":{"name":"Proceedings of the 11th European Conference on Visual Media Production","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115992869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a new algorithm for searching video repositories using free-hand sketches. Our queries express both appearance (color, shape) and motion attributes, as well as semantic properties (object labels) enabling hybrid queries to be specified. Unlike existing sketch based video retrieval (SBVR) systems that enable hybrid queries of this form, we do not adopt a model fitting/optimization approach to match at query-time. Rather, we create an efficiently searchable index via a novel space-time descriptor that encapsulates all these properties. The real-time performance yielded by our indexing approach enables interactive refinement of search results within a relevance feedback (RF) framework; a unique contribution to SBVR. We evaluate our system over 700 sports footage clips exhibiting a variety of clutter and motion conditions, demonstrating significant accuracy and speed gains over the state of the art.
{"title":"Interactive video asset retrieval using sketched queries","authors":"Stuart James, J. Collomosse","doi":"10.1145/2668904.2668940","DOIUrl":"https://doi.org/10.1145/2668904.2668940","url":null,"abstract":"We present a new algorithm for searching video repositories using free-hand sketches. Our queries express both appearance (color, shape) and motion attributes, as well as semantic properties (object labels) enabling hybrid queries to be specified. Unlike existing sketch based video retrieval (SBVR) systems that enable hybrid queries of this form, we do not adopt a model fitting/optimization approach to match at query-time. Rather, we create an efficiently searchable index via a novel space-time descriptor that encapsulates all these properties. The real-time performance yielded by our indexing approach enables interactive refinement of search results within a relevance feedback (RF) framework; a unique contribution to SBVR. We evaluate our system over 700 sports footage clips exhibiting a variety of clutter and motion conditions, demonstrating significant accuracy and speed gains over the state of the art.","PeriodicalId":401915,"journal":{"name":"Proceedings of the 11th European Conference on Visual Media Production","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129431030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper considers the problem of estimating human pose in challenging monocular sports videos, where manual intervention is often required in order to obtain useful results. Fully automatic approaches focus on developing inference algorithms and probabilistic prior models based on learned measurements and often face challenges in generalisation beyond the learned dataset. This work expands on the idea of using an interactive model-based generative technique for accurately estimating the human pose from uncalibrated unconstrained monocular TV sports footage. A method of keyframe propagation is introduced to obtain reliable tracking from limited operator input by introducing the concepts of keyframe propagation and optimal keyframe selection assistance for the operator. Experimental results show that the approach produces results competitive with those produced with twice the number of manually annotated keyframes, halving the amount of interaction required.
{"title":"Athlete pose estimation by non-sequential key-frame propagation","authors":"Mykyta Fastovets, Jean-Yves Guillemaut, A. Hilton","doi":"10.1145/2668904.2668938","DOIUrl":"https://doi.org/10.1145/2668904.2668938","url":null,"abstract":"This paper considers the problem of estimating human pose in challenging monocular sports videos, where manual intervention is often required in order to obtain useful results. Fully automatic approaches focus on developing inference algorithms and probabilistic prior models based on learned measurements and often face challenges in generalisation beyond the learned dataset. This work expands on the idea of using an interactive model-based generative technique for accurately estimating the human pose from uncalibrated unconstrained monocular TV sports footage. A method of keyframe propagation is introduced to obtain reliable tracking from limited operator input by introducing the concepts of keyframe propagation and optimal keyframe selection assistance for the operator. Experimental results show that the approach produces results competitive with those produced with twice the number of manually annotated keyframes, halving the amount of interaction required.","PeriodicalId":401915,"journal":{"name":"Proceedings of the 11th European Conference on Visual Media Production","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128592025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fabrizio Pece, J. Tompkin, H. Pfister, J. Kautz, C. Theobalt
Panoramic imagery is viewed daily by thousands of people, and panoramic video imagery is becoming more common. This imagery is viewed on many different devices with different properties, and the effect of these differences on spatio-temporal task performance is yet untested on these imagery. We adapt a novel panoramic video interface and conduct a user study to discover whether display type affects spatio-temporal reasoning task performance across desktop monitor, tablet, and head-mounted displays. We discover that, in our complex reasoning task, HMDs are as effective as desktop displays even if participants felt less capable, but tablets were less effective than desktop displays even though participants felt just as capable. Our results impact virtual tourism, telepresence, and surveillance applications, and so we state the design implications of our results for panoramic imagery systems.
{"title":"Device effect on panoramic video+context tasks","authors":"Fabrizio Pece, J. Tompkin, H. Pfister, J. Kautz, C. Theobalt","doi":"10.1145/2668904.2668943","DOIUrl":"https://doi.org/10.1145/2668904.2668943","url":null,"abstract":"Panoramic imagery is viewed daily by thousands of people, and panoramic video imagery is becoming more common. This imagery is viewed on many different devices with different properties, and the effect of these differences on spatio-temporal task performance is yet untested on these imagery. We adapt a novel panoramic video interface and conduct a user study to discover whether display type affects spatio-temporal reasoning task performance across desktop monitor, tablet, and head-mounted displays. We discover that, in our complex reasoning task, HMDs are as effective as desktop displays even if participants felt less capable, but tablets were less effective than desktop displays even though participants felt just as capable. Our results impact virtual tourism, telepresence, and surveillance applications, and so we state the design implications of our results for panoramic imagery systems.","PeriodicalId":401915,"journal":{"name":"Proceedings of the 11th European Conference on Visual Media Production","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130645386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a simple method for realistic retargeting of facial performance from one shot to another. Editors can combine different takes of a shot into one single, optimal take with minimal manual labour and highly realistic results. Using a static proxy mesh of the actor's head, we obtain approximate 3D information of recorded monocular facial video. This 3D information is used to create pose-invariant textures from recorded facial action and to re-render it into a target shot. This can be done for the full face or parts of it, allowing for flexible editing.
{"title":"Realistic retargeting of facial video","authors":"Wolfgang Paier, M. Kettern, P. Eisert","doi":"10.1145/2668904.2668935","DOIUrl":"https://doi.org/10.1145/2668904.2668935","url":null,"abstract":"We propose a simple method for realistic retargeting of facial performance from one shot to another. Editors can combine different takes of a shot into one single, optimal take with minimal manual labour and highly realistic results. Using a static proxy mesh of the actor's head, we obtain approximate 3D information of recorded monocular facial video. This 3D information is used to create pose-invariant textures from recorded facial action and to re-render it into a target shot. This can be done for the full face or parts of it, allowing for flexible editing.","PeriodicalId":401915,"journal":{"name":"Proceedings of the 11th European Conference on Visual Media Production","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128677277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}