This issue discusses methods to extract three-dimensional (3D) models from plain images. In particular, the 3D information is obtained from images for which the camera parameters are unknown. The principles underlying such uncalibrated structure-from-motion methods are outlined. First, a short review of 3D acquisition technologies puts such methods in a wider context and highlights their important advantages. Then, the actual theory behind this line of research is given. The authors have tried to keep the text maximally self-contained, therefore also avoiding to rely on an extensive knowledge of the projective concepts that usually appear in texts about self-calibration 3D methods. Rather, mathematical explanations that are more amenable to intuition are given. The explanation of the theory includes the stratification of reconstructions obtained from image pairs as well as metric reconstruction on the basis of more than two images combined with some additional knowledge about the cameras used. Readers who want to obtain more practical information about how to implement such uncalibrated structure-from-motion pipelines may be interested in two more Foundations and Trends issues written by the same authors. Together with this issue they can be read as a single tutorial on the subject.
{"title":"3D Reconstruction from Multiple Images: Part 1 - Principles","authors":"T. Moons, L. Gool, M. Vergauwen","doi":"10.1561/0600000007","DOIUrl":"https://doi.org/10.1561/0600000007","url":null,"abstract":"This issue discusses methods to extract three-dimensional (3D) models from plain images. In particular, the 3D information is obtained from images for which the camera parameters are unknown. The principles underlying such uncalibrated structure-from-motion methods are outlined. First, a short review of 3D acquisition technologies puts such methods in a wider context and highlights their important advantages. Then, the actual theory behind this line of research is given. The authors have tried to keep the text maximally self-contained, therefore also avoiding to rely on an extensive knowledge of the projective concepts that usually appear in texts about self-calibration 3D methods. Rather, mathematical explanations that are more amenable to intuition are given. The explanation of the theory includes the stratification of reconstructions obtained from image pairs as well as metric reconstruction on the basis of more than two images combined with some additional knowledge about the cameras used. Readers who want to obtain more practical information about how to implement such uncalibrated structure-from-motion pipelines may be interested in two more Foundations and Trends issues written by the same authors. Together with this issue they can be read as a single tutorial on the subject.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2009-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75061085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the last years, kernel methods have established themselves as powerful tools for computer vision researchers as well as for practitioners. In this tutorial, we give an introduction to kernel methods in computer vision from a geometric perspective, introducing not only the ubiquitous support vector machines, but also less known techniques for regression, dimensionality reduction, outlier detection, and clustering. Additionally, we give an outlook on very recent, non-classical techniques for the prediction of structure data, for the estimation of statistical dependency, and for learning the kernel function itself. All methods are illustrated with examples of successful application from the recent computer vision research literature.
{"title":"Kernel Methods in Computer Vision","authors":"Christoph H. Lampert","doi":"10.1561/0600000027","DOIUrl":"https://doi.org/10.1561/0600000027","url":null,"abstract":"Over the last years, kernel methods have established themselves as powerful tools for computer vision researchers as well as for practitioners. In this tutorial, we give an introduction to kernel methods in computer vision from a geometric perspective, introducing not only the ubiquitous support vector machines, but also less known techniques for regression, dimensionality reduction, outlier detection, and clustering. Additionally, we give an outlook on very recent, non-classical techniques for the prediction of structure data, for the estimation of statistical dependency, and for learning the kernel function itself. All methods are illustrated with examples of successful application from the recent computer vision research literature.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2009-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79058622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
1: Introduction 2: From Gaussian Convolution to Bilateral Filter 3: Applications 4: Efficient Implementation 5: Relationship between BF and Other Methods or Framework 6: Extensions of Bilateral Filtering 7: Conclusions. Acknowledgements. References.
{"title":"Bilateral Filtering: Theory and Applications","authors":"Pierre Kornprobst, J. Tumblin, F. Durand","doi":"10.1561/0600000020","DOIUrl":"https://doi.org/10.1561/0600000020","url":null,"abstract":"1: Introduction 2: From Gaussian Convolution to Bilateral Filter 3: Applications 4: Efficient Implementation 5: Relationship between BF and Other Methods or Framework 6: Extensions of Bilateral Filtering 7: Conclusions. Acknowledgements. References.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81269395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matting refers to the problem of accurate foreground estimation in images and video. It is one of the key techniques in many image editing and film production applications, thus has been extensively studied in the literature. With the recent advances of digital cameras, using matting techniques to create novel composites or facilitate other editing tasks has gained increasing interest from both professionals as well as consumers. Consequently, various matting techniques and systems have been proposed to try to efficiently extract high quality mattes from both still images and video sequences. This survey provides a comprehensive review of existing image and video matting algorithms and systems, with an emphasis on the advanced techniques that have been recently proposed. The first part of the survey is focused on image matting. The fundamental techniques shared by many image matting algorithms, such as color sampling methods and matting affinities, are first analyzed. Image matting techniques are then classified into three categories based on their underlying methodologies, and an objective evaluation is conducted to reveal the advantages and disadvantages of each category. A unique Accuracy vs. Cost analysis is presented as a practical guidance for readers to properly choose matting tools that best fit their specific requirements and constraints. The second part of the survey is focused on video matting. The difficulties and challenges of video matting are first analyzed, and various ways of combining matting algorithms with other video processing techniques for building efficient video matting systems are reviewed. Key contributions, advantages as well as limitations of important systems are summarized. Finally, special matting systems that rely on capturing additional foreground/background information to automate the matting process are discussed. A few interesting directions for future matting research are presented in the conclusion.
{"title":"Image and Video Matting: A Survey","authors":"Jue Wang, Michael F. Cohen","doi":"10.1561/0600000019","DOIUrl":"https://doi.org/10.1561/0600000019","url":null,"abstract":"Matting refers to the problem of accurate foreground estimation in images and video. It is one of the key techniques in many image editing and film production applications, thus has been extensively studied in the literature. With the recent advances of digital cameras, using matting techniques to create novel composites or facilitate other editing tasks has gained increasing interest from both professionals as well as consumers. Consequently, various matting techniques and systems have been proposed to try to efficiently extract high quality mattes from both still images and video sequences. This survey provides a comprehensive review of existing image and video matting algorithms and systems, with an emphasis on the advanced techniques that have been recently proposed. The first part of the survey is focused on image matting. The fundamental techniques shared by many image matting algorithms, such as color sampling methods and matting affinities, are first analyzed. Image matting techniques are then classified into three categories based on their underlying methodologies, and an objective evaluation is conducted to reveal the advantages and disadvantages of each category. A unique Accuracy vs. Cost analysis is presented as a practical guidance for readers to properly choose matting tools that best fit their specific requirements and constraints. The second part of the survey is focused on video matting. The difficulties and challenges of video matting are first analyzed, and various ways of combining matting algorithms with other video processing techniques for building efficient video matting systems are reviewed. Key contributions, advantages as well as limitations of important systems are summarized. Finally, special matting systems that rely on capturing additional foreground/background information to automate the matting process are discussed. A few interesting directions for future matting research are presented in the conclusion.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78102893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image-based rendering (IBR) is unique in that it requires computer graphics, computer vision, and image processing to join forces to solve a common goal, namely photorealistic rendering through the use of images. IBR as an area of research has been around for about ten years, and substantial progress has been achieved in effiectively capturing, representing, and rendering scenes. In this article, we survey the techniques used in IBR. Our survey shows that representations and rendering techniques can differ radically, depending on design decisions related to ease of capture, use of geometry, accuracy of geometry (if used), number and distribution of source images, degrees of freedom for virtual navigation, and expected scene complexity.
{"title":"Image-Based Rendering","authors":"S. B. Kang, Yin Li, Xin Tong, H. Shum","doi":"10.1561/0600000012","DOIUrl":"https://doi.org/10.1561/0600000012","url":null,"abstract":"Image-based rendering (IBR) is unique in that it requires computer graphics, computer vision, and image processing to join forces to solve a common goal, namely photorealistic rendering through the use of images. IBR as an area of research has been around for about ten years, and substantial progress has been achieved in effiectively capturing, representing, and rendering scenes. In this article, we survey the techniques used in IBR. Our survey shows that representations and rendering techniques can differ radically, depending on design decisions related to ease of capture, use of geometry, accuracy of geometry (if used), number and distribution of source images, degrees of freedom for virtual navigation, and expected scene complexity.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2006-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73175162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many applications require tracking of complex 3D objects. These include visual servoing of robotic arms on specific target objects, Augmented Reality systems that require real-time registration of the object to be augmented, and head tracking systems that sophisticated interfaces can use. Computer Vision offers solutions that are cheap, practical and non-invasive.This survey reviews the different techniques and approaches that have been developed by industry and research. First, important mathematical tools are introduced: Camera representation, robust estimation and uncertainty estimation. Then a comprehensive study is given of the numerous approaches developed by the Augmented Reality and Robotics communities, beginning with those that are based on point or planar fiducial marks and moving on to those that avoid the need to engineer the environment by relying on natural features such as edges, texture or interest. Recent advances that avoid manual initialization and failures due to fast motion are also presented. The survery concludes with the different possible choices that should be made when implementing a 3D tracking system and a discussion of the future of vision-based 3D tracking.Because it encompasses many computer vision techniques from low-level vision to 3D geometry and includes a comprehensive study of the massive literature on the subject, this survey should be the handbook of the student, the researcher, or the engineer who wants to implement a 3D tracking system.
{"title":"Monocular Model-Based 3D Tracking of Rigid Objects: A Survey","authors":"V. Lepetit, P. Fua","doi":"10.1561/0600000001","DOIUrl":"https://doi.org/10.1561/0600000001","url":null,"abstract":"Many applications require tracking of complex 3D objects. These include visual servoing of robotic arms on specific target objects, Augmented Reality systems that require real-time registration of the object to be augmented, and head tracking systems that sophisticated interfaces can use. Computer Vision offers solutions that are cheap, practical and non-invasive.This survey reviews the different techniques and approaches that have been developed by industry and research. First, important mathematical tools are introduced: Camera representation, robust estimation and uncertainty estimation. Then a comprehensive study is given of the numerous approaches developed by the Augmented Reality and Robotics communities, beginning with those that are based on point or planar fiducial marks and moving on to those that avoid the need to engineer the environment by relying on natural features such as edges, texture or interest. Recent advances that avoid manual initialization and failures due to fast motion are also presented. The survery concludes with the different possible choices that should be made when implementing a 3D tracking system and a discussion of the future of vision-based 3D tracking.Because it encompasses many computer vision techniques from low-level vision to 3D geometry and includes a comprehensive study of the massive literature on the subject, this survey should be the handbook of the student, the researcher, or the engineer who wants to implement a 3D tracking system.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2005-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76442572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Forsyth, Okan Arikan, L. Ikemoto, J. F. O'Brien, Deva Ramanan
We review methods for kinematic tracking of the human body in video. The review is part of a projected book that is intended to cross-fertilize ideas about motion representation between the animation and computer vision communities. The review confines itself to the earlier stages of motion, focusing on tracking and motion synthesis; future material will cover activity representation and motion generation. In general, we take the position that tracking does not necessarily involve (as is usually thought) complex multimodal inference problems. Instead, there are two key problems, both easy to state. The first is lifting, where one must infer the configuration of the body in three dimensions from image data. Ambiguities in lifting can result in multimodal inference problem, and we review what little is known about the extent to which a lift is ambiguous. The second is data association, where one must determine which pixels in an image Full text available at: http://dx.doi.org/10.1561/0600000005
{"title":"Computational Studies of Human Motion: Part 1, Tracking and Motion Synthesis","authors":"D. Forsyth, Okan Arikan, L. Ikemoto, J. F. O'Brien, Deva Ramanan","doi":"10.1561/0600000005","DOIUrl":"https://doi.org/10.1561/0600000005","url":null,"abstract":"We review methods for kinematic tracking of the human body in video. The review is part of a projected book that is intended to cross-fertilize ideas about motion representation between the animation and computer vision communities. The review confines itself to the earlier stages of motion, focusing on tracking and motion synthesis; future material will cover activity representation and motion generation. In general, we take the position that tracking does not necessarily involve (as is usually thought) complex multimodal inference problems. Instead, there are two key problems, both easy to state. The first is lifting, where one must infer the configuration of the body in three dimensions from image data. Ambiguities in lifting can result in multimodal inference problem, and we review what little is known about the extent to which a lift is ambiguous. The second is data association, where one must determine which pixels in an image Full text available at: http://dx.doi.org/10.1561/0600000005","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":null,"pages":null},"PeriodicalIF":36.5,"publicationDate":"2005-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78304643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}