首页 > 最新文献

2011 IEEE Workshop on Applications of Computer Vision (WACV)最新文献

英文 中文
Personalized video summarization with human in the loop 个性化的视频总结与人在循环
Pub Date : 2011-01-05 DOI: 10.1109/WACV.2011.5711483
Bohyung Han, Jihun Hamm, Jack Sim
In automatic video summarization, visual summary is constructed typically based on the analysis of low-level features with little consideration of video semantics. However, the contextual and semantic information of a video is marginally related to low-level features in practice although they are useful to compute visual similarity between frames. Therefore, we propose a novel video summarization technique, where the semantically important information is extracted from a set of keyframes given by human and the summary of a video is constructed based on the automatic temporal segmentation using the analysis of inter-frame similarity to the keyframes. Toward this goal, we model a video sequence with a dissimilarity matrix based on bidirectional similarity measure between every pair of frames, and subsequently characterize the structure of the video by a nonlinear manifold embedding. Then, we formulate video summarization as a variant of the 0–1 knapsack problem, which is solved by dynamic programming efficiently. The effectiveness of our algorithm is illustrated quantitatively and qualitatively using realistic videos collected from YouTube.
在自动视频摘要中,视觉摘要通常是基于对底层特征的分析而构建的,很少考虑视频语义。然而,视频的上下文和语义信息在实践中与底层特征的关系不大,尽管它们对计算帧之间的视觉相似性很有用。因此,我们提出了一种新的视频摘要技术,该技术从人类给出的一组关键帧中提取语义上重要的信息,并利用帧间与关键帧的相似度分析,在自动时间分割的基础上构建视频摘要。为此,我们利用基于每对帧之间的双向相似性度量的不相似矩阵来建模视频序列,然后通过非线性流形嵌入来表征视频的结构。然后,我们将视频摘要化为0-1背包问题的一个变体,并利用动态规划有效地解决了该问题。利用从YouTube上收集的真实视频,定量和定性地说明了我们算法的有效性。
{"title":"Personalized video summarization with human in the loop","authors":"Bohyung Han, Jihun Hamm, Jack Sim","doi":"10.1109/WACV.2011.5711483","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711483","url":null,"abstract":"In automatic video summarization, visual summary is constructed typically based on the analysis of low-level features with little consideration of video semantics. However, the contextual and semantic information of a video is marginally related to low-level features in practice although they are useful to compute visual similarity between frames. Therefore, we propose a novel video summarization technique, where the semantically important information is extracted from a set of keyframes given by human and the summary of a video is constructed based on the automatic temporal segmentation using the analysis of inter-frame similarity to the keyframes. Toward this goal, we model a video sequence with a dissimilarity matrix based on bidirectional similarity measure between every pair of frames, and subsequently characterize the structure of the video by a nonlinear manifold embedding. Then, we formulate video summarization as a variant of the 0–1 knapsack problem, which is solved by dynamic programming efficiently. The effectiveness of our algorithm is illustrated quantitatively and qualitatively using realistic videos collected from YouTube.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124903179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
On the reliability of eye color as a soft biometric trait 眼睛颜色作为一种软生物特征的可靠性研究
Pub Date : 2011-01-05 DOI: 10.1109/WACV.2011.5711507
A. Dantcheva, N. Erdogmus, J. Dugelay
This work studies eye color as a soft biometric trait and provides a novel insight about the influence of pertinent factors in this context, like color spaces, illumination and presence of glasses. A motivation for the paper is the fact that the human iris color is an essential facial trait for Caucasians, which can be employed in iris pattern recognition systems for pruning the search or in soft biometrics systems for person re-identification. Towards studying iris color as a soft biometric trait, we consider a system for automatic detection of eye color, based on standard facial images. The system entails automatic iris localization, followed by classification based on Gaussian Mixture Models with Expectation Maximization. We finally provide related detection results on the UBIRIS2 database employable in a real time eye color detection system.
这项工作将眼睛颜色作为一种软生物特征进行研究,并提供了一种关于相关因素在这种情况下的影响的新见解,如色彩空间、照明和眼镜的存在。本文的动机是人类虹膜颜色是高加索人的基本面部特征,可以在虹膜模式识别系统中用于修剪搜索或在软生物识别系统中用于人员再识别。为了研究虹膜颜色作为一种软生物特征,我们考虑了一种基于标准面部图像的眼睛颜色自动检测系统。该系统需要自动定位虹膜,然后基于期望最大化的高斯混合模型进行分类。最后,我们在UBIRIS2数据库上提供了相关的检测结果,可用于实时眼睛颜色检测系统。
{"title":"On the reliability of eye color as a soft biometric trait","authors":"A. Dantcheva, N. Erdogmus, J. Dugelay","doi":"10.1109/WACV.2011.5711507","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711507","url":null,"abstract":"This work studies eye color as a soft biometric trait and provides a novel insight about the influence of pertinent factors in this context, like color spaces, illumination and presence of glasses. A motivation for the paper is the fact that the human iris color is an essential facial trait for Caucasians, which can be employed in iris pattern recognition systems for pruning the search or in soft biometrics systems for person re-identification. Towards studying iris color as a soft biometric trait, we consider a system for automatic detection of eye color, based on standard facial images. The system entails automatic iris localization, followed by classification based on Gaussian Mixture Models with Expectation Maximization. We finally provide related detection results on the UBIRIS2 database employable in a real time eye color detection system.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125310425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Simultaneous motion segmentation and Structure from Motion 同步运动分割和基于运动的结构
Pub Date : 2011-01-05 DOI: 10.1109/WACV.2011.5711570
L. Zappella, A. D. Bue, X. Lladó, J. Salvi
This paper presents a novel approach to simultaneously compute the motion segmentation and the 3D reconstruction of a set of 2D points extracted from an image sequence. Starting from an initial segmentation, our method proposes an iterative procedure that corrects the misclassified points while reconstructing the 3D scene, which is composed of objects that move independently. This optimization procedure is made by considering two well-known principles: firstly, in multi-body Structure from Motion the matrix describing the 3D shape is sparse, secondly, the segmented 2D points must give a valid 3D reconstruction given the rotational metric constraints. Our formulation results in a bilinear optimization where sparsity and metric constraints are enforced at each iteration of the algorithm. The final result is the corrected segmentation, the 3D structure of the moving objects and an orthographic camera matrix for each motion and each frame. Results are shown on synthetic sequences and a preliminary application on real sequences of the Hopkins 155 database is presented.
本文提出了一种同时计算从图像序列中提取的一组二维点的运动分割和三维重建的新方法。从初始分割开始,我们的方法提出了一个迭代过程,在重建由独立运动的物体组成的3D场景时纠正错误分类的点。该优化过程考虑了两个众所周知的原则:首先,在多体运动结构中,描述三维形状的矩阵是稀疏的;其次,在给定旋转度量约束的情况下,分割的二维点必须给出有效的三维重构。我们的公式导致双线性优化,其中在算法的每次迭代中强制执行稀疏性和度量约束。最终的结果是经过校正的分割,运动物体的3D结构和每个运动和每帧的正交相机矩阵。给出了在合成序列上的结果,并介绍了在Hopkins 155数据库真实序列上的初步应用。
{"title":"Simultaneous motion segmentation and Structure from Motion","authors":"L. Zappella, A. D. Bue, X. Lladó, J. Salvi","doi":"10.1109/WACV.2011.5711570","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711570","url":null,"abstract":"This paper presents a novel approach to simultaneously compute the motion segmentation and the 3D reconstruction of a set of 2D points extracted from an image sequence. Starting from an initial segmentation, our method proposes an iterative procedure that corrects the misclassified points while reconstructing the 3D scene, which is composed of objects that move independently. This optimization procedure is made by considering two well-known principles: firstly, in multi-body Structure from Motion the matrix describing the 3D shape is sparse, secondly, the segmented 2D points must give a valid 3D reconstruction given the rotational metric constraints. Our formulation results in a bilinear optimization where sparsity and metric constraints are enforced at each iteration of the algorithm. The final result is the corrected segmentation, the 3D structure of the moving objects and an orthographic camera matrix for each motion and each frame. Results are shown on synthetic sequences and a preliminary application on real sequences of the Hopkins 155 database is presented.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"509 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115889128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Object matching using feature aggregation over a frame sequence 在帧序列上使用特征聚合的对象匹配
Pub Date : 2011-01-05 DOI: 10.1109/WACV.2011.5711489
Mahmoud Bassiouny, M. El-Saban
Object instance matching is a cornerstone component in many computer vision applications such as image search, augmented reality and unsupervised tagging. The common flow in these applications is to take an input image and match it against a database of previously enrolled images of objects of interest. This is usually difficult as one needs to capture an image corresponding to an object view already present in the database, especially in the case of 3D objects with high curvature where light reflection, viewpoint change and partial occlusion can significantly alter the appearance of the captured image. Rather than relying on having numerous views of each object in the database, we propose an alternative method of capturing a short video sequence scanning a certain object and utilize information from multiple frames to improve the chance of a successful match in the database. The matching step combines local features from a number of frames and incrementally forms a point cloud describing the object. We conduct experiments on a database of different object types showing promising matching results on both a privately collected set of videos and those freely available on the Web such that on YouTube. Increase in accuracy of up to 20% over the baseline of using a single frame matching is shown to be possible.
对象实例匹配是图像搜索、增强现实和无监督标记等许多计算机视觉应用的基础组件。这些应用程序中的常见流程是获取输入图像,并将其与先前注册的感兴趣对象图像的数据库进行匹配。这通常是困难的,因为需要捕获与数据库中已经存在的对象视图相对应的图像,特别是在具有高曲率的3D对象的情况下,光反射,视点变化和部分遮挡会显著改变捕获图像的外观。与其依赖于数据库中每个对象的多个视图,我们提出了一种替代方法,即捕获短视频序列,扫描某个对象并利用来自多帧的信息来提高数据库中成功匹配的机会。匹配步骤结合来自许多帧的局部特征,并逐渐形成描述对象的点云。我们在不同对象类型的数据库上进行了实验,在私人收集的视频集和在网络上免费提供的视频(如YouTube)上都显示出有希望的匹配结果。在使用单帧匹配的基础上,精度提高20%是可能的。
{"title":"Object matching using feature aggregation over a frame sequence","authors":"Mahmoud Bassiouny, M. El-Saban","doi":"10.1109/WACV.2011.5711489","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711489","url":null,"abstract":"Object instance matching is a cornerstone component in many computer vision applications such as image search, augmented reality and unsupervised tagging. The common flow in these applications is to take an input image and match it against a database of previously enrolled images of objects of interest. This is usually difficult as one needs to capture an image corresponding to an object view already present in the database, especially in the case of 3D objects with high curvature where light reflection, viewpoint change and partial occlusion can significantly alter the appearance of the captured image. Rather than relying on having numerous views of each object in the database, we propose an alternative method of capturing a short video sequence scanning a certain object and utilize information from multiple frames to improve the chance of a successful match in the database. The matching step combines local features from a number of frames and incrementally forms a point cloud describing the object. We conduct experiments on a database of different object types showing promising matching results on both a privately collected set of videos and those freely available on the Web such that on YouTube. Increase in accuracy of up to 20% over the baseline of using a single frame matching is shown to be possible.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115628406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Augmented distinctive features for efficient image matching 增强显著特征,实现高效的图像匹配
Pub Date : 2011-01-05 DOI: 10.1109/WACV.2011.5711478
Quan Wang, Wei Guan, Suya You
Finding corresponding image points is a challenging computer vision problem, especially for confusing scenes with surfaces of low textures or repeated patterns. Despite the well-known challenges of extracting conceptually meaningful high-level matching primitives, many recent works describe high-level image features such as edge groups, lines and regions, which are more distinctive than traditional local appearance based features, to tackle such difficult scenes. In this paper, we propose a different and more general approach, which treats the image matching problem as a recognition problem of spatially related image patch sets. We construct augmented semi-global descriptors (ordinal codes) based on subsets of scale and orientation invariant local keypoint descriptors. Tied ranking problem of ordinal codes is handled by increasingly keypoint sampling around image patch sets. Finally, similarities of augmented features are measured using Spearman correlation coefficient. Our proposed method is compatible with a large range of existing local image descriptors. Experimental results based on standard benchmark datasets and SURF descriptors have demonstrated its distinctiveness and effectiveness.
寻找相应的图像点是一个具有挑战性的计算机视觉问题,特别是对于具有低纹理表面或重复图案的混淆场景。尽管提取概念上有意义的高级匹配原语是众所周知的挑战,但最近的许多研究都描述了高级图像特征,如边缘组、线条和区域,这些特征比传统的基于局部外观的特征更有特色,以解决这些困难的场景。在本文中,我们提出了一种不同的更通用的方法,将图像匹配问题视为空间相关图像补丁集的识别问题。在尺度不变和方向不变局部关键点描述子子集的基础上构造增广半全局描述子(序数码)。通过对图像补丁集周围越来越多的关键点采样来解决有序码的并列排序问题。最后,利用Spearman相关系数度量增强特征的相似度。我们提出的方法与大量现有的局部图像描述符兼容。基于标准基准数据集和SURF描述符的实验结果证明了该方法的独特性和有效性。
{"title":"Augmented distinctive features for efficient image matching","authors":"Quan Wang, Wei Guan, Suya You","doi":"10.1109/WACV.2011.5711478","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711478","url":null,"abstract":"Finding corresponding image points is a challenging computer vision problem, especially for confusing scenes with surfaces of low textures or repeated patterns. Despite the well-known challenges of extracting conceptually meaningful high-level matching primitives, many recent works describe high-level image features such as edge groups, lines and regions, which are more distinctive than traditional local appearance based features, to tackle such difficult scenes. In this paper, we propose a different and more general approach, which treats the image matching problem as a recognition problem of spatially related image patch sets. We construct augmented semi-global descriptors (ordinal codes) based on subsets of scale and orientation invariant local keypoint descriptors. Tied ranking problem of ordinal codes is handled by increasingly keypoint sampling around image patch sets. Finally, similarities of augmented features are measured using Spearman correlation coefficient. Our proposed method is compatible with a large range of existing local image descriptors. Experimental results based on standard benchmark datasets and SURF descriptors have demonstrated its distinctiveness and effectiveness.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117032881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
View context based 2D sketch-3D model alignment 查看基于上下文的2D草图- 3d模型对齐
Pub Date : 2011-01-05 DOI: 10.1109/WACV.2011.5711482
Bo Li, H. Johan
2D sketch-3D model alignment is important for many applications such as sketch-based 3D model retrieval, sketch-based 3D modeling as well as model-based vision and recognition. In this paper, we propose a 2D sketch-3D model alignment algorithm using view context and shape context matching. A sketch consists of a set of curves. A 3D model is typically a 3D triangle mesh. It includes two main steps: precomputation and actual alignment. In the precomputation, we extract the view context features of a set of sample views for a 3D model to be aligned. To speed up the precomputation, two computationally efficient and rotation-invariant features, Zernike moments and Fourier descriptors are used to represent a view. In the actual alignment, we prune most sample views which are dissimilar to the sketch very quickly based on their view context similarities. Finally, to find an approximate pose, we only compare the sketch with a very small portion (e.g. 5% in our experiments) of the sample views based on shape context matching. Experiments on two types of datasets show that the algorithm can align 2D sketches with 3D models approximately.
2D草图-3D模型对齐对于基于草图的3D模型检索、基于草图的3D建模以及基于模型的视觉和识别等许多应用都很重要。在本文中,我们提出了一种基于视图上下文和形状上下文匹配的二维草图-三维模型对齐算法。草图由一组曲线组成。3D模型通常是一个3D三角形网格。它包括两个主要步骤:预计算和实际对齐。在预计算中,我们提取了一组样本视图的视图上下文特征,以便对3D模型进行对齐。为了加快预计算速度,使用了两个计算效率高且旋转不变的特征,即泽尼克矩和傅立叶描述子来表示视图。在实际对齐中,我们基于视图上下文相似度,非常快速地修剪与草图不相似的大多数示例视图。最后,为了找到一个近似的姿势,我们只将草图与基于形状上下文匹配的样本视图的很小一部分(例如在我们的实验中为5%)进行比较。在两类数据集上的实验表明,该算法可以近似地将二维草图与三维模型对齐。
{"title":"View context based 2D sketch-3D model alignment","authors":"Bo Li, H. Johan","doi":"10.1109/WACV.2011.5711482","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711482","url":null,"abstract":"2D sketch-3D model alignment is important for many applications such as sketch-based 3D model retrieval, sketch-based 3D modeling as well as model-based vision and recognition. In this paper, we propose a 2D sketch-3D model alignment algorithm using view context and shape context matching. A sketch consists of a set of curves. A 3D model is typically a 3D triangle mesh. It includes two main steps: precomputation and actual alignment. In the precomputation, we extract the view context features of a set of sample views for a 3D model to be aligned. To speed up the precomputation, two computationally efficient and rotation-invariant features, Zernike moments and Fourier descriptors are used to represent a view. In the actual alignment, we prune most sample views which are dissimilar to the sketch very quickly based on their view context similarities. Finally, to find an approximate pose, we only compare the sketch with a very small portion (e.g. 5% in our experiments) of the sample views based on shape context matching. Experiments on two types of datasets show that the algorithm can align 2D sketches with 3D models approximately.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121788939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Fast and scalable keypoint recognition and image retrieval using binary codes 快速和可扩展的关键点识别和图像检索使用二进制代码
Pub Date : 2011-01-05 DOI: 10.1109/WACV.2011.5711573
Jonathan Ventura, Tobias Höllerer
In this paper we report an evaluation of keypoint descriptor compression using as little as 16 bits to describe a single keypoint. We use spectral hashing to compress keypoint descriptors, and match them using the Hamming distance. By indexing the keypoints in a binary tree, we can quickly recognize keypoints with a very small database, and efficiently insert new keypoints. Our tests using image datasets with perspective distortion show the method to enable fast keypoint recognition and image retrieval with a small code size, and point towards potential applications for scalable visual SLAM on mobile phones.
在本文中,我们报告了一个关键点描述符压缩的评估,使用少至16位来描述单个关键点。我们使用谱散列压缩关键点描述符,并使用汉明距离进行匹配。通过索引二叉树中的关键点,我们可以在很小的数据库中快速识别关键点,并有效地插入新的关键点。我们使用具有视角失真的图像数据集进行的测试表明,该方法可以用较小的代码大小实现快速关键点识别和图像检索,并指出在手机上可扩展的视觉SLAM的潜在应用。
{"title":"Fast and scalable keypoint recognition and image retrieval using binary codes","authors":"Jonathan Ventura, Tobias Höllerer","doi":"10.1109/WACV.2011.5711573","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711573","url":null,"abstract":"In this paper we report an evaluation of keypoint descriptor compression using as little as 16 bits to describe a single keypoint. We use spectral hashing to compress keypoint descriptors, and match them using the Hamming distance. By indexing the keypoints in a binary tree, we can quickly recognize keypoints with a very small database, and efficiently insert new keypoints. Our tests using image datasets with perspective distortion show the method to enable fast keypoint recognition and image retrieval with a small code size, and point towards potential applications for scalable visual SLAM on mobile phones.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122160219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Generalized autofocus 广义的自动对焦
Pub Date : 2011-01-05 DOI: 10.1109/WACV.2011.5711547
D. Vaquero, Natasha Gelfand, M. Tico, K. Pulli, M. Turk
All-in-focus imaging is a computational photography technique that produces images free of defocus blur by capturing a stack of images focused at different distances and merging them into a single sharp result. Current approaches assume that images have been captured offline, and that a reasonably powerful computer is available to process them. In contrast, we focus on the problem of how to capture such input stacks in an efficient and scene-adaptive fashion. Inspired by passive autofocus techniques, which select a single best plane of focus in the scene, we propose a method to automatically select a minimal set of images, focused at different depths, such that all objects in a given scene are in focus in at least one image. We aim to minimize both the amount of time spent metering the scene and capturing the images, and the total amount of high-resolution data that is captured. The algorithm first analyzes a set of low-resolution sharpness measurements of the scene while continuously varying the focus distance of the lens. From these measurements, we estimate the final lens positions required to capture all objects in the scene in acceptable focus. We demonstrate the use of our technique in a mobile computational photography scenario, where it is essential to minimize image capture time (as the camera is typically handheld) and processing time (as the computation and energy resources are limited).
全焦成像是一种计算摄影技术,通过捕捉在不同距离聚焦的一堆图像,并将它们合并成一个清晰的结果,产生无离焦模糊的图像。目前的方法假设图像是脱机捕获的,并且有一台功能相当强大的计算机可以处理它们。相比之下,我们关注的问题是如何以高效和场景自适应的方式捕获这些输入堆栈。受被动自动对焦技术的启发,我们提出了一种自动选择最小图像集的方法,这些图像集聚焦在不同的深度,这样给定场景中的所有物体都至少在一张图像中对焦。我们的目标是尽量减少测量场景和捕获图像所花费的时间,以及捕获的高分辨率数据的总量。该算法首先分析了一组低分辨率的场景清晰度测量值,同时连续改变镜头的焦距。从这些测量中,我们估计在可接受的焦点下捕获场景中所有物体所需的最终镜头位置。我们演示了在移动计算摄影场景中使用我们的技术,其中最小化图像捕获时间(因为相机通常是手持的)和处理时间(因为计算和能源资源有限)是必不可少的。
{"title":"Generalized autofocus","authors":"D. Vaquero, Natasha Gelfand, M. Tico, K. Pulli, M. Turk","doi":"10.1109/WACV.2011.5711547","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711547","url":null,"abstract":"All-in-focus imaging is a computational photography technique that produces images free of defocus blur by capturing a stack of images focused at different distances and merging them into a single sharp result. Current approaches assume that images have been captured offline, and that a reasonably powerful computer is available to process them. In contrast, we focus on the problem of how to capture such input stacks in an efficient and scene-adaptive fashion. Inspired by passive autofocus techniques, which select a single best plane of focus in the scene, we propose a method to automatically select a minimal set of images, focused at different depths, such that all objects in a given scene are in focus in at least one image. We aim to minimize both the amount of time spent metering the scene and capturing the images, and the total amount of high-resolution data that is captured. The algorithm first analyzes a set of low-resolution sharpness measurements of the scene while continuously varying the focus distance of the lens. From these measurements, we estimate the final lens positions required to capture all objects in the scene in acceptable focus. We demonstrate the use of our technique in a mobile computational photography scenario, where it is essential to minimize image capture time (as the camera is typically handheld) and processing time (as the computation and energy resources are limited).","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128163000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
GPU accelerated one-pass algorithm for computing minimal rectangles of connected components 一种计算连通组件最小矩形的GPU加速单遍算法
Pub Date : 2011-01-05 DOI: 10.1109/WACV.2011.5711542
L. Ríha, M. Manohar
The connected component labeling is an essential task for detecting moving objects and tracking them in video surveillance application. Since tracking algorithms are designed for real-time applications, efficiencies of the underlying algorithms become critical. In this paper we present a new one-pass algorithm for computing minimal binding rectangles of all the connected components of background foreground segmented video frames (binary data) using GPU accelerator. The given image frame is scanned once in raster scan mode and the background foreground transition information is stored in a directed-graph where each transition is represented by a node. This data structure contains the locations of object edges in every row, and it is used to detect connected components in the image and extract its main features, e.g. bounding box size and location, location of the centroid, real size, etc. Further we use GPU acceleration to speed up feature extraction from the image to a directed graph from which minimal bounding rectangles will be computed subsequently. Also we compare the performance of GPU acceleration (using Tesla C2050 accelerator card) with the performance of multi-core (up 24 cores) general purpose CPU implementation of the algorithm.
在视频监控应用中,连接构件的标记是检测和跟踪运动目标的重要环节。由于跟踪算法是为实时应用而设计的,因此底层算法的效率变得至关重要。本文提出了一种利用GPU加速器计算背景前景分割视频帧(二进制数据)中所有连接分量的最小绑定矩形的一遍算法。给定的图像帧以光栅扫描方式扫描一次,背景前景过渡信息存储在有向图中,其中每个过渡由一个节点表示。该数据结构包含每一行物体边缘的位置,用于检测图像中的连通分量,提取其主要特征,如边界框的大小和位置、质心的位置、实际尺寸等。进一步,我们使用GPU加速来加速从图像到有向图的特征提取,随后将从中计算最小边界矩形。我们还比较了GPU加速(使用Tesla C2050加速卡)与多核(最多24核)通用CPU实现算法的性能。
{"title":"GPU accelerated one-pass algorithm for computing minimal rectangles of connected components","authors":"L. Ríha, M. Manohar","doi":"10.1109/WACV.2011.5711542","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711542","url":null,"abstract":"The connected component labeling is an essential task for detecting moving objects and tracking them in video surveillance application. Since tracking algorithms are designed for real-time applications, efficiencies of the underlying algorithms become critical. In this paper we present a new one-pass algorithm for computing minimal binding rectangles of all the connected components of background foreground segmented video frames (binary data) using GPU accelerator. The given image frame is scanned once in raster scan mode and the background foreground transition information is stored in a directed-graph where each transition is represented by a node. This data structure contains the locations of object edges in every row, and it is used to detect connected components in the image and extract its main features, e.g. bounding box size and location, location of the centroid, real size, etc. Further we use GPU acceleration to speed up feature extraction from the image to a directed graph from which minimal bounding rectangles will be computed subsequently. Also we compare the performance of GPU acceleration (using Tesla C2050 accelerator card) with the performance of multi-core (up 24 cores) general purpose CPU implementation of the algorithm.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132043779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Robust multi-view camera calibration for wide-baseline camera networks 面向宽基线摄像机网络的鲁棒多视点摄像机标定
Pub Date : 2011-01-05 DOI: 10.1109/WACV.2011.5711521
Jens Puwein, R. Ziegler, Julia Vogel, M. Pollefeys
Real-world camera networks are often characterized by very wide baselines covering a wide range of viewpoints. We describe a method not only calibrating each camera sequence added to the system automatically, but also taking advantage of multi-view correspondences to make the entire calibration framework more robust. Novel camera sequences can be seamlessly integrated into the system at any time, adding to the robustness of future computations. One of the challenges consists in establishing correspondences between cameras. Initializing a bag of features from a calibrated frame, correspondences between cameras are established in a two-step procedure. First, affine invariant features of camera sequences are warped into a common coordinate frame and a coarse matching is obtained between the collected features and the incrementally built and updated bag of features. This allows us to warp images to a common view. Second, scale invariant features are extracted from the warped images. This leads to both more numerous and more accurate correspondences. Finally, the parameters are optimized in a bundle adjustment. Adding the feature descriptors and the optimized 3D positions to the bag of features, we obtain a feature-based scene abstraction, allowing for the calibration of novel sequences and the correction of drift in single-view calibration tracking. We demonstrate that our approach can deal with wide baselines. Novel sequences can seamlessly be integrated in the calibration framework.
现实世界的摄像机网络通常具有非常宽的基线,覆盖了广泛的视点。我们描述了一种方法,不仅自动校准添加到系统中的每个摄像机序列,而且利用多视图对应使整个校准框架更具鲁棒性。新的相机序列可以在任何时候无缝集成到系统中,增加了未来计算的鲁棒性。其中一个挑战是在相机之间建立通信。从校准的帧初始化特征包,相机之间的对应关系在两个步骤中建立。首先,将相机序列的仿射不变特征扭曲成一个公共坐标帧,并将收集到的特征与增量构建和更新的特征包进行粗匹配;这允许我们将图像扭曲成一个共同的视图。其次,从扭曲图像中提取尺度不变特征;这导致了更多的和更准确的通信。最后,在一束调整中对参数进行优化。将特征描述符和优化的3D位置添加到特征包中,我们获得了基于特征的场景抽象,允许对新序列进行校准,并在单视图校准跟踪中校正漂移。我们证明了我们的方法可以处理宽基线。新的序列可以无缝地集成到校准框架中。
{"title":"Robust multi-view camera calibration for wide-baseline camera networks","authors":"Jens Puwein, R. Ziegler, Julia Vogel, M. Pollefeys","doi":"10.1109/WACV.2011.5711521","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711521","url":null,"abstract":"Real-world camera networks are often characterized by very wide baselines covering a wide range of viewpoints. We describe a method not only calibrating each camera sequence added to the system automatically, but also taking advantage of multi-view correspondences to make the entire calibration framework more robust. Novel camera sequences can be seamlessly integrated into the system at any time, adding to the robustness of future computations. One of the challenges consists in establishing correspondences between cameras. Initializing a bag of features from a calibrated frame, correspondences between cameras are established in a two-step procedure. First, affine invariant features of camera sequences are warped into a common coordinate frame and a coarse matching is obtained between the collected features and the incrementally built and updated bag of features. This allows us to warp images to a common view. Second, scale invariant features are extracted from the warped images. This leads to both more numerous and more accurate correspondences. Finally, the parameters are optimized in a bundle adjustment. Adding the feature descriptors and the optimized 3D positions to the bag of features, we obtain a feature-based scene abstraction, allowing for the calibration of novel sequences and the correction of drift in single-view calibration tracking. We demonstrate that our approach can deal with wide baselines. Novel sequences can seamlessly be integrated in the calibration framework.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130305231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
期刊
2011 IEEE Workshop on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1