首页 > 最新文献

2011 International Conference on Computer Vision最新文献

英文 中文
ORB: An efficient alternative to SIFT or SURF ORB: SIFT或SURF的有效替代方案
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126544
Ethan Rublee, V. Rabaud, K. Konolige, G. Bradski
Feature matching is at the base of many computer vision problems, such as object recognition or structure from motion. Current methods rely on costly descriptors for detection and matching. In this paper, we propose a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise. We demonstrate through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations. The efficiency is tested on several real-world applications, including object detection and patch-tracking on a smart phone.
特征匹配是许多计算机视觉问题的基础,如物体识别或运动结构。目前的方法依赖于昂贵的描述符进行检测和匹配。本文提出了一种基于BRIEF的快速二进制描述子ORB,它具有旋转不变性和抗噪声性。我们通过实验证明ORB如何比SIFT快两个数量级,同时在许多情况下表现良好。这种效率已经在几个实际应用中进行了测试,包括智能手机上的物体检测和补丁跟踪。
{"title":"ORB: An efficient alternative to SIFT or SURF","authors":"Ethan Rublee, V. Rabaud, K. Konolige, G. Bradski","doi":"10.1109/ICCV.2011.6126544","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126544","url":null,"abstract":"Feature matching is at the base of many computer vision problems, such as object recognition or structure from motion. Current methods rely on costly descriptors for detection and matching. In this paper, we propose a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise. We demonstrate through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations. The efficiency is tested on several real-world applications, including object detection and patch-tracking on a smart phone.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"25 1","pages":"2564-2571"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87290872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8700
Self-calibrating depth from refraction 自校准深度从折射
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126298
Zhihu Chen, Kwan-Yee Kenneth Wong, Y. Matsushita, Xiaolong Zhu, Miaomiao Liu
In this paper, we introduce a novel method for depth acquisition based on refraction of light. A scene is captured twice by a fixed perspective camera, with the first image captured directly by the camera and the second by placing a transparent medium between the scene and the camera. A depth map of the scene is then recovered from the displacements of scene points in the images. Unlike other existing depth from refraction methods, our method does not require the knowledge of the pose and refractive index of the transparent medium, but can recover them directly from the input images. We hence call our method self-calibrating depth from refraction. Experimental results on both synthetic and real-world data are presented, which demonstrate the effectiveness of the proposed method.
本文介绍了一种基于光折射的深度采集新方法。一个场景被固定透视相机捕获两次,第一张图像直接被相机捕获,第二次通过在场景和相机之间放置透明介质捕获。然后从图像中场景点的位移恢复场景的深度图。与其他现有的深度折射方法不同,我们的方法不需要了解透明介质的姿态和折射率,而是可以直接从输入图像中恢复它们。因此,我们称这种方法为自校准折射深度。在合成数据和实际数据上的实验结果证明了该方法的有效性。
{"title":"Self-calibrating depth from refraction","authors":"Zhihu Chen, Kwan-Yee Kenneth Wong, Y. Matsushita, Xiaolong Zhu, Miaomiao Liu","doi":"10.1109/ICCV.2011.6126298","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126298","url":null,"abstract":"In this paper, we introduce a novel method for depth acquisition based on refraction of light. A scene is captured twice by a fixed perspective camera, with the first image captured directly by the camera and the second by placing a transparent medium between the scene and the camera. A depth map of the scene is then recovered from the displacements of scene points in the images. Unlike other existing depth from refraction methods, our method does not require the knowledge of the pose and refractive index of the transparent medium, but can recover them directly from the input images. We hence call our method self-calibrating depth from refraction. Experimental results on both synthetic and real-world data are presented, which demonstrate the effectiveness of the proposed method.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"2 1","pages":"635-642"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73112717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Exploiting the Manhattan-world assumption for extrinsic self-calibration of multi-modal sensor networks 利用曼哈顿世界假设进行多模态传感器网络的外部自定标
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126337
Marcel Brückner, Joachim Denzler
Many new applications are enabled by combining a multi-camera system with a Time-of-Flight (ToF) camera, which is able to simultaneously record intensity and depth images. Classical approaches for self-calibration of a multi-camera system fail to calibrate such a system due to the very different image modalities. In addition, the typical environments of multi-camera systems are man-made and consist primary of only low textured objects. However, at the same time they satisfy the Manhattan-world assumption. We formulate the multi-modal sensor network calibration as a Maximum a Posteriori (MAP) problem and solve it by minimizing the corresponding energy function. First we estimate two separate 3D reconstructions of the environment: one using the pan-tilt unit mounted ToF camera and one using the multi-camera system. We exploit the Manhattan-world assumption and estimate multiple initial calibration hypotheses by registering the three dominant orientations of planes. These hypotheses are used as prior knowledge of a subsequent MAP estimation aiming to align edges that are parallel to these dominant directions. To our knowledge, this is the first self-calibration approach that is able to calibrate a ToF camera with a multi-camera system. Quantitative experiments on real data demonstrate the high accuracy of our approach.
许多新应用都是通过将多摄像头系统与飞行时间(ToF)摄像头相结合来实现的,该摄像头能够同时记录强度和深度图像。由于图像模态的差异,传统的多相机系统自标定方法无法对多相机系统进行标定。此外,多相机系统的典型环境是人造的,主要由低纹理物体组成。然而,与此同时,它们满足了曼哈顿世界的假设。我们将多模态传感器网络的校准表述为一个极大后验问题,并通过最小化相应的能量函数来求解。首先,我们估计了环境的两个独立的3D重建:一个使用安装在ToF相机上的平移单元,另一个使用多相机系统。我们利用曼哈顿世界假设和估计多个初始校准假设通过注册三个主要方向的平面。这些假设被用作后续MAP估计的先验知识,旨在对齐平行于这些主导方向的边。据我们所知,这是第一个能够校准具有多相机系统的ToF相机的自校准方法。在实际数据上的定量实验证明了我们的方法具有较高的准确性。
{"title":"Exploiting the Manhattan-world assumption for extrinsic self-calibration of multi-modal sensor networks","authors":"Marcel Brückner, Joachim Denzler","doi":"10.1109/ICCV.2011.6126337","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126337","url":null,"abstract":"Many new applications are enabled by combining a multi-camera system with a Time-of-Flight (ToF) camera, which is able to simultaneously record intensity and depth images. Classical approaches for self-calibration of a multi-camera system fail to calibrate such a system due to the very different image modalities. In addition, the typical environments of multi-camera systems are man-made and consist primary of only low textured objects. However, at the same time they satisfy the Manhattan-world assumption. We formulate the multi-modal sensor network calibration as a Maximum a Posteriori (MAP) problem and solve it by minimizing the corresponding energy function. First we estimate two separate 3D reconstructions of the environment: one using the pan-tilt unit mounted ToF camera and one using the multi-camera system. We exploit the Manhattan-world assumption and estimate multiple initial calibration hypotheses by registering the three dominant orientations of planes. These hypotheses are used as prior knowledge of a subsequent MAP estimation aiming to align edges that are parallel to these dominant directions. To our knowledge, this is the first self-calibration approach that is able to calibrate a ToF camera with a multi-camera system. Quantitative experiments on real data demonstrate the high accuracy of our approach.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"191 1","pages":"945-950"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73749958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding scenes on many levels 从多个层面理解场景
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126260
Joseph Tighe, S. Lazebnik
This paper presents a framework for image parsing with multiple label sets. For example, we may want to simultaneously label every image region according to its basic-level object category (car, building, road, tree, etc.), superordinate category (animal, vehicle, manmade object, natural object, etc.), geometric orientation (horizontal, vertical, etc.), and material (metal, glass, wood, etc.). Some object regions may also be given part names (a car can have wheels, doors, windshield, etc.). We compute co-occurrence statistics between different label types of the same region to capture relationships such as “roads are horizontal,” “cars are made of metal,” “cars have wheels” but “horses have legs,” and so on. By incorporating these constraints into a Markov Random Field inference framework and jointly solving for all the label sets, we are able to improve the classification accuracy for all the label sets at once, achieving a richer form of image understanding.
提出了一种基于多标签集的图像解析框架。例如,我们可能希望同时根据每个图像区域的基本对象类别(汽车、建筑、道路、树木等)、上级类别(动物、车辆、人造物体、自然物体等)、几何方向(水平、垂直等)和材料(金属、玻璃、木材等)来标记每个图像区域。一些对象区域也可以被赋予部件名称(汽车可以有轮子、门、挡风玻璃等)。我们计算同一区域的不同标签类型之间的共现统计,以捕获诸如“道路是水平的”、“汽车是金属制成的”、“汽车有轮子”但“马有腿”等关系。通过将这些约束纳入马尔可夫随机场推理框架,并对所有标签集进行联合求解,我们可以一次提高所有标签集的分类精度,实现更丰富的图像理解形式。
{"title":"Understanding scenes on many levels","authors":"Joseph Tighe, S. Lazebnik","doi":"10.1109/ICCV.2011.6126260","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126260","url":null,"abstract":"This paper presents a framework for image parsing with multiple label sets. For example, we may want to simultaneously label every image region according to its basic-level object category (car, building, road, tree, etc.), superordinate category (animal, vehicle, manmade object, natural object, etc.), geometric orientation (horizontal, vertical, etc.), and material (metal, glass, wood, etc.). Some object regions may also be given part names (a car can have wheels, doors, windshield, etc.). We compute co-occurrence statistics between different label types of the same region to capture relationships such as “roads are horizontal,” “cars are made of metal,” “cars have wheels” but “horses have legs,” and so on. By incorporating these constraints into a Markov Random Field inference framework and jointly solving for all the label sets, we are able to improve the classification accuracy for all the label sets at once, achieving a richer form of image understanding.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"54 1","pages":"335-342"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73515628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Simultaneous correspondence and non-rigid 3D reconstruction of the coronary tree from single X-ray images 同时对应和非刚性三维重建从单一的x射线图像冠状树
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126325
Eduard Serradell, Adriana Romero, R. Leta, C. Gatta, F. Moreno-Noguer
We present a novel approach to simultaneously reconstruct the 3D structure of a non-rigid coronary tree and estimate point correspondences between an input X-ray image and a reference 3D shape. At the core of our approach lies an optimization scheme that iteratively fits a generative 3D model of increasing complexity and guides the matching process. As a result, and in contrast to existing approaches that assume rigidity or quasi-rigidity of the structure, our method is able to retrieve large non-linear deformations even when the input data is corrupted by the presence of noise and partial occlusions. We extensively evaluate our approach under synthetic and real data and demonstrate a remarkable improvement compared to state-of-the-art.
我们提出了一种新的方法来同时重建非刚性冠状树的三维结构,并估计输入x射线图像和参考三维形状之间的点对应关系。我们方法的核心是一个优化方案,该方案迭代地适合日益复杂的生成3D模型,并指导匹配过程。因此,与假设结构刚性或准刚性的现有方法相反,即使输入数据被噪声和部分遮挡破坏,我们的方法也能够检索大的非线性变形。我们在综合和真实数据下广泛评估了我们的方法,并证明了与最先进的方法相比有了显着的改进。
{"title":"Simultaneous correspondence and non-rigid 3D reconstruction of the coronary tree from single X-ray images","authors":"Eduard Serradell, Adriana Romero, R. Leta, C. Gatta, F. Moreno-Noguer","doi":"10.1109/ICCV.2011.6126325","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126325","url":null,"abstract":"We present a novel approach to simultaneously reconstruct the 3D structure of a non-rigid coronary tree and estimate point correspondences between an input X-ray image and a reference 3D shape. At the core of our approach lies an optimization scheme that iteratively fits a generative 3D model of increasing complexity and guides the matching process. As a result, and in contrast to existing approaches that assume rigidity or quasi-rigidity of the structure, our method is able to retrieve large non-linear deformations even when the input data is corrupted by the presence of noise and partial occlusions. We extensively evaluate our approach under synthetic and real data and demonstrate a remarkable improvement compared to state-of-the-art.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"9 1","pages":"850-857"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74275456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Gaussian process regression flow for analysis of motion trajectories 高斯过程回归流分析运动轨迹
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126365
Kihwan Kim, Dongryeol Lee, Irfan Essa
Recognition of motions and activities of objects in videos requires effective representations for analysis and matching of motion trajectories. In this paper, we introduce a new representation specifically aimed at matching motion trajectories. We model a trajectory as a continuous dense flow field from a sparse set of vector sequences using Gaussian Process Regression. Furthermore, we introduce a random sampling strategy for learning stable classes of motions from limited data. Our representation allows for incrementally predicting possible paths and detecting anomalous events from online trajectories. This representation also supports matching of complex motions with acceleration changes and pauses or stops within a trajectory. We use the proposed approach for classifying and predicting motion trajectories in traffic monitoring domains and test on several data sets. We show that our approach works well on various types of complete and incomplete trajectories from a variety of video data sets with different frame rates.
识别视频中物体的运动和活动需要有效的表示来分析和匹配运动轨迹。在本文中,我们引入了一种专门用于匹配运动轨迹的新表示。我们用高斯过程回归从稀疏的向量序列集将轨迹建模为连续的密集流场。此外,我们还引入了一种随机采样策略,用于从有限的数据中学习稳定的运动类别。我们的表示允许增量预测可能的路径,并从在线轨迹检测异常事件。这种表示还支持在轨迹中匹配具有加速度变化和暂停或停止的复杂运动。我们使用该方法对交通监控域中的运动轨迹进行分类和预测,并在多个数据集上进行了测试。我们表明,我们的方法在来自不同帧率的各种视频数据集的各种类型的完整和不完整轨迹上都能很好地工作。
{"title":"Gaussian process regression flow for analysis of motion trajectories","authors":"Kihwan Kim, Dongryeol Lee, Irfan Essa","doi":"10.1109/ICCV.2011.6126365","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126365","url":null,"abstract":"Recognition of motions and activities of objects in videos requires effective representations for analysis and matching of motion trajectories. In this paper, we introduce a new representation specifically aimed at matching motion trajectories. We model a trajectory as a continuous dense flow field from a sparse set of vector sequences using Gaussian Process Regression. Furthermore, we introduce a random sampling strategy for learning stable classes of motions from limited data. Our representation allows for incrementally predicting possible paths and detecting anomalous events from online trajectories. This representation also supports matching of complex motions with acceleration changes and pauses or stops within a trajectory. We use the proposed approach for classifying and predicting motion trajectories in traffic monitoring domains and test on several data sets. We show that our approach works well on various types of complete and incomplete trajectories from a variety of video data sets with different frame rates.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"36 1","pages":"1164-1171"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74557672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 183
Unsupervised and semi-supervised learning via ℓ1-norm graph 基于1-范数图的无监督和半监督学习
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126506
F. Nie, Hua Wang, Heng Huang, C. Ding
In this paper, we propose a novel ℓ1-norm graph model to perform unsupervised and semi-supervised learning methods. Instead of minimizing the ℓ2-norm of spectral embedding as traditional graph based learning methods, our new graph learning model minimizes the ℓ1-norm of spectral embedding with well motivation. The sparsity produced by the ℓ1-norm minimization results in the solutions with much clearer cluster structures, which are suitable for both image clustering and classification tasks. We introduce a new efficient iterative algorithm to solve the ℓ1-norm of spectral embedding minimization problem, and prove the convergence of the algorithm. More specifically, our algorithm adaptively re-weight the original weights of graph to discover clearer cluster structure. Experimental results on both toy data and real image data sets show the effectiveness and advantages of our proposed method.
在本文中,我们提出了一种新的1-范数图模型来执行无监督和半监督学习方法。与传统的基于图的学习方法最小化谱嵌入的l2范数不同,我们的新图学习模型在动机良好的情况下最小化谱嵌入的l2范数。由1范数最小化产生的稀疏性使得解具有更清晰的聚类结构,适合于图像聚类和分类任务。提出了一种新的求解谱嵌入最小化问题的高效迭代算法,并证明了该算法的收敛性。更具体地说,我们的算法自适应地对图的原始权值进行重新加权,以发现更清晰的聚类结构。在玩具数据和真实图像数据集上的实验结果表明了该方法的有效性和优越性。
{"title":"Unsupervised and semi-supervised learning via ℓ1-norm graph","authors":"F. Nie, Hua Wang, Heng Huang, C. Ding","doi":"10.1109/ICCV.2011.6126506","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126506","url":null,"abstract":"In this paper, we propose a novel ℓ1-norm graph model to perform unsupervised and semi-supervised learning methods. Instead of minimizing the ℓ2-norm of spectral embedding as traditional graph based learning methods, our new graph learning model minimizes the ℓ1-norm of spectral embedding with well motivation. The sparsity produced by the ℓ1-norm minimization results in the solutions with much clearer cluster structures, which are suitable for both image clustering and classification tasks. We introduce a new efficient iterative algorithm to solve the ℓ1-norm of spectral embedding minimization problem, and prove the convergence of the algorithm. More specifically, our algorithm adaptively re-weight the original weights of graph to discover clearer cluster structure. Experimental results on both toy data and real image data sets show the effectiveness and advantages of our proposed method.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"14 1","pages":"2268-2273"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74597016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Color photometric stereo for multicolored surfaces 用于多色表面的彩色光度立体
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126495
Robert Anderson, B. Stenger, R. Cipolla
We present a multispectral photometric stereo method for capturing geometry of deforming surfaces. A novel photometric calibration technique allows calibration of scenes containing multiple piecewise constant chromaticities. This method estimates per-pixel photometric properties, then uses a RANSAC-based approach to estimate the dominant chromaticities in the scene. A likelihood term is developed linking surface normal, image intensity and photometric properties, which allows estimating the number of chromaticities present in a scene to be framed as a model estimation problem. The Bayesian Information Criterion is applied to automatically estimate the number of chromaticities present during calibration. A two-camera stereo system provides low resolution geometry, allowing the likelihood term to be used in segmenting new images into regions of constant chromaticity. This segmentation is carried out in a Markov Random Field framework and allows the correct photometric properties to be used at each pixel to estimate a dense normal map. Results are shown on several challenging real-world sequences, demonstrating state-of-the-art results using only two cameras and three light sources. Quantitative evaluation is provided against synthetic ground truth data.
我们提出了一种多光谱光度立体方法来捕捉变形表面的几何形状。一种新的光度校准技术允许校准包含多个分段恒定色度的场景。该方法估计每像素的光度属性,然后使用基于ransac的方法估计场景中的主色度。开发了连接表面法线,图像强度和光度属性的似然项,它允许估计场景中存在的色度数量,并将其框架为模型估计问题。应用贝叶斯信息准则自动估计校准过程中存在的色度数。双摄像头立体系统提供低分辨率几何,允许在分割新图像到恒定色度的区域使用的可能性项。这种分割是在马尔科夫随机场框架中进行的,并允许在每个像素上使用正确的光度属性来估计密集的法线贴图。结果显示了几个具有挑战性的现实世界序列,展示了仅使用两个摄像头和三个光源的最先进的结果。对合成的地面真值数据进行定量评价。
{"title":"Color photometric stereo for multicolored surfaces","authors":"Robert Anderson, B. Stenger, R. Cipolla","doi":"10.1109/ICCV.2011.6126495","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126495","url":null,"abstract":"We present a multispectral photometric stereo method for capturing geometry of deforming surfaces. A novel photometric calibration technique allows calibration of scenes containing multiple piecewise constant chromaticities. This method estimates per-pixel photometric properties, then uses a RANSAC-based approach to estimate the dominant chromaticities in the scene. A likelihood term is developed linking surface normal, image intensity and photometric properties, which allows estimating the number of chromaticities present in a scene to be framed as a model estimation problem. The Bayesian Information Criterion is applied to automatically estimate the number of chromaticities present during calibration. A two-camera stereo system provides low resolution geometry, allowing the likelihood term to be used in segmenting new images into regions of constant chromaticity. This segmentation is carried out in a Markov Random Field framework and allows the correct photometric properties to be used at each pixel to estimate a dense normal map. Results are shown on several challenging real-world sequences, demonstrating state-of-the-art results using only two cameras and three light sources. Quantitative evaluation is provided against synthetic ground truth data.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"27 1","pages":"2182-2189"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74741606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Dynamic Manifold Warping for view invariant action recognition 动态流形翘曲的视图不变动作识别
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126290
Dian Gong, G. Medioni
We address the problem of learning view-invariant 3D models of human motion from motion capture data, in order to recognize human actions from a monocular video sequence with arbitrary viewpoint. We propose a Spatio-Temporal Manifold (STM) model to analyze non-linear multivariate time series with latent spatial structure and apply it to recognize actions in the joint-trajectories space. Based on STM, a novel alignment algorithm Dynamic Manifold Warping (DMW) and a robust motion similarity metric are proposed for human action sequences, both in 2D and 3D. DMW extends previous works on spatio-temporal alignment by incorporating manifold learning. We evaluate and compare the approach to state-of-the-art methods on motion capture data and realistic videos. Experimental results demonstrate the effectiveness of our approach, which yields visually appealing alignment results, produces higher action recognition accuracy, and can recognize actions from arbitrary views with partial occlusion.
我们解决了从动作捕捉数据中学习人体运动的视图不变3D模型的问题,以便从任意视点的单目视频序列中识别人类动作。提出了一种时空流形(STM)模型,用于分析具有潜在空间结构的非线性多变量时间序列,并将其应用于联合轨迹空间中的动作识别。基于STM,提出了一种新的二维和三维人体动作序列对齐算法动态流形扭曲(Dynamic Manifold Warping, DMW)和鲁棒运动相似度度量。DMW通过结合流形学习扩展了以前在时空对齐方面的工作。我们评估并比较了最先进的运动捕捉数据和现实视频的方法。实验结果证明了该方法的有效性,产生了视觉上吸引人的对齐结果,产生了更高的动作识别精度,并且可以识别来自部分遮挡的任意视图的动作。
{"title":"Dynamic Manifold Warping for view invariant action recognition","authors":"Dian Gong, G. Medioni","doi":"10.1109/ICCV.2011.6126290","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126290","url":null,"abstract":"We address the problem of learning view-invariant 3D models of human motion from motion capture data, in order to recognize human actions from a monocular video sequence with arbitrary viewpoint. We propose a Spatio-Temporal Manifold (STM) model to analyze non-linear multivariate time series with latent spatial structure and apply it to recognize actions in the joint-trajectories space. Based on STM, a novel alignment algorithm Dynamic Manifold Warping (DMW) and a robust motion similarity metric are proposed for human action sequences, both in 2D and 3D. DMW extends previous works on spatio-temporal alignment by incorporating manifold learning. We evaluate and compare the approach to state-of-the-art methods on motion capture data and realistic videos. Experimental results demonstrate the effectiveness of our approach, which yields visually appealing alignment results, produces higher action recognition accuracy, and can recognize actions from arbitrary views with partial occlusion.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"183 1","pages":"571-578"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77376154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 91
N-best maximal decoders for part models 零件模型的n -最佳最大解码器
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126552
Dennis Park, Deva Ramanan
We describe a method for generating N-best configurations from part-based models, ensuring that they do not overlap according to some user-provided definition of overlap. We extend previous N-best algorithms from the speech community to incorporate non-maximal suppression cues, such that pixel-shifted copies of a single configuration are not returned. We use approximate algorithms that perform nearly identical to their exact counterparts, but are orders of magnitude faster. Our approach outperforms standard methods for generating multiple object configurations in an image. We use our method to generate multiple pose hypotheses for the problem of human pose estimation from video sequences. We present quantitative results that demonstrate that our framework significantly improves the accuracy of a state-of-the-art pose estimation algorithm.
我们描述了一种从基于零件的模型生成n个最佳配置的方法,根据用户提供的重叠定义,确保它们不重叠。我们从语音社区扩展了以前的N-best算法,以纳入非最大抑制线索,这样就不会返回单个配置的像素移位副本。我们使用近似算法,其执行效果与精确算法几乎相同,但速度要快几个数量级。我们的方法优于在图像中生成多个对象配置的标准方法。针对视频序列中人体姿态估计的问题,我们使用该方法生成了多个姿态假设。我们提出的定量结果表明,我们的框架显着提高了最先进的姿态估计算法的准确性。
{"title":"N-best maximal decoders for part models","authors":"Dennis Park, Deva Ramanan","doi":"10.1109/ICCV.2011.6126552","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126552","url":null,"abstract":"We describe a method for generating N-best configurations from part-based models, ensuring that they do not overlap according to some user-provided definition of overlap. We extend previous N-best algorithms from the speech community to incorporate non-maximal suppression cues, such that pixel-shifted copies of a single configuration are not returned. We use approximate algorithms that perform nearly identical to their exact counterparts, but are orders of magnitude faster. Our approach outperforms standard methods for generating multiple object configurations in an image. We use our method to generate multiple pose hypotheses for the problem of human pose estimation from video sequences. We present quantitative results that demonstrate that our framework significantly improves the accuracy of a state-of-the-art pose estimation algorithm.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"15 1","pages":"2627-2634"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76691504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 126
期刊
2011 International Conference on Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1