首页 > 最新文献

2013 IEEE Conference on Computer Vision and Pattern Recognition最新文献

英文 中文
Manhattan Scene Understanding via XSlit Imaging 通过XSlit成像了解曼哈顿场景
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.18
Jinwei Ye, Yu Ji, Jingyi Yu
A Manhattan World (MW) [3] is composed of planar surfaces and parallel lines aligned with three mutually orthogonal principal axes. Traditional MW understanding algorithms rely on geometry priors such as the vanishing points and reference (ground) planes for grouping coplanar structures. In this paper, we present a novel single-image MW reconstruction algorithm from the perspective of non-pinhole cameras. We show that by acquiring the MW using an XSlit camera, we can instantly resolve co planarity ambiguities. Specifically, we prove that parallel 3D lines map to 2D curves in an XSlit image and they converge at an XSlit Vanishing Point (XVP). In addition, if the lines are coplanar, their curved images will intersect at a second common pixel that we call Coplanar Common Point (CCP). CCP is a unique image feature in XSlit cameras that does not exist in pinholes. We present a comprehensive theory to analyze XVPs and CCPs in a MW scene and study how to recover 3D geometry in a complex MW scene from XVPs and CCPs. Finally, we build a prototype XSlit camera by using two layers of cylindrical lenses. Experimental results on both synthetic and real data show that our new XSlit-camera-based solution provides an effective and reliable solution for MW understanding.
曼哈顿世界(Manhattan World, MW)[3]由三个相互正交的主轴组成的平面和平行线组成。传统的毫米波理解算法依赖于几何先验,如消失点和参考(地)面来对共面结构进行分组。本文从非针孔相机的角度出发,提出了一种新的单图像毫瓦重构算法。我们表明,通过使用XSlit相机获取MW,我们可以立即解决共平面模糊性。具体来说,我们证明了平行的3D直线映射到xsl图像中的2D曲线,并且它们收敛于xsl消失点(XVP)。此外,如果这些线共面,它们的曲线图像将在第二个公共像素处相交,我们称之为共面公共点(CCP)。CCP是xslt相机中不存在于针孔中的独特图像特性。我们提出了一种综合理论来分析MW场景中的XVPs和ccp,并研究了如何从XVPs和ccp中恢复复杂MW场景中的3D几何形状。最后,我们通过使用两层圆柱形镜头构建了一个原型xslt相机。在合成数据和实际数据上的实验结果表明,我们的基于xslt相机的新解决方案为MW理解提供了有效可靠的解决方案。
{"title":"Manhattan Scene Understanding via XSlit Imaging","authors":"Jinwei Ye, Yu Ji, Jingyi Yu","doi":"10.1109/CVPR.2013.18","DOIUrl":"https://doi.org/10.1109/CVPR.2013.18","url":null,"abstract":"A Manhattan World (MW) [3] is composed of planar surfaces and parallel lines aligned with three mutually orthogonal principal axes. Traditional MW understanding algorithms rely on geometry priors such as the vanishing points and reference (ground) planes for grouping coplanar structures. In this paper, we present a novel single-image MW reconstruction algorithm from the perspective of non-pinhole cameras. We show that by acquiring the MW using an XSlit camera, we can instantly resolve co planarity ambiguities. Specifically, we prove that parallel 3D lines map to 2D curves in an XSlit image and they converge at an XSlit Vanishing Point (XVP). In addition, if the lines are coplanar, their curved images will intersect at a second common pixel that we call Coplanar Common Point (CCP). CCP is a unique image feature in XSlit cameras that does not exist in pinholes. We present a comprehensive theory to analyze XVPs and CCPs in a MW scene and study how to recover 3D geometry in a complex MW scene from XVPs and CCPs. Finally, we build a prototype XSlit camera by using two layers of cylindrical lenses. Experimental results on both synthetic and real data show that our new XSlit-camera-based solution provides an effective and reliable solution for MW understanding.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"47 1","pages":"81-88"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90353546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Finding Group Interactions in Social Clutter 在社会混乱中发现群体互动
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.351
Ruonan Li, Parker Porfilio, Todd E. Zickler
We consider the problem of finding distinctive social interactions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short exemplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are localized in space, the extent of their interaction is localized in time, and when the gallery of exemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our approach represents group behaviors by dichotomous collections of descriptors for (a) individual actions, and (b) pair-wise interactions, and it includes efficient algorithms for optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evaluate the approach using three different video collections, two that involve humans and one that involves mice.
我们考虑的问题是,在更大的社会聚会中,如何找到包含代理群体的独特社会互动。给定一个预定义的短示例交互视频库,以及一个大型集合(带有近似跟踪的代理)的长输入视频,我们在集合中识别出展示与示例中类似的社会交互的代理的小子组。每个检测到的群体交互的参与者在空间上是局部的,他们的交互程度在时间上是局部的,当样本库中标注了群体交互类别时,每个检测到的交互被分类到一个预定义的类别中。我们的方法通过(a)个体行为和(b)成对交互的描述符的二分类集合来表示群体行为,它包括有效的算法,可以在每个时间单元中最佳地区分参与者和旁观者,并在时间上定位群体交互的程度。最重要的是,该方法是通用的,可以应用于任何可以随时间大致跟踪许多相互作用的代理的情况。我们使用三个不同的视频集来评估这种方法,其中两个涉及人类,一个涉及老鼠。
{"title":"Finding Group Interactions in Social Clutter","authors":"Ruonan Li, Parker Porfilio, Todd E. Zickler","doi":"10.1109/CVPR.2013.351","DOIUrl":"https://doi.org/10.1109/CVPR.2013.351","url":null,"abstract":"We consider the problem of finding distinctive social interactions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short exemplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are localized in space, the extent of their interaction is localized in time, and when the gallery of exemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our approach represents group behaviors by dichotomous collections of descriptors for (a) individual actions, and (b) pair-wise interactions, and it includes efficient algorithms for optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evaluate the approach using three different video collections, two that involve humans and one that involves mice.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"74 1","pages":"2722-2729"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86560998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Poselet Key-Framing: A Model for Human Activity Recognition 键框架:人类活动识别的一个模型
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.342
Michalis Raptis, L. Sigal
In this paper, we develop a new model for recognizing human actions. An action is modeled as a very sparse sequence of temporally local discriminative key frames - collections of partial key-poses of the actor(s), depicting key states in the action sequence. We cast the learning of key frames in a max-margin discriminative framework, where we treat key frames as latent variables. This allows us to (jointly) learn a set of most discriminative key frames while also learning the local temporal context between them. Key frames are encoded using a spatially-localizable pose let-like representation with HoG and BoW components learned from weak annotations, we rely on structured SVM formulation to align our components and mine for hard negatives to boost localization performance. This results in a model that supports spatio-temporal localization and is insensitive to dropped frames or partial observations. We show classification performance that is competitive with the state of the art on the benchmark UT-Interaction dataset and illustrate that our model outperforms prior methods in an on-line streaming setting.
在本文中,我们开发了一个新的模型来识别人类的行为。一个动作被建模为一个非常稀疏的临时局部判别关键帧序列——参与者的部分关键姿势的集合,描绘了动作序列中的关键状态。我们将关键帧的学习置于最大边界判别框架中,其中我们将关键帧视为潜在变量。这允许我们(共同)学习一组最具判别性的关键帧,同时也学习它们之间的局部时间上下文。关键帧使用从弱注释中学习的HoG和BoW组件的空间可定位姿态let表示进行编码,我们依靠结构化的SVM公式来对齐组件并挖掘硬否定以提高定位性能。这将产生一个支持时空定位的模型,并且对丢弃的帧或部分观测不敏感。我们在基准UT-Interaction数据集上展示了与最先进的分类性能相竞争的分类性能,并说明我们的模型在在线流设置中优于先前的方法。
{"title":"Poselet Key-Framing: A Model for Human Activity Recognition","authors":"Michalis Raptis, L. Sigal","doi":"10.1109/CVPR.2013.342","DOIUrl":"https://doi.org/10.1109/CVPR.2013.342","url":null,"abstract":"In this paper, we develop a new model for recognizing human actions. An action is modeled as a very sparse sequence of temporally local discriminative key frames - collections of partial key-poses of the actor(s), depicting key states in the action sequence. We cast the learning of key frames in a max-margin discriminative framework, where we treat key frames as latent variables. This allows us to (jointly) learn a set of most discriminative key frames while also learning the local temporal context between them. Key frames are encoded using a spatially-localizable pose let-like representation with HoG and BoW components learned from weak annotations, we rely on structured SVM formulation to align our components and mine for hard negatives to boost localization performance. This results in a model that supports spatio-temporal localization and is insensitive to dropped frames or partial observations. We show classification performance that is competitive with the state of the art on the benchmark UT-Interaction dataset and illustrate that our model outperforms prior methods in an on-line streaming setting.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"76 1","pages":"2650-2657"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86639944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 235
Representing Videos Using Mid-level Discriminative Patches 使用中级判别补丁表示视频
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.332
Arpit Jain, A. Gupta, Mikel D. Rodriguez, L. Davis
How should a video be represented? We propose a new representation for videos based on mid-level discriminative spatio-temporal patches. These spatio-temporal patches might correspond to a primitive human action, a semantic object, or perhaps a random but informative spatio-temporal patch in the video. What defines these spatio-temporal patches is their discriminative and representative properties. We automatically mine these patches from hundreds of training videos and experimentally demonstrate that these patches establish correspondence across videos and align the videos for label transfer techniques. Furthermore, these patches can be used as a discriminative vocabulary for action classification where they demonstrate state-of-the-art performance on UCF50 and Olympics datasets.
视频应该如何表现?本文提出了一种基于中级判别性时空补丁的视频表示方法。这些时空片段可能对应于一个原始的人类行为,一个语义对象,或者可能是视频中一个随机但信息丰富的时空片段。定义这些时空斑块的是它们的区别性和代表性。我们从数百个训练视频中自动挖掘这些补丁,并通过实验证明这些补丁在视频之间建立了对应关系,并为标签转移技术对齐视频。此外,这些补丁可以用作动作分类的判别词汇表,它们在UCF50和奥运会数据集上展示了最先进的性能。
{"title":"Representing Videos Using Mid-level Discriminative Patches","authors":"Arpit Jain, A. Gupta, Mikel D. Rodriguez, L. Davis","doi":"10.1109/CVPR.2013.332","DOIUrl":"https://doi.org/10.1109/CVPR.2013.332","url":null,"abstract":"How should a video be represented? We propose a new representation for videos based on mid-level discriminative spatio-temporal patches. These spatio-temporal patches might correspond to a primitive human action, a semantic object, or perhaps a random but informative spatio-temporal patch in the video. What defines these spatio-temporal patches is their discriminative and representative properties. We automatically mine these patches from hundreds of training videos and experimentally demonstrate that these patches establish correspondence across videos and align the videos for label transfer techniques. Furthermore, these patches can be used as a discriminative vocabulary for action classification where they demonstrate state-of-the-art performance on UCF50 and Olympics datasets.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"3 1","pages":"2571-2578"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88028625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 155
Monocular Template-Based 3D Reconstruction of Extensible Surfaces with Local Linear Elasticity 基于单目模板的局部线弹性可扩展曲面三维重建
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.200
Abed C. Malti, R. Hartley, A. Bartoli, Jae-Hak Kim
We propose a new approach for template-based extensible surface reconstruction from a single view. We extend the method of isometric surface reconstruction and more recent work on conformal surface reconstruction. Our approach relies on the minimization of a proposed stretching energy formalized with respect to the Poisson ratio parameter of the surface. We derive a patch-based formulation of this stretching energy by assuming local linear elasticity. This formulation unifies geometrical and mechanical constraints in a single energy term. We prevent local scale ambiguities by imposing a set of fixed boundary 3D points. We experimentally prove the sufficiency of this set of boundary points and demonstrate the effectiveness of our approach on different developable and non-developable surfaces with a wide range of extensibility.
提出了一种基于模板的单视图可扩展曲面重建方法。我们扩展了等距曲面重建方法和最近的保形曲面重建工作。我们的方法依赖于关于表面泊松比参数形式化的拟议拉伸能量的最小化。我们通过假设局部线性弹性,推导出基于补丁的拉伸能量公式。这个公式将几何约束和力学约束统一在一个能量项中。我们通过施加一组固定的边界3D点来防止局部尺度模糊。实验证明了该边界点集的充分性,并证明了该方法在具有广泛可扩展性的可展曲面和不可展曲面上的有效性。
{"title":"Monocular Template-Based 3D Reconstruction of Extensible Surfaces with Local Linear Elasticity","authors":"Abed C. Malti, R. Hartley, A. Bartoli, Jae-Hak Kim","doi":"10.1109/CVPR.2013.200","DOIUrl":"https://doi.org/10.1109/CVPR.2013.200","url":null,"abstract":"We propose a new approach for template-based extensible surface reconstruction from a single view. We extend the method of isometric surface reconstruction and more recent work on conformal surface reconstruction. Our approach relies on the minimization of a proposed stretching energy formalized with respect to the Poisson ratio parameter of the surface. We derive a patch-based formulation of this stretching energy by assuming local linear elasticity. This formulation unifies geometrical and mechanical constraints in a single energy term. We prevent local scale ambiguities by imposing a set of fixed boundary 3D points. We experimentally prove the sufficiency of this set of boundary points and demonstrate the effectiveness of our approach on different developable and non-developable surfaces with a wide range of extensibility.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"34 1","pages":"1522-1529"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87967267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
Discriminative Non-blind Deblurring 判别非盲去模糊
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.84
Uwe Schmidt, C. Rother, Sebastian Nowozin, Jeremy Jancsary, S. Roth
Non-blind deblurring is an integral component of blind approaches for removing image blur due to camera shake. Even though learning-based deblurring methods exist, they have been limited to the generative case and are computationally expensive. To this date, manually-defined models are thus most widely used, though limiting the attained restoration quality. We address this gap by proposing a discriminative approach for non-blind deblurring. One key challenge is that the blur kernel in use at test time is not known in advance. To address this, we analyze existing approaches that use half-quadratic regularization. From this analysis, we derive a discriminative model cascade for image deblurring. Our cascade model consists of a Gaussian CRF at each stage, based on the recently introduced regression tree fields. We train our model by loss minimization and use synthetically generated blur kernels to generate training data. Our experiments show that the proposed approach is efficient and yields state-of-the-art restoration quality on images corrupted with synthetic and real blur.
非盲去模糊是消除相机抖动引起的图像模糊的盲方法的一个组成部分。尽管存在基于学习的去模糊方法,但它们仅限于生成情况,并且计算成本很高。到目前为止,手工定义的模型是最广泛使用的,尽管限制了获得的恢复质量。我们通过提出一种非盲去模糊的判别方法来解决这一差距。一个关键的挑战是,在测试时使用的模糊内核是事先不知道的。为了解决这个问题,我们分析了使用半二次正则化的现有方法。从这个分析中,我们得到了一个判别模型级联图像去模糊。我们的级联模型由每个阶段的高斯CRF组成,基于最近引入的回归树域。我们使用损失最小化方法训练模型,并使用合成模糊核生成训练数据。我们的实验表明,所提出的方法是有效的,并产生了最先进的恢复质量的图像损坏与合成和真实模糊。
{"title":"Discriminative Non-blind Deblurring","authors":"Uwe Schmidt, C. Rother, Sebastian Nowozin, Jeremy Jancsary, S. Roth","doi":"10.1109/CVPR.2013.84","DOIUrl":"https://doi.org/10.1109/CVPR.2013.84","url":null,"abstract":"Non-blind deblurring is an integral component of blind approaches for removing image blur due to camera shake. Even though learning-based deblurring methods exist, they have been limited to the generative case and are computationally expensive. To this date, manually-defined models are thus most widely used, though limiting the attained restoration quality. We address this gap by proposing a discriminative approach for non-blind deblurring. One key challenge is that the blur kernel in use at test time is not known in advance. To address this, we analyze existing approaches that use half-quadratic regularization. From this analysis, we derive a discriminative model cascade for image deblurring. Our cascade model consists of a Gaussian CRF at each stage, based on the recently introduced regression tree fields. We train our model by loss minimization and use synthetically generated blur kernels to generate training data. Our experiments show that the proposed approach is efficient and yields state-of-the-art restoration quality on images corrupted with synthetic and real blur.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"43 1","pages":"604-611"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88695509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 122
Learning Class-to-Image Distance with Object Matchings 通过对象匹配学习类到图像的距离
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.108
Guang-Tong Zhou, Tian Lan, Weilong Yang, Greg Mori
We conduct image classification by learning a class-to-image distance function that matches objects. The set of objects in training images for an image class are treated as a collage. When presented with a test image, the best matching between this collage of training image objects and those in the test image is found. We validate the efficacy of the proposed model on the PASCAL 07 and SUN 09 datasets, showing that our model is effective for object classification and scene classification tasks. State-of-the-art image classification results are obtained, and qualitative results demonstrate that objects can be accurately matched.
我们通过学习匹配对象的类到图像距离函数来进行图像分类。一个图像类的训练图像中的对象集被视为一个拼贴。当提供测试图像时,找出该拼贴图像对象与测试图像对象之间的最佳匹配。我们在PASCAL 07和SUN 09数据集上验证了该模型的有效性,表明我们的模型对目标分类和场景分类任务是有效的。获得了最先进的图像分类结果,定性结果表明可以准确匹配目标。
{"title":"Learning Class-to-Image Distance with Object Matchings","authors":"Guang-Tong Zhou, Tian Lan, Weilong Yang, Greg Mori","doi":"10.1109/CVPR.2013.108","DOIUrl":"https://doi.org/10.1109/CVPR.2013.108","url":null,"abstract":"We conduct image classification by learning a class-to-image distance function that matches objects. The set of objects in training images for an image class are treated as a collage. When presented with a test image, the best matching between this collage of training image objects and those in the test image is found. We validate the efficacy of the proposed model on the PASCAL 07 and SUN 09 datasets, showing that our model is effective for object classification and scene classification tasks. State-of-the-art image classification results are obtained, and qualitative results demonstrate that objects can be accurately matched.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"214 1","pages":"795-802"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89021848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
GeoF: Geodesic Forests for Learning Coupled Predictors GeoF:用于学习耦合预测器的测地线森林
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.16
P. Kontschieder, Pushmeet Kohli, J. Shotton, A. Criminisi
Conventional decision forest based methods for image labelling tasks like object segmentation make predictions for each variable (pixel) independently [3, 5, 8]. This prevents them from enforcing dependencies between variables and translates into locally inconsistent pixel labellings. Random field models, instead, encourage spatial consistency of labels at increased computational expense. This paper presents a new and efficient forest based model that achieves spatially consistent semantic image segmentation by encoding variable dependencies directly in the feature space the forests operate on. Such correlations are captured via new long-range, soft connectivity features, computed via generalized geodesic distance transforms. Our model can be thought of as a generalization of the successful Semantic Texton Forest, Auto-Context, and Entangled Forest models. A second contribution is to show the connection between the typical Conditional Random Field (CRF) energy and the forest training objective. This analysis yields a new objective for training decision forests that encourages more accurate structured prediction. Our GeoF model is validated quantitatively on the task of semantic image segmentation, on four challenging and very diverse image datasets. GeoF outperforms both state of-the-art forest models and the conventional pair wise CRF.
传统的基于决策森林的图像标记任务(如对象分割)方法对每个变量(像素)进行独立预测[3,5,8]。这可以防止它们强制变量之间的依赖关系,并转化为局部不一致的像素标记。相反,随机场模型鼓励标签的空间一致性,但增加了计算开销。本文提出了一种新的高效的基于森林的模型,该模型通过在森林操作的特征空间中直接编码变量依赖关系来实现空间一致的语义图像分割。这种相关性是通过新的远程软连接特征捕获的,通过广义测地线距离变换计算。我们的模型可以被认为是成功的语义Texton森林、自动上下文和纠缠森林模型的推广。第二个贡献是展示了典型条件随机场(CRF)能量与森林训练目标之间的联系。这种分析为训练决策森林提供了一个新的目标,鼓励更准确的结构化预测。我们的GeoF模型在四个具有挑战性和非常多样化的图像数据集上对语义图像分割任务进行了定量验证。GeoF优于最先进的森林模型和传统的成对CRF。
{"title":"GeoF: Geodesic Forests for Learning Coupled Predictors","authors":"P. Kontschieder, Pushmeet Kohli, J. Shotton, A. Criminisi","doi":"10.1109/CVPR.2013.16","DOIUrl":"https://doi.org/10.1109/CVPR.2013.16","url":null,"abstract":"Conventional decision forest based methods for image labelling tasks like object segmentation make predictions for each variable (pixel) independently [3, 5, 8]. This prevents them from enforcing dependencies between variables and translates into locally inconsistent pixel labellings. Random field models, instead, encourage spatial consistency of labels at increased computational expense. This paper presents a new and efficient forest based model that achieves spatially consistent semantic image segmentation by encoding variable dependencies directly in the feature space the forests operate on. Such correlations are captured via new long-range, soft connectivity features, computed via generalized geodesic distance transforms. Our model can be thought of as a generalization of the successful Semantic Texton Forest, Auto-Context, and Entangled Forest models. A second contribution is to show the connection between the typical Conditional Random Field (CRF) energy and the forest training objective. This analysis yields a new objective for training decision forests that encourages more accurate structured prediction. Our GeoF model is validated quantitatively on the task of semantic image segmentation, on four challenging and very diverse image datasets. GeoF outperforms both state of-the-art forest models and the conventional pair wise CRF.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"61 11 1","pages":"65-72"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83546441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
Large Displacement Optical Flow from Nearest Neighbor Fields 来自最近邻场的大位移光流
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.316
Zhuoyuan Chen, Hailin Jin, Zhe L. Lin, Scott D. Cohen, Ying Wu
We present an optical flow algorithm for large displacement motions. Most existing optical flow methods use the standard coarse-to-fine framework to deal with large displacement motions which has intrinsic limitations. Instead, we formulate the motion estimation problem as a motion segmentation problem. We use approximate nearest neighbor fields to compute an initial motion field and use a robust algorithm to compute a set of similarity transformations as the motion candidates for segmentation. To account for deviations from similarity transformations, we add local deformations in the segmentation process. We also observe that small objects can be better recovered using translations as the motion candidates. We fuse the motion results obtained under similarity transformations and under translations together before a final refinement. Experimental validation shows that our method can successfully handle large displacement motions. Although we particularly focus on large displacement motions in this work, we make no sacrifice in terms of overall performance. In particular, our method ranks at the top of the Middlebury benchmark.
提出了一种大位移运动的光流算法。现有的光流方法大多采用标准的粗到精框架来处理大位移运动,这种方法具有固有的局限性。相反,我们将运动估计问题表述为运动分割问题。我们使用近似最近邻域来计算初始运动域,并使用鲁棒算法来计算一组相似变换作为分割的运动候选。为了解释相似变换的偏差,我们在分割过程中添加了局部变形。我们还观察到,使用平移作为运动候选,可以更好地恢复小物体。在最后的细化之前,我们将相似变换和平移得到的运动结果融合在一起。实验验证表明,该方法可以成功地处理大位移运动。虽然我们在这项工作中特别关注大位移运动,但我们不会牺牲整体性能。特别是,我们的方法在米德尔伯里的基准中排名第一。
{"title":"Large Displacement Optical Flow from Nearest Neighbor Fields","authors":"Zhuoyuan Chen, Hailin Jin, Zhe L. Lin, Scott D. Cohen, Ying Wu","doi":"10.1109/CVPR.2013.316","DOIUrl":"https://doi.org/10.1109/CVPR.2013.316","url":null,"abstract":"We present an optical flow algorithm for large displacement motions. Most existing optical flow methods use the standard coarse-to-fine framework to deal with large displacement motions which has intrinsic limitations. Instead, we formulate the motion estimation problem as a motion segmentation problem. We use approximate nearest neighbor fields to compute an initial motion field and use a robust algorithm to compute a set of similarity transformations as the motion candidates for segmentation. To account for deviations from similarity transformations, we add local deformations in the segmentation process. We also observe that small objects can be better recovered using translations as the motion candidates. We fuse the motion results obtained under similarity transformations and under translations together before a final refinement. Experimental validation shows that our method can successfully handle large displacement motions. Although we particularly focus on large displacement motions in this work, we make no sacrifice in terms of overall performance. In particular, our method ranks at the top of the Middlebury benchmark.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"44 1","pages":"2443-2450"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78886969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 168
Intrinsic Characterization of Dynamic Surfaces 动态曲面的内在表征
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.37
Tony Tung, T. Matsuyama
This paper presents a novel approach to characterize deformable surface using intrinsic property dynamics. 3D dynamic surfaces representing humans in motion can be obtained using multiple view stereo reconstruction methods or depth cameras. Nowadays these technologies have become capable to capture surface variations in real-time, and give details such as clothing wrinkles and deformations. Assuming repetitive patterns in the deformations, we propose to model complex surface variations using sets of linear dynamical systems (LDS) where observations across time are given by surface intrinsic properties such as local curvatures. We introduce an approach based on bags of dynamical systems, where each surface feature to be represented in the codebook is modeled by a set of LDS equipped with timing structure. Experiments are performed on datasets of real-world dynamical surfaces and show compelling results for description, classification and segmentation.
本文提出了一种利用内禀特性动力学表征可变形表面的新方法。代表运动中的人体的三维动态表面可以通过多视图立体重建方法或深度相机获得。如今,这些技术已经能够实时捕捉表面变化,并提供诸如衣服褶皱和变形等细节。假设变形中的重复模式,我们建议使用线性动力系统(LDS)集来模拟复杂的表面变化,其中随时间的观测由表面固有性质(如局部曲率)给出。我们介绍了一种基于动力系统包的方法,其中每个要在码本中表示的表面特征由一组具有时序结构的LDS来建模。在真实世界的动态表面数据集上进行了实验,并显示了令人信服的描述、分类和分割结果。
{"title":"Intrinsic Characterization of Dynamic Surfaces","authors":"Tony Tung, T. Matsuyama","doi":"10.1109/CVPR.2013.37","DOIUrl":"https://doi.org/10.1109/CVPR.2013.37","url":null,"abstract":"This paper presents a novel approach to characterize deformable surface using intrinsic property dynamics. 3D dynamic surfaces representing humans in motion can be obtained using multiple view stereo reconstruction methods or depth cameras. Nowadays these technologies have become capable to capture surface variations in real-time, and give details such as clothing wrinkles and deformations. Assuming repetitive patterns in the deformations, we propose to model complex surface variations using sets of linear dynamical systems (LDS) where observations across time are given by surface intrinsic properties such as local curvatures. We introduce an approach based on bags of dynamical systems, where each surface feature to be represented in the codebook is modeled by a set of LDS equipped with timing structure. Experiments are performed on datasets of real-world dynamical surfaces and show compelling results for description, classification and segmentation.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"16 1","pages":"233-240"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80104596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2013 IEEE Conference on Computer Vision and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1