首页 > 最新文献

2015 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
Visual Madlibs: Fill in the Blank Description Generation and Question Answering Visual Madlibs:填空描述生成和问题回答
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.283
Licheng Yu, Eunbyung Park, A. Berg, Tamara L. Berg
In this paper, we introduce a new dataset consisting of 360,001 focused natural language descriptions for 10,738 images. This dataset, the Visual Madlibs dataset, is collected using automatically produced fill-in-the-blank templates designed to gather targeted descriptions about: people and objects, their appearances, activities, and interactions, as well as inferences about the general scene or its broader context. We provide several analyses of the Visual Madlibs dataset and demonstrate its applicability to two new description generation tasks: focused description generation, and multiple-choice question-answering for images. Experiments using joint-embedding and deep learning methods show promising results on these tasks.
在本文中,我们引入了一个新的数据集,该数据集由360,001个聚焦的自然语言描述组成,涉及10,738个图像。这个数据集,即Visual Madlibs数据集,是使用自动生成的填空模板收集的,该模板旨在收集关于以下方面的目标描述:人和物体、它们的外观、活动和交互,以及关于一般场景或更广泛背景的推断。我们对Visual Madlibs数据集进行了一些分析,并展示了它对两个新的描述生成任务的适用性:集中描述生成和图像的多项选择题回答。使用联合嵌入和深度学习方法的实验在这些任务上显示了令人鼓舞的结果。
{"title":"Visual Madlibs: Fill in the Blank Description Generation and Question Answering","authors":"Licheng Yu, Eunbyung Park, A. Berg, Tamara L. Berg","doi":"10.1109/ICCV.2015.283","DOIUrl":"https://doi.org/10.1109/ICCV.2015.283","url":null,"abstract":"In this paper, we introduce a new dataset consisting of 360,001 focused natural language descriptions for 10,738 images. This dataset, the Visual Madlibs dataset, is collected using automatically produced fill-in-the-blank templates designed to gather targeted descriptions about: people and objects, their appearances, activities, and interactions, as well as inferences about the general scene or its broader context. We provide several analyses of the Visual Madlibs dataset and demonstrate its applicability to two new description generation tasks: focused description generation, and multiple-choice question-answering for images. Experiments using joint-embedding and deep learning methods show promising results on these tasks.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"C1 1","pages":"2461-2469"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85197306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 135
Attributed Grammars for Joint Estimation of Human Attributes, Part and Pose 人体属性、部位和姿势联合估计的属性语法
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.273
Seyoung Park, Song-Chun Zhu
In this paper, we are interested in developing compositional models to explicit representing pose, parts and attributes and tackling the tasks of attribute recognition, pose estimation and part localization jointly. This is different from the recent trend of using CNN-based approaches for training and testing on these tasks separately with a large amount of data. Conventional attribute models typically use a large number of region-based attribute classifiers on parts of pre-trained pose estimator without explicitly detecting the object or its parts, or considering the correlations between attributes. In contrast, our approach jointly represents both the object parts and their semantic attributes within a unified compositional hierarchy. We apply our attributed grammar model to the task of human parsing by simultaneously performing part localization and attribute recognition. We show our modeling helps performance improvements on pose-estimation task and also outperforms on other existing methods on attribute prediction task.
在本文中,我们感兴趣的是开发组合模型来显式表示姿态、部件和属性,并共同解决属性识别、姿态估计和部件定位的任务。这与最近使用基于cnn的方法对这些任务分别进行大量数据的训练和测试的趋势不同。传统的属性模型通常在预训练姿态估计器的部分上使用大量基于区域的属性分类器,而没有明确检测物体或其部分,也没有考虑属性之间的相关性。相反,我们的方法在一个统一的组合层次结构中联合表示对象部分及其语义属性。我们通过同时进行部分定位和属性识别,将我们的属性语法模型应用于人工解析任务。我们的模型有助于姿态估计任务的性能改进,并且在属性预测任务上优于其他现有方法。
{"title":"Attributed Grammars for Joint Estimation of Human Attributes, Part and Pose","authors":"Seyoung Park, Song-Chun Zhu","doi":"10.1109/ICCV.2015.273","DOIUrl":"https://doi.org/10.1109/ICCV.2015.273","url":null,"abstract":"In this paper, we are interested in developing compositional models to explicit representing pose, parts and attributes and tackling the tasks of attribute recognition, pose estimation and part localization jointly. This is different from the recent trend of using CNN-based approaches for training and testing on these tasks separately with a large amount of data. Conventional attribute models typically use a large number of region-based attribute classifiers on parts of pre-trained pose estimator without explicitly detecting the object or its parts, or considering the correlations between attributes. In contrast, our approach jointly represents both the object parts and their semantic attributes within a unified compositional hierarchy. We apply our attributed grammar model to the task of human parsing by simultaneously performing part localization and attribute recognition. We show our modeling helps performance improvements on pose-estimation task and also outperforms on other existing methods on attribute prediction task.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"43 1","pages":"2372-2380"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85480385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Sparse Dynamic 3D Reconstruction from Unsynchronized Videos 从非同步视频稀疏动态3D重建
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.504
Enliang Zheng, Dinghuang Ji, Enrique Dunn, Jan-Michael Frahm
We target the sparse 3D reconstruction of dynamic objects observed by multiple unsynchronized video cameras with unknown temporal overlap. To this end, we develop a framework to recover the unknown structure without sequencing information across video sequences. Our proposed compressed sensing framework poses the estimation of 3D structure as the problem of dictionary learning. Moreover, we define our dictionary as the temporally varying 3D structure, while we define local sequencing information in terms of the sparse coefficients describing a locally linear 3D structural interpolation. Our formulation optimizes a biconvex cost function that leverages a compressed sensing formulation and enforces both structural dependency coherence across video streams, as well as motion smoothness across estimates from common video sources. Experimental results demonstrate the effectiveness of our approach in both synthetic data and captured imagery.
我们的目标是对多个时间重叠未知的不同步摄像机观测到的动态物体进行稀疏三维重建。为此,我们开发了一个框架来恢复未知结构,而不需要跨视频序列的测序信息。我们提出的压缩感知框架将三维结构的估计作为字典学习问题。此外,我们将字典定义为随时间变化的三维结构,而我们将局部排序信息定义为描述局部线性三维结构插值的稀疏系数。我们的公式优化了一个双凸成本函数,该函数利用压缩感知公式,并加强了视频流之间的结构依赖一致性,以及来自常见视频源的估计的运动平滑性。实验结果证明了该方法在合成数据和捕获图像上的有效性。
{"title":"Sparse Dynamic 3D Reconstruction from Unsynchronized Videos","authors":"Enliang Zheng, Dinghuang Ji, Enrique Dunn, Jan-Michael Frahm","doi":"10.1109/ICCV.2015.504","DOIUrl":"https://doi.org/10.1109/ICCV.2015.504","url":null,"abstract":"We target the sparse 3D reconstruction of dynamic objects observed by multiple unsynchronized video cameras with unknown temporal overlap. To this end, we develop a framework to recover the unknown structure without sequencing information across video sequences. Our proposed compressed sensing framework poses the estimation of 3D structure as the problem of dictionary learning. Moreover, we define our dictionary as the temporally varying 3D structure, while we define local sequencing information in terms of the sparse coefficients describing a locally linear 3D structural interpolation. Our formulation optimizes a biconvex cost function that leverages a compressed sensing formulation and enforces both structural dependency coherence across video streams, as well as motion smoothness across estimates from common video sources. Experimental results demonstrate the effectiveness of our approach in both synthetic data and captured imagery.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"63 1","pages":"4435-4443"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83870310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Photogeometric Scene Flow for High-Detail Dynamic 3D Reconstruction 高细节动态三维重建的光几何场景流
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.103
P. Gotardo, T. Simon, Yaser Sheikh, I. Matthews
Photometric stereo (PS) is an established technique for high-detail reconstruction of 3D geometry and appearance. To correct for surface integration errors, PS is often combined with multiview stereo (MVS). With dynamic objects, PS reconstruction also faces the problem of computing optical flow (OF) for image alignment under rapid changes in illumination. Current PS methods typically compute optical flow and MVS as independent stages, each one with its own limitations and errors introduced by early regularization. In contrast, scene flow methods estimate geometry and motion, but lack the fine detail from PS. This paper proposes photogeometric scene flow (PGSF) for high-quality dynamic 3D reconstruction. PGSF performs PS, OF, and MVS simultaneously. It is based on two key observations: (i) while image alignment improves PS, PS allows for surfaces to be relit to improve alignment, (ii) PS provides surface gradients that render the smoothness term in MVS unnecessary, leading to truly data-driven, continuous depth estimates. This synergy is demonstrated in the quality of the resulting RGB appearance, 3D geometry, and 3D motion.
光度立体(PS)是一种成熟的高细节三维几何和外观重建技术。为了纠正曲面积分误差,PS通常与多视点立体(MVS)相结合。对于动态目标,PS重建还面临光照快速变化下图像对准的光流计算问题。目前的PS方法通常将光流和MVS作为独立的阶段计算,每个阶段都有自己的局限性和早期正则化带来的误差。相比之下,场景流方法估计几何和运动,但缺乏PS的精细细节。本文提出了用于高质量动态三维重建的光几何场景流(PGSF)。PGSF同时执行PS、OF和MVS。它基于两个关键的观察结果:(i)虽然图像对齐改善了PS,但PS允许表面重新定位以改善对齐;(ii) PS提供表面梯度,使MVS中的平滑项变得不必要,从而导致真正的数据驱动的连续深度估计。这种协同作用体现在生成的RGB外观、3D几何和3D运动的质量上。
{"title":"Photogeometric Scene Flow for High-Detail Dynamic 3D Reconstruction","authors":"P. Gotardo, T. Simon, Yaser Sheikh, I. Matthews","doi":"10.1109/ICCV.2015.103","DOIUrl":"https://doi.org/10.1109/ICCV.2015.103","url":null,"abstract":"Photometric stereo (PS) is an established technique for high-detail reconstruction of 3D geometry and appearance. To correct for surface integration errors, PS is often combined with multiview stereo (MVS). With dynamic objects, PS reconstruction also faces the problem of computing optical flow (OF) for image alignment under rapid changes in illumination. Current PS methods typically compute optical flow and MVS as independent stages, each one with its own limitations and errors introduced by early regularization. In contrast, scene flow methods estimate geometry and motion, but lack the fine detail from PS. This paper proposes photogeometric scene flow (PGSF) for high-quality dynamic 3D reconstruction. PGSF performs PS, OF, and MVS simultaneously. It is based on two key observations: (i) while image alignment improves PS, PS allows for surfaces to be relit to improve alignment, (ii) PS provides surface gradients that render the smoothness term in MVS unnecessary, leading to truly data-driven, continuous depth estimates. This synergy is demonstrated in the quality of the resulting RGB appearance, 3D geometry, and 3D motion.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"16 1","pages":"846-854"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83153687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Depth Recovery from Light Field Using Focal Stack Symmetry 利用焦叠对称的光场深度恢复
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.394
Haiting Lin, Can Chen, S. B. Kang, Jingyi Yu
We describe a technique to recover depth from a light field (LF) using two proposed features of the LF focal stack. One feature is the property that non-occluding pixels exhibit symmetry along the focal depth dimension centered at the in-focus slice. The other is a data consistency measure based on analysis-by-synthesis, i.e., the difference between the synthesized focal stack given the hypothesized depth map and that from the LF. These terms are used in an iterative optimization framework to extract scene depth. Experimental results on real Lytro and Raytrix data demonstrate that our technique outperforms state-of-the-art solutions and is significantly more robust to noise and under-sampling.
我们描述了一种利用光场焦叠的两个特征从光场(LF)中恢复深度的技术。其中一个特征是,非遮挡像素沿聚焦切片为中心的焦深度维度呈现对称性。另一个是基于合成分析的数据一致性度量,即给定假设深度图的合成震源叠加与LF的震源叠加之间的差异。这些术语在迭代优化框架中用于提取场景深度。在Lytro和Raytrix实际数据上的实验结果表明,我们的技术优于最先进的解决方案,并且对噪声和欠采样的鲁棒性更强。
{"title":"Depth Recovery from Light Field Using Focal Stack Symmetry","authors":"Haiting Lin, Can Chen, S. B. Kang, Jingyi Yu","doi":"10.1109/ICCV.2015.394","DOIUrl":"https://doi.org/10.1109/ICCV.2015.394","url":null,"abstract":"We describe a technique to recover depth from a light field (LF) using two proposed features of the LF focal stack. One feature is the property that non-occluding pixels exhibit symmetry along the focal depth dimension centered at the in-focus slice. The other is a data consistency measure based on analysis-by-synthesis, i.e., the difference between the synthesized focal stack given the hypothesized depth map and that from the LF. These terms are used in an iterative optimization framework to extract scene depth. Experimental results on real Lytro and Raytrix data demonstrate that our technique outperforms state-of-the-art solutions and is significantly more robust to noise and under-sampling.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"66 1","pages":"3451-3459"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78730298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 134
A Novel Sparsity Measure for Tensor Recovery 一种新的张量恢复稀疏度测度
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.39
Qian Zhao, Deyu Meng, Xu Kong, Qi Xie, Wenfei Cao, Yao Wang, Zongben Xu
In this paper, we propose a new sparsity regularizer for measuring the low-rank structure underneath a tensor. The proposed sparsity measure has a natural physical meaning which is intrinsically the size of the fundamental Kronecker basis to express the tensor. By embedding the sparsity measure into the tensor completion and tensor robust PCA frameworks, we formulate new models to enhance their capability in tensor recovery. Through introducing relaxation forms of the proposed sparsity measure, we also adopt the alternating direction method of multipliers (ADMM) for solving the proposed models. Experiments implemented on synthetic and multispectral image data sets substantiate the effectiveness of the proposed methods.
本文提出了一种新的稀疏正则化器,用于测量张量下的低秩结构。提出的稀疏度度量具有自然的物理意义,本质上是表示张量的基本Kronecker基的大小。通过将稀疏度测度嵌入到张量补全和张量鲁棒PCA框架中,我们建立了新的模型来增强它们的张量恢复能力。通过引入稀疏度度量的松弛形式,我们还采用乘法器的交替方向法(ADMM)来求解所提出的模型。在合成和多光谱图像数据集上进行的实验验证了所提方法的有效性。
{"title":"A Novel Sparsity Measure for Tensor Recovery","authors":"Qian Zhao, Deyu Meng, Xu Kong, Qi Xie, Wenfei Cao, Yao Wang, Zongben Xu","doi":"10.1109/ICCV.2015.39","DOIUrl":"https://doi.org/10.1109/ICCV.2015.39","url":null,"abstract":"In this paper, we propose a new sparsity regularizer for measuring the low-rank structure underneath a tensor. The proposed sparsity measure has a natural physical meaning which is intrinsically the size of the fundamental Kronecker basis to express the tensor. By embedding the sparsity measure into the tensor completion and tensor robust PCA frameworks, we formulate new models to enhance their capability in tensor recovery. Through introducing relaxation forms of the proposed sparsity measure, we also adopt the alternating direction method of multipliers (ADMM) for solving the proposed models. Experiments implemented on synthetic and multispectral image data sets substantiate the effectiveness of the proposed methods.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"112 1","pages":"271-279"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90757572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Actionness-Assisted Recognition of Actions 动作-辅助动作识别
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.371
Ye Luo, L. Cheong, An Tran
We elicit from a fundamental definition of action low-level attributes that can reveal agency and intentionality. These descriptors are mainly trajectory-based, measuring sudden changes, temporal synchrony, and repetitiveness. The actionness map can be used to localize actions in a way that is generic across action and agent types. Furthermore, it also groups interacting regions into a useful unit of analysis, which is crucial for recognition of actions involving interactions. We then implement an actionness-driven pooling scheme to improve action recognition performance. Experimental results on three datasets show the advantages of our method on both action detection and action recognition comparing with other state-of-the-art methods.
我们从行为的基本定义中推导出可以揭示能动性和意向性的低级属性。这些描述符主要是基于轨迹的,测量突然变化、时间同步性和重复性。动作性图可用于以跨动作和代理类型通用的方式对动作进行本地化。此外,它还将相互作用的区域分组为一个有用的分析单元,这对于识别涉及相互作用的行为至关重要。然后,我们实现了一个动作驱动的池化方案来提高动作识别性能。在三个数据集上的实验结果表明,该方法在动作检测和动作识别方面都优于其他先进的方法。
{"title":"Actionness-Assisted Recognition of Actions","authors":"Ye Luo, L. Cheong, An Tran","doi":"10.1109/ICCV.2015.371","DOIUrl":"https://doi.org/10.1109/ICCV.2015.371","url":null,"abstract":"We elicit from a fundamental definition of action low-level attributes that can reveal agency and intentionality. These descriptors are mainly trajectory-based, measuring sudden changes, temporal synchrony, and repetitiveness. The actionness map can be used to localize actions in a way that is generic across action and agent types. Furthermore, it also groups interacting regions into a useful unit of analysis, which is crucial for recognition of actions involving interactions. We then implement an actionness-driven pooling scheme to improve action recognition performance. Experimental results on three datasets show the advantages of our method on both action detection and action recognition comparing with other state-of-the-art methods.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"20 1","pages":"3244-3252"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91153888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Action Localization in Videos through Context Walk 通过上下文行走在视频中的动作定位
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.375
K. Soomro, Haroon Idrees, M. Shah
This paper presents an efficient approach for localizing actions by learning contextual relations, in the form of relative locations between different video regions. We begin by over-segmenting the videos into supervoxels, which have the ability to preserve action boundaries and also reduce the complexity of the problem. Context relations are learned during training which capture displacements from all the supervoxels in a video to those belonging to foreground actions. Then, given a testing video, we select a supervoxel randomly and use the context information acquired during training to estimate the probability of each supervoxel belonging to the foreground action. The walk proceeds to a new supervoxel and the process is repeated for a few steps. This "context walk" generates a conditional distribution of an action over all the supervoxels. A Conditional Random Field is then used to find action proposals in the video, whose confidences are obtained using SVMs. We validated the proposed approach on several datasets and show that context in the form of relative displacements between supervoxels can be extremely useful for action localization. This also results in significantly fewer evaluations of the classifier, in sharp contrast to the alternate sliding window approaches.
本文提出了一种通过学习上下文关系(以不同视频区域之间的相对位置的形式)来定位动作的有效方法。我们首先将视频过度分割成超体素,这有能力保留动作边界,也降低了问题的复杂性。上下文关系是在训练过程中学习的,它捕获视频中所有超体素到那些属于前景动作的超体素的位移。然后,给定一个测试视频,我们随机选择一个超体素,并使用在训练过程中获得的上下文信息来估计每个超体素属于前景动作的概率。行走继续到一个新的超体素,这个过程重复了几个步骤。这种“上下文行走”在所有超体素上生成一个动作的条件分布。然后使用条件随机场来查找视频中的动作建议,并使用支持向量机获得其置信度。我们在几个数据集上验证了所提出的方法,并表明超体素之间相对位移形式的上下文对于动作定位非常有用。这也导致分类器的评估显著减少,与滑动窗口方法形成鲜明对比。
{"title":"Action Localization in Videos through Context Walk","authors":"K. Soomro, Haroon Idrees, M. Shah","doi":"10.1109/ICCV.2015.375","DOIUrl":"https://doi.org/10.1109/ICCV.2015.375","url":null,"abstract":"This paper presents an efficient approach for localizing actions by learning contextual relations, in the form of relative locations between different video regions. We begin by over-segmenting the videos into supervoxels, which have the ability to preserve action boundaries and also reduce the complexity of the problem. Context relations are learned during training which capture displacements from all the supervoxels in a video to those belonging to foreground actions. Then, given a testing video, we select a supervoxel randomly and use the context information acquired during training to estimate the probability of each supervoxel belonging to the foreground action. The walk proceeds to a new supervoxel and the process is repeated for a few steps. This \"context walk\" generates a conditional distribution of an action over all the supervoxels. A Conditional Random Field is then used to find action proposals in the video, whose confidences are obtained using SVMs. We validated the proposed approach on several datasets and show that context in the form of relative displacements between supervoxels can be extremely useful for action localization. This also results in significantly fewer evaluations of the classifier, in sharp contrast to the alternate sliding window approaches.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"91 1","pages":"3280-3288"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90185039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
An NMF Perspective on Binary Hashing 二进制哈希的NMF视角
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.476
L. Mukherjee, Sathya Ravi, V. Ithapu, Tyler Holmes, Vikas Singh
The pervasiveness of massive data repositories has led to much interest in efficient methods for indexing, search, and retrieval. For image data, a rapidly developing body of work for these applications shows impressive performance with methods that broadly fall under the umbrella term of Binary Hashing. Given a distance matrix, a binary hashing algorithm solves for a binary code for the given set of examples, whose Hamming distance nicely approximates the original distances. The formulation is non-convex -- so existing solutions adopt spectral relaxations or perform coordinate descent (or quantization) on a surrogate objective that is numerically more tractable. In this paper, we first derive an Augmented Lagrangian approach to optimize the standard binary Hashing objective (i.e.,maintain fidelity with a given distance matrix). With appropriate step sizes, we find that this scheme already yields results that match or substantially outperform state of the art methods on most benchmarks used in the literature. Then, to allow the model to scale to large datasets, we obtain an interesting reformulation of the binary hashing objective as a non negative matrix factorization. Later, this leads to a simple multiplicative updates algorithm -- whose parallelization properties are exploited to obtain a fast GPU based implementation. We give a probabilistic analysis of our initialization scheme and present a range of experiments to show that the method is simple to implement and competes favorably with available methods (both for optimization and generalization).
海量数据存储库的普及使得人们对高效的索引、搜索和检索方法产生了浓厚的兴趣。对于图像数据,这些应用程序的快速发展的工作主体显示了令人印象深刻的性能,这些方法大致属于二进制哈希的总称。给定距离矩阵,二进制哈希算法求解给定示例集的二进制代码,其汉明距离很好地近似于原始距离。该公式是非凸的,因此现有的解决方案采用谱松弛或在数字上更易于处理的替代目标上执行坐标下降(或量化)。在本文中,我们首先推导了一种增广拉格朗日方法来优化标准二进制哈希目标(即在给定距离矩阵下保持保真度)。通过适当的步长,我们发现该方案已经在文献中使用的大多数基准测试中产生匹配或实质上优于最先进方法的结果。然后,为了允许模型扩展到大型数据集,我们将二进制哈希目标重新表述为非负矩阵分解。后来,这导致了一个简单的乘法更新算法——其并行化特性被利用来获得一个快速的基于GPU的实现。我们对我们的初始化方案进行了概率分析,并提出了一系列实验,以表明该方法易于实现,并与现有方法(优化和泛化)竞争。
{"title":"An NMF Perspective on Binary Hashing","authors":"L. Mukherjee, Sathya Ravi, V. Ithapu, Tyler Holmes, Vikas Singh","doi":"10.1109/ICCV.2015.476","DOIUrl":"https://doi.org/10.1109/ICCV.2015.476","url":null,"abstract":"The pervasiveness of massive data repositories has led to much interest in efficient methods for indexing, search, and retrieval. For image data, a rapidly developing body of work for these applications shows impressive performance with methods that broadly fall under the umbrella term of Binary Hashing. Given a distance matrix, a binary hashing algorithm solves for a binary code for the given set of examples, whose Hamming distance nicely approximates the original distances. The formulation is non-convex -- so existing solutions adopt spectral relaxations or perform coordinate descent (or quantization) on a surrogate objective that is numerically more tractable. In this paper, we first derive an Augmented Lagrangian approach to optimize the standard binary Hashing objective (i.e.,maintain fidelity with a given distance matrix). With appropriate step sizes, we find that this scheme already yields results that match or substantially outperform state of the art methods on most benchmarks used in the literature. Then, to allow the model to scale to large datasets, we obtain an interesting reformulation of the binary hashing objective as a non negative matrix factorization. Later, this leads to a simple multiplicative updates algorithm -- whose parallelization properties are exploited to obtain a fast GPU based implementation. We give a probabilistic analysis of our initialization scheme and present a range of experiments to show that the method is simple to implement and competes favorably with available methods (both for optimization and generalization).","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"67 1","pages":"4184-4192"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90195759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
BodyPrint: Pose Invariant 3D Shape Matching of Human Bodies BodyPrint:人体姿势不变的3D形状匹配
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.186
Jiangping Wang, Kai Ma, V. Singh, Thomas S. Huang, Terrence Chen
3D human body shape matching has large potential on many real world applications, especially with the recent advances in the 3D range sensing technology. We address this problem by proposing a novel holistic human body shape descriptor called BodyPrint. To compute the bodyprint for a given body scan, we fit a deformable human body mesh and project the mesh parameters to a low-dimensional subspace which improves discriminability across different persons. Experiments are carried out on three real-world human body datasets to demonstrate that BodyPrint is robust to pose variation as well as missing information and sensor noise. It improves the matching accuracy significantly compared to conventional 3D shape matching techniques using local features. To facilitate practical applications where the shape database may grow over time, we also extend our learning framework to handle online updates.
三维人体形状匹配在许多现实世界的应用中具有很大的潜力,特别是随着近年来三维距离传感技术的发展。为了解决这个问题,我们提出了一种名为BodyPrint的全新整体人体形状描述器。为了计算给定身体扫描的身体印记,我们拟合了一个可变形的人体网格,并将网格参数投影到一个低维子空间,从而提高了不同人之间的可分辨性。在三个真实的人体数据集上进行了实验,证明了BodyPrint对姿态变化、缺失信息和传感器噪声具有鲁棒性。与传统的基于局部特征的三维形状匹配技术相比,该方法显著提高了匹配精度。为了方便实际应用,形状数据库可能会随着时间的推移而增长,我们还扩展了我们的学习框架来处理在线更新。
{"title":"BodyPrint: Pose Invariant 3D Shape Matching of Human Bodies","authors":"Jiangping Wang, Kai Ma, V. Singh, Thomas S. Huang, Terrence Chen","doi":"10.1109/ICCV.2015.186","DOIUrl":"https://doi.org/10.1109/ICCV.2015.186","url":null,"abstract":"3D human body shape matching has large potential on many real world applications, especially with the recent advances in the 3D range sensing technology. We address this problem by proposing a novel holistic human body shape descriptor called BodyPrint. To compute the bodyprint for a given body scan, we fit a deformable human body mesh and project the mesh parameters to a low-dimensional subspace which improves discriminability across different persons. Experiments are carried out on three real-world human body datasets to demonstrate that BodyPrint is robust to pose variation as well as missing information and sensor noise. It improves the matching accuracy significantly compared to conventional 3D shape matching techniques using local features. To facilitate practical applications where the shape database may grow over time, we also extend our learning framework to handle online updates.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"63 1","pages":"1591-1599"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90452229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2015 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1