首页 > 最新文献

2017 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
Open Set Domain Adaptation 开集域自适应
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.88
Pau Panareda Busto, Juergen Gall
When the training and the test data belong to different domains, the accuracy of an object classifier is significantly reduced. Therefore, several algorithms have been proposed in the last years to diminish the so called domain shift between datasets. However, all available evaluation protocols for domain adaptation describe a closed set recognition task, where both domains, namely source and target, contain exactly the same object classes. In this work, we also explore the field of domain adaptation in open sets, which is a more realistic scenario where only a few categories of interest are shared between source and target data. Therefore, we propose a method that fits in both closed and open set scenarios. The approach learns a mapping from the source to the target domain by jointly solving an assignment problem that labels those target instances that potentially belong to the categories of interest present in the source dataset. A thorough evaluation shows that our approach outperforms the state-of-the-art.
当训练数据和测试数据属于不同的域时,目标分类器的准确率会显著降低。因此,在过去的几年里,已经提出了几种算法来减少数据集之间所谓的域转移。然而,所有可用的领域自适应评估协议都描述了一个封闭集识别任务,其中两个领域,即源和目标,包含完全相同的对象类。在这项工作中,我们还探索了开放集中的域适应领域,这是一个更现实的场景,源数据和目标数据之间只有少数兴趣类别共享。因此,我们提出了一种适合于封闭和开放场景的方法。该方法通过联合解决一个分配问题来学习从源域到目标域的映射,该分配问题标记那些可能属于源数据集中存在的感兴趣类别的目标实例。全面的评估表明,我们的方法优于最先进的技术。
{"title":"Open Set Domain Adaptation","authors":"Pau Panareda Busto, Juergen Gall","doi":"10.1109/ICCV.2017.88","DOIUrl":"https://doi.org/10.1109/ICCV.2017.88","url":null,"abstract":"When the training and the test data belong to different domains, the accuracy of an object classifier is significantly reduced. Therefore, several algorithms have been proposed in the last years to diminish the so called domain shift between datasets. However, all available evaluation protocols for domain adaptation describe a closed set recognition task, where both domains, namely source and target, contain exactly the same object classes. In this work, we also explore the field of domain adaptation in open sets, which is a more realistic scenario where only a few categories of interest are shared between source and target data. Therefore, we propose a method that fits in both closed and open set scenarios. The approach learns a mapping from the source to the target domain by jointly solving an assignment problem that labels those target instances that potentially belong to the categories of interest present in the source dataset. A thorough evaluation shows that our approach outperforms the state-of-the-art.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"30 1","pages":"754-763"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87977768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 457
Ensemble Diffusion for Retrieval 用于检索的集合扩散
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.90
S. Bai, Zhichao Zhou, Jingdong Wang, X. Bai, Longin Jan Latecki, Q. Tian
As a postprocessing procedure, diffusion process has demonstrated its ability of substantially improving the performance of various visual retrieval systems. Whereas, great efforts are also devoted to similarity (or metric) fusion, seeing that only one individual type of similarity cannot fully reveal the intrinsic relationship between objects. This stimulates a great research interest of considering similarity fusion in the framework of diffusion process (i.e., fusion with diffusion) for robust retrieval.,,In this paper, we firstly revisit representative methods about fusion with diffusion, and provide new insights which are ignored by previous researchers. Then, observing that existing algorithms are susceptible to noisy similarities, the proposed Regularized Ensemble Diffusion (RED) is bundled with an automatic weight learning paradigm, so that the negative impacts of noisy similarities are suppressed. At last, we integrate several recently-proposed similarities with the proposed framework. The experimental results suggest that we can achieve new state-of-the-art performances on various retrieval tasks, including 3D shape retrieval on ModelNet dataset, and image retrieval on Holidays and Ukbench dataset.
作为一种后处理过程,扩散处理已经证明了它能够极大地提高各种视觉检索系统的性能。然而,人们也在相似性(或度量)融合方面付出了很大的努力,因为只有一种单独的相似性不能完全揭示对象之间的内在关系。这激发了在扩散过程框架中考虑相似性融合(即融合与扩散)以实现鲁棒检索的研究兴趣。在本文中,我们首先回顾了具有代表性的融合扩散方法,并提供了以往研究人员所忽略的新见解。然后,观察到现有算法容易受到噪声相似度的影响,将提出的正则化集成扩散(RED)与自动权重学习范式捆绑在一起,从而抑制噪声相似度的负面影响。最后,我们将最近提出的几个相似点与提出的框架相结合。实验结果表明,我们可以在各种检索任务上实现新的最先进的性能,包括ModelNet数据集上的3D形状检索,以及Holidays和Ukbench数据集上的图像检索。
{"title":"Ensemble Diffusion for Retrieval","authors":"S. Bai, Zhichao Zhou, Jingdong Wang, X. Bai, Longin Jan Latecki, Q. Tian","doi":"10.1109/ICCV.2017.90","DOIUrl":"https://doi.org/10.1109/ICCV.2017.90","url":null,"abstract":"As a postprocessing procedure, diffusion process has demonstrated its ability of substantially improving the performance of various visual retrieval systems. Whereas, great efforts are also devoted to similarity (or metric) fusion, seeing that only one individual type of similarity cannot fully reveal the intrinsic relationship between objects. This stimulates a great research interest of considering similarity fusion in the framework of diffusion process (i.e., fusion with diffusion) for robust retrieval.,,In this paper, we firstly revisit representative methods about fusion with diffusion, and provide new insights which are ignored by previous researchers. Then, observing that existing algorithms are susceptible to noisy similarities, the proposed Regularized Ensemble Diffusion (RED) is bundled with an automatic weight learning paradigm, so that the negative impacts of noisy similarities are suppressed. At last, we integrate several recently-proposed similarities with the proposed framework. The experimental results suggest that we can achieve new state-of-the-art performances on various retrieval tasks, including 3D shape retrieval on ModelNet dataset, and image retrieval on Holidays and Ukbench dataset.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"774-783"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84836487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
Flip-Invariant Motion Representation 翻转不变运动表示
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.600
Takumi Kobayashi
In action recognition, local motion descriptors contribute to effectively representing video sequences where target actions appear in localized spatio-temporal regions. For robust recognition, those fundamental descriptors are required to be invariant against horizontal (mirror) flipping in video frames which frequently occurs due to changes of camera viewpoints and action directions, deteriorating classification performance. In this paper, we propose methods to render flip invariance to the local motion descriptors by two approaches. One method leverages local motion flows to ensure the invariance on input patches where the descriptors are computed. The other derives a invariant form theoretically from the flipping transformation applied to hand-crafted descriptors. The method is also extended so as to deal with ConvNet descriptors through learning the invariant form based on data. The experimental results on human action classification show that the proposed methods favorably improve performance both of the handcrafted and the ConvNet descriptors.
在动作识别中,局部运动描述符有助于有效地表示目标动作出现在局部时空区域的视频序列。为了实现鲁棒性识别,这些基本描述符需要对视频帧中由于摄像机视点和动作方向的变化而经常发生的水平(镜像)翻转保持不变性,从而降低分类性能。在本文中,我们提出了两种方法来实现局部运动描述子的翻转不变性。一种方法利用局部运动流来确保计算描述符的输入补丁的不变性。另一种是从应用于手工描述符的翻转变换的理论推导出不变形式。通过学习基于数据的不变形式,将该方法扩展到处理卷积网络描述符。人体动作分类的实验结果表明,所提出的方法能较好地提高手工描述子和卷积神经网络描述子的分类性能。
{"title":"Flip-Invariant Motion Representation","authors":"Takumi Kobayashi","doi":"10.1109/ICCV.2017.600","DOIUrl":"https://doi.org/10.1109/ICCV.2017.600","url":null,"abstract":"In action recognition, local motion descriptors contribute to effectively representing video sequences where target actions appear in localized spatio-temporal regions. For robust recognition, those fundamental descriptors are required to be invariant against horizontal (mirror) flipping in video frames which frequently occurs due to changes of camera viewpoints and action directions, deteriorating classification performance. In this paper, we propose methods to render flip invariance to the local motion descriptors by two approaches. One method leverages local motion flows to ensure the invariance on input patches where the descriptors are computed. The other derives a invariant form theoretically from the flipping transformation applied to hand-crafted descriptors. The method is also extended so as to deal with ConvNet descriptors through learning the invariant form based on data. The experimental results on human action classification show that the proposed methods favorably improve performance both of the handcrafted and the ConvNet descriptors.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"17 1","pages":"5629-5638"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82679397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework 基于稀疏编码的堆叠RNN异常检测研究
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.45
Weixin Luo, Wen Liu, Shenghua Gao
Motivated by the capability of sparse coding based anomaly detection, we propose a Temporally-coherent Sparse Coding (TSC) where we enforce similar neighbouring frames be encoded with similar reconstruction coefficients. Then we map the TSC with a special type of stacked Recurrent Neural Network (sRNN). By taking advantage of sRNN in learning all parameters simultaneously, the nontrivial hyper-parameter selection to TSC can be avoided, meanwhile with a shallow sRNN, the reconstruction coefficients can be inferred within a forward pass, which reduces the computational cost for learning sparse coefficients. The contributions of this paper are two-fold: i) We propose a TSC, which can be mapped to a sRNN which facilitates the parameter optimization and accelerates the anomaly prediction. ii) We build a very large dataset which is even larger than the summation of all existing dataset for anomaly detection in terms of both the volume of data and the diversity of scenes. Extensive experiments on both a toy dataset and real datasets demonstrate that our TSC based and sRNN based method consistently outperform existing methods, which validates the effectiveness of our method.
基于稀疏编码的异常检测能力,我们提出了一种时间相干稀疏编码(TSC),其中我们强制使用相似的重构系数对相似的相邻帧进行编码。然后我们用一种特殊类型的堆叠递归神经网络(sRNN)映射TSC。利用sRNN同时学习所有参数的优点,可以避免对TSC进行非平凡的超参数选择,同时使用浅sRNN可以在前向传递中推断重建系数,从而减少了学习稀疏系数的计算成本。本文的贡献有两个方面:1)我们提出了一个TSC,它可以映射到一个sRNN,这有利于参数优化和加速异常预测。ii)我们建立了一个非常大的数据集,在数据量和场景多样性方面,它甚至大于所有现有数据集的总和。在玩具数据集和真实数据集上的大量实验表明,我们基于TSC和基于sRNN的方法始终优于现有方法,这验证了我们方法的有效性。
{"title":"A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework","authors":"Weixin Luo, Wen Liu, Shenghua Gao","doi":"10.1109/ICCV.2017.45","DOIUrl":"https://doi.org/10.1109/ICCV.2017.45","url":null,"abstract":"Motivated by the capability of sparse coding based anomaly detection, we propose a Temporally-coherent Sparse Coding (TSC) where we enforce similar neighbouring frames be encoded with similar reconstruction coefficients. Then we map the TSC with a special type of stacked Recurrent Neural Network (sRNN). By taking advantage of sRNN in learning all parameters simultaneously, the nontrivial hyper-parameter selection to TSC can be avoided, meanwhile with a shallow sRNN, the reconstruction coefficients can be inferred within a forward pass, which reduces the computational cost for learning sparse coefficients. The contributions of this paper are two-fold: i) We propose a TSC, which can be mapped to a sRNN which facilitates the parameter optimization and accelerates the anomaly prediction. ii) We build a very large dataset which is even larger than the summation of all existing dataset for anomaly detection in terms of both the volume of data and the diversity of scenes. Extensive experiments on both a toy dataset and real datasets demonstrate that our TSC based and sRNN based method consistently outperform existing methods, which validates the effectiveness of our method.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"56 1","pages":"341-349"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83111502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 484
Editable Parametric Dense Foliage from 3D Capture 可编辑的参数密集树叶从3D捕获
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.567
P. Beardsley, G. Chaurasia
We present an algorithm to compute parametric models of dense foliage. The guiding principles of our work are automatic reconstruction and compact artist friendly representation. We use Bezier patches to model leaf surface, which we compute from images and point clouds of dense foliage. We present an algorithm to segment individual leaves from colour and depth data. We then reconstruct the Bezier representation from segmented leaf points clouds using non-linear optimisation. Unlike previous work, we do not require laboratory scanned exemplars or user intervention. We also demonstrate intuitive manipulators to edit the reconstructed parametric models. We believe our work is a step towards making captured data more accessible to artists for foliage modelling.
本文提出了一种计算浓密树叶参数化模型的算法。我们的工作指导原则是自动重建和紧凑的艺术家友好的表现。我们使用贝塞尔补丁来模拟树叶表面,我们从图像和密集树叶的点云中计算。我们提出了一种从颜色和深度数据中分割单个叶子的算法。然后,我们使用非线性优化从分割的叶点云重建贝塞尔表示。与以前的工作不同,我们不需要实验室扫描样本或用户干预。我们还演示了直观的操纵器来编辑重建的参数模型。我们相信我们的工作是朝着使捕获的数据更容易被艺术家用于树叶建模迈出的一步。
{"title":"Editable Parametric Dense Foliage from 3D Capture","authors":"P. Beardsley, G. Chaurasia","doi":"10.1109/ICCV.2017.567","DOIUrl":"https://doi.org/10.1109/ICCV.2017.567","url":null,"abstract":"We present an algorithm to compute parametric models of dense foliage. The guiding principles of our work are automatic reconstruction and compact artist friendly representation. We use Bezier patches to model leaf surface, which we compute from images and point clouds of dense foliage. We present an algorithm to segment individual leaves from colour and depth data. We then reconstruct the Bezier representation from segmented leaf points clouds using non-linear optimisation. Unlike previous work, we do not require laboratory scanned exemplars or user intervention. We also demonstrate intuitive manipulators to edit the reconstructed parametric models. We believe our work is a step towards making captured data more accessible to artists for foliage modelling.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"79 1","pages":"5315-5324"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84148186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Deep Free-Form Deformation Network for Object-Mask Registration 对象-掩码配准的深度自由变形网络
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.456
Haoyang Zhang, Xuming He
This paper addresses the problem of object-mask registration, which aligns a shape mask to a target object instance. Prior work typically formulate the problem as an object segmentation task with mask prior, which is challenging to solve. In this work, we take a transformation based approach that predicts a 2D non-rigid spatial transform and warps the shape mask onto the target object. In particular, we propose a deep spatial transformer network that learns free-form deformations (FFDs) to non-rigidly warp the shape mask based on a multi-level dual mask feature pooling strategy. The FFD transforms are based on B-splines and parameterized by the offsets of predefined control points, which are differentiable. Therefore, we are able to train the entire network in an end-to-end manner based on L2 matching loss. We evaluate our FFD network on a challenging object-mask alignment task, which aims to refine a set of object segment proposals, and our approach achieves the state-of-the-art performance on the Cityscapes, the PASCAL VOC and the MSCOCO datasets.
本文解决了对象掩码配准问题,该问题将形状掩码与目标对象实例对齐。先前的工作通常将该问题描述为具有掩码先验的对象分割任务,这是一个具有挑战性的解决方案。在这项工作中,我们采用一种基于变换的方法来预测二维非刚性空间变换,并将形状蒙版扭曲到目标物体上。特别地,我们提出了一种深度空间变压器网络,该网络基于多级双掩模特征池化策略,学习自由形式变形(ffd)以非刚性扭曲形状掩模。FFD变换基于b样条,由可微的预定义控制点的偏移量参数化。因此,我们能够以基于L2匹配损失的端到端方式训练整个网络。我们在一个具有挑战性的目标掩码对齐任务上评估了我们的FFD网络,该任务旨在改进一组目标分段建议,我们的方法在cityscape、PASCAL VOC和MSCOCO数据集上实现了最先进的性能。
{"title":"Deep Free-Form Deformation Network for Object-Mask Registration","authors":"Haoyang Zhang, Xuming He","doi":"10.1109/ICCV.2017.456","DOIUrl":"https://doi.org/10.1109/ICCV.2017.456","url":null,"abstract":"This paper addresses the problem of object-mask registration, which aligns a shape mask to a target object instance. Prior work typically formulate the problem as an object segmentation task with mask prior, which is challenging to solve. In this work, we take a transformation based approach that predicts a 2D non-rigid spatial transform and warps the shape mask onto the target object. In particular, we propose a deep spatial transformer network that learns free-form deformations (FFDs) to non-rigidly warp the shape mask based on a multi-level dual mask feature pooling strategy. The FFD transforms are based on B-splines and parameterized by the offsets of predefined control points, which are differentiable. Therefore, we are able to train the entire network in an end-to-end manner based on L2 matching loss. We evaluate our FFD network on a challenging object-mask alignment task, which aims to refine a set of object segment proposals, and our approach achieves the state-of-the-art performance on the Cityscapes, the PASCAL VOC and the MSCOCO datasets.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"19 1","pages":"4261-4269"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81811556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos RPAN:一种用于视频动作识别的端到端递归姿态-注意网络
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.402
Wenbin Du, Yali Wang, Y. Qiao
Recent studies demonstrate the effectiveness of Recurrent Neural Networks (RNNs) for action recognition in videos. However, previous works mainly utilize video-level category as supervision to train RNNs, which may prohibit RNNs to learn complex motion structures along time. In this paper, we propose a recurrent pose-attention network (RPAN) to address this challenge, where we introduce a novel pose-attention mechanism to adaptively learn pose-related features at every time-step action prediction of RNNs. More specifically, we make three main contributions in this paper. Firstly, unlike previous works on pose-related action recognition, our RPAN is an end-toend recurrent network which can exploit important spatialtemporal evolutions of human pose to assist action recognition in a unified framework. Secondly, instead of learning individual human-joint features separately, our poseattention mechanism learns robust human-part features by sharing attention parameters partially on the semanticallyrelated human joints. These human-part features are then fed into the human-part pooling layer to construct a highlydiscriminative pose-related representation for temporal action modeling. Thirdly, one important byproduct of our RPAN is pose estimation in videos, which can be used for coarse pose annotation in action videos. We evaluate the proposed RPAN quantitatively and qualitatively on two popular benchmarks, i.e., Sub-JHMDB and PennAction. Experimental results show that RPAN outperforms the recent state-of-the-art methods on these challenging datasets.
最近的研究证明了递归神经网络(RNNs)在视频动作识别中的有效性。然而,以往的工作主要是利用视频级别的类别作为监督来训练rnn,这可能会阻碍rnn随着时间的推移学习复杂的运动结构。在本文中,我们提出了一个循环姿态注意网络(RPAN)来解决这一挑战,在rnn的每一个时间步动作预测中,我们引入了一种新的姿态注意机制来自适应地学习姿态相关特征。更具体地说,我们在本文中做出了三个主要贡献。首先,与先前的姿势相关动作识别工作不同,我们的RPAN是一个端到端的循环网络,它可以利用人体姿势的重要时空演变来协助在统一框架下的动作识别。其次,我们的poseattention机制不是单独学习单个人体关节特征,而是通过在语义相关的人体关节上部分共享注意参数来学习鲁棒的人体部位特征。然后将这些人体部位特征输入到人体部位池化层中,以构建一个高度判别的姿势相关表示,用于时间动作建模。第三,我们的RPAN的一个重要副产品是视频中的姿态估计,它可以用于动作视频中的粗姿态标注。我们在两个流行的基准(即Sub-JHMDB和PennAction)上定量和定性地评估了拟议的RPAN。实验结果表明,在这些具有挑战性的数据集上,RPAN优于最近最先进的方法。
{"title":"RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos","authors":"Wenbin Du, Yali Wang, Y. Qiao","doi":"10.1109/ICCV.2017.402","DOIUrl":"https://doi.org/10.1109/ICCV.2017.402","url":null,"abstract":"Recent studies demonstrate the effectiveness of Recurrent Neural Networks (RNNs) for action recognition in videos. However, previous works mainly utilize video-level category as supervision to train RNNs, which may prohibit RNNs to learn complex motion structures along time. In this paper, we propose a recurrent pose-attention network (RPAN) to address this challenge, where we introduce a novel pose-attention mechanism to adaptively learn pose-related features at every time-step action prediction of RNNs. More specifically, we make three main contributions in this paper. Firstly, unlike previous works on pose-related action recognition, our RPAN is an end-toend recurrent network which can exploit important spatialtemporal evolutions of human pose to assist action recognition in a unified framework. Secondly, instead of learning individual human-joint features separately, our poseattention mechanism learns robust human-part features by sharing attention parameters partially on the semanticallyrelated human joints. These human-part features are then fed into the human-part pooling layer to construct a highlydiscriminative pose-related representation for temporal action modeling. Thirdly, one important byproduct of our RPAN is pose estimation in videos, which can be used for coarse pose annotation in action videos. We evaluate the proposed RPAN quantitatively and qualitatively on two popular benchmarks, i.e., Sub-JHMDB and PennAction. Experimental results show that RPAN outperforms the recent state-of-the-art methods on these challenging datasets.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"30 1","pages":"3745-3754"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81339891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 149
Object-Level Proposals 对象级的建议
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.527
Jianxiang Ma, Anlong Ming, Zilong Huang, Xinggang Wang, Yu Zhou
Edge and surface are two fundamental visual elements of an object. The majority of existing object proposal approaches utilize edge or edge-like cues to rank candidates, while we consider that the surface cue containing the 3D characteristic of objects should be captured effectively for proposals, which has been rarely discussed before. In this paper, an object-level proposal model is presented, which constructs an occlusion-based objectness taking the surface cue into account. Specifically, the better detection of occlusion edges is focused on to enrich the surface cue into proposals, namely, the occlusion-dominated fusion and normalization criterion are designed to obtain the approximately overall contour information, to enhance the occlusion edge map at utmost and thus boost proposals. Experimental results on the PASCAL VOC 2007 and MS COCO 2014 dataset demonstrate the effectiveness of our approach, which achieves around 6% improvement on the average recall than Edge Boxes at 1000 proposals and also leads to a modest gain on the performance of object detection.
边缘和表面是物体的两个基本视觉元素。大多数现有的对象建议方法使用边缘或类边缘线索对候选对象进行排序,而我们认为包含对象3D特征的表面线索应该有效地捕获到建议,这在以前很少被讨论。本文提出了一种对象级建议模型,该模型在考虑表面线索的情况下,构建了一个基于遮挡的对象。具体而言,重点是更好地检测遮挡边缘,以丰富建议的表面线索,即设计以遮挡为主的融合和归一化准则,以获得近似整体的轮廓信息,最大限度地增强遮挡边缘图,从而增强建议。PASCAL VOC 2007和MS COCO 2014数据集上的实验结果证明了我们方法的有效性,在1000个提议时,平均召回率比边缘盒提高了6%左右,并且在目标检测性能上也有了适度的提高。
{"title":"Object-Level Proposals","authors":"Jianxiang Ma, Anlong Ming, Zilong Huang, Xinggang Wang, Yu Zhou","doi":"10.1109/ICCV.2017.527","DOIUrl":"https://doi.org/10.1109/ICCV.2017.527","url":null,"abstract":"Edge and surface are two fundamental visual elements of an object. The majority of existing object proposal approaches utilize edge or edge-like cues to rank candidates, while we consider that the surface cue containing the 3D characteristic of objects should be captured effectively for proposals, which has been rarely discussed before. In this paper, an object-level proposal model is presented, which constructs an occlusion-based objectness taking the surface cue into account. Specifically, the better detection of occlusion edges is focused on to enrich the surface cue into proposals, namely, the occlusion-dominated fusion and normalization criterion are designed to obtain the approximately overall contour information, to enhance the occlusion edge map at utmost and thus boost proposals. Experimental results on the PASCAL VOC 2007 and MS COCO 2014 dataset demonstrate the effectiveness of our approach, which achieves around 6% improvement on the average recall than Edge Boxes at 1000 proposals and also leads to a modest gain on the performance of object detection.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"6 1","pages":"4931-4939"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83512891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Supervision by Fusion: Towards Unsupervised Learning of Deep Salient Object Detector 融合监督:面向深度显著目标检测器的无监督学习
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.436
Dingwen Zhang, Junwei Han, Yu Zhang
In light of the powerful learning capability of deep neural networks (DNNs), deep (convolutional) models have been built in recent years to address the task of salient object detection. Although training such deep saliency models can significantly improve the detection performance, it requires large-scale manual supervision in the form of pixel-level human annotation, which is highly labor-intensive and time-consuming. To address this problem, this paper makes the earliest effort to train a deep salient object detector without using any human annotation. The key insight is “supervision by fusion”, i.e., generating useful supervisory signals from the fusion process of weak but fast unsupervised saliency models. Based on this insight, we combine an intra-image fusion stream and a inter-image fusion stream in the proposed framework to generate the learning curriculum and pseudo ground-truth for supervising the training of the deep salient object detector. Comprehensive experiments on four benchmark datasets demonstrate that our method can approach the same network trained with full supervision (within 2-5% performance gap) and, more encouragingly, even outperform a number of fully supervised state-of-the-art approaches.
鉴于深度神经网络(dnn)强大的学习能力,近年来人们建立了深度(卷积)模型来解决显著目标检测的任务。虽然训练这种深度显著性模型可以显著提高检测性能,但它需要以像素级人工标注的形式进行大规模的人工监督,这是高度劳动密集型和耗时的。为了解决这个问题,本文最早尝试在不使用任何人工注释的情况下训练一个深度显著目标检测器。关键观点是“融合监督”,即从弱但快速的无监督显著性模型的融合过程中产生有用的监督信号。基于这一见解,我们在所提出的框架中结合图像内融合流和图像间融合流来生成学习课程和伪ground-truth,用于监督深度显著目标检测器的训练。在四个基准数据集上进行的综合实验表明,我们的方法可以接近经过完全监督训练的相同网络(在2-5%的性能差距内),更令人鼓舞的是,甚至优于许多完全监督的最先进方法。
{"title":"Supervision by Fusion: Towards Unsupervised Learning of Deep Salient Object Detector","authors":"Dingwen Zhang, Junwei Han, Yu Zhang","doi":"10.1109/ICCV.2017.436","DOIUrl":"https://doi.org/10.1109/ICCV.2017.436","url":null,"abstract":"In light of the powerful learning capability of deep neural networks (DNNs), deep (convolutional) models have been built in recent years to address the task of salient object detection. Although training such deep saliency models can significantly improve the detection performance, it requires large-scale manual supervision in the form of pixel-level human annotation, which is highly labor-intensive and time-consuming. To address this problem, this paper makes the earliest effort to train a deep salient object detector without using any human annotation. The key insight is “supervision by fusion”, i.e., generating useful supervisory signals from the fusion process of weak but fast unsupervised saliency models. Based on this insight, we combine an intra-image fusion stream and a inter-image fusion stream in the proposed framework to generate the learning curriculum and pseudo ground-truth for supervising the training of the deep salient object detector. Comprehensive experiments on four benchmark datasets demonstrate that our method can approach the same network trained with full supervision (within 2-5% performance gap) and, more encouragingly, even outperform a number of fully supervised state-of-the-art approaches.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"54 1","pages":"4068-4076"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85594968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 144
Makeup-Go: Blind Reversion of Portrait Edit 化妆- go:肖像编辑的盲目还原
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.482
Ying-Cong Chen, Xiaoyong Shen, Jiaya Jia
Virtual face beautification (or markup) becomes common operations in camera or image processing Apps, which is actually deceiving. In this paper, we propose the task of restoring a portrait image from this process. As the first attempt along this line, we assume unknown global operations on human faces and aim to tackle the two issues of skin smoothing and skin color change. These two tasks, intriguingly, impose very different difficulties to estimate subtle details and major color variation. We propose a Component Regression Network (CRN) and address the limitation of using Euclidean loss in blind reversion. CRN maps the edited portrait images back to the original ones without knowing beautification operation details. Our experiments demonstrate effectiveness of the system for this novel task.
虚拟面部美化(或标记)成为相机或图像处理应用程序的常见操作,这实际上是欺骗。在本文中,我们提出了从这个过程中恢复肖像图像的任务。作为这一思路的第一次尝试,我们对人脸进行了未知的全局操作,旨在解决皮肤平滑和肤色变化两个问题。有趣的是,这两项任务在估计细微细节和主要颜色变化方面带来了截然不同的困难。我们提出了一种成分回归网络(CRN),并解决了在盲回归中使用欧几里得损失的局限性。CRN在不了解美化操作细节的情况下,将编辑后的人像图像映射回原始图像。我们的实验证明了该系统对这项新任务的有效性。
{"title":"Makeup-Go: Blind Reversion of Portrait Edit","authors":"Ying-Cong Chen, Xiaoyong Shen, Jiaya Jia","doi":"10.1109/ICCV.2017.482","DOIUrl":"https://doi.org/10.1109/ICCV.2017.482","url":null,"abstract":"Virtual face beautification (or markup) becomes common operations in camera or image processing Apps, which is actually deceiving. In this paper, we propose the task of restoring a portrait image from this process. As the first attempt along this line, we assume unknown global operations on human faces and aim to tackle the two issues of skin smoothing and skin color change. These two tasks, intriguingly, impose very different difficulties to estimate subtle details and major color variation. We propose a Component Regression Network (CRN) and address the limitation of using Euclidean loss in blind reversion. CRN maps the edited portrait images back to the original ones without knowing beautification operation details. Our experiments demonstrate effectiveness of the system for this novel task.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"29 1","pages":"4511-4519"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89798130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
期刊
2017 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1