首页 > 最新文献

2017 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios? 边界框、分割和目标坐标:识别对自动驾驶场景中3D场景流估计有多重要?
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.281
Aseem Behl, O. Jafari, Siva Karthik Mustikovela, Hassan Abu Alhaija, C. Rother, Andreas Geiger
Existing methods for 3D scene flow estimation often fail in the presence of large displacement or local ambiguities, e.g., at texture-less or reflective surfaces. However, these challenges are omnipresent in dynamic road scenes, which is the focus of this work. Our main contribution is to overcome these 3D motion estimation problems by exploiting recognition. In particular, we investigate the importance of recognition granularity, from coarse 2D bounding box estimates over 2D instance segmentations to fine-grained 3D object part predictions. We compute these cues using CNNs trained on a newly annotated dataset of stereo images and integrate them into a CRF-based model for robust 3D scene flow estimation - an approach we term Instance Scene Flow. We analyze the importance of each recognition cue in an ablation study and observe that the instance segmentation cue is by far strongest, in our setting. We demonstrate the effectiveness of our method on the challenging KITTI 2015 scene flow benchmark where we achieve state-of-the-art performance at the time of submission.
现有的3D场景流估计方法经常在存在大位移或局部模糊的情况下失败,例如,在无纹理或反射表面。然而,这些挑战在动态道路场景中无处不在,这是本工作的重点。我们的主要贡献是通过利用识别来克服这些3D运动估计问题。特别地,我们研究了识别粒度的重要性,从2D实例分割的粗2D边界框估计到细粒度的3D对象部分预测。我们使用在新注释的立体图像数据集上训练的cnn来计算这些线索,并将它们集成到基于crf的模型中,用于鲁棒的3D场景流估计-我们称之为实例场景流的方法。我们分析了消融研究中每个识别线索的重要性,并观察到在我们的设置中,实例分割线索是迄今为止最强的。我们证明了我们的方法在具有挑战性的KITTI 2015场景流基准上的有效性,我们在提交时实现了最先进的性能。
{"title":"Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios?","authors":"Aseem Behl, O. Jafari, Siva Karthik Mustikovela, Hassan Abu Alhaija, C. Rother, Andreas Geiger","doi":"10.1109/ICCV.2017.281","DOIUrl":"https://doi.org/10.1109/ICCV.2017.281","url":null,"abstract":"Existing methods for 3D scene flow estimation often fail in the presence of large displacement or local ambiguities, e.g., at texture-less or reflective surfaces. However, these challenges are omnipresent in dynamic road scenes, which is the focus of this work. Our main contribution is to overcome these 3D motion estimation problems by exploiting recognition. In particular, we investigate the importance of recognition granularity, from coarse 2D bounding box estimates over 2D instance segmentations to fine-grained 3D object part predictions. We compute these cues using CNNs trained on a newly annotated dataset of stereo images and integrate them into a CRF-based model for robust 3D scene flow estimation - an approach we term Instance Scene Flow. We analyze the importance of each recognition cue in an ablation study and observe that the instance segmentation cue is by far strongest, in our setting. We demonstrate the effectiveness of our method on the challenging KITTI 2015 scene flow benchmark where we achieve state-of-the-art performance at the time of submission.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"79 1","pages":"2593-2602"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90838920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 130
PanNet: A Deep Network Architecture for Pan-Sharpening PanNet:用于泛锐化的深度网络架构
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.193
Junfeng Yang, Xueyang Fu, Yuwen Hu, Yue Huang, Xinghao Ding, J. Paisley
We propose a deep network architecture for the pan-sharpening problem called PanNet. We incorporate domain-specific knowledge to design our PanNet architecture by focusing on the two aims of the pan-sharpening problem: spectral and spatial preservation. For spectral preservation, we add up-sampled multispectral images to the network output, which directly propagates the spectral information to the reconstructed image. To preserve spatial structure, we train our network parameters in the high-pass filtering domain rather than the image domain. We show that the trained network generalizes well to images from different satellites without needing retraining. Experiments show significant improvement over state-of-the-art methods visually and in terms of standard quality metrics.
我们针对泛锐化问题提出了一种深度网络架构,称为PanNet。我们结合特定领域的知识来设计我们的PanNet架构,重点关注泛锐化问题的两个目标:光谱和空间保存。为了保持光谱,我们将上采样的多光谱图像加入到网络输出中,直接将光谱信息传播到重建图像中。为了保持空间结构,我们在高通滤波域而不是图像域训练网络参数。我们表明,训练后的网络可以很好地泛化来自不同卫星的图像,而无需再训练。实验表明,在视觉上和标准质量度量方面,比最先进的方法有了显著的改进。
{"title":"PanNet: A Deep Network Architecture for Pan-Sharpening","authors":"Junfeng Yang, Xueyang Fu, Yuwen Hu, Yue Huang, Xinghao Ding, J. Paisley","doi":"10.1109/ICCV.2017.193","DOIUrl":"https://doi.org/10.1109/ICCV.2017.193","url":null,"abstract":"We propose a deep network architecture for the pan-sharpening problem called PanNet. We incorporate domain-specific knowledge to design our PanNet architecture by focusing on the two aims of the pan-sharpening problem: spectral and spatial preservation. For spectral preservation, we add up-sampled multispectral images to the network output, which directly propagates the spectral information to the reconstructed image. To preserve spatial structure, we train our network parameters in the high-pass filtering domain rather than the image domain. We show that the trained network generalizes well to images from different satellites without needing retraining. Experiments show significant improvement over state-of-the-art methods visually and in terms of standard quality metrics.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"50 1","pages":"1753-1761"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90983903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 427
Filter Selection for Hyperspectral Estimation 高光谱估计的滤波器选择
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.342
Boaz Arad, O. Ben-Shahar
While recovery of hyperspectral signals from natural RGB images has been a recent subject of exploration, little to no consideration has been given to the camera response profiles used in the recovery process. In this paper we demonstrate that optimal selection of camera response filters may improve hyperspectral estimation accuracy by over 33%, emphasizing the importance of considering and selecting these response profiles wisely. Additionally, we present an evolutionary optimization methodology for optimal filter set selection from very large filter spaces, an approach that facilitates practical selection from families of customizable filters or filter optimization for multispectral cameras with more than 3 channels.
虽然从自然RGB图像中恢复高光谱信号一直是最近探索的主题,但很少或根本没有考虑到恢复过程中使用的相机响应概况。在本文中,我们证明了相机响应滤波器的最佳选择可以使高光谱估计精度提高33%以上,强调了明智地考虑和选择这些响应曲线的重要性。此外,我们提出了一种进化优化方法,用于从非常大的滤波器空间中选择最优滤波器集,这种方法有助于从可定制滤波器家族中进行实际选择,或对具有3个以上通道的多光谱相机进行滤波器优化。
{"title":"Filter Selection for Hyperspectral Estimation","authors":"Boaz Arad, O. Ben-Shahar","doi":"10.1109/ICCV.2017.342","DOIUrl":"https://doi.org/10.1109/ICCV.2017.342","url":null,"abstract":"While recovery of hyperspectral signals from natural RGB images has been a recent subject of exploration, little to no consideration has been given to the camera response profiles used in the recovery process. In this paper we demonstrate that optimal selection of camera response filters may improve hyperspectral estimation accuracy by over 33%, emphasizing the importance of considering and selecting these response profiles wisely. Additionally, we present an evolutionary optimization methodology for optimal filter set selection from very large filter spaces, an approach that facilitates practical selection from families of customizable filters or filter optimization for multispectral cameras with more than 3 channels.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"3172-3180"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90642457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation? SceneNet RGB-D: 5M合成图像在室内分割上能胜过通用ImageNet预训练吗?
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.292
J. McCormac, Ankur Handa, Stefan Leutenegger, A. Davison
We introduce SceneNet RGB-D, a dataset providing pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentation, and object detection. It also provides perfect camera poses and depth data, allowing investigation into geometric computer vision problems such as optical flow, camera pose estimation, and 3D scene labelling tasks. Random sampling permits virtually unlimited scene configurations, and here we provide 5M rendered RGB-D images from 16K randomly generated 3D trajectories in synthetic layouts, with random but physically simulated object configurations. We compare the semantic segmentation performance of network weights produced from pretraining on RGB images from our dataset against generic VGG-16 ImageNet weights. After fine-tuning on the SUN RGB-D and NYUv2 real-world datasets we find in both cases that the synthetically pre-trained network outperforms the VGG-16 weights. When synthetic pre-training includes a depth channel (something ImageNet cannot natively provide) the performance is greater still. This suggests that large-scale high-quality synthetic RGB datasets with task-specific labels can be more useful for pretraining than real-world generic pre-training such as ImageNet. We host the dataset at http://robotvault. bitbucket.io/scenenet-rgbd.html.
我们介绍了SceneNet RGB-D,这是一个为场景理解问题(如语义分割、实例分割和目标检测)提供像素完美的地面真相的数据集。它还提供完美的相机姿势和深度数据,允许调查几何计算机视觉问题,如光流,相机姿势估计和3D场景标记任务。随机抽样允许几乎无限的场景配置,在这里,我们提供了5M渲染RGB-D图像从16K随机生成的3D轨迹合成布局,随机但物理模拟对象配置。我们比较了来自我们数据集的RGB图像的预训练产生的网络权重与通用VGG-16 ImageNet权重的语义分割性能。在对SUN RGB-D和NYUv2真实数据集进行微调后,我们发现在这两种情况下,综合预训练的网络优于VGG-16权重。当合成预训练包括深度通道(ImageNet本身无法提供的东西)时,性能仍然更好。这表明具有任务特定标签的大规模高质量合成RGB数据集对于预训练可能比真实世界的通用预训练(如ImageNet)更有用。我们将数据集托管在http://robotvault上。bitbucket.io / scenenet-rgbd.html。
{"title":"SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation?","authors":"J. McCormac, Ankur Handa, Stefan Leutenegger, A. Davison","doi":"10.1109/ICCV.2017.292","DOIUrl":"https://doi.org/10.1109/ICCV.2017.292","url":null,"abstract":"We introduce SceneNet RGB-D, a dataset providing pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentation, and object detection. It also provides perfect camera poses and depth data, allowing investigation into geometric computer vision problems such as optical flow, camera pose estimation, and 3D scene labelling tasks. Random sampling permits virtually unlimited scene configurations, and here we provide 5M rendered RGB-D images from 16K randomly generated 3D trajectories in synthetic layouts, with random but physically simulated object configurations. We compare the semantic segmentation performance of network weights produced from pretraining on RGB images from our dataset against generic VGG-16 ImageNet weights. After fine-tuning on the SUN RGB-D and NYUv2 real-world datasets we find in both cases that the synthetically pre-trained network outperforms the VGG-16 weights. When synthetic pre-training includes a depth channel (something ImageNet cannot natively provide) the performance is greater still. This suggests that large-scale high-quality synthetic RGB datasets with task-specific labels can be more useful for pretraining than real-world generic pre-training such as ImageNet. We host the dataset at http://robotvault. bitbucket.io/scenenet-rgbd.html.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"60 1","pages":"2697-2706"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89487958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 263
Semantic Line Detection and Its Applications 语义线检测及其应用
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.350
Jun-Tae Lee, Han-Ul Kim, Chulwoo Lee, Chang-Su Kim
Semantic lines characterize the layout of an image. Despite their importance in image analysis and scene understanding, there is no reliable research for semantic line detection. In this paper, we propose a semantic line detector using a convolutional neural network with multi-task learning, by regarding the line detection as a combination of classification and regression tasks. We use convolution and max-pooling layers to obtain multi-scale feature maps for an input image. Then, we develop the line pooling layer to extract a feature vector for each candidate line from the feature maps. Next, we feed the feature vector into the parallel classification and regression layers. The classification layer decides whether the line candidate is semant ic or not. In case of a semantic line, the regression layer determines the offset for refining the line location. Experimental results show that the proposed detector extracts semantic lines accurately and reliably. Moreover, we demonstrate that the proposed detector can be used successfully in three applications: horizon estimation, composition enhancement, and image simplification.
语义线是图像布局的特征。尽管语义线检测在图像分析和场景理解中具有重要意义,但目前还没有可靠的研究。在本文中,我们提出了一种使用多任务学习的卷积神经网络的语义线检测器,将线检测视为分类和回归任务的组合。我们使用卷积和最大池化层来获得输入图像的多尺度特征映射。然后,我们开发了线池层,从特征映射中提取每条候选线的特征向量。接下来,我们将特征向量馈送到并行分类和回归层。分类层决定候选行是否具有语义性。在语义线的情况下,回归层确定偏移量以精炼线的位置。实验结果表明,该检测器能够准确、可靠地提取语义线。此外,我们还证明了该检测器可以成功地用于三种应用:水平估计、成分增强和图像简化。
{"title":"Semantic Line Detection and Its Applications","authors":"Jun-Tae Lee, Han-Ul Kim, Chulwoo Lee, Chang-Su Kim","doi":"10.1109/ICCV.2017.350","DOIUrl":"https://doi.org/10.1109/ICCV.2017.350","url":null,"abstract":"Semantic lines characterize the layout of an image. Despite their importance in image analysis and scene understanding, there is no reliable research for semantic line detection. In this paper, we propose a semantic line detector using a convolutional neural network with multi-task learning, by regarding the line detection as a combination of classification and regression tasks. We use convolution and max-pooling layers to obtain multi-scale feature maps for an input image. Then, we develop the line pooling layer to extract a feature vector for each candidate line from the feature maps. Next, we feed the feature vector into the parallel classification and regression layers. The classification layer decides whether the line candidate is semant ic or not. In case of a semantic line, the regression layer determines the offset for refining the line location. Experimental results show that the proposed detector extracts semantic lines accurately and reliably. Moreover, we demonstrate that the proposed detector can be used successfully in three applications: horizon estimation, composition enhancement, and image simplification.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"35 1","pages":"3249-3257"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76154482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Interleaved Group Convolutions 交错群卷积
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.469
Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang
In this paper, we present a simple and modularized neural network architecture, named interleaved group convolutional neural networks (IGCNets). The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution. The two group convolutions are complementary: (i) the convolution on each partition in primary group convolution is a spatial convolution, while on each partition in secondary group convolution, the convolution is a point-wise convolution; (ii) the channels in the same secondary partition come from different primary partitions. We discuss one representative advantage: Wider than a regular convolution with the number of parameters and the computation complexity preserved. We also show that regular convolutions, group convolution with summation fusion, and the Xception block are special cases of interleaved group convolutions. Empirical results over standard benchmarks, CIFAR-10, CIFAR-100, SVHN and ImageNet demonstrate that our networks are more efficient in using parameters and computation complexity with similar or higher accuracy.
本文提出了一种简单的模块化神经网络结构,称为交错群卷积神经网络(IGCNets)。重点在于一个新的构建块,一对两个连续的交错群卷积:主群卷积和次群卷积。这两个群卷积是互补的:(i)在初级群卷积中,每个分区上的卷积是一个空间卷积,而在次级群卷积中,每个分区上的卷积是一个点向卷积;(ii)同一辅助分区中的通道来自不同的主分区。我们讨论了一个代表性的优点:比常规卷积更宽,参数的数量和计算复杂度保持不变。我们还证明了正则卷积、带求和融合的群卷积和例外块是交错群卷积的特殊情况。在标准基准,CIFAR-10, CIFAR-100, SVHN和ImageNet上的经验结果表明,我们的网络在使用参数和计算复杂度方面更有效,具有相似或更高的精度。
{"title":"Interleaved Group Convolutions","authors":"Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang","doi":"10.1109/ICCV.2017.469","DOIUrl":"https://doi.org/10.1109/ICCV.2017.469","url":null,"abstract":"In this paper, we present a simple and modularized neural network architecture, named interleaved group convolutional neural networks (IGCNets). The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution. The two group convolutions are complementary: (i) the convolution on each partition in primary group convolution is a spatial convolution, while on each partition in secondary group convolution, the convolution is a point-wise convolution; (ii) the channels in the same secondary partition come from different primary partitions. We discuss one representative advantage: Wider than a regular convolution with the number of parameters and the computation complexity preserved. We also show that regular convolutions, group convolution with summation fusion, and the Xception block are special cases of interleaved group convolutions. Empirical results over standard benchmarks, CIFAR-10, CIFAR-100, SVHN and ImageNet demonstrate that our networks are more efficient in using parameters and computation complexity with similar or higher accuracy.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"224 1","pages":"4383-4392"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79006241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 216
Weakly Supervised Manifold Learning for Dense Semantic Object Correspondence 密集语义对象对应的弱监督流形学习
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.192
Utkarsh Gaur, B. S. Manjunath
The goal of the semantic object correspondence problem is to compute dense association maps for a pair of images such that the same object parts get matched for very different appearing object instances. Our method builds on the recent findings that deep convolutional neural networks (DCNNs) implicitly learn a latent model of object parts even when trained for classification. We also leverage a key correspondence problem insight that the geometric structure between object parts is consistent across multiple object instances. These two concepts are then combined in the form of a novel optimization scheme. This optimization learns a feature embedding by rewarding for projecting features closer on the manifold if they have low feature-space distance. Simultaneously, the optimization penalizes feature clusters whose geometric structure is inconsistent with the observed geometric structure of object parts. In this manner, by accounting for feature space similarities and feature neighborhood context together, a manifold is learned where features belonging to semantically similar object parts cluster together. We also describe transferring these embedded features to the sister tasks of semantic keypoint classification and localization task via a Siamese DCNN. We provide qualitative results on the Pascal VOC 2012 images and quantitative results on the Pascal Berkeley dataset where we improve on the state of the art by over 5% on classification and over 9% on localization tasks.
语义对象对应问题的目标是为一对图像计算密集的关联映射,以便为非常不同的对象实例匹配相同的对象部分。我们的方法建立在最近的发现之上,即深度卷积神经网络(DCNNs)即使在进行分类训练时也会隐式地学习对象部分的潜在模型。我们还利用了一个关键的对应问题洞察力,即对象部分之间的几何结构在多个对象实例中是一致的。然后将这两个概念以一种新的优化方案的形式结合起来。该优化通过奖励在流形上投影更近的特征来学习特征嵌入,如果特征空间距离较低。同时,对几何结构与观测到的目标部位几何结构不一致的特征簇进行处罚。通过将特征空间相似度和特征邻域上下文结合在一起,学习到语义相似的对象部分的特征聚类的流形。我们还描述了通过连体DCNN将这些嵌入特征转移到语义关键点分类和定位任务的姊妹任务中。我们在Pascal VOC 2012图像上提供定性结果,在Pascal Berkeley数据集上提供定量结果,其中我们在分类任务上提高了5%以上,在定位任务上提高了9%以上。
{"title":"Weakly Supervised Manifold Learning for Dense Semantic Object Correspondence","authors":"Utkarsh Gaur, B. S. Manjunath","doi":"10.1109/ICCV.2017.192","DOIUrl":"https://doi.org/10.1109/ICCV.2017.192","url":null,"abstract":"The goal of the semantic object correspondence problem is to compute dense association maps for a pair of images such that the same object parts get matched for very different appearing object instances. Our method builds on the recent findings that deep convolutional neural networks (DCNNs) implicitly learn a latent model of object parts even when trained for classification. We also leverage a key correspondence problem insight that the geometric structure between object parts is consistent across multiple object instances. These two concepts are then combined in the form of a novel optimization scheme. This optimization learns a feature embedding by rewarding for projecting features closer on the manifold if they have low feature-space distance. Simultaneously, the optimization penalizes feature clusters whose geometric structure is inconsistent with the observed geometric structure of object parts. In this manner, by accounting for feature space similarities and feature neighborhood context together, a manifold is learned where features belonging to semantically similar object parts cluster together. We also describe transferring these embedded features to the sister tasks of semantic keypoint classification and localization task via a Siamese DCNN. We provide qualitative results on the Pascal VOC 2012 images and quantitative results on the Pascal Berkeley dataset where we improve on the state of the art by over 5% on classification and over 9% on localization tasks.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"68 1","pages":"1744-1752"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79550154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Should We Encode Rain Streaks in Video as Deterministic or Stochastic? 我们应该将视频中的雨条纹编码为确定性还是随机?
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.275
Wei Wei, Lixuan Yi, Qi Xie, Qian Zhao, Deyu Meng, Zongben Xu
Videos taken in the wild sometimes contain unexpected rain streaks, which brings difficulty in subsequent video processing tasks. Rain streak removal in a video (RSRV) is thus an important issue and has been attracting much attention in computer vision. Different from previous RSRV methods formulating rain streaks as a deterministic message, this work first encodes the rains in a stochastic manner, i.e., a patch-based mixture of Gaussians. Such modification makes the proposed model capable of finely adapting a wider range of rain variations instead of certain types of rain configurations as traditional. By integrating with the spatiotemporal smoothness configuration of moving objects and low-rank structure of background scene, we propose a concise model for RSRV, containing one likelihood term imposed on the rain streak layer and two prior terms on the moving object and background scene layers of the video. Experiments implemented on videos with synthetic and real rains verify the superiority of the proposed method, as compared with the state-of-the-art methods, both visually and quantitatively in various performance metrics.
在野外拍摄的视频有时会出现意外的雨点,这给后续的视频处理任务带来了困难。因此,视频中的雨痕去除(RSRV)是计算机视觉领域的一个重要问题,一直备受关注。不同于以前的RSRV方法将雨条作为确定性信息,这项工作首先以随机方式对降雨进行编码,即基于斑块的高斯混合。这种修正使所提出的模型能够很好地适应更大范围的降雨变化,而不是像传统的那样适应某些类型的降雨结构。结合运动物体的时空平滑配置和背景场景的低秩结构,提出了一种简洁的RSRV模型,该模型在视频的雨条纹层上包含一个似然项,在运动物体和背景场景层上包含两个先验项。在合成和真实降雨视频上实施的实验验证了所提出方法的优越性,与最先进的方法相比,在各种性能指标上都是视觉和定量的。
{"title":"Should We Encode Rain Streaks in Video as Deterministic or Stochastic?","authors":"Wei Wei, Lixuan Yi, Qi Xie, Qian Zhao, Deyu Meng, Zongben Xu","doi":"10.1109/ICCV.2017.275","DOIUrl":"https://doi.org/10.1109/ICCV.2017.275","url":null,"abstract":"Videos taken in the wild sometimes contain unexpected rain streaks, which brings difficulty in subsequent video processing tasks. Rain streak removal in a video (RSRV) is thus an important issue and has been attracting much attention in computer vision. Different from previous RSRV methods formulating rain streaks as a deterministic message, this work first encodes the rains in a stochastic manner, i.e., a patch-based mixture of Gaussians. Such modification makes the proposed model capable of finely adapting a wider range of rain variations instead of certain types of rain configurations as traditional. By integrating with the spatiotemporal smoothness configuration of moving objects and low-rank structure of background scene, we propose a concise model for RSRV, containing one likelihood term imposed on the rain streak layer and two prior terms on the moving object and background scene layers of the video. Experiments implemented on videos with synthetic and real rains verify the superiority of the proposed method, as compared with the state-of-the-art methods, both visually and quantitatively in various performance metrics.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"58 1","pages":"2535-2544"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83934869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 115
Realistic Dynamic Facial Textures from a Single Image Using GANs 使用gan从单个图像中获得逼真的动态面部纹理
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.580
Kyle Olszewski, Zimo Li, Chao Yang, Yi Zhou, Ronald Yu, Zeng Huang, Sitao Xiang, Shunsuke Saito, Pushmeet Kohli, Hao Li
We present a novel method to realistically puppeteer and animate a face from a single RGB image using a source video sequence. We begin by fitting a multilinear PCA model to obtain the 3D geometry and a single texture of the target face. In order for the animation to be realistic, however, we need dynamic per-frame textures that capture subtle wrinkles and deformations corresponding to the animated facial expressions. This problem is highly undercon-strained, as dynamic textures cannot be obtained directly from a single image. Furthermore, if the target face has a closed mouth, it is not possible to obtain actual images of the mouth interior. To address this issue, we train a Deep Generative Network that can infer realistic per-frame texture deformations, including the mouth interior, of the target identity using the per-frame source textures and the single target texture. By retargeting the PCA expression geometry from the source, as well as using the newly inferred texture, we can both animate the face and perform video face replacement on the source video using the target appearance.
我们提出了一种新颖的方法,可以使用源视频序列从单个RGB图像中逼真地操纵和动画人脸。我们首先拟合一个多线性PCA模型,以获得三维几何形状和单一纹理的目标面。然而,为了使动画逼真,我们需要动态的每帧纹理来捕捉与动画面部表情相对应的微妙皱纹和变形。这个问题是高度欠约束的,因为动态纹理不能直接从单个图像中获得。此外,如果目标面部有一个封闭的嘴,则不可能获得嘴内部的实际图像。为了解决这个问题,我们训练了一个深度生成网络,该网络可以使用每帧源纹理和单个目标纹理来推断目标身份的逼真的每帧纹理变形,包括嘴巴内部。通过从源视频中重新定位PCA表达式几何,以及使用新推断的纹理,我们可以使用目标外观对源视频进行人脸动画和视频人脸替换。
{"title":"Realistic Dynamic Facial Textures from a Single Image Using GANs","authors":"Kyle Olszewski, Zimo Li, Chao Yang, Yi Zhou, Ronald Yu, Zeng Huang, Sitao Xiang, Shunsuke Saito, Pushmeet Kohli, Hao Li","doi":"10.1109/ICCV.2017.580","DOIUrl":"https://doi.org/10.1109/ICCV.2017.580","url":null,"abstract":"We present a novel method to realistically puppeteer and animate a face from a single RGB image using a source video sequence. We begin by fitting a multilinear PCA model to obtain the 3D geometry and a single texture of the target face. In order for the animation to be realistic, however, we need dynamic per-frame textures that capture subtle wrinkles and deformations corresponding to the animated facial expressions. This problem is highly undercon-strained, as dynamic textures cannot be obtained directly from a single image. Furthermore, if the target face has a closed mouth, it is not possible to obtain actual images of the mouth interior. To address this issue, we train a Deep Generative Network that can infer realistic per-frame texture deformations, including the mouth interior, of the target identity using the per-frame source textures and the single target texture. By retargeting the PCA expression geometry from the source, as well as using the newly inferred texture, we can both animate the face and perform video face replacement on the source video using the target appearance.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"29 1","pages":"5439-5448"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77398188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
Efficient Online Local Metric Adaptation via Negative Samples for Person Re-identification 基于负样本的高效在线局部度量自适应人的再识别
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.265
Jiahuan Zhou, Pei Yu, Wei Tang, Ying Wu
Many existing person re-identification (PRID) methods typically attempt to train a faithful global metric offline to cover the enormous visual appearance variations, so as to directly use it online on various probes for identity match- ing. However, their need for a huge set of positive training pairs is very demanding in practice. In contrast to these methods, this paper advocates a different paradigm: part of the learning can be performed online but with nominal costs, so as to achieve online metric adaptation for different input probes. A major challenge here is that no positive training pairs are available for the probe anymore. By only exploiting easily-available negative samples, we propose a novel solution to achieve local metric adaptation effectively and efficiently. For each probe at the test time, it learns a strictly positive semi-definite dedicated local metric. Comparing to offline global metric learning, its com- putational cost is negligible. The insight of this new method is that the local hard negative samples can actually provide tight constraints to fine tune the metric locally. This new local metric adaptation method is generally applicable, as it can be used on top of any global metric to enhance its performance. In addition, this paper gives in-depth the- oretical analysis and justification of the new method. We prove that our new method guarantees the reduction of the classification error asymptotically, and prove that it actually learns the optimal local metric to best approximate the asymptotic case by a finite number of training data. Extensive experiments and comparative studies on almost all major benchmarks (VIPeR, QMUL GRID, CUHK Campus, CUHK03 and Market-1501) have confirmed the effectiveness and superiority of our method.
现有的许多人再识别(PRID)方法通常试图在离线状态下训练一个忠实的全局度量来覆盖巨大的视觉外观变化,从而直接将其在线上用于各种探针上进行身份匹配。然而,在实践中,他们对大量积极训练的需求是非常苛刻的。与这些方法相比,本文提倡一种不同的范式:部分学习可以在线进行,但代价不大,从而实现对不同输入探针的在线度量适应。这里的一个主要挑战是探针再也没有可用的正训练对了。通过仅利用容易获得的负样本,我们提出了一种有效地实现局部度量自适应的新方法。对于测试时的每个探测,它学习一个严格正的半定专用局部度量。与离线全局度量学习相比,其计算成本可以忽略不计。这种新方法的洞察力在于,局部硬负样本实际上可以提供严格的约束来局部微调度量。这种新的局部度量自适应方法是普遍适用的,因为它可以在任何全局度量之上使用,以提高其性能。此外,本文还对新方法进行了深入的理论分析和论证。我们证明了我们的新方法保证了分类误差的渐近减小,并证明了它实际上是通过有限个数的训练数据学习到最优的局部度量来逼近渐近情况。在几乎所有主要基准测试(VIPeR、QMUL GRID、中大校园、中大03和Market-1501)上进行的大量实验和比较研究都证实了我们方法的有效性和优越性。
{"title":"Efficient Online Local Metric Adaptation via Negative Samples for Person Re-identification","authors":"Jiahuan Zhou, Pei Yu, Wei Tang, Ying Wu","doi":"10.1109/ICCV.2017.265","DOIUrl":"https://doi.org/10.1109/ICCV.2017.265","url":null,"abstract":"Many existing person re-identification (PRID) methods typically attempt to train a faithful global metric offline to cover the enormous visual appearance variations, so as to directly use it online on various probes for identity match- ing. However, their need for a huge set of positive training pairs is very demanding in practice. In contrast to these methods, this paper advocates a different paradigm: part of the learning can be performed online but with nominal costs, so as to achieve online metric adaptation for different input probes. A major challenge here is that no positive training pairs are available for the probe anymore. By only exploiting easily-available negative samples, we propose a novel solution to achieve local metric adaptation effectively and efficiently. For each probe at the test time, it learns a strictly positive semi-definite dedicated local metric. Comparing to offline global metric learning, its com- putational cost is negligible. The insight of this new method is that the local hard negative samples can actually provide tight constraints to fine tune the metric locally. This new local metric adaptation method is generally applicable, as it can be used on top of any global metric to enhance its performance. In addition, this paper gives in-depth the- oretical analysis and justification of the new method. We prove that our new method guarantees the reduction of the classification error asymptotically, and prove that it actually learns the optimal local metric to best approximate the asymptotic case by a finite number of training data. Extensive experiments and comparative studies on almost all major benchmarks (VIPeR, QMUL GRID, CUHK Campus, CUHK03 and Market-1501) have confirmed the effectiveness and superiority of our method.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"20 1","pages":"2439-2447"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87609331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
期刊
2017 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1