首页 > 最新文献

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition最新文献

英文 中文
Multi-task Adversarial Network for Disentangled Feature Learning 解纠缠特征学习的多任务对抗网络
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00394
Yang Liu, Zhaowen Wang, Hailin Jin, I. Wassell
We address the problem of image feature learning for the applications where multiple factors exist in the image generation process and only some factors are of our interest. We present a novel multi-task adversarial network based on an encoder-discriminator-generator architecture. The encoder extracts a disentangled feature representation for the factors of interest. The discriminators classify each of the factors as individual tasks. The encoder and the discriminators are trained cooperatively on factors of interest, but in an adversarial way on factors of distraction. The generator provides further regularization on the learned feature by reconstructing images with shared factors as the input image. We design a new optimization scheme to stabilize the adversarial optimization process when multiple distributions need to be aligned. The experiments on face recognition and font recognition tasks show that our method outperforms the state-of-the-art methods in terms of both recognizing the factors of interest and generalization to images with unseen variations.
我们针对图像生成过程中存在多个因素且只有一些因素是我们感兴趣的应用,解决了图像特征学习的问题。提出了一种基于编码器-鉴别器-生成器结构的多任务对抗网络。编码器为感兴趣的因素提取一个解纠缠的特征表示。鉴别器将每个因素分类为单独的任务。编码器和鉴别器在兴趣因素上进行合作训练,但在分心因素上进行对抗训练。生成器通过重建具有共享因子的图像作为输入图像,对学习到的特征进行进一步的正则化。我们设计了一种新的优化方案,以稳定多个分布需要对齐时的对抗优化过程。在人脸识别和字体识别任务上的实验表明,我们的方法在识别兴趣因素和对不可见变化的图像的泛化方面都优于最先进的方法。
{"title":"Multi-task Adversarial Network for Disentangled Feature Learning","authors":"Yang Liu, Zhaowen Wang, Hailin Jin, I. Wassell","doi":"10.1109/CVPR.2018.00394","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00394","url":null,"abstract":"We address the problem of image feature learning for the applications where multiple factors exist in the image generation process and only some factors are of our interest. We present a novel multi-task adversarial network based on an encoder-discriminator-generator architecture. The encoder extracts a disentangled feature representation for the factors of interest. The discriminators classify each of the factors as individual tasks. The encoder and the discriminators are trained cooperatively on factors of interest, but in an adversarial way on factors of distraction. The generator provides further regularization on the learned feature by reconstructing images with shared factors as the input image. We design a new optimization scheme to stabilize the adversarial optimization process when multiple distributions need to be aligned. The experiments on face recognition and font recognition tasks show that our method outperforms the state-of-the-art methods in terms of both recognizing the factors of interest and generalization to images with unseen variations.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"3743-3751"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89378674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Controllable Video Generation with Sparse Trajectories 稀疏轨迹的可控视频生成
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00819
Zekun Hao, Xun Huang, Serge J. Belongie
Video generation and manipulation is an important yet challenging task in computer vision. Existing methods usually lack ways to explicitly control the synthesized motion. In this work, we present a conditional video generation model that allows detailed control over the motion of the generated video. Given the first frame and sparse motion trajectories specified by users, our model can synthesize a video with corresponding appearance and motion. We propose to combine the advantage of copying pixels from the given frame and hallucinating the lightness difference from scratch which help generate sharp video while keeping the model robust to occlusion and lightness change. We also propose a training paradigm that calculate trajectories from video clips, which eliminated the need of annotated training data. Experiments on several standard benchmarks demonstrate that our approach can generate realistic videos comparable to state-of-the-art video generation and video prediction methods while the motion of the generated videos can correspond well with user input.
视频的生成和处理是计算机视觉领域一个重要而又具有挑战性的课题。现有的方法通常缺乏明确控制合成运动的方法。在这项工作中,我们提出了一个条件视频生成模型,允许对生成视频的运动进行详细控制。给定第一帧和用户指定的稀疏运动轨迹,我们的模型可以合成具有相应外观和运动的视频。我们建议结合从给定帧复制像素和从头开始产生亮度差异的优势,这有助于生成清晰的视频,同时保持模型对遮挡和亮度变化的鲁棒性。我们还提出了一种从视频片段中计算轨迹的训练范式,从而消除了对带注释的训练数据的需求。在几个标准基准上的实验表明,我们的方法可以生成与最先进的视频生成和视频预测方法相当的逼真视频,而生成的视频的运动可以很好地与用户输入相对应。
{"title":"Controllable Video Generation with Sparse Trajectories","authors":"Zekun Hao, Xun Huang, Serge J. Belongie","doi":"10.1109/CVPR.2018.00819","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00819","url":null,"abstract":"Video generation and manipulation is an important yet challenging task in computer vision. Existing methods usually lack ways to explicitly control the synthesized motion. In this work, we present a conditional video generation model that allows detailed control over the motion of the generated video. Given the first frame and sparse motion trajectories specified by users, our model can synthesize a video with corresponding appearance and motion. We propose to combine the advantage of copying pixels from the given frame and hallucinating the lightness difference from scratch which help generate sharp video while keeping the model robust to occlusion and lightness change. We also propose a training paradigm that calculate trajectories from video clips, which eliminated the need of annotated training data. Experiments on several standard benchmarks demonstrate that our approach can generate realistic videos comparable to state-of-the-art video generation and video prediction methods while the motion of the generated videos can correspond well with user input.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"40 1","pages":"7854-7863"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80665243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
Empirical Study of the Topology and Geometry of Deep Networks 深度网络拓扑与几何的实证研究
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00396
Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, P. Frossard, Stefano Soatto
The goal of this paper is to analyze the geometric properties of deep neural network image classifiers in the input space. We specifically study the topology of classification regions created by deep networks, as well as their associated decision boundary. Through a systematic empirical study, we show that state-of-the-art deep nets learn connected classification regions, and that the decision boundary in the vicinity of datapoints is flat along most directions. We further draw an essential connection between two seemingly unrelated properties of deep networks: their sensitivity to additive perturbations of the inputs, and the curvature of their decision boundary. The directions where the decision boundary is curved in fact characterize the directions to which the classifier is the most vulnerable. We finally leverage a fundamental asymmetry in the curvature of the decision boundary of deep nets, and propose a method to discriminate between original images, and images perturbed with small adversarial examples. We show the effectiveness of this purely geometric approach for detecting small adversarial perturbations in images, and for recovering the labels of perturbed images.
本文的目的是分析深度神经网络图像分类器在输入空间中的几何特性。我们具体研究了由深度网络创建的分类区域的拓扑结构,以及它们相关的决策边界。通过系统的实证研究,我们发现最先进的深度网络学习连接的分类区域,并且数据点附近的决策边界在大多数方向上是平坦的。我们进一步在深度网络的两个看似无关的特性之间建立了一个重要的联系:它们对输入的加性扰动的敏感性,以及它们的决策边界的曲率。决策边界弯曲的方向实际上表征了分类器最容易受到攻击的方向。最后,我们利用深度网络决策边界曲率的基本不对称性,并提出了一种区分原始图像和被小对抗样本扰动的图像的方法。我们证明了这种纯几何方法在检测图像中的小对抗性扰动以及恢复受扰动图像的标签方面的有效性。
{"title":"Empirical Study of the Topology and Geometry of Deep Networks","authors":"Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, P. Frossard, Stefano Soatto","doi":"10.1109/CVPR.2018.00396","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00396","url":null,"abstract":"The goal of this paper is to analyze the geometric properties of deep neural network image classifiers in the input space. We specifically study the topology of classification regions created by deep networks, as well as their associated decision boundary. Through a systematic empirical study, we show that state-of-the-art deep nets learn connected classification regions, and that the decision boundary in the vicinity of datapoints is flat along most directions. We further draw an essential connection between two seemingly unrelated properties of deep networks: their sensitivity to additive perturbations of the inputs, and the curvature of their decision boundary. The directions where the decision boundary is curved in fact characterize the directions to which the classifier is the most vulnerable. We finally leverage a fundamental asymmetry in the curvature of the decision boundary of deep nets, and propose a method to discriminate between original images, and images perturbed with small adversarial examples. We show the effectiveness of this purely geometric approach for detecting small adversarial perturbations in images, and for recovering the labels of perturbed images.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"65 1","pages":"3762-3770"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89245296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 116
Eliminating Background-bias for Robust Person Re-identification 消除背景偏差的稳健人物再识别
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00607
Maoqing Tian, Shuai Yi, Hongsheng Li, Shihua Li, Xuesen Zhang, Jianping Shi, Junjie Yan, Xiaogang Wang
Person re-identification is an important topic in intelligent surveillance and computer vision. It aims to accurately measure visual similarities between person images for determining whether two images correspond to the same person. State-of-the-art methods mainly utilize deep learning based approaches for learning visual features for describing person appearances. However, we observe that existing deep learning models are biased to capture too much relevance between background appearances of person images. We design a series of experiments with newly created datasets to validate the influence of background information. To solve the background bias problem, we propose a person-region guided pooling deep neural network based on human parsing maps to learn more discriminative person-part features, and propose to augment training data with person images with random background. Extensive experiments demonstrate the robustness and effectiveness of our proposed method.
人的再识别是智能监控和计算机视觉领域的一个重要课题。它旨在准确测量人物图像之间的视觉相似性,以确定两个图像是否对应于同一个人。最先进的方法主要利用基于深度学习的方法来学习描述人的外表的视觉特征。然而,我们观察到现有的深度学习模型在捕捉人物图像背景外观之间的相关性方面存在偏见。我们用新创建的数据集设计了一系列实验来验证背景信息的影响。为了解决背景偏差问题,我们提出了一种基于人类解析地图的人-区域引导池化深度神经网络,以学习更具判别性的人-部位特征,并提出了用随机背景的人图像增强训练数据。大量的实验证明了该方法的鲁棒性和有效性。
{"title":"Eliminating Background-bias for Robust Person Re-identification","authors":"Maoqing Tian, Shuai Yi, Hongsheng Li, Shihua Li, Xuesen Zhang, Jianping Shi, Junjie Yan, Xiaogang Wang","doi":"10.1109/CVPR.2018.00607","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00607","url":null,"abstract":"Person re-identification is an important topic in intelligent surveillance and computer vision. It aims to accurately measure visual similarities between person images for determining whether two images correspond to the same person. State-of-the-art methods mainly utilize deep learning based approaches for learning visual features for describing person appearances. However, we observe that existing deep learning models are biased to capture too much relevance between background appearances of person images. We design a series of experiments with newly created datasets to validate the influence of background information. To solve the background bias problem, we propose a person-region guided pooling deep neural network based on human parsing maps to learn more discriminative person-part features, and propose to augment training data with person images with random background. Extensive experiments demonstrate the robustness and effectiveness of our proposed method.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"72 2 1","pages":"5794-5803"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90721354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 138
Learning Visual Knowledge Memory Networks for Visual Question Answering 学习视觉知识记忆网络的视觉问答
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00807
Zhou Su, Chen Zhu, Yinpeng Dong, Dongqi Cai, Yurong Chen, Jianguo Li
Visual question answering (VQA) requires joint comprehension of images and natural language questions, where many questions can't be directly or clearly answered from visual content but require reasoning from structured human knowledge with confirmation from visual content. This paper proposes visual knowledge memory network (VKMN) to address this issue, which seamlessly incorporates structured human knowledge and deep visual features into memory networks in an end-to-end learning framework. Comparing to existing methods for leveraging external knowledge for supporting VQA, this paper stresses more on two missing mechanisms. First is the mechanism for integrating visual contents with knowledge facts. VKMN handles this issue by embedding knowledge triples (subject, relation, target) and deep visual features jointly into the visual knowledge features. Second is the mechanism for handling multiple knowledge facts expanding from question and answer pairs. VKMN stores joint embedding using key-value pair structure in the memory networks so that it is easy to handle multiple facts. Experiments show that the proposed method achieves promising results on both VQA v1.0 and v2.0 benchmarks, while outperforms state-of-the-art methods on the knowledge-reasoning related questions.
视觉问答(Visual question answer, VQA)需要图像和自然语言问题的联合理解,其中许多问题无法从视觉内容中直接或清晰地回答,而是需要从结构化的人类知识中进行推理,并得到视觉内容的确认。为了解决这一问题,本文提出了视觉知识记忆网络(VKMN),它在端到端学习框架中将结构化的人类知识和深度视觉特征无缝地融合到记忆网络中。与现有的利用外部知识支持VQA的方法相比,本文更多地强调了两个缺失的机制。首先是整合视觉内容和知识事实的机制。VKMN通过将知识三元组(主体、关系、目标)和深度视觉特征共同嵌入到视觉知识特征中来解决这一问题。二是处理由问答对展开的多个知识事实的机制。VKMN采用键值对结构在存储网络中存储联合嵌入,便于处理多个事实。实验表明,该方法在VQA v1.0和v2.0基准测试中都取得了令人满意的结果,并且在知识推理相关问题上优于目前最先进的方法。
{"title":"Learning Visual Knowledge Memory Networks for Visual Question Answering","authors":"Zhou Su, Chen Zhu, Yinpeng Dong, Dongqi Cai, Yurong Chen, Jianguo Li","doi":"10.1109/CVPR.2018.00807","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00807","url":null,"abstract":"Visual question answering (VQA) requires joint comprehension of images and natural language questions, where many questions can't be directly or clearly answered from visual content but require reasoning from structured human knowledge with confirmation from visual content. This paper proposes visual knowledge memory network (VKMN) to address this issue, which seamlessly incorporates structured human knowledge and deep visual features into memory networks in an end-to-end learning framework. Comparing to existing methods for leveraging external knowledge for supporting VQA, this paper stresses more on two missing mechanisms. First is the mechanism for integrating visual contents with knowledge facts. VKMN handles this issue by embedding knowledge triples (subject, relation, target) and deep visual features jointly into the visual knowledge features. Second is the mechanism for handling multiple knowledge facts expanding from question and answer pairs. VKMN stores joint embedding using key-value pair structure in the memory networks so that it is easy to handle multiple facts. Experiments show that the proposed method achieves promising results on both VQA v1.0 and v2.0 benchmarks, while outperforms state-of-the-art methods on the knowledge-reasoning related questions.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"93 1","pages":"7736-7745"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90763630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Self-Supervised Feature Learning by Learning to Spot Artifacts 通过学习发现人工制品的自监督特征学习
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00289
S. Jenni, P. Favaro
We introduce a novel self-supervised learning method based on adversarial training. Our objective is to train a discriminator network to distinguish real images from images with synthetic artifacts, and then to extract features from its intermediate layers that can be transferred to other data domains and tasks. To generate images with artifacts, we pre-train a high-capacity autoencoder and then we use a damage and repair strategy: First, we freeze the autoencoder and damage the output of the encoder by randomly dropping its entries. Second, we augment the decoder with a repair network, and train it in an adversarial manner against the discriminator. The repair network helps generate more realistic images by inpainting the dropped feature entries. To make the discriminator focus on the artifacts, we also make it predict what entries in the feature were dropped. We demonstrate experimentally that features learned by creating and spotting artifacts achieve state of the art performance in several benchmarks.
提出了一种基于对抗性训练的自监督学习方法。我们的目标是训练一个判别器网络来区分真实图像和具有合成伪像的图像,然后从其中间层中提取可以转移到其他数据域和任务的特征。为了生成带有伪影的图像,我们预训练了一个高容量的自编码器,然后我们使用损坏和修复策略:首先,我们冻结自编码器,并通过随机删除其条目来损坏编码器的输出。其次,我们用修复网络增强解码器,并以对抗鉴别器的方式训练解码器。修复网络通过重新绘制丢失的特征项来帮助生成更逼真的图像。为了使鉴别器专注于工件,我们还使其预测特征中的哪些条目被删除。我们通过实验证明,通过创建和发现工件来学习的特征在几个基准中达到了最先进的性能状态。
{"title":"Self-Supervised Feature Learning by Learning to Spot Artifacts","authors":"S. Jenni, P. Favaro","doi":"10.1109/CVPR.2018.00289","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00289","url":null,"abstract":"We introduce a novel self-supervised learning method based on adversarial training. Our objective is to train a discriminator network to distinguish real images from images with synthetic artifacts, and then to extract features from its intermediate layers that can be transferred to other data domains and tasks. To generate images with artifacts, we pre-train a high-capacity autoencoder and then we use a damage and repair strategy: First, we freeze the autoencoder and damage the output of the encoder by randomly dropping its entries. Second, we augment the decoder with a repair network, and train it in an adversarial manner against the discriminator. The repair network helps generate more realistic images by inpainting the dropped feature entries. To make the discriminator focus on the artifacts, we also make it predict what entries in the feature were dropped. We demonstrate experimentally that features learned by creating and spotting artifacts achieve state of the art performance in several benchmarks.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"2733-2742"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89851470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 115
Time-Resolved Light Transport Decomposition for Thermal Photometric Stereo 热光度立体的时间分辨光输运分解
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00505
Kenichiro Tanaka, Nobuhiro Ikeya, T. Takatani, Hiroyuki Kubo, Takuya Funatomi, Y. Mukaigawa
We present a novel time-resolved light transport decomposition method using thermal imaging. Because the speed of heat propagation is much slower than the speed of light propagation, transient transport of far infrared light can be observed at a video frame rate. A key observation is that the thermal image looks similar to the visible light image in an appropriately controlled environment. This implies that conventional computer vision techniques can be straightforwardly applied to the thermal image. We show that the diffuse component in the thermal image can be separated and, therefore, the surface normals of objects can be estimated by the Lambertian photometric stereo. The effectiveness of our method is evaluated by conducting real-world experiments, and its applicability to black body, transparent, and translucent objects is shown.
提出了一种利用热成像技术进行时间分辨光输运分解的方法。由于热的传播速度比光的传播速度慢得多,因此可以观察到远红外光在视频帧速率下的瞬态传输。一个关键的观察结果是,在适当控制的环境中,热图像看起来与可见光图像相似。这意味着传统的计算机视觉技术可以直接应用于热图像。我们证明了热图像中的漫射分量可以被分离,因此,物体的表面法线可以通过朗伯氏光度立体来估计。通过实际实验验证了该方法的有效性,并证明了该方法对黑体、透明和半透明物体的适用性。
{"title":"Time-Resolved Light Transport Decomposition for Thermal Photometric Stereo","authors":"Kenichiro Tanaka, Nobuhiro Ikeya, T. Takatani, Hiroyuki Kubo, Takuya Funatomi, Y. Mukaigawa","doi":"10.1109/CVPR.2018.00505","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00505","url":null,"abstract":"We present a novel time-resolved light transport decomposition method using thermal imaging. Because the speed of heat propagation is much slower than the speed of light propagation, transient transport of far infrared light can be observed at a video frame rate. A key observation is that the thermal image looks similar to the visible light image in an appropriately controlled environment. This implies that conventional computer vision techniques can be straightforwardly applied to the thermal image. We show that the diffuse component in the thermal image can be separated and, therefore, the surface normals of objects can be estimated by the Lambertian photometric stereo. The effectiveness of our method is evaluated by conducting real-world experiments, and its applicability to black body, transparent, and translucent objects is shown.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"35 1","pages":"4804-4813"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89983288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Label Denoising Adversarial Network (LDAN) for Inverse Lighting of Faces 人脸反光照的标签去噪对抗网络(LDAN)
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00653
Hao Zhou, J. Sun, Y. Yacoob, D. Jacobs
Lighting estimation from faces is an important task and has applications in many areas such as image editing, intrinsic image decomposition, and image forgery detection. We propose to train a deep Convolutional Neural Network (CNN) to regress lighting parameters from a single face image. Lacking massive ground truth lighting labels for face images in the wild, we use an existing method to estimate lighting parameters, which are treated as ground truth with noise. To alleviate the effect of such noise, we utilize the idea of Generative Adversarial Networks (GAN) and propose a Label Denoising Adversarial Network (LDAN). LDAN makes use of synthetic data with accurate ground truth to help train a deep CNN for lighting regression on real face images. Experiments show that our network outperforms existing methods in producing consistent lighting parameters of different faces under similar lighting conditions. To further evaluate the proposed method, we also apply it to regress object 2D key points where ground truth labels are available. Our experiments demonstrate its effectiveness on this application.
人脸光照估计是一项重要的任务,在图像编辑、图像固有分解和图像伪造检测等领域都有广泛的应用。我们建议训练深度卷积神经网络(CNN)从单张人脸图像中回归照明参数。由于野外人脸图像缺乏大量的地面真值照明标签,我们使用现有的方法来估计照明参数,并将其作为带噪声的地面真值。为了减轻这种噪声的影响,我们利用生成对抗网络(GAN)的思想,提出了一种标签去噪对抗网络(LDAN)。LDAN利用具有准确地面真值的合成数据,帮助训练深度CNN对真实人脸图像进行光照回归。实验表明,在相似光照条件下,我们的网络在生成一致的人脸光照参数方面优于现有方法。为了进一步评估所提出的方法,我们还将其应用于回归地面真值标签可用的对象2D关键点。实验证明了该方法的有效性。
{"title":"Label Denoising Adversarial Network (LDAN) for Inverse Lighting of Faces","authors":"Hao Zhou, J. Sun, Y. Yacoob, D. Jacobs","doi":"10.1109/CVPR.2018.00653","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00653","url":null,"abstract":"Lighting estimation from faces is an important task and has applications in many areas such as image editing, intrinsic image decomposition, and image forgery detection. We propose to train a deep Convolutional Neural Network (CNN) to regress lighting parameters from a single face image. Lacking massive ground truth lighting labels for face images in the wild, we use an existing method to estimate lighting parameters, which are treated as ground truth with noise. To alleviate the effect of such noise, we utilize the idea of Generative Adversarial Networks (GAN) and propose a Label Denoising Adversarial Network (LDAN). LDAN makes use of synthetic data with accurate ground truth to help train a deep CNN for lighting regression on real face images. Experiments show that our network outperforms existing methods in producing consistent lighting parameters of different faces under similar lighting conditions. To further evaluate the proposed method, we also apply it to regress object 2D key points where ground truth labels are available. Our experiments demonstrate its effectiveness on this application.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"23 1","pages":"6238-6247"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91347232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Matching Adversarial Networks 匹配对抗网络
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00837
G. Máttyus, R. Urtasun
Generative Adversarial Nets (GANs) and Conditonal GANs (CGANs) show that using a trained network as loss function (discriminator) enables to synthesize highly structured outputs (e.g. natural images). However, applying a discriminator network as a universal loss function for common supervised tasks (e.g. semantic segmentation, line detection, depth estimation) is considerably less successful. We argue that the main difficulty of applying CGANs to supervised tasks is that the generator training consists of optimizing a loss function that does not depend directly on the ground truth labels. To overcome this, we propose to replace the discriminator with a matching network taking into account both the ground truth outputs as well as the generated examples. As a consequence, the generator loss function also depends on the targets of the training examples, thus facilitating learning. We demonstrate on three computer vision tasks that this approach can significantly outperform CGANs achieving comparable or superior results to task-specific solutions and results in stable training. Importantly, this is a general approach that does not require the use of task-specific loss functions.
生成对抗网络(GANs)和条件对抗网络(cgan)表明,使用训练好的网络作为损失函数(鉴别器)可以合成高度结构化的输出(例如自然图像)。然而,将鉴别器网络作为通用损失函数应用于常见的监督任务(例如语义分割,线检测,深度估计)是相当不成功的。我们认为,将cgan应用于监督任务的主要困难在于生成器训练包括优化不直接依赖于基础真值标签的损失函数。为了克服这个问题,我们建议用匹配网络替换鉴别器,同时考虑到基础真值输出和生成的示例。因此,生成器损失函数也依赖于训练样例的目标,从而便于学习。我们在三个计算机视觉任务中证明,这种方法可以显著优于cgan,获得与特定任务解决方案相当或更好的结果,并在稳定的训练中获得结果。重要的是,这是一种通用方法,不需要使用特定于任务的损失函数。
{"title":"Matching Adversarial Networks","authors":"G. Máttyus, R. Urtasun","doi":"10.1109/CVPR.2018.00837","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00837","url":null,"abstract":"Generative Adversarial Nets (GANs) and Conditonal GANs (CGANs) show that using a trained network as loss function (discriminator) enables to synthesize highly structured outputs (e.g. natural images). However, applying a discriminator network as a universal loss function for common supervised tasks (e.g. semantic segmentation, line detection, depth estimation) is considerably less successful. We argue that the main difficulty of applying CGANs to supervised tasks is that the generator training consists of optimizing a loss function that does not depend directly on the ground truth labels. To overcome this, we propose to replace the discriminator with a matching network taking into account both the ground truth outputs as well as the generated examples. As a consequence, the generator loss function also depends on the targets of the training examples, thus facilitating learning. We demonstrate on three computer vision tasks that this approach can significantly outperform CGANs achieving comparable or superior results to task-specific solutions and results in stable training. Importantly, this is a general approach that does not require the use of task-specific loss functions.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"2 1","pages":"8024-8032"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89875268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Multispectral Image Intrinsic Decomposition via Subspace Constraint 基于子空间约束的多光谱图像内禀分解
Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00673
Qian Huang, Weixin Zhu, Yang Zhao, Linsen Chen, Yao Wang, Tao Yue, Xun Cao
Multispectral images contain many clues of surface characteristics of the objects, thus can be used in many computer vision tasks, e.g., recolorization and segmentation. However, due to the complex geometry structure of natural scenes, the spectra curves of the same surface can look very different under different illuminations and from different angles. In this paper, a new Multispectral Image Intrinsic Decomposition model (MIID) is presented to decompose the shading and reflectance from a single multispectral image. We extend the Retinex model, which is proposed for RGB image intrinsic decomposition, for multispectral domain. Based on this, a subspace constraint is introduced to both the shading and reflectance spectral space to reduce the ill-posedness of the problem and make the problem solvable. A dataset of 22 scenes is given with the ground truth of shadings and reflectance to facilitate objective evaluations. The experiments demonstrate the effectiveness of the proposed method.
多光谱图像包含了许多物体表面特征的线索,因此可以用于许多计算机视觉任务,如重新着色和分割。然而,由于自然场景复杂的几何结构,同一表面的光谱曲线在不同的光照和不同的角度下看起来会有很大的不同。本文提出了一种新的多光谱图像内禀分解模型(MIID)来分解单幅多光谱图像的遮光和反射率。我们将RGB图像固有分解的Retinex模型扩展到多光谱域。在此基础上,对遮阳光谱空间和反射率光谱空间引入子空间约束,降低了问题的病态性,使问题具有可解性。为了便于客观评价,给出了一个包含22个场景的数据集,其中包含阴影和反射率的真实值。实验证明了该方法的有效性。
{"title":"Multispectral Image Intrinsic Decomposition via Subspace Constraint","authors":"Qian Huang, Weixin Zhu, Yang Zhao, Linsen Chen, Yao Wang, Tao Yue, Xun Cao","doi":"10.1109/CVPR.2018.00673","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00673","url":null,"abstract":"Multispectral images contain many clues of surface characteristics of the objects, thus can be used in many computer vision tasks, e.g., recolorization and segmentation. However, due to the complex geometry structure of natural scenes, the spectra curves of the same surface can look very different under different illuminations and from different angles. In this paper, a new Multispectral Image Intrinsic Decomposition model (MIID) is presented to decompose the shading and reflectance from a single multispectral image. We extend the Retinex model, which is proposed for RGB image intrinsic decomposition, for multispectral domain. Based on this, a subspace constraint is introduced to both the shading and reflectance spectral space to reduce the ill-posedness of the problem and make the problem solvable. A dataset of 22 scenes is given with the ground truth of shadings and reflectance to facilitate objective evaluations. The experiments demonstrate the effectiveness of the proposed method.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"33 1","pages":"6430-6439"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78062822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1