2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition最新文献_第10页

Duplex Generative Adversarial Network for Unsupervised Domain Adaptation 无监督域自适应的双生成对抗网络

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00162

Lanqing Hu, Meina Kan, S. Shan, Xilin Chen

Domain adaptation attempts to transfer the knowledge obtained from the source domain to the target domain, i.e., the domain where the testing data are. The main challenge lies in the distribution discrepancy between source and target domain. Most existing works endeavor to learn domain invariant representation usually by minimizing a distribution distance, e.g., MMD and the discriminator in the recently proposed generative adversarial network (GAN). Following the similar idea of GAN, this work proposes a novel GAN architecture with duplex adversarial discriminators (referred to as DupGAN), which can achieve domain-invariant representation and domain transformation. Specifically, our proposed network consists of three parts, an encoder, a generator and two discriminators. The encoder embeds samples from both domains into the latent representation, and the generator decodes the latent representation to both source and target domains respectively conditioned on a domain code, i.e., achieves domain transformation. The generator is pitted against duplex discriminators, one for source domain and the other for target, to ensure the reality of domain transformation, the latent representation domain invariant and the category information of it preserved as well. Our proposed work achieves the state-of-the-art performance on unsupervised domain adaptation of digit classification and object recognition.

领域适应试图将从源领域获得的知识转移到目标领域，即测试数据所在的领域。主要的挑战在于源域和目标域之间的分布差异。大多数现有的工作通常是通过最小化分布距离来学习域不变表示，例如MMD和最近提出的生成对抗网络(GAN)中的鉴别器。遵循GAN的类似思想，本文提出了一种具有双对抗性鉴别器的新型GAN架构(称为DupGAN)，可以实现域不变表示和域变换。具体来说，我们提出的网络由三部分组成，一个编码器，一个生成器和两个鉴别器。编码器将两个域的样本嵌入到潜在表示中，生成器根据域代码将潜在表示分别解码到源域和目标域，即实现域变换。该生成器与源域和目标域的双鉴别器进行对抗，保证了域变换的真实性，并保留了潜在表示域的不变性和类别信息。我们的工作在数字分类和目标识别的无监督域自适应方面达到了最先进的性能。

{"title":"Duplex Generative Adversarial Network for Unsupervised Domain Adaptation","authors":"Lanqing Hu, Meina Kan, S. Shan, Xilin Chen","doi":"10.1109/CVPR.2018.00162","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00162","url":null,"abstract":"Domain adaptation attempts to transfer the knowledge obtained from the source domain to the target domain, i.e., the domain where the testing data are. The main challenge lies in the distribution discrepancy between source and target domain. Most existing works endeavor to learn domain invariant representation usually by minimizing a distribution distance, e.g., MMD and the discriminator in the recently proposed generative adversarial network (GAN). Following the similar idea of GAN, this work proposes a novel GAN architecture with duplex adversarial discriminators (referred to as DupGAN), which can achieve domain-invariant representation and domain transformation. Specifically, our proposed network consists of three parts, an encoder, a generator and two discriminators. The encoder embeds samples from both domains into the latent representation, and the generator decodes the latent representation to both source and target domains respectively conditioned on a domain code, i.e., achieves domain transformation. The generator is pitted against duplex discriminators, one for source domain and the other for target, to ensure the reality of domain transformation, the latent representation domain invariant and the category information of it preserved as well. Our proposed work achieves the state-of-the-art performance on unsupervised domain adaptation of digit classification and object recognition.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"32 1","pages":"1498-1507"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78544129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 153

Feature Super-Resolution: Make Machine See More Clearly 超分辨率:使机器看得更清楚

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00420

Weimin Tan, Bo Yan, Bahetiyaer Bare

Identifying small size images or small objects is a notoriously challenging problem, as discriminative representations are difficult to learn from the limited information contained in them with poor-quality appearance and unclear object structure. Existing research works usually increase the resolution of low-resolution image in the pixel space in order to provide better visual quality for human viewing. However, the improved performance of such methods is usually limited or even trivial in the case of very small image size (we will show it in this paper explicitly). In this paper, different from image super-resolution (ISR), we propose a novel super-resolution technique called feature super-resolution (FSR), which aims at enhancing the discriminatory power of small size image in order to provide high recognition precision for machine. To achieve this goal, we propose a new Feature Super-Resolution Generative Adversarial Network (FSR-GAN) model that transforms the raw poor features of small size images to highly discriminative ones by performing super-resolution in the feature space. Our FSR-GAN consists of two subnetworks: a feature generator network G and a feature discriminator network D. By training the G and the D networks in an alternative manner, we encourage the G network to discover the latent distribution correlations between small size and large size images and then use G to improve the representations of small images. Extensive experiment results on Oxford5K, Paris, Holidays, and Flick100k datasets demonstrate that the proposed FSR approach can effectively enhance the discriminatory ability of features. Even when the resolution of query images is reduced greatly, e.g., 1/64 original size, the query feature enhanced by our FSR approach achieves surprisingly high retrieval performance at different image resolutions and increases the retrieval precision by 25% compared to the raw query feature.

识别小尺寸的图像或小物体是一个非常具有挑战性的问题，因为鉴别表示很难从它们所包含的有限信息中学习，而且它们的外观质量很差，物体结构不清楚。现有的研究工作通常是在像素空间中提高低分辨率图像的分辨率，以便为人类观看提供更好的视觉质量。然而，在非常小的图像尺寸的情况下，这些方法的改进性能通常是有限的，甚至微不足道(我们将在本文中明确地展示它)。与图像超分辨率(ISR)不同，本文提出了一种新的超分辨率技术——特征超分辨率(FSR)，该技术旨在增强小尺寸图像的识别能力，从而为机器提供更高的识别精度。为了实现这一目标，我们提出了一种新的特征超分辨率生成对抗网络(FSR-GAN)模型，该模型通过在特征空间中执行超分辨率，将小尺寸图像的原始差特征转换为高度判别的特征。我们的FSR-GAN由两个子网络组成:一个特征生成器网络G和一个特征鉴别器网络D。通过以另一种方式训练G和D网络，我们鼓励G网络发现小尺寸和大尺寸图像之间的潜在分布相关性，然后使用G来改进小图像的表示。在Oxford5K、Paris、Holidays和Flick100k数据集上的大量实验结果表明，本文提出的FSR方法可以有效地增强特征的区分能力。即使当查询图像的分辨率大大降低时，例如，原始尺寸为1/64，我们的FSR方法增强的查询特征在不同图像分辨率下获得了惊人的高检索性能，与原始查询特征相比，检索精度提高了25%。

{"title":"Feature Super-Resolution: Make Machine See More Clearly","authors":"Weimin Tan, Bo Yan, Bahetiyaer Bare","doi":"10.1109/CVPR.2018.00420","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00420","url":null,"abstract":"Identifying small size images or small objects is a notoriously challenging problem, as discriminative representations are difficult to learn from the limited information contained in them with poor-quality appearance and unclear object structure. Existing research works usually increase the resolution of low-resolution image in the pixel space in order to provide better visual quality for human viewing. However, the improved performance of such methods is usually limited or even trivial in the case of very small image size (we will show it in this paper explicitly). In this paper, different from image super-resolution (ISR), we propose a novel super-resolution technique called feature super-resolution (FSR), which aims at enhancing the discriminatory power of small size image in order to provide high recognition precision for machine. To achieve this goal, we propose a new Feature Super-Resolution Generative Adversarial Network (FSR-GAN) model that transforms the raw poor features of small size images to highly discriminative ones by performing super-resolution in the feature space. Our FSR-GAN consists of two subnetworks: a feature generator network G and a feature discriminator network D. By training the G and the D networks in an alternative manner, we encourage the G network to discover the latent distribution correlations between small size and large size images and then use G to improve the representations of small images. Extensive experiment results on Oxford5K, Paris, Holidays, and Flick100k datasets demonstrate that the proposed FSR approach can effectively enhance the discriminatory ability of features. Even when the resolution of query images is reduced greatly, e.g., 1/64 original size, the query feature enhanced by our FSR approach achieves surprisingly high retrieval performance at different image resolutions and increases the retrieval precision by 25% compared to the raw query feature.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"143 1","pages":"3994-4002"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78680638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

DenseASPP for Semantic Segmentation in Street Scenes 街景语义分割的DenseASPP

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00388

Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, Kuiyuan Yang

Semantic image segmentation is a basic street scene understanding task in autonomous driving, where each pixel in a high resolution image is categorized into a set of semantic labels. Unlike other scenarios, objects in autonomous driving scene exhibit very large scale changes, which poses great challenges for high-level feature representation in a sense that multi-scale information must be correctly encoded. To remedy this problem, atrous convolution[14]was introduced to generate features with larger receptive fields without sacrificing spatial resolution. Built upon atrous convolution, Atrous Spatial Pyramid Pooling (ASPP)[2] was proposed to concatenate multiple atrous-convolved features using different dilation rates into a final feature representation. Although ASPP is able to generate multi-scale features, we argue the feature resolution in the scale-axis is not dense enough for the autonomous driving scenario. To this end, we propose Densely connected Atrous Spatial Pyramid Pooling (DenseASPP), which connects a set of atrous convolutional layers in a dense way, such that it generates multi-scale features that not only cover a larger scale range, but also cover that scale range densely, without significantly increasing the model size. We evaluate DenseASPP on the street scene benchmark Cityscapes[4] and achieve state-of-the-art performance.

语义图像分割是自动驾驶中基本的街景理解任务，将高分辨率图像中的每个像素划分为一组语义标签。与其他场景不同，自动驾驶场景中的物体呈现非常大的尺度变化，这对高级特征表示提出了很大的挑战，因为必须对多尺度信息进行正确编码。为了解决这个问题，引入了亚光卷积[14]，在不牺牲空间分辨率的情况下生成具有更大接受域的特征。在亚特罗斯卷积的基础上，提出了亚特罗斯空间金字塔池(ASPP)[2]，以不同的扩张率将多个亚特罗斯卷积特征连接成最终的特征表示。尽管ASPP能够生成多尺度特征，但我们认为尺度轴上的特征分辨率对于自动驾驶场景来说不够密集。为此，我们提出了密集连接的亚特鲁斯空间金字塔池(DenseASPP)，它以密集的方式连接一组亚特鲁斯卷积层，从而生成的多尺度特征不仅覆盖更大的尺度范围，而且覆盖该尺度范围的密度也不显著增加模型的大小。我们在街景基准cityscape[4]上对DenseASPP进行了评估，并获得了最先进的性能。

{"title":"DenseASPP for Semantic Segmentation in Street Scenes","authors":"Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, Kuiyuan Yang","doi":"10.1109/CVPR.2018.00388","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00388","url":null,"abstract":"Semantic image segmentation is a basic street scene understanding task in autonomous driving, where each pixel in a high resolution image is categorized into a set of semantic labels. Unlike other scenarios, objects in autonomous driving scene exhibit very large scale changes, which poses great challenges for high-level feature representation in a sense that multi-scale information must be correctly encoded. To remedy this problem, atrous convolution[14]was introduced to generate features with larger receptive fields without sacrificing spatial resolution. Built upon atrous convolution, Atrous Spatial Pyramid Pooling (ASPP)[2] was proposed to concatenate multiple atrous-convolved features using different dilation rates into a final feature representation. Although ASPP is able to generate multi-scale features, we argue the feature resolution in the scale-axis is not dense enough for the autonomous driving scenario. To this end, we propose Densely connected Atrous Spatial Pyramid Pooling (DenseASPP), which connects a set of atrous convolutional layers in a dense way, such that it generates multi-scale features that not only cover a larger scale range, but also cover that scale range densely, without significantly increasing the model size. We evaluate DenseASPP on the street scene benchmark Cityscapes[4] and achieve state-of-the-art performance.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"12 1","pages":"3684-3692"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76684713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1036

Salience Guided Depth Calibration for Perceptually Optimized Compressive Light Field 3D Display 感知优化压缩光场三维显示的显著性引导深度校准

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00217

Shizheng Wang, Wenjuan Liao, P. Surman, Zhigang Tu, Yuanjin Zheng, Junsong Yuan

Multi-layer light field displays are a type of computational three-dimensional (3D) display which has recently gained increasing interest for its holographic-like effect and natural compatibility with 2D displays. However, the major shortcoming, depth limitation, still cannot be overcome in the traditional light field modeling and reconstruction based on multi-layer liquid crystal displays (LCDs). Considering this disadvantage, our paper incorporates a salience guided depth optimization over a limited display range to calibrate the displayed depth and present the maximum area of salience region for multi-layer light field display. Different from previously reported cascaded light field displays that use the fixed initialization plane as the depth center of display content, our method automatically calibrates the depth initialization based on the salience results derived from the proposed contrast enhanced salience detection method. Experiments demonstrate that the proposed method provides a promising advantage in visual perception for the compressive light field displays from both software simulation and prototype demonstration.

多层光场显示器是一种计算三维(3D)显示器，近年来因其类似全息的效果和与二维显示器的自然兼容性而受到越来越多的关注。然而，基于多层液晶显示器(lcd)的传统光场建模和重建仍然无法克服深度限制这一主要缺点。考虑到这一缺点，本文结合了在有限显示范围内的显著性引导深度优化来校准显示深度，并为多层光场显示提供最大显著区域面积。与以往报道的使用固定初始化平面作为显示内容深度中心的级联光场显示不同，本文方法基于对比度增强显著性检测方法得出的显著性结果自动校准深度初始化。实验结果表明，该方法在压缩光场显示的视觉感知方面具有很好的优势。

{"title":"Salience Guided Depth Calibration for Perceptually Optimized Compressive Light Field 3D Display","authors":"Shizheng Wang, Wenjuan Liao, P. Surman, Zhigang Tu, Yuanjin Zheng, Junsong Yuan","doi":"10.1109/CVPR.2018.00217","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00217","url":null,"abstract":"Multi-layer light field displays are a type of computational three-dimensional (3D) display which has recently gained increasing interest for its holographic-like effect and natural compatibility with 2D displays. However, the major shortcoming, depth limitation, still cannot be overcome in the traditional light field modeling and reconstruction based on multi-layer liquid crystal displays (LCDs). Considering this disadvantage, our paper incorporates a salience guided depth optimization over a limited display range to calibrate the displayed depth and present the maximum area of salience region for multi-layer light field display. Different from previously reported cascaded light field displays that use the fixed initialization plane as the depth center of display content, our method automatically calibrates the depth initialization based on the salience results derived from the proposed contrast enhanced salience detection method. Experiments demonstrate that the proposed method provides a promising advantage in visual perception for the compressive light field displays from both software simulation and prototype demonstration.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"62 1","pages":"2031-2040"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77902099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Re-weighted Adversarial Adaptation Network for Unsupervised Domain Adaptation 无监督域自适应的重加权对抗自适应网络

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00832

Qingchao Chen, Yang Liu, Zhaowen Wang, I. Wassell, K. Chetty

Unsupervised Domain Adaptation (UDA) aims to transfer domain knowledge from existing well-defined tasks to new ones where labels are unavailable. In the real-world applications, as the domain (task) discrepancies are usually uncontrollable, it is significantly motivated to match the feature distributions even if the domain discrepancies are disparate. Additionally, as no label is available in the target domain, how to successfully adapt the classifier from the source to the target domain still remains an open question. In this paper, we propose the Re-weighted Adversarial Adaptation Network (RAAN) to reduce the feature distribution divergence and adapt the classifier when domain discrepancies are disparate. Specifically, to alleviate the need of common supports in matching the feature distribution, we choose to minimize optimal transport (OT) based Earth-Mover (EM) distance and reformulate it to a minimax objective function. Utilizing this, RAAN can be trained in an end-to-end and adversarial manner. To further adapt the classifier, we propose to match the label distribution and embed it into the adversarial training. Finally, after extensive evaluation of our method using UDA datasets of varying difficulty, RAAN achieved the state-of-the-art results and outperformed other methods by a large margin when the domain shifts are disparate.

无监督域自适应(Unsupervised Domain Adaptation, UDA)旨在将已有的领域知识从定义良好的任务转移到没有标签的新任务中。在现实世界的应用程序中，由于领域(任务)差异通常是不可控的，因此即使领域差异是完全不同的，也很有必要匹配特征分布。此外，由于目标域没有可用的标签，如何成功地使分类器从源域适应目标域仍然是一个悬而未决的问题。在本文中，我们提出了重加权对抗自适应网络(RAAN)来减少特征分布差异，并在域差异完全不同的情况下自适应分类器。具体而言，为了减轻匹配特征分布时对公共支撑的需求，我们选择最小化基于最优运输(OT)的土方(EM)距离，并将其重新表述为极小极大目标函数。利用这一点，可以以端到端对抗的方式训练RAAN。为了进一步适应分类器，我们提出匹配标签分布并将其嵌入到对抗性训练中。最后，在使用不同难度的UDA数据集对我们的方法进行了广泛的评估之后，RAAN获得了最先进的结果，并且在不同的域转移时大大优于其他方法。

{"title":"Re-weighted Adversarial Adaptation Network for Unsupervised Domain Adaptation","authors":"Qingchao Chen, Yang Liu, Zhaowen Wang, I. Wassell, K. Chetty","doi":"10.1109/CVPR.2018.00832","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00832","url":null,"abstract":"Unsupervised Domain Adaptation (UDA) aims to transfer domain knowledge from existing well-defined tasks to new ones where labels are unavailable. In the real-world applications, as the domain (task) discrepancies are usually uncontrollable, it is significantly motivated to match the feature distributions even if the domain discrepancies are disparate. Additionally, as no label is available in the target domain, how to successfully adapt the classifier from the source to the target domain still remains an open question. In this paper, we propose the Re-weighted Adversarial Adaptation Network (RAAN) to reduce the feature distribution divergence and adapt the classifier when domain discrepancies are disparate. Specifically, to alleviate the need of common supports in matching the feature distribution, we choose to minimize optimal transport (OT) based Earth-Mover (EM) distance and reformulate it to a minimax objective function. Utilizing this, RAAN can be trained in an end-to-end and adversarial manner. To further adapt the classifier, we propose to match the label distribution and embed it into the adversarial training. Finally, after extensive evaluation of our method using UDA datasets of varying difficulty, RAAN achieved the state-of-the-art results and outperformed other methods by a large margin when the domain shifts are disparate.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"41 1","pages":"7976-7985"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78125182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 113

Mesoscopic Facial Geometry Inference Using Deep Neural Networks 基于深度神经网络的介观面部几何推理

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00877

Loc Huynh, Weikai Chen, Shunsuke Saito, Jun Xing, Koki Nagano, Andrew Jones, P. Debevec, Hao Li

We present a learning-based approach for synthesizing facial geometry at medium and fine scales from diffusely-lit facial texture maps. When applied to an image sequence, the synthesized detail is temporally coherent. Unlike current state-of-the-art methods [17, 5], which assume "dark is deep", our model is trained with measured facial detail collected using polarized gradient illumination in a Light Stage [20]. This enables us to produce plausible facial detail across the entire face, including where previous approaches may incorrectly interpret dark features as concavities such as at moles, hair stubble, and occluded pores. Instead of directly inferring 3D geometry, we propose to encode fine details in high-resolution displacement maps which are learned through a hybrid network adopting the state-of-the-art image-to-image translation network [29] and super resolution network [43]. To effectively capture geometric detail at both mid- and high frequencies, we factorize the learning into two separate sub-networks, enabling the full range of facial detail to be modeled. Results from our learning-based approach compare favorably with a high-quality active facial scanhening technique, and require only a single passive lighting condition without a complex scanning setup.

我们提出了一种基于学习的方法，用于从漫射光照面部纹理图中合成中等和精细尺度的面部几何。当应用于图像序列时，合成细节在时间上是一致的。与当前最先进的方法不同[17,5]，该方法假设“黑暗是深的”，我们的模型是使用在Light Stage中使用偏振梯度照明收集的测量面部细节进行训练[20]。这使我们能够在整个面部产生可信的面部细节，包括以前的方法可能错误地将深色特征解释为凹陷，如痣，头发茬和闭塞的毛孔。与直接推断3D几何形状不同，我们提出在高分辨率位移图中编码精细细节，这些位移图是通过采用最先进的图像到图像平移网络[29]和超分辨率网络[43]的混合网络学习的。为了有效地捕获中频和高频的几何细节，我们将学习分解为两个独立的子网络，从而可以对所有面部细节进行建模。我们基于学习的方法的结果与高质量的主动面部扫描技术相比具有优势，并且只需要一个被动照明条件，而不需要复杂的扫描设置。

{"title":"Mesoscopic Facial Geometry Inference Using Deep Neural Networks","authors":"Loc Huynh, Weikai Chen, Shunsuke Saito, Jun Xing, Koki Nagano, Andrew Jones, P. Debevec, Hao Li","doi":"10.1109/CVPR.2018.00877","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00877","url":null,"abstract":"We present a learning-based approach for synthesizing facial geometry at medium and fine scales from diffusely-lit facial texture maps. When applied to an image sequence, the synthesized detail is temporally coherent. Unlike current state-of-the-art methods [17, 5], which assume \"dark is deep\", our model is trained with measured facial detail collected using polarized gradient illumination in a Light Stage [20]. This enables us to produce plausible facial detail across the entire face, including where previous approaches may incorrectly interpret dark features as concavities such as at moles, hair stubble, and occluded pores. Instead of directly inferring 3D geometry, we propose to encode fine details in high-resolution displacement maps which are learned through a hybrid network adopting the state-of-the-art image-to-image translation network [29] and super resolution network [43]. To effectively capture geometric detail at both mid- and high frequencies, we factorize the learning into two separate sub-networks, enabling the full range of facial detail to be modeled. Results from our learning-based approach compare favorably with a high-quality active facial scanhening technique, and require only a single passive lighting condition without a complex scanning setup.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"52 1","pages":"8407-8416"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76900806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 63

PointGrid: A Deep Network for 3D Shape Understanding PointGrid:用于三维形状理解的深度网络

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00959

Truc Le, Y. Duan

Volumetric grid is widely used for 3D deep learning due to its regularity. However the use of relatively lower order local approximation functions such as piece-wise constant function (occupancy grid) or piece-wise linear function (distance field) to approximate 3D shape means that it needs a very high-resolution grid to represent finer geometry details, which could be memory and computationally inefficient. In this work, we propose the PointGrid, a 3D convolutional network that incorporates a constant number of points within each grid cell thus allowing the network to learn higher order local approximation functions that could better represent the local geometry shape details. With experiments on popular shape recognition benchmarks, PointGrid demonstrates state-of-the-art performance over existing deep learning methods on both classification and segmentation.

体积网格因其规律性被广泛应用于三维深度学习。然而，使用相对较低阶的局部近似函数，如分段常数函数(占用网格)或分段线性函数(距离场)来近似3D形状意味着它需要一个非常高分辨率的网格来表示更精细的几何细节，这可能会占用内存和计算效率低下。在这项工作中，我们提出了PointGrid，这是一个3D卷积网络，在每个网格单元中包含恒定数量的点，从而允许网络学习高阶局部近似函数，从而更好地表示局部几何形状细节。通过对流行的形状识别基准的实验，PointGrid在分类和分割方面展示了比现有深度学习方法更先进的性能。

引用次数: 290

Towards Pose Invariant Face Recognition in the Wild 面向姿态不变的野外人脸识别

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00235

Jian Zhao, Yu Cheng, Yan Xu, Lin Xiong, Jianshu Li, F. Zhao, J. Karlekar, Sugiri Pranata, Shengmei Shen, Junliang Xing, Shuicheng Yan, Jiashi Feng

Pose variation is one key challenge in face recognition. As opposed to current techniques for pose invariant face recognition, which either directly extract pose invariant features for recognition, or first normalize profile face images to frontal pose before feature extraction, we argue that it is more desirable to perform both tasks jointly to allow them to benefit from each other. To this end, we propose a Pose Invariant Model (PIM) for face recognition in the wild, with three distinct novelties. First, PIM is a novel and unified deep architecture, containing a Face Frontalization sub-Net (FFN) and a Discriminative Learning sub-Net (DLN), which are jointly learned from end to end. Second, FFN is a well-designed dual-path Generative Adversarial Network (GAN) which simultaneously perceives global structures and local details, incorporated with an unsupervised cross-domain adversarial training and a "learning to learn" strategy for high-fidelity and identity-preserving frontal view synthesis. Third, DLN is a generic Convolutional Neural Network (CNN) for face recognition with our enforced cross-entropy optimization strategy for learning discriminative yet generalized feature representation. Qualitative and quantitative experiments on both controlled and in-the-wild benchmarks demonstrate the superiority of the proposed model over the state-of-the-arts.

姿态变化是人脸识别中的一个关键问题。当前的姿态不变人脸识别技术要么直接提取姿态不变特征进行识别，要么在特征提取之前先将侧面人脸图像归一化为正面姿态，与此相反，我们认为将这两项任务联合执行以使它们相互受益更为可取。为此，我们提出了一种姿态不变模型(PIM)用于野外人脸识别，具有三个不同的新颖之处。首先，PIM是一种新颖的、统一的深度体系结构，它包含一个人脸前端化子网(FFN)和一个判别学习子网(DLN)，它们是端到端共同学习的。其次，FFN是一个设计良好的双路径生成对抗网络(GAN)，它同时感知全局结构和局部细节，结合无监督跨域对抗训练和“学习学习”策略，用于高保真和身份保持正面视图合成。第三，DLN是一种用于人脸识别的通用卷积神经网络(CNN)，我们采用强制交叉熵优化策略来学习判别性和广义特征表示。在受控基准和野外基准上进行的定性和定量实验表明，所提出的模型优于最先进的模型。

{"title":"Towards Pose Invariant Face Recognition in the Wild","authors":"Jian Zhao, Yu Cheng, Yan Xu, Lin Xiong, Jianshu Li, F. Zhao, J. Karlekar, Sugiri Pranata, Shengmei Shen, Junliang Xing, Shuicheng Yan, Jiashi Feng","doi":"10.1109/CVPR.2018.00235","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00235","url":null,"abstract":"Pose variation is one key challenge in face recognition. As opposed to current techniques for pose invariant face recognition, which either directly extract pose invariant features for recognition, or first normalize profile face images to frontal pose before feature extraction, we argue that it is more desirable to perform both tasks jointly to allow them to benefit from each other. To this end, we propose a Pose Invariant Model (PIM) for face recognition in the wild, with three distinct novelties. First, PIM is a novel and unified deep architecture, containing a Face Frontalization sub-Net (FFN) and a Discriminative Learning sub-Net (DLN), which are jointly learned from end to end. Second, FFN is a well-designed dual-path Generative Adversarial Network (GAN) which simultaneously perceives global structures and local details, incorporated with an unsupervised cross-domain adversarial training and a \"learning to learn\" strategy for high-fidelity and identity-preserving frontal view synthesis. Third, DLN is a generic Convolutional Neural Network (CNN) for face recognition with our enforced cross-entropy optimization strategy for learning discriminative yet generalized feature representation. Qualitative and quantitative experiments on both controlled and in-the-wild benchmarks demonstrate the superiority of the proposed model over the state-of-the-arts.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"27 1","pages":"2207-2216"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75221520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 185

Camera Pose Estimation with Unknown Principal Point 未知主点的相机姿态估计

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00315

Viktor Larsson, Z. Kukelova, Yinqiang Zheng

To estimate the 6-DoF extrinsic pose of a pinhole camera with partially unknown intrinsic parameters is a critical sub-problem in structure-from-motion and camera localization. In most of existing camera pose estimation solvers, the principal point is assumed to be in the image center. Unfortunately, this assumption is not always true, especially for asymmetrically cropped images. In this paper, we develop the first exactly minimal solver for the case of unknown principal point and focal length by using four and a half point correspondences (P4.5Pfuv). We also present an extremely fast solver for the case of unknown aspect ratio (P5Pfuva). The new solvers outperform the previous state-of-the-art in terms of stability and speed. Finally, we explore the extremely challenging case of both unknown principal point and radial distortion, and develop the first practical non-minimal solver by using seven point correspondences (P7Pfruv). Experimental results on both simulated data and real Internet images demonstrate the usefulness of our new solvers.

在针孔相机内部参数部分未知的情况下，六自由度相机的外部位姿估计是运动构造和相机定位中的关键子问题。在现有的大多数相机姿态估计算法中，假设主点位于图像中心。不幸的是，这个假设并不总是正确的，特别是对于不对称裁剪的图像。在本文中，我们利用四点半对应(P4.5Pfuv)建立了未知主点和焦距情况下的第一个精确最小解算器。我们还提出了一个非常快速的求解未知宽高比(P5Pfuva)的方法。新的解算器在稳定性和速度方面优于以前的最先进的解算器。最后，我们探索了未知主点和径向畸变的极具挑战性的情况，并利用七点对应(P7Pfruv)开发了第一个实用的非最小解算器。在模拟数据和真实网络图像上的实验结果表明了新算法的有效性。

引用次数: 25

Statistical Tomography of Microscopic Life 微观生命的统计断层扫描

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00671

Aviad Levis, Y. Schechner, R. Talmon

We achieve tomography of 3D volumetric natural objects, where each projected 2D image corresponds to a different specimen. Each specimen has unknown random 3D orientation, location, and scale. This imaging scenario is relevant to microscopic and mesoscopic organisms, aerosols and hydrosols viewed naturally by a microscope. In-class scale variation inhibits prior single-particle reconstruction methods. We thus generalize tomographic recovery to account for all degrees of freedom of a similarity transformation. This enables geometric self-calibration in imaging of transparent objects. We make the computational load manageable and reach good quality reconstruction in a short time. This enables extraction of statistics that are important for a scientific study of specimen populations, specifically size distribution parameters. We apply the method to study of plankton.

我们实现了三维体积自然物体的断层扫描，其中每个投影的二维图像对应于不同的标本。每个标本都有未知的随机三维方向、位置和比例。这种成像场景与显微镜下观察到的微观和介观生物、气溶胶和水溶胶有关。类内尺度变化抑制了先前的单粒子重建方法。因此，我们推广层析恢复，以解释相似变换的所有自由度。这使几何自校准成像的透明物体。我们使计算负荷易于管理，并在短时间内达到高质量的重建。这使得统计数据的提取对标本种群的科学研究非常重要，特别是大小分布参数。我们把这种方法应用于浮游生物的研究。

引用次数: 9