首页 > 最新文献

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
DeshadowNet: A Multi-context Embedding Deep Network for Shadow Removal deshadownnet:一个用于阴影去除的多上下文嵌入深度网络
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.248
Liangqiong Qu, Jiandong Tian, Shengfeng He, Yandong Tang, Rynson W. H. Lau
Shadow removal is a challenging task as it requires the detection/annotation of shadows as well as semantic understanding of the scene. In this paper, we propose an automatic and end-to-end deep neural network (DeshadowNet) to tackle these problems in a unified manner. DeshadowNet is designed with a multi-context architecture, where the output shadow matte is predicted by embedding information from three different perspectives. The first global network extracts shadow features from a global view. Two levels of features are derived from the global network and transferred to two parallel networks. While one extracts the appearance of the input image, the other one involves semantic understanding for final prediction. These two complementary networks generate multi-context features to obtain the shadow matte with fine local details. To evaluate the performance of the proposed method, we construct the first large scale benchmark with 3088 image pairs. Extensive experiments on two publicly available benchmarks and our large-scale benchmark show that the proposed method performs favorably against several state-of-the-art methods.
阴影去除是一项具有挑战性的任务,因为它需要对阴影的检测/注释以及对场景的语义理解。在本文中,我们提出了一个自动的端到端深度神经网络(DeshadowNet)来统一解决这些问题。DeshadowNet采用多上下文架构设计,通过从三个不同角度嵌入信息来预测输出阴影。第一个全局网络从全局视图中提取阴影特征。从全局网络中导出两层特征,并将其转移到两个并行网络中。其中一个是提取输入图像的外观,另一个是对最终预测的语义理解。这两个互补的网络生成多上下文特征,以获得具有精细局部细节的阴影哑光。为了评估该方法的性能,我们构建了第一个包含3088对图像的大规模基准测试。在两个公开可用的基准测试和我们的大规模基准测试上进行的大量实验表明,所提出的方法优于几种最先进的方法。
{"title":"DeshadowNet: A Multi-context Embedding Deep Network for Shadow Removal","authors":"Liangqiong Qu, Jiandong Tian, Shengfeng He, Yandong Tang, Rynson W. H. Lau","doi":"10.1109/CVPR.2017.248","DOIUrl":"https://doi.org/10.1109/CVPR.2017.248","url":null,"abstract":"Shadow removal is a challenging task as it requires the detection/annotation of shadows as well as semantic understanding of the scene. In this paper, we propose an automatic and end-to-end deep neural network (DeshadowNet) to tackle these problems in a unified manner. DeshadowNet is designed with a multi-context architecture, where the output shadow matte is predicted by embedding information from three different perspectives. The first global network extracts shadow features from a global view. Two levels of features are derived from the global network and transferred to two parallel networks. While one extracts the appearance of the input image, the other one involves semantic understanding for final prediction. These two complementary networks generate multi-context features to obtain the shadow matte with fine local details. To evaluate the performance of the proposed method, we construct the first large scale benchmark with 3088 image pairs. Extensive experiments on two publicly available benchmarks and our large-scale benchmark show that the proposed method performs favorably against several state-of-the-art methods.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"60 1","pages":"2308-2316"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81049382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 195
Hyper-Laplacian Regularized Unidirectional Low-Rank Tensor Recovery for Multispectral Image Denoising 多光谱图像去噪的超拉普拉斯正则化单向低秩张量恢复
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.625
Yi Chang, Luxin Yan, Sheng Zhong
Recent low-rank based matrix/tensor recovery methods have been widely explored in multispectral images (MSI) denoising. These methods, however, ignore the difference of the intrinsic structure correlation along spatial sparsity, spectral correlation and non-local self-similarity mode. In this paper, we go further by giving a detailed analysis about the rank properties both in matrix and tensor cases, and figure out the non-local self-similarity is the key ingredient, while the low-rank assumption of others may not hold. This motivates us to design a simple yet effective unidirectional low-rank tensor recovery model that is capable of truthfully capturing the intrinsic structure correlation with reduced computational burden. However, the low-rank models suffer from the ringing artifacts, due to the aggregation of overlapped patches/cubics. While previous methods resort to spatial information, we offer a new perspective by utilizing the exclusively spectral information in MSIs to address the issue. The analysis-based hyper-Laplacian prior is introduced to model the global spectral structures, so as to indirectly alleviate the ringing artifacts in spatial domain. The advantages of the proposed method over the existing ones are multi-fold: more reasonably structure correlation representability, less processing time, and less artifacts in the overlapped regions. The proposed method is extensively evaluated on several benchmarks, and significantly outperforms state-of-the-art MSI denoising methods.
近年来基于低秩的矩阵/张量恢复方法在多光谱图像去噪中得到了广泛的探索。然而,这些方法忽略了内在结构相关性在空间稀疏性、谱相关性和非局部自相似模式上的差异。在本文中,我们进一步详细分析了矩阵和张量情况下的秩性质,并指出非局部自相似是关键因素,而其他情况下的低秩假设可能不成立。这促使我们设计一种简单而有效的单向低秩张量恢复模型,该模型能够真实地捕获内在结构相关性并减少计算负担。然而,由于重叠的斑块/立方体聚集,低秩模型会受到环形伪影的影响。与以往的方法依赖于空间信息相比,我们提供了一个新的视角,即利用msi中专有的光谱信息来解决这个问题。引入基于分析的超拉普拉斯先验对全局谱结构进行建模,间接缓解了空间域的环形伪影。与现有方法相比,该方法具有结构相关性更合理、处理时间更短、重叠区域伪影更少等优点。所提出的方法在几个基准上进行了广泛的评估,并且明显优于最先进的MSI去噪方法。
{"title":"Hyper-Laplacian Regularized Unidirectional Low-Rank Tensor Recovery for Multispectral Image Denoising","authors":"Yi Chang, Luxin Yan, Sheng Zhong","doi":"10.1109/CVPR.2017.625","DOIUrl":"https://doi.org/10.1109/CVPR.2017.625","url":null,"abstract":"Recent low-rank based matrix/tensor recovery methods have been widely explored in multispectral images (MSI) denoising. These methods, however, ignore the difference of the intrinsic structure correlation along spatial sparsity, spectral correlation and non-local self-similarity mode. In this paper, we go further by giving a detailed analysis about the rank properties both in matrix and tensor cases, and figure out the non-local self-similarity is the key ingredient, while the low-rank assumption of others may not hold. This motivates us to design a simple yet effective unidirectional low-rank tensor recovery model that is capable of truthfully capturing the intrinsic structure correlation with reduced computational burden. However, the low-rank models suffer from the ringing artifacts, due to the aggregation of overlapped patches/cubics. While previous methods resort to spatial information, we offer a new perspective by utilizing the exclusively spectral information in MSIs to address the issue. The analysis-based hyper-Laplacian prior is introduced to model the global spectral structures, so as to indirectly alleviate the ringing artifacts in spatial domain. The advantages of the proposed method over the existing ones are multi-fold: more reasonably structure correlation representability, less processing time, and less artifacts in the overlapped regions. The proposed method is extensively evaluated on several benchmarks, and significantly outperforms state-of-the-art MSI denoising methods.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"32 1","pages":"5901-5909"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84796285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 141
G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition G2DeNet:全局高斯分布嵌入网络及其在视觉识别中的应用
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.689
Qilong Wang, P. Li, Lei Zhang
Recently, plugging trainable structural layers into deep convolutional neural networks (CNNs) as image representations has made promising progress. However, there has been little work on inserting parametric probability distributions, which can effectively model feature statistics, into deep CNNs in an end-to-end manner. This paper proposes a Global Gaussian Distribution embedding Network (G2DeNet) to take a step towards addressing this problem. The core of G2DeNet is a novel trainable layer of a global Gaussian as an image representation plugged into deep CNNs for end-to-end learning. The challenge is that the proposed layer involves Gaussian distributions whose space is not a linear space, which makes its forward and backward propagations be non-intuitive and non-trivial. To tackle this issue, we employ a Gaussian embedding strategy which respects the structures of both Riemannian manifold and smooth group of Gaussians. Based on this strategy, we construct the proposed global Gaussian embedding layer and decompose it into two sub-layers: the matrix partition sub-layer decoupling the mean vector and covariance matrix entangled in the embedding matrix, and the square-rooted, symmetric positive definite matrix sub-layer. In this way, we can derive the partial derivatives associated with the proposed structural layer and thus allow backpropagation of gradients. Experimental results on large scale region classification and fine-grained recognition tasks show that G2DeNet is superior to its counterparts, capable of achieving state-of-the-art performance.
最近,将可训练结构层插入深度卷积神经网络(cnn)作为图像表示已经取得了可喜的进展。然而,在以端到端方式将参数概率分布(能够有效地对特征统计进行建模)插入深度cnn中,很少有研究。本文提出了一种全局高斯分布嵌入网络(G2DeNet)来解决这个问题。G2DeNet的核心是一个新颖的全局高斯可训练层,作为嵌入深度cnn的图像表示,用于端到端学习。挑战在于所提出的层涉及高斯分布,其空间不是线性空间,这使得其向前和向后传播是非直观和非平凡的。为了解决这个问题,我们采用了一种高斯嵌入策略,该策略既尊重黎曼流形的结构,也尊重光滑高斯群的结构。基于此策略,我们构造了所提出的全局高斯嵌入层,并将其分解为两个子层:解耦嵌入矩阵中均值向量和协方差矩阵的矩阵划分子层和平方根对称正定矩阵子层。通过这种方式,我们可以推导出与所提出的结构层相关的偏导数,从而允许梯度的反向传播。在大尺度区域分类和细粒度识别任务上的实验结果表明,G2DeNet优于同类算法,能够达到最先进的性能。
{"title":"G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition","authors":"Qilong Wang, P. Li, Lei Zhang","doi":"10.1109/CVPR.2017.689","DOIUrl":"https://doi.org/10.1109/CVPR.2017.689","url":null,"abstract":"Recently, plugging trainable structural layers into deep convolutional neural networks (CNNs) as image representations has made promising progress. However, there has been little work on inserting parametric probability distributions, which can effectively model feature statistics, into deep CNNs in an end-to-end manner. This paper proposes a Global Gaussian Distribution embedding Network (G2DeNet) to take a step towards addressing this problem. The core of G2DeNet is a novel trainable layer of a global Gaussian as an image representation plugged into deep CNNs for end-to-end learning. The challenge is that the proposed layer involves Gaussian distributions whose space is not a linear space, which makes its forward and backward propagations be non-intuitive and non-trivial. To tackle this issue, we employ a Gaussian embedding strategy which respects the structures of both Riemannian manifold and smooth group of Gaussians. Based on this strategy, we construct the proposed global Gaussian embedding layer and decompose it into two sub-layers: the matrix partition sub-layer decoupling the mean vector and covariance matrix entangled in the embedding matrix, and the square-rooted, symmetric positive definite matrix sub-layer. In this way, we can derive the partial derivatives associated with the proposed structural layer and thus allow backpropagation of gradients. Experimental results on large scale region classification and fine-grained recognition tasks show that G2DeNet is superior to its counterparts, capable of achieving state-of-the-art performance.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"198 1","pages":"6507-6516"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90359214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 99
Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition 看得近看得好:用于细粒度图像识别的循环注意卷积神经网络
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.476
Jianlong Fu, Heliang Zheng, Tao Mei
Recognizing fine-grained categories (e.g., bird species) is difficult due to the challenges of discriminative region localization and fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that region detection and fine-grained feature learning are mutually correlated and thus can reinforce each other. In this paper, we propose a novel recurrent attention convolutional neural network (RA-CNN) which recursively learns discriminative region attention and region-based feature representation at multiple scales in a mutual reinforced way. The learning at each scale consists of a classification sub-network and an attention proposal sub-network (APN). The APN starts from full images, and iteratively generates region attention from coarse to fine by taking previous prediction as a reference, while the finer scale network takes as input an amplified attended region from previous scale in a recurrent way. The proposed RA-CNN is optimized by an intra-scale classification loss and an inter-scale ranking loss, to mutually learn accurate region attention and fine-grained representation. RA-CNN does not need bounding box/part annotations and can be trained end-to-end. We conduct comprehensive experiments and show that RA-CNN achieves the best performance in three fine-grained tasks, with relative accuracy gains of 3.3%, 3.7%, 3.8%, on CUB Birds, Stanford Dogs and Stanford Cars, respectively.
由于判别区域定位和细粒度特征学习的挑战,识别细粒度类别(如鸟类)是困难的。现有的方法主要是独立解决这些挑战,而忽略了区域检测和细粒度特征学习是相互关联的,因此可以相互加强。本文提出了一种新的递归注意卷积神经网络(RA-CNN),该网络在多尺度上以相互强化的方式递归学习判别区域注意和基于区域的特征表示。每个尺度上的学习由分类子网络和注意建议子网络(APN)组成。APN从完整图像开始,以之前的预测为参考,由粗到细迭代生成区域关注,而更精细的尺度网络以循环的方式从之前的尺度中放大一个被关注的区域作为输入。本文提出的RA-CNN通过尺度内分类损失和尺度间排序损失进行优化,以相互学习准确的区域关注和细粒度表示。RA-CNN不需要边界框/部分注释,可以端到端进行训练。我们进行了全面的实验,结果表明,RA-CNN在三个细粒度任务中取得了最好的性能,在CUB Birds、Stanford Dogs和Stanford Cars上的相对准确率分别提高了3.3%、3.7%和3.8%。
{"title":"Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition","authors":"Jianlong Fu, Heliang Zheng, Tao Mei","doi":"10.1109/CVPR.2017.476","DOIUrl":"https://doi.org/10.1109/CVPR.2017.476","url":null,"abstract":"Recognizing fine-grained categories (e.g., bird species) is difficult due to the challenges of discriminative region localization and fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that region detection and fine-grained feature learning are mutually correlated and thus can reinforce each other. In this paper, we propose a novel recurrent attention convolutional neural network (RA-CNN) which recursively learns discriminative region attention and region-based feature representation at multiple scales in a mutual reinforced way. The learning at each scale consists of a classification sub-network and an attention proposal sub-network (APN). The APN starts from full images, and iteratively generates region attention from coarse to fine by taking previous prediction as a reference, while the finer scale network takes as input an amplified attended region from previous scale in a recurrent way. The proposed RA-CNN is optimized by an intra-scale classification loss and an inter-scale ranking loss, to mutually learn accurate region attention and fine-grained representation. RA-CNN does not need bounding box/part annotations and can be trained end-to-end. We conduct comprehensive experiments and show that RA-CNN achieves the best performance in three fine-grained tasks, with relative accuracy gains of 3.3%, 3.7%, 3.8%, on CUB Birds, Stanford Dogs and Stanford Cars, respectively.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"15 1","pages":"4476-4484"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73415819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1037
Image Deblurring via Extreme Channels Prior 图像去模糊通过极端通道先验
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.738
Yanyang Yan, Wenqi Ren, Yuanfang Guo, Rui Wang, Xiaochun Cao
Camera motion introduces motion blur, affecting many computer vision tasks. Dark Channel Prior (DCP) helps the blind deblurring on scenes including natural, face, text, and low-illumination images. However, it has limitations and is less likely to support the kernel estimation while bright pixels dominate the input image. We observe that the bright pixels in the clear images are not likely to be bright after the blur process. Based on this observation, we first illustrate this phenomenon mathematically and define it as the Bright Channel Prior (BCP). Then, we propose a technique for deblurring such images which elevates the performance of existing motion deblurring algorithms. The proposed method takes advantage of both Bright and Dark Channel Prior. This joint prior is named as extreme channels prior and is crucial for achieving efficient restorations by leveraging both the bright and dark information. Extensive experimental results demonstrate that the proposed method is more robust and performs favorably against the state-of-the-art image deblurring methods on both synthesized and natural images.
摄像机运动引入了运动模糊,影响了许多计算机视觉任务。暗通道先验(DCP)可以帮助消除自然、人脸、文字和低照度图像等场景的盲目模糊。然而,它有局限性,当明亮的像素主导输入图像时,它不太可能支持核估计。我们观察到,清晰图像中的明亮像素在模糊处理后不太可能变得明亮。基于这一观察,我们首先用数学方法说明了这一现象,并将其定义为明亮信道先验(BCP)。然后,我们提出了一种去模糊图像的技术,提高了现有运动去模糊算法的性能。该方法同时利用了明暗通道先验。这种联合先验被称为极端通道先验,通过利用明暗信息来实现有效的恢复至关重要。大量的实验结果表明,所提出的方法具有更强的鲁棒性,并且在合成图像和自然图像上都优于最先进的图像去模糊方法。
{"title":"Image Deblurring via Extreme Channels Prior","authors":"Yanyang Yan, Wenqi Ren, Yuanfang Guo, Rui Wang, Xiaochun Cao","doi":"10.1109/CVPR.2017.738","DOIUrl":"https://doi.org/10.1109/CVPR.2017.738","url":null,"abstract":"Camera motion introduces motion blur, affecting many computer vision tasks. Dark Channel Prior (DCP) helps the blind deblurring on scenes including natural, face, text, and low-illumination images. However, it has limitations and is less likely to support the kernel estimation while bright pixels dominate the input image. We observe that the bright pixels in the clear images are not likely to be bright after the blur process. Based on this observation, we first illustrate this phenomenon mathematically and define it as the Bright Channel Prior (BCP). Then, we propose a technique for deblurring such images which elevates the performance of existing motion deblurring algorithms. The proposed method takes advantage of both Bright and Dark Channel Prior. This joint prior is named as extreme channels prior and is crucial for achieving efficient restorations by leveraging both the bright and dark information. Extensive experimental results demonstrate that the proposed method is more robust and performs favorably against the state-of-the-art image deblurring methods on both synthesized and natural images.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"5 1","pages":"6978-6986"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79269453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 265
SST: Single-Stream Temporal Action Proposals SST:单流暂时行动建议
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.675
S. Buch, Victor Escorcia, Chuanqi Shen, Bernard Ghanem, Juan Carlos Niebles
Our paper presents a new approach for temporal detection of human actions in long, untrimmed video sequences. We introduce Single-Stream Temporal Action Proposals (SST), a new effective and efficient deep architecture for the generation of temporal action proposals. Our network can run continuously in a single stream over very long input video sequences, without the need to divide input into short overlapping clips or temporal windows for batch processing. We demonstrate empirically that our model outperforms the state-of-the-art on the task of temporal action proposal generation, while achieving some of the fastest processing speeds in the literature. Finally, we demonstrate that using SST proposals in conjunction with existing action classifiers results in improved state-of-the-art temporal action detection performance.
我们的论文提出了一种新的方法来检测人类行为的时间长,未经修剪的视频序列。我们介绍了单流时间动作提议(Single-Stream Temporal Action Proposals, SST),这是一种新的高效的时间动作提议生成深度架构。我们的网络可以在非常长的输入视频序列中连续运行在单个流中,而不需要将输入分成短的重叠片段或时间窗口进行批处理。我们通过经验证明,我们的模型在时间动作提案生成任务上优于最先进的技术,同时实现了文献中最快的处理速度。最后,我们证明了将SST建议与现有动作分类器结合使用可以提高最先进的时间动作检测性能。
{"title":"SST: Single-Stream Temporal Action Proposals","authors":"S. Buch, Victor Escorcia, Chuanqi Shen, Bernard Ghanem, Juan Carlos Niebles","doi":"10.1109/CVPR.2017.675","DOIUrl":"https://doi.org/10.1109/CVPR.2017.675","url":null,"abstract":"Our paper presents a new approach for temporal detection of human actions in long, untrimmed video sequences. We introduce Single-Stream Temporal Action Proposals (SST), a new effective and efficient deep architecture for the generation of temporal action proposals. Our network can run continuously in a single stream over very long input video sequences, without the need to divide input into short overlapping clips or temporal windows for batch processing. We demonstrate empirically that our model outperforms the state-of-the-art on the task of temporal action proposal generation, while achieving some of the fastest processing speeds in the literature. Finally, we demonstrate that using SST proposals in conjunction with existing action classifiers results in improved state-of-the-art temporal action detection performance.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"27 1","pages":"6373-6382"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81896170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 376
Face Normals "In-the-Wild" Using Fully Convolutional Networks 使用完全卷积网络“在野外”面对法线
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.44
George Trigeorgis, Patrick Snape, Iasonas Kokkinos, S. Zafeiriou
In this work we pursue a data-driven approach to the problem of estimating surface normals from a single intensity image, focusing in particular on human faces. We introduce new methods to exploit the currently available facial databases for dataset construction and tailor a deep convolutional neural network to the task of estimating facial surface normals in-the-wild. We train a fully convolutional network that can accurately recover facial normals from images including a challenging variety of expressions and facial poses. We compare against state-of-the-art face Shape-from-Shading and 3D reconstruction techniques and show that the proposed network can recover substantially more accurate and realistic normals. Furthermore, in contrast to other existing face-specific surface recovery methods, we do not require the solving of an explicit alignment step due to the fully convolutional nature of our network.
在这项工作中,我们采用数据驱动的方法来解决从单个强度图像中估计表面法线的问题,特别关注人脸。我们引入了新的方法来利用现有的面部数据库来构建数据集,并定制了一个深度卷积神经网络来估计野外面部表面法线。我们训练了一个完全卷积的网络,可以准确地从图像中恢复面部法线,包括具有挑战性的各种表情和面部姿势。我们与最先进的面部形状-从阴影和3D重建技术进行了比较,并表明所提出的网络可以恢复更准确和真实的法线。此外,与其他现有的特定于人脸的表面恢复方法相比,由于我们的网络具有完全卷积的性质,我们不需要求解显式对齐步骤。
{"title":"Face Normals \"In-the-Wild\" Using Fully Convolutional Networks","authors":"George Trigeorgis, Patrick Snape, Iasonas Kokkinos, S. Zafeiriou","doi":"10.1109/CVPR.2017.44","DOIUrl":"https://doi.org/10.1109/CVPR.2017.44","url":null,"abstract":"In this work we pursue a data-driven approach to the problem of estimating surface normals from a single intensity image, focusing in particular on human faces. We introduce new methods to exploit the currently available facial databases for dataset construction and tailor a deep convolutional neural network to the task of estimating facial surface normals in-the-wild. We train a fully convolutional network that can accurately recover facial normals from images including a challenging variety of expressions and facial poses. We compare against state-of-the-art face Shape-from-Shading and 3D reconstruction techniques and show that the proposed network can recover substantially more accurate and realistic normals. Furthermore, in contrast to other existing face-specific surface recovery methods, we do not require the solving of an explicit alignment step due to the fully convolutional nature of our network.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"7 1","pages":"340-349"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84185172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Object Co-skeletonization with Co-segmentation 对象协同骨架化与协同分割
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.413
Koteswar Rao Jerripothula, Jianfei Cai, Jiangbo Lu, Junsong Yuan
Recent advances in the joint processing of images have certainly shown its advantages over the individual processing. Different from the existing works geared towards co-segmentation or co-localization, in this paper, we explore a new joint processing topic: co-skeletonization, which is defined as joint skeleton extraction of common objects in a set of semantically similar images. Object skeletonization in real world images is a challenging problem, because there is no prior knowledge of the objects shape if we consider only a single image. This motivates us to resort to the idea of object co-skeletonization hoping that the commonness prior existing across the similar images may help, just as it does for other joint processing problems such as co-segmentation. Noting that skeleton can provide good scribbles for segmentation, and skeletonization, in turn, needs good segmentation, we propose a coupled framework for co-skeletonization and co-segmentation tasks so that they are well informed by each other, and benefit each other synergistically. Since it is a new problem, we also construct a benchmark dataset for the co-skeletonization task. Extensive experiments demonstrate that proposed method achieves very competitive results.
近年来在图像联合处理方面的进展已经显示出其相对于单独处理的优势。不同于现有的共同分割或共同定位的工作,本文探索了一个新的联合处理主题:共同骨架化,即在一组语义相似的图像中提取共同目标的联合骨架。现实世界图像中的物体骨架化是一个具有挑战性的问题,因为如果我们只考虑一张图像,就没有关于物体形状的先验知识。这促使我们求助于对象共骨架化的想法,希望类似图像之间存在的共性可以有所帮助,就像它对其他联合处理问题(如共分割)所做的那样。注意到骨架可以为分割提供良好的草稿,而骨架化反过来又需要良好的分割,我们提出了一个耦合框架,用于共同骨架化和共同分割任务,使它们相互了解,并相互协同受益。由于这是一个新问题,我们还为协同骨架化任务构建了一个基准数据集。大量的实验表明,该方法取得了很好的效果。
{"title":"Object Co-skeletonization with Co-segmentation","authors":"Koteswar Rao Jerripothula, Jianfei Cai, Jiangbo Lu, Junsong Yuan","doi":"10.1109/CVPR.2017.413","DOIUrl":"https://doi.org/10.1109/CVPR.2017.413","url":null,"abstract":"Recent advances in the joint processing of images have certainly shown its advantages over the individual processing. Different from the existing works geared towards co-segmentation or co-localization, in this paper, we explore a new joint processing topic: co-skeletonization, which is defined as joint skeleton extraction of common objects in a set of semantically similar images. Object skeletonization in real world images is a challenging problem, because there is no prior knowledge of the objects shape if we consider only a single image. This motivates us to resort to the idea of object co-skeletonization hoping that the commonness prior existing across the similar images may help, just as it does for other joint processing problems such as co-segmentation. Noting that skeleton can provide good scribbles for segmentation, and skeletonization, in turn, needs good segmentation, we propose a coupled framework for co-skeletonization and co-segmentation tasks so that they are well informed by each other, and benefit each other synergistically. Since it is a new problem, we also construct a benchmark dataset for the co-skeletonization task. Extensive experiments demonstrate that proposed method achieves very competitive results.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"41 1","pages":"3881-3889"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89260881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
StyleNet: Generating Attractive Visual Captions with Styles StyleNet:用样式生成吸引人的视觉标题
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.108
Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, L. Deng
We propose a novel framework named StyleNet to address the task of generating attractive captions for images and videos with different styles. To this end, we devise a novel model component, named factored LSTM, which automatically distills the style factors in the monolingual text corpus. Then at runtime, we can explicitly control the style in the caption generation process so as to produce attractive visual captions with the desired style. Our approach achieves this goal by leveraging two sets of data: 1) factual image/video-caption paired data, and 2) stylized monolingual text data (e.g., romantic and humorous sentences). We show experimentally that StyleNet outperforms existing approaches for generating visual captions with different styles, measured in both automatic and human evaluation metrics on the newly collected FlickrStyle10K image caption dataset, which contains 10K Flickr images with corresponding humorous and romantic captions.
我们提出了一个名为StyleNet的新框架来解决为不同风格的图像和视频生成吸引人的字幕的任务。为此,我们设计了一种新的模型组件,命名为因子LSTM,它可以自动提取单语文本语料库中的风格因素。然后在运行时,我们可以显式地控制标题生成过程中的样式,从而生成具有所需样式的有吸引力的视觉标题。我们的方法通过利用两组数据来实现这一目标:1)事实图像/视频标题配对数据,以及2)风格化的单语文本数据(例如,浪漫和幽默的句子)。我们通过实验证明,StyleNet在生成不同风格的视觉字幕方面优于现有的方法,在新收集的FlickrStyle10K图像标题数据集上进行了自动和人工评估指标的测量,该数据集包含10K带有相应幽默和浪漫字幕的Flickr图像。
{"title":"StyleNet: Generating Attractive Visual Captions with Styles","authors":"Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, L. Deng","doi":"10.1109/CVPR.2017.108","DOIUrl":"https://doi.org/10.1109/CVPR.2017.108","url":null,"abstract":"We propose a novel framework named StyleNet to address the task of generating attractive captions for images and videos with different styles. To this end, we devise a novel model component, named factored LSTM, which automatically distills the style factors in the monolingual text corpus. Then at runtime, we can explicitly control the style in the caption generation process so as to produce attractive visual captions with the desired style. Our approach achieves this goal by leveraging two sets of data: 1) factual image/video-caption paired data, and 2) stylized monolingual text data (e.g., romantic and humorous sentences). We show experimentally that StyleNet outperforms existing approaches for generating visual captions with different styles, measured in both automatic and human evaluation metrics on the newly collected FlickrStyle10K image caption dataset, which contains 10K Flickr images with corresponding humorous and romantic captions.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"11 1","pages":"955-964"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86353250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 243
SGM-Nets: Semi-Global Matching with Neural Networks SGM-Nets:神经网络的半全局匹配
Pub Date : 2017-07-21 DOI: 10.1109/CVPR.2017.703
A. Seki, M. Pollefeys
This paper deals with deep neural networks for predicting accurate dense disparity map with Semi-global matching (SGM). SGM is a widely used regularization method for real scenes because of its high accuracy and fast computation speed. Even though SGM can obtain accurate results, tuning of SGMs penalty-parameters, which control a smoothness and discontinuity of a disparity map, is uneasy and empirical methods have been proposed. We propose a learning based penalties estimation method, which we call SGM-Nets that consist of Convolutional Neural Networks. A small image patch and its position are input into SGMNets to predict the penalties for the 3D object structures. In order to train the networks, we introduce a novel loss function which is able to use sparsely annotated disparity maps such as captured by a LiDAR sensor in real environments. Moreover, we propose a novel SGM parameterization, which deploys different penalties depending on either positive or negative disparity changes in order to represent the object structures more discriminatively. Our SGM-Nets outperformed state of the art accuracy on KITTI benchmark datasets.
本文研究了基于深度神经网络的半全局匹配(SGM)密集视差精确预测方法。SGM算法具有精度高、计算速度快等优点,是一种应用广泛的真实场景正则化方法。尽管SGM可以获得精确的结果,但控制视差图平滑性和不连续性的SGM惩罚参数的调整是一个棘手的问题,人们提出了经验方法。我们提出了一种基于学习的惩罚估计方法,我们称之为由卷积神经网络组成的SGM-Nets。将图像小块及其位置输入到SGMNets中,用于预测三维目标结构的惩罚。为了训练网络,我们引入了一种新的损失函数,它能够使用稀疏注释的视差图,例如在真实环境中由激光雷达传感器捕获的视差图。此外,我们提出了一种新的SGM参数化,该参数化根据正或负视差变化部署不同的惩罚,以便更有区别地表示目标结构。我们的SGM-Nets在KITTI基准数据集上的准确性优于最先进的状态。
{"title":"SGM-Nets: Semi-Global Matching with Neural Networks","authors":"A. Seki, M. Pollefeys","doi":"10.1109/CVPR.2017.703","DOIUrl":"https://doi.org/10.1109/CVPR.2017.703","url":null,"abstract":"This paper deals with deep neural networks for predicting accurate dense disparity map with Semi-global matching (SGM). SGM is a widely used regularization method for real scenes because of its high accuracy and fast computation speed. Even though SGM can obtain accurate results, tuning of SGMs penalty-parameters, which control a smoothness and discontinuity of a disparity map, is uneasy and empirical methods have been proposed. We propose a learning based penalties estimation method, which we call SGM-Nets that consist of Convolutional Neural Networks. A small image patch and its position are input into SGMNets to predict the penalties for the 3D object structures. In order to train the networks, we introduce a novel loss function which is able to use sparsely annotated disparity maps such as captured by a LiDAR sensor in real environments. Moreover, we propose a novel SGM parameterization, which deploys different penalties depending on either positive or negative disparity changes in order to represent the object structures more discriminatively. Our SGM-Nets outperformed state of the art accuracy on KITTI benchmark datasets.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"39 1","pages":"6640-6649"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79004005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 220
期刊
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1