首页 > 最新文献

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Channel Attention Based Iterative Residual Learning for Depth Map Super-Resolution 基于通道注意的深度图超分辨率迭代残差学习
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00567
Xibin Song, Yuchao Dai, Dingfu Zhou, Liu Liu, Wei Li, H. Li, Ruigang Yang
Despite the remarkable progresses made in deep learning based depth map super-resolution (DSR), how to tackle real-world degradation in low-resolution (LR) depth maps remains a major challenge. Existing DSR model is generally trained and tested on synthetic dataset, which is very different from what would get from a real depth sensor. In this paper, we argue that DSR models trained under this setting are restrictive and not effective in dealing with realworld DSR tasks. We make two contributions in tackling real-world degradation of different depth sensors. First, we propose to classify the generation of LR depth maps into two types: non-linear downsampling with noise and interval downsampling, for which DSR models are learned correspondingly. Second, we propose a new framework for real-world DSR, which consists of four modules : 1) An iterative residual learning module with deep supervision to learn effective high-frequency components of depth maps in a coarse-to-fine manner; 2) A channel attention strategy to enhance channels with abundant high-frequency components; 3) A multi-stage fusion module to effectively reexploit the results in the coarse-to-fine process; and 4) A depth refinement module to improve the depth map by TGV regularization and input loss. Extensive experiments on benchmarking datasets demonstrate the superiority of our method over current state-of-the-art DSR methods.
尽管在基于深度学习的深度图超分辨率(DSR)方面取得了显著进展,但如何解决低分辨率(LR)深度图在现实世界中的退化问题仍然是一个主要挑战。现有的DSR模型一般都是在合成数据集上进行训练和测试的,这与真实深度传感器得到的结果有很大的差异。在本文中,我们认为在这种设置下训练的DSR模型是限制性的,并且不能有效地处理现实世界的DSR任务。在解决不同深度传感器的实际退化问题方面,我们做出了两个贡献。首先,我们提出将LR深度图的生成分为两种类型:带噪声的非线性下采样和区间下采样,并相应地学习DSR模型。其次,我们提出了一种新的现实DSR框架,该框架由四个模块组成:1)具有深度监督的迭代残差学习模块,以粗到精的方式学习深度图的有效高频分量;2)采用通道注意策略,增强高频成分丰富的通道;3)多阶段融合模块,可有效再利用粗到精过程的结果;4)深度细化模块,通过TGV正则化和输入损失对深度图进行改进。在基准数据集上进行的大量实验表明,我们的方法优于当前最先进的DSR方法。
{"title":"Channel Attention Based Iterative Residual Learning for Depth Map Super-Resolution","authors":"Xibin Song, Yuchao Dai, Dingfu Zhou, Liu Liu, Wei Li, H. Li, Ruigang Yang","doi":"10.1109/cvpr42600.2020.00567","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00567","url":null,"abstract":"Despite the remarkable progresses made in deep learning based depth map super-resolution (DSR), how to tackle real-world degradation in low-resolution (LR) depth maps remains a major challenge. Existing DSR model is generally trained and tested on synthetic dataset, which is very different from what would get from a real depth sensor. In this paper, we argue that DSR models trained under this setting are restrictive and not effective in dealing with realworld DSR tasks. We make two contributions in tackling real-world degradation of different depth sensors. First, we propose to classify the generation of LR depth maps into two types: non-linear downsampling with noise and interval downsampling, for which DSR models are learned correspondingly. Second, we propose a new framework for real-world DSR, which consists of four modules : 1) An iterative residual learning module with deep supervision to learn effective high-frequency components of depth maps in a coarse-to-fine manner; 2) A channel attention strategy to enhance channels with abundant high-frequency components; 3) A multi-stage fusion module to effectively reexploit the results in the coarse-to-fine process; and 4) A depth refinement module to improve the depth map by TGV regularization and input loss. Extensive experiments on benchmarking datasets demonstrate the superiority of our method over current state-of-the-art DSR methods.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"257 1","pages":"5630-5639"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77047520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval 基于细粒度草图的混合模态拼图图像检索
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.01036
Kaiyue Pang, Yongxin Yang, Timothy M. Hospedales, T. Xiang, Yi-Zhe Song
ImageNet pre-training has long been considered crucial by the fine-grained sketch-based image retrieval (FG-SBIR) community due to the lack of large sketch-photo paired datasets for FG-SBIR training. In this paper, we propose a self-supervised alternative for representation pre-training. Specifically, we consider the jigsaw puzzle game of recomposing images from shuffled parts. We identify two key facets of jigsaw task design that are required for effective FG-SBIR pre-training. The first is formulating the puzzle in a mixed-modality fashion. Second we show that framing the optimisation as permutation matrix inference via Sinkhorn iterations is more effective than the common classifier formulation of Jigsaw self-supervision. Experiments show that this self-supervised pre-training strategy significantly outperforms the standard ImageNet-based pipeline across all four product-level FG-SBIR benchmarks. Interestingly it also leads to improved cross-category generalisation across both pre-train/fine-tune and fine-tune/testing stages.
ImageNet预训练一直被细粒度素描图像检索(FG-SBIR)社区认为是至关重要的,因为缺乏用于FG-SBIR训练的大型素描照片配对数据集。在本文中,我们提出了一种自监督的表征预训练方法。具体来说,我们考虑的拼图游戏重组图像从洗牌部分。我们确定了有效的FG-SBIR预训练所需的拼图任务设计的两个关键方面。第一种方法是以混合模态的方式来构思谜题。其次,我们证明了通过Sinkhorn迭代将优化框架化为排列矩阵推理比Jigsaw自监督的常见分类器公式更有效。实验表明,在所有四个产品级FG-SBIR基准测试中,这种自监督预训练策略显著优于标准的基于imagenet的管道。有趣的是,它还可以改善预训练/微调和微调/测试阶段的跨类别泛化。
{"title":"Solving Mixed-Modal Jigsaw Puzzle for Fine-Grained Sketch-Based Image Retrieval","authors":"Kaiyue Pang, Yongxin Yang, Timothy M. Hospedales, T. Xiang, Yi-Zhe Song","doi":"10.1109/cvpr42600.2020.01036","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.01036","url":null,"abstract":"ImageNet pre-training has long been considered crucial by the fine-grained sketch-based image retrieval (FG-SBIR) community due to the lack of large sketch-photo paired datasets for FG-SBIR training. In this paper, we propose a self-supervised alternative for representation pre-training. Specifically, we consider the jigsaw puzzle game of recomposing images from shuffled parts. We identify two key facets of jigsaw task design that are required for effective FG-SBIR pre-training. The first is formulating the puzzle in a mixed-modality fashion. Second we show that framing the optimisation as permutation matrix inference via Sinkhorn iterations is more effective than the common classifier formulation of Jigsaw self-supervision. Experiments show that this self-supervised pre-training strategy significantly outperforms the standard ImageNet-based pipeline across all four product-level FG-SBIR benchmarks. Interestingly it also leads to improved cross-category generalisation across both pre-train/fine-tune and fine-tune/testing stages.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"33 1","pages":"10344-10352"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77307333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Progressive Adversarial Networks for Fine-Grained Domain Adaptation 用于细粒度领域自适应的渐进式对抗网络
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00923
Sinan Wang, Xinyang Chen, Yunbo Wang, Mingsheng Long, Jianmin Wang
Fine-grained visual categorization has long been considered as an important problem, however, its real application is still restricted, since precisely annotating a large fine-grained image dataset is a laborious task and requires expert-level human knowledge. A solution to this problem is applying domain adaptation approaches to fine-grained scenarios, where the key idea is to discover the commonality between existing fine-grained image datasets and massive unlabeled data in the wild. The main technical bottleneck lies in that the large inter-domain variation will deteriorate the subtle boundaries of small inter-class variation during domain alignment. This paper presents the Progressive Adversarial Networks (PAN) to align fine-grained categories across domains with a curriculum-based adversarial learning framework. In particular, throughout the learning process, domain adaptation is carried out through all multi-grained features, progressively exploiting the label hierarchy from coarse to fine. The progressive learning is applied upon both category classification and domain alignment, boosting both the discriminability and the transferability of the fine-grained features. Our method is evaluated on three benchmarks, two of which are proposed by us, and it outperforms the state-of-the-art domain adaptation methods.
细粒度视觉分类一直被认为是一个重要的问题,然而,它的实际应用仍然受到限制,因为精确标注大型细粒度图像数据集是一项费力的任务,需要专家水平的人类知识。这个问题的解决方案是将领域自适应方法应用于细粒度场景,其关键思想是发现现有细粒度图像数据集与大量未标记数据之间的共性。主要的技术瓶颈在于大的域间变异会破坏域对齐过程中小的类间变异的微妙边界。本文提出了渐进式对抗网络(PAN),通过基于课程的对抗学习框架来对齐跨领域的细粒度类别。特别是,在整个学习过程中,通过所有多粒度特征进行域适应,逐步利用从粗到细的标签层次结构。将渐进式学习应用于类别分类和领域对齐,提高了细粒度特征的可辨别性和可转移性。我们的方法在三个基准上进行了评估,其中两个是我们提出的,它优于最先进的领域自适应方法。
{"title":"Progressive Adversarial Networks for Fine-Grained Domain Adaptation","authors":"Sinan Wang, Xinyang Chen, Yunbo Wang, Mingsheng Long, Jianmin Wang","doi":"10.1109/cvpr42600.2020.00923","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00923","url":null,"abstract":"Fine-grained visual categorization has long been considered as an important problem, however, its real application is still restricted, since precisely annotating a large fine-grained image dataset is a laborious task and requires expert-level human knowledge. A solution to this problem is applying domain adaptation approaches to fine-grained scenarios, where the key idea is to discover the commonality between existing fine-grained image datasets and massive unlabeled data in the wild. The main technical bottleneck lies in that the large inter-domain variation will deteriorate the subtle boundaries of small inter-class variation during domain alignment. This paper presents the Progressive Adversarial Networks (PAN) to align fine-grained categories across domains with a curriculum-based adversarial learning framework. In particular, throughout the learning process, domain adaptation is carried out through all multi-grained features, progressively exploiting the label hierarchy from coarse to fine. The progressive learning is applied upon both category classification and domain alignment, boosting both the discriminability and the transferability of the fine-grained features. Our method is evaluated on three benchmarks, two of which are proposed by us, and it outperforms the state-of-the-art domain adaptation methods.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"35 1","pages":"9210-9219"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76554915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
TESA: Tensor Element Self-Attention via Matricization 通过矩阵化的张量元素自关注
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.01396
F. Babiloni, Ioannis Marras, G. Slabaugh, S. Zafeiriou
Representation learning is a fundamental part of modern computer vision, where abstract representations of data are encoded as tensors optimized to solve problems like image segmentation and inpainting. Recently, self-attention in the form of Non-Local Block has emerged as a powerful technique to enrich features, by capturing complex interdependencies in feature tensors. However, standard self-attention approaches leverage only spatial relationships, drawing similarities between vectors and overlooking correlations between channels. In this paper, we introduce a new method, called Tensor Element Self-Attention (TESA) that generalizes such work to capture interdependencies along all dimensions of the tensor using matricization. An order R tensor produces R results, one for each dimension. The results are then fused to produce an enriched output which encapsulates similarity among tensor elements. Additionally, we analyze self-attention mathematically, providing new perspectives on how it adjusts the singular values of the input feature tensor. With these new insights, we present experimental results demonstrating how TESA can benefit diverse problems including classification and instance segmentation. By simply adding a TESA module to existing networks, we substantially improve competitive baselines and set new state-of-the-art results for image inpainting on Celeb and low light raw-to-rgb image translation on SID.
表示学习是现代计算机视觉的基本组成部分,其中数据的抽象表示被编码为优化的张量,以解决图像分割和绘图等问题。最近,非局部块形式的自关注已经成为一种强大的技术,通过捕获特征张量中复杂的相互依赖关系来丰富特征。然而,标准的自我关注方法仅利用空间关系,绘制向量之间的相似性,而忽略通道之间的相关性。在本文中,我们引入了一种新的方法,称为张量元素自注意(TESA),它将这种工作推广到使用矩阵化来捕获张量所有维度上的相互依赖性。一个R阶张量产生R个结果,每个维度一个。然后将结果融合以产生丰富的输出,该输出封装了张量元素之间的相似性。此外,我们在数学上分析了自关注,为它如何调整输入特征张量的奇异值提供了新的视角。有了这些新的见解,我们展示了实验结果,证明了TESA如何有利于包括分类和实例分割在内的各种问题。通过简单地将TESA模块添加到现有网络中,我们大大提高了竞争基准,并为Celeb上的图像绘制和SID上的低光原始到rgb图像转换设置了新的最先进的结果。
{"title":"TESA: Tensor Element Self-Attention via Matricization","authors":"F. Babiloni, Ioannis Marras, G. Slabaugh, S. Zafeiriou","doi":"10.1109/cvpr42600.2020.01396","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.01396","url":null,"abstract":"Representation learning is a fundamental part of modern computer vision, where abstract representations of data are encoded as tensors optimized to solve problems like image segmentation and inpainting. Recently, self-attention in the form of Non-Local Block has emerged as a powerful technique to enrich features, by capturing complex interdependencies in feature tensors. However, standard self-attention approaches leverage only spatial relationships, drawing similarities between vectors and overlooking correlations between channels. In this paper, we introduce a new method, called Tensor Element Self-Attention (TESA) that generalizes such work to capture interdependencies along all dimensions of the tensor using matricization. An order R tensor produces R results, one for each dimension. The results are then fused to produce an enriched output which encapsulates similarity among tensor elements. Additionally, we analyze self-attention mathematically, providing new perspectives on how it adjusts the singular values of the input feature tensor. With these new insights, we present experimental results demonstrating how TESA can benefit diverse problems including classification and instance segmentation. By simply adding a TESA module to existing networks, we substantially improve competitive baselines and set new state-of-the-art results for image inpainting on Celeb and low light raw-to-rgb image translation on SID.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"268 1","pages":"13942-13951"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77801357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Self-Supervised Domain-Aware Generative Network for Generalized Zero-Shot Learning 广义零概率学习的自监督域感知生成网络
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.01278
Jiamin Wu, Tianzhu Zhang, Zhengjun Zha, Jiebo Luo, Yongdong Zhang, Feng Wu
Generalized Zero-Shot Learning (GZSL) aims at recognizing both seen and unseen classes by constructing correspondence between visual and semantic embedding. However, existing methods have severely suffered from the strong bias problem, where unseen instances in target domain tend to be recognized as seen classes in source domain. To address this issue, we propose an end-to-end Self-supervised Domain-aware Generative Network (SDGN) by integrating self-supervised learning into feature generating model for unbiased GZSL. The proposed SDGN model enjoys several merits. First, we design a cross-domain feature generating module to synthesize samples with high fidelity based on class embeddings, which involves a novel target domain discriminator to preserve the domain consistency. Second, we propose a self-supervised learning module to investigate inter-domain relationships, where a set of anchors are introduced as a bridge between seen and unseen categories. In the shared space, we pull the distribution of target domain away from source domain, and obtain domain-aware features with high discriminative power for both seen and unseen classes. To our best knowledge, this is the first work to introduce self-supervised learning into GZSL as a learning guidance. Extensive experimental results on five standard benchmarks demonstrate that our model performs favorably against state-of-the-art GZSL methods.
广义零次学习(GZSL)旨在通过构建视觉嵌入和语义嵌入之间的对应关系来识别可见类和不可见类。然而,现有的方法严重存在强偏差问题,即目标域中不可见的实例往往被识别为源域中可见的类。为了解决这一问题,我们将自监督学习集成到无偏GZSL的特征生成模型中,提出了端到端的自监督域感知生成网络(SDGN)。提出的SDGN模型有几个优点。首先,我们设计了一个基于类嵌入的跨域特征生成模块来合成高保真度的样本,该模块采用了一种新的目标域鉴别器来保持域的一致性。其次,我们提出了一个自监督学习模块来研究领域间的关系,其中引入了一组锚点作为可见和不可见类别之间的桥梁。在共享空间中,我们将目标域的分布拉离源域,得到对可见类和不可见类都具有高判别能力的领域感知特征。在五个标准基准上的广泛实验结果表明,我们的模型优于最先进的GZSL方法。
{"title":"Self-Supervised Domain-Aware Generative Network for Generalized Zero-Shot Learning","authors":"Jiamin Wu, Tianzhu Zhang, Zhengjun Zha, Jiebo Luo, Yongdong Zhang, Feng Wu","doi":"10.1109/CVPR42600.2020.01278","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.01278","url":null,"abstract":"Generalized Zero-Shot Learning (GZSL) aims at recognizing both seen and unseen classes by constructing correspondence between visual and semantic embedding. However, existing methods have severely suffered from the strong bias problem, where unseen instances in target domain tend to be recognized as seen classes in source domain. To address this issue, we propose an end-to-end Self-supervised Domain-aware Generative Network (SDGN) by integrating self-supervised learning into feature generating model for unbiased GZSL. The proposed SDGN model enjoys several merits. First, we design a cross-domain feature generating module to synthesize samples with high fidelity based on class embeddings, which involves a novel target domain discriminator to preserve the domain consistency. Second, we propose a self-supervised learning module to investigate inter-domain relationships, where a set of anchors are introduced as a bridge between seen and unseen categories. In the shared space, we pull the distribution of target domain away from source domain, and obtain domain-aware features with high discriminative power for both seen and unseen classes. To our best knowledge, this is the first work to introduce self-supervised learning into GZSL as a learning guidance. Extensive experimental results on five standard benchmarks demonstrate that our model performs favorably against state-of-the-art GZSL methods.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"22 1","pages":"12764-12773"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77863495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Attention-Guided Hierarchical Structure Aggregation for Image Matting 图像抠图的注意引导层次结构聚合
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.01369
Y. Qiao, Yuhao Liu, Xin Yang, D. Zhou, Mingliang Xu, Qiang Zhang, Xiaopeng Wei
Existing deep learning based matting algorithms primarily resort to high-level semantic features to improve the overall structure of alpha mattes. However, we argue that advanced semantics extracted from CNNs contribute unequally for alpha perception and we are supposed to reconcile advanced semantic information with low-level appearance cues to refine the foreground details. In this paper, we propose an end-to-end Hierarchical Attention Matting Network (HAttMatting), which can predict the better structure of alpha mattes from single RGB images without additional input. Specifically, we employ spatial and channel-wise attention to integrate appearance cues and pyramidal features in a novel fashion. This blended attention mechanism can perceive alpha mattes from refined boundaries and adaptive semantics. We also introduce a hybrid loss function fusing Structural SIMilarity (SSIM), Mean Square Error (MSE) and Adversarial loss to guide the network to further improve the overall foreground structure. Besides, we construct a large-scale image matting dataset comprised of 59,600 training images and 1000 test images (total 646 distinct foreground alpha mattes), which can further improve the robustness of our hierarchical structure aggregation model. Extensive experiments demonstrate that the proposed HAttMatting can capture sophisticated foreground structure and achieve state-of-the-art performance with single RGB images as input.
现有的基于深度学习的抠图算法主要依靠高级语义特征来改善alpha抠图的整体结构。然而,我们认为从cnn中提取的高级语义对alpha感知的贡献是不平等的,我们应该将高级语义信息与低级外观线索协调起来,以改进前景细节。在本文中,我们提出了一个端到端的分层注意抠图网络(HAttMatting),它可以在不需要额外输入的情况下从单个RGB图像中预测出更好的alpha抠图结构。具体来说,我们采用空间和渠道的注意力,以一种新颖的方式整合外观线索和金字塔特征。这种混合注意机制可以从精细的边界和自适应语义中感知alpha mattes。我们还引入了融合结构相似度(SSIM)、均方误差(MSE)和对抗损失的混合损失函数,以指导网络进一步改善整体前景结构。此外,我们构建了一个由59,600张训练图像和1000张测试图像(共646张不同前景alpha mattes)组成的大规模图像抠图数据集,进一步提高了我们的分层结构聚合模型的鲁棒性。大量的实验表明,所提出的HAttMatting可以捕获复杂的前景结构,并以单个RGB图像作为输入实现最先进的性能。
{"title":"Attention-Guided Hierarchical Structure Aggregation for Image Matting","authors":"Y. Qiao, Yuhao Liu, Xin Yang, D. Zhou, Mingliang Xu, Qiang Zhang, Xiaopeng Wei","doi":"10.1109/CVPR42600.2020.01369","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.01369","url":null,"abstract":"Existing deep learning based matting algorithms primarily resort to high-level semantic features to improve the overall structure of alpha mattes. However, we argue that advanced semantics extracted from CNNs contribute unequally for alpha perception and we are supposed to reconcile advanced semantic information with low-level appearance cues to refine the foreground details. In this paper, we propose an end-to-end Hierarchical Attention Matting Network (HAttMatting), which can predict the better structure of alpha mattes from single RGB images without additional input. Specifically, we employ spatial and channel-wise attention to integrate appearance cues and pyramidal features in a novel fashion. This blended attention mechanism can perceive alpha mattes from refined boundaries and adaptive semantics. We also introduce a hybrid loss function fusing Structural SIMilarity (SSIM), Mean Square Error (MSE) and Adversarial loss to guide the network to further improve the overall foreground structure. Besides, we construct a large-scale image matting dataset comprised of 59,600 training images and 1000 test images (total 646 distinct foreground alpha mattes), which can further improve the robustness of our hierarchical structure aggregation model. Extensive experiments demonstrate that the proposed HAttMatting can capture sophisticated foreground structure and achieve state-of-the-art performance with single RGB images as input.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"10 1","pages":"13673-13682"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78183209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 109
BFBox: Searching Face-Appropriate Backbone and Feature Pyramid Network for Face Detector BFBox:人脸检测器的人脸匹配骨架和特征金字塔网络搜索
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.01358
Yang Liu, Xu Tang
Popular backbones designed on image classification have demonstrated their considerable compatibility on the task of general object detection. However, the same phenomenon does not appear on the face detection. This is largely due to the average scale of ground-truth in the WiderFace dataset is far smaller than that of generic objects in theCOCO one. To resolve this, the success of Neural Archi-tecture Search (NAS) inspires us to search face-appropriate backbone and featrue pyramid network (FPN) architecture.Firstly, we design the search space for backbone and FPN by comparing performance of feature maps with different backbones and excellent FPN architectures on the face detection. Second, we propose a FPN-attention module to joint search the architecture of backbone and FPN. Finally,we conduct comprehensive experiments on popular bench-marks, including Wider Face, FDDB, AFW and PASCALFace, display the superiority of our proposed method.
基于图像分类设计的主流主干在一般目标检测任务上具有很强的兼容性。然而,同样的现象不会出现在人脸检测上。这主要是由于WiderFace数据集中的ground-truth的平均尺度远远小于coco数据集中的通用对象。为了解决这个问题,神经结构搜索(NAS)的成功启发了我们去搜索适合人脸的骨干和特征金字塔网络(FPN)结构。首先,通过比较不同主干和优秀FPN结构的特征映射在人脸检测上的性能,设计主干和FPN的搜索空间;其次,我们提出了一个FPN关注模块来联合搜索主干网和FPN的结构。最后,我们在wide Face、FDDB、AFW和PASCALFace等常用基准上进行了综合实验,验证了本文方法的优越性。
{"title":"BFBox: Searching Face-Appropriate Backbone and Feature Pyramid Network for Face Detector","authors":"Yang Liu, Xu Tang","doi":"10.1109/cvpr42600.2020.01358","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.01358","url":null,"abstract":"Popular backbones designed on image classification have demonstrated their considerable compatibility on the task of general object detection. However, the same phenomenon does not appear on the face detection. This is largely due to the average scale of ground-truth in the WiderFace dataset is far smaller than that of generic objects in theCOCO one. To resolve this, the success of Neural Archi-tecture Search (NAS) inspires us to search face-appropriate backbone and featrue pyramid network (FPN) architecture.Firstly, we design the search space for backbone and FPN by comparing performance of feature maps with different backbones and excellent FPN architectures on the face detection. Second, we propose a FPN-attention module to joint search the architecture of backbone and FPN. Finally,we conduct comprehensive experiments on popular bench-marks, including Wider Face, FDDB, AFW and PASCALFace, display the superiority of our proposed method.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"26 1","pages":"13565-13574"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77073294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules 通过内化观察-行动规则实现自动驾驶汽车的明智学习
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00968
Jinkyu Kim, Suhong Moon, Anna Rohrbach, Trevor Darrell, J. Canny
Humans learn to drive through both practice and theory, e.g. by studying the rules, while most self-driving systems are limited to the former. Being able to incorporate human knowledge of typical causal driving behaviour should benefit autonomous systems. We propose a new approach that learns vehicle control with the help of human advice. Specifically, our system learns to summarize its visual observations in natural language, predict an appropriate action response (e.g. "I see a pedestrian crossing, so I stop"), and predict the controls, accordingly. Moreover, to enhance interpretability of our system, we introduce a fine-grained attention mechanism which relies on semantic segmentation and object-centric RoI pooling. We show that our approach of training the autonomous system with human advice, grounded in a rich semantic representation, matches or outperforms prior work in terms of control prediction and explanation generation. Our approach also results in more interpretable visual explanations by visualizing object-centric attention maps. Code is available at https://github.com/JinkyuKimUCB/advisable-driving.
人类通过实践和理论来学习驾驶,例如通过研究规则,而大多数自动驾驶系统仅限于前者。能够将人类对典型因果驾驶行为的知识结合起来,应该有利于自动驾驶系统。我们提出了一种新的方法,在人类建议的帮助下学习车辆控制。具体来说,我们的系统学会用自然语言总结其视觉观察,预测适当的动作响应(例如:“我看到一个人行横道,所以我停下来”),并据此预测控制。此外,为了提高系统的可解释性,我们引入了一种依赖于语义分割和以对象为中心的RoI池的细粒度注意力机制。我们表明,基于丰富的语义表示,我们用人类建议训练自主系统的方法,在控制预测和解释生成方面匹配或优于先前的工作。通过可视化以对象为中心的注意图,我们的方法也产生了更多可解释的视觉解释。代码可从https://github.com/JinkyuKimUCB/advisable-driving获得。
{"title":"Advisable Learning for Self-Driving Vehicles by Internalizing Observation-to-Action Rules","authors":"Jinkyu Kim, Suhong Moon, Anna Rohrbach, Trevor Darrell, J. Canny","doi":"10.1109/cvpr42600.2020.00968","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00968","url":null,"abstract":"Humans learn to drive through both practice and theory, e.g. by studying the rules, while most self-driving systems are limited to the former. Being able to incorporate human knowledge of typical causal driving behaviour should benefit autonomous systems. We propose a new approach that learns vehicle control with the help of human advice. Specifically, our system learns to summarize its visual observations in natural language, predict an appropriate action response (e.g. \"I see a pedestrian crossing, so I stop\"), and predict the controls, accordingly. Moreover, to enhance interpretability of our system, we introduce a fine-grained attention mechanism which relies on semantic segmentation and object-centric RoI pooling. We show that our approach of training the autonomous system with human advice, grounded in a rich semantic representation, matches or outperforms prior work in terms of control prediction and explanation generation. Our approach also results in more interpretable visual explanations by visualizing object-centric attention maps. Code is available at https://github.com/JinkyuKimUCB/advisable-driving.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"139 1","pages":"9658-9667"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75942882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Moving in the Right Direction: A Regularization for Deep Metric Learning 向正确的方向前进:深度度量学习的正则化
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.01460
D. Mohan, Nishant Sankaran, Dennis Fedorishin, S. Setlur, V. Govindaraju
Deep metric learning leverages carefully designed sampling strategies and loss functions that aid in optimizing the generation of a discriminable embedding space. While effective sampling of pairs is critical for shaping the metric space during training, the relative interactions between pairs, and consequently the forces exerted on these pairs that direct their displacement in the embedding space can significantly impact the formation of well separated clusters. In this work, we identify a shortcoming of existing loss formulations which fail to consider more optimal directions of pair displacements as another criterion for optimization. We propose a novel direction regularization to explicitly account for the layout of sampled pairs and attempt to introduce orthogonality in the representations. The proposed regularization is easily integrated into existing loss functions providing considerable performance improvements. We experimentally validate our hypothesis on the Cars-196, CUB-200 and InShop datasets and outperform existing methods to yield state-of-the-art results on these datasets.
深度度量学习利用精心设计的采样策略和损失函数,帮助优化可判别嵌入空间的生成。虽然在训练过程中对成对的有效采样对于形成度量空间至关重要,但对之间的相对相互作用以及施加在这些对上的力(这些力指导它们在嵌入空间中的位移)可以显著影响分离良好的簇的形成。在这项工作中,我们发现了现有损失公式的一个缺点,即没有考虑更优的对位移方向作为优化的另一个准则。我们提出了一种新的方向正则化来明确地解释采样对的布局,并尝试在表示中引入正交性。提出的正则化很容易集成到现有的损失函数中,提供了相当大的性能改进。我们在Cars-196、CUB-200和InShop数据集上通过实验验证了我们的假设,并超越了现有方法,在这些数据集上产生了最先进的结果。
{"title":"Moving in the Right Direction: A Regularization for Deep Metric Learning","authors":"D. Mohan, Nishant Sankaran, Dennis Fedorishin, S. Setlur, V. Govindaraju","doi":"10.1109/CVPR42600.2020.01460","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.01460","url":null,"abstract":"Deep metric learning leverages carefully designed sampling strategies and loss functions that aid in optimizing the generation of a discriminable embedding space. While effective sampling of pairs is critical for shaping the metric space during training, the relative interactions between pairs, and consequently the forces exerted on these pairs that direct their displacement in the embedding space can significantly impact the formation of well separated clusters. In this work, we identify a shortcoming of existing loss formulations which fail to consider more optimal directions of pair displacements as another criterion for optimization. We propose a novel direction regularization to explicitly account for the layout of sampled pairs and attempt to introduce orthogonality in the representations. The proposed regularization is easily integrated into existing loss functions providing considerable performance improvements. We experimentally validate our hypothesis on the Cars-196, CUB-200 and InShop datasets and outperform existing methods to yield state-of-the-art results on these datasets.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"61 1","pages":"14579-14587"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80283128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Boundary-Aware 3D Building Reconstruction From a Single Overhead Image 基于单幅头顶图像的边界感知三维建筑重建
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00052
Jisan Mahmud, True Price, Akash Bapat, Jan-Michael Frahm
We propose a boundary-aware multi-task deep-learning-based framework for fast 3D building modeling from a single overhead image. Unlike most existing techniques which rely on multiple images for 3D scene modeling, we seek to model the buildings in the scene from a single overhead image by jointly learning a modified signed distance function (SDF) from the building boundaries, a dense heightmap of the scene, and scene semantics. To jointly train for these tasks, we leverage pixel-wise semantic segmentation and normalized digital surface maps (nDSM) as supervision, in addition to labeled building outlines. At test time, buildings in the scene are automatically modeled in 3D using only an input overhead image. We demonstrate an increase in building modeling performance using a multi-feature network architecture that improves building outline detection by considering network features learned for the other jointly learned tasks. We also introduce a novel mechanism for robustly refining instance-specific building outlines using the learned modified SDF. We verify the effectiveness of our method on multiple large-scale satellite and aerial imagery datasets, where we obtain state-of-the-art performance in the 3D building reconstruction task.
我们提出了一种基于边界感知的多任务深度学习框架,用于从单个头顶图像中快速建模3D建筑。与大多数依赖多幅图像进行3D场景建模的现有技术不同,我们寻求通过联合学习来自建筑物边界的修改符号距离函数(SDF)、场景的密集高度图和场景语义,从单个架空图像中建模场景中的建筑物。为了联合训练这些任务,除了标记建筑轮廓外,我们还利用逐像素语义分割和规范化数字表面地图(nDSM)作为监督。在测试时,场景中的建筑物仅使用输入的头顶图像自动建模为3D。我们演示了使用多特征网络架构来提高建筑物建模性能,该架构通过考虑为其他联合学习任务学习的网络特征来改进建筑物轮廓检测。我们还引入了一种新机制,使用学习到的改进的SDF稳健地精炼特定于实例的建筑轮廓。我们在多个大型卫星和航空图像数据集上验证了我们方法的有效性,在这些数据集上,我们在3D建筑重建任务中获得了最先进的性能。
{"title":"Boundary-Aware 3D Building Reconstruction From a Single Overhead Image","authors":"Jisan Mahmud, True Price, Akash Bapat, Jan-Michael Frahm","doi":"10.1109/cvpr42600.2020.00052","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00052","url":null,"abstract":"We propose a boundary-aware multi-task deep-learning-based framework for fast 3D building modeling from a single overhead image. Unlike most existing techniques which rely on multiple images for 3D scene modeling, we seek to model the buildings in the scene from a single overhead image by jointly learning a modified signed distance function (SDF) from the building boundaries, a dense heightmap of the scene, and scene semantics. To jointly train for these tasks, we leverage pixel-wise semantic segmentation and normalized digital surface maps (nDSM) as supervision, in addition to labeled building outlines. At test time, buildings in the scene are automatically modeled in 3D using only an input overhead image. We demonstrate an increase in building modeling performance using a multi-feature network architecture that improves building outline detection by considering network features learned for the other jointly learned tasks. We also introduce a novel mechanism for robustly refining instance-specific building outlines using the learned modified SDF. We verify the effectiveness of our method on multiple large-scale satellite and aerial imagery datasets, where we obtain state-of-the-art performance in the 3D building reconstruction task.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"22 1","pages":"438-448"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81471331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
期刊
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1