首页 > 最新文献

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Recorrupted-to-Recorrupted: Unsupervised Deep Learning for Image Denoising 重构到重构:用于图像去噪的无监督深度学习
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00208
T. Pang, Huan Zheng, Yuhui Quan, Hui Ji
Deep denoiser, the deep network for denoising, has been the focus of the recent development on image denoising. In the last few years, there is an increasing interest in developing unsupervised deep denoisers which only call unorganized noisy images without ground truth for training. Nevertheless, the performance of these unsupervised deep denoisers is not competitive to their supervised counterparts. Aiming at developing a more powerful unsupervised deep denoiser, this paper proposed a data augmentation technique, called recorrupted-to-recorrupted (R2R), to address the overfitting caused by the absence of truth images. For each noisy image, we showed that the cost function defined on the noisy/noisy image pairs constructed by the R2R method is statistically equivalent to its supervised counterpart defined on the noisy/truth image pairs. Extensive experiments showed that the proposed R2R method noticeably outperformed existing unsupervised deep denoisers, and is competitive to representative supervised deep denoisers.
深度去噪,即深度网络去噪,是近年来图像去噪研究的热点。在过去的几年里,人们对开发无监督深度去噪器越来越感兴趣,这种去噪器只调用没有基础真值的无组织噪声图像进行训练。然而,这些无监督深度去噪器的性能与有监督深度去噪器相比并不具有竞争力。为了开发一种更强大的无监督深度去噪器,本文提出了一种数据增强技术,称为重构到重构(R2R),以解决由于缺乏真实图像而导致的过拟合问题。对于每个噪声图像,我们证明了R2R方法在噪声/噪声图像对上定义的代价函数与在噪声/真值图像对上定义的监督对应函数在统计上是等价的。大量实验表明,所提出的R2R方法明显优于现有的无监督深度去噪方法,并且与具有代表性的有监督深度去噪方法具有竞争力。
{"title":"Recorrupted-to-Recorrupted: Unsupervised Deep Learning for Image Denoising","authors":"T. Pang, Huan Zheng, Yuhui Quan, Hui Ji","doi":"10.1109/CVPR46437.2021.00208","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00208","url":null,"abstract":"Deep denoiser, the deep network for denoising, has been the focus of the recent development on image denoising. In the last few years, there is an increasing interest in developing unsupervised deep denoisers which only call unorganized noisy images without ground truth for training. Nevertheless, the performance of these unsupervised deep denoisers is not competitive to their supervised counterparts. Aiming at developing a more powerful unsupervised deep denoiser, this paper proposed a data augmentation technique, called recorrupted-to-recorrupted (R2R), to address the overfitting caused by the absence of truth images. For each noisy image, we showed that the cost function defined on the noisy/noisy image pairs constructed by the R2R method is statistically equivalent to its supervised counterpart defined on the noisy/truth image pairs. Extensive experiments showed that the proposed R2R method noticeably outperformed existing unsupervised deep denoisers, and is competitive to representative supervised deep denoisers.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129432049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 85
Shape from Sky: Polarimetric Normal Recovery Under The Sky 来自天空的形状:天空下的偏振正常恢复
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01459
Tomoki Ichikawa, Matthew Purri, Ryo Kawahara, S. Nobuhara, Kristin J. Dana, K. Nishino
The sky exhibits a unique spatial polarization pattern by scattering the unpolarized sun light. Just like insects use this unique angular pattern to navigate, we use it to map pixels to directions on the sky. That is, we show that the unique polarization pattern encoded in the polarimetric appearance of an object captured under the sky can be decoded to reveal the surface normal at each pixel. We derive a polarimetric reflection model of a diffuse plus mirror surface lit by the sun and a clear sky. This model is used to recover the per-pixel surface normal of an object from a single polarimetric image or from multiple polarimetric images captured under the sky at different times of the day. We experimentally evaluate the accuracy of our shape-from-sky method on a number of real objects of different surface compositions. The results clearly show that this passive approach to fine-geometry recovery that fully leverages the unique illumination made by nature is a viable option for 3D sensing. With the advent of quad-Bayer polarization chips, we believe the implications of our method span a wide range of domains.
天空通过散射未偏振光的太阳而呈现出独特的空间偏振模式。就像昆虫用这种独特的角度模式导航一样,我们用它来映射像素到天空的方向。也就是说,我们展示了在天空下捕获的物体的偏振外观中编码的独特偏振模式可以被解码以揭示每个像素的表面法线。我们推导了在阳光和晴朗天空照射下的漫反射+镜面的偏振反射模型。该模型用于从一天中不同时间天空下捕获的单幅偏振图像或多幅偏振图像中恢复物体的逐像素表面法线。我们通过实验评估了我们的天空形状方法在许多不同表面组成的真实物体上的准确性。结果清楚地表明,这种被动的精细几何恢复方法充分利用了自然产生的独特照明,是3D传感的可行选择。随着四拜耳极化芯片的出现,我们相信我们的方法的含义跨越了广泛的领域。
{"title":"Shape from Sky: Polarimetric Normal Recovery Under The Sky","authors":"Tomoki Ichikawa, Matthew Purri, Ryo Kawahara, S. Nobuhara, Kristin J. Dana, K. Nishino","doi":"10.1109/CVPR46437.2021.01459","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01459","url":null,"abstract":"The sky exhibits a unique spatial polarization pattern by scattering the unpolarized sun light. Just like insects use this unique angular pattern to navigate, we use it to map pixels to directions on the sky. That is, we show that the unique polarization pattern encoded in the polarimetric appearance of an object captured under the sky can be decoded to reveal the surface normal at each pixel. We derive a polarimetric reflection model of a diffuse plus mirror surface lit by the sun and a clear sky. This model is used to recover the per-pixel surface normal of an object from a single polarimetric image or from multiple polarimetric images captured under the sky at different times of the day. We experimentally evaluate the accuracy of our shape-from-sky method on a number of real objects of different surface compositions. The results clearly show that this passive approach to fine-geometry recovery that fully leverages the unique illumination made by nature is a viable option for 3D sensing. With the advent of quad-Bayer polarization chips, we believe the implications of our method span a wide range of domains.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130506634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
LAU-Net: Latitude Adaptive Upscaling Network for Omnidirectional Image Super-resolution 面向全向图像超分辨率的纬度自适应升级网络
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00907
Xin Deng, Hao Wang, Mai Xu, Yichen Guo, Yuhang Song, Li Yang
The omnidirectional images (ODIs) are usually at low-resolution, due to the constraints of collection, storage and transmission. The traditional two-dimensional (2D) image super-resolution methods are not effective for spherical ODIs, because ODIs tend to have non-uniformly distributed pixel density and varying texture complexity across latitudes. In this work, we propose a novel latitude adaptive upscaling network (LAU-Net) for ODI super-resolution, which allows pixels at different latitudes to adopt distinct upscaling factors. Specifically, we introduce a Laplacian multi-level separation architecture to split an ODI into different latitude bands, and hierarchically upscale them with different factors. In addition, we propose a deep reinforcement learning scheme with a latitude adaptive reward, in order to automatically select optimal upscaling factors for different latitude bands. To the best of our knowledge, LAU-Net is the first attempt to consider the latitude difference for ODI super-resolution. Extensive results demonstrate that our LAU-Net significantly advances the super-resolution performance for ODIs. Codes are available at https://github.com/wangh-allen/LAU-Net.
由于采集、存储和传输的限制,全向图像通常分辨率较低。传统的二维(2D)图像超分辨率方法对于球形odi并不有效,因为odi往往具有不均匀分布的像素密度和不同纬度的纹理复杂性。在这项工作中,我们提出了一种新的纬度自适应上尺度网络(law - net)用于ODI超分辨率,它允许不同纬度的像素采用不同的上尺度因子。具体来说,我们引入了一种拉普拉斯多级分离架构,将ODI划分为不同的纬度带,并利用不同的因子对ODI进行分层升级。此外,我们提出了一种具有纬度自适应奖励的深度强化学习方案,以自动选择不同纬度波段的最优升级因子。据我们所知,law - net是第一次尝试考虑ODI超分辨率的纬度差异。广泛的结果表明,我们的劳网显著提高了odi的超分辨率性能。代码可在https://github.com/wangh-allen/LAU-Net上获得。
{"title":"LAU-Net: Latitude Adaptive Upscaling Network for Omnidirectional Image Super-resolution","authors":"Xin Deng, Hao Wang, Mai Xu, Yichen Guo, Yuhang Song, Li Yang","doi":"10.1109/CVPR46437.2021.00907","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00907","url":null,"abstract":"The omnidirectional images (ODIs) are usually at low-resolution, due to the constraints of collection, storage and transmission. The traditional two-dimensional (2D) image super-resolution methods are not effective for spherical ODIs, because ODIs tend to have non-uniformly distributed pixel density and varying texture complexity across latitudes. In this work, we propose a novel latitude adaptive upscaling network (LAU-Net) for ODI super-resolution, which allows pixels at different latitudes to adopt distinct upscaling factors. Specifically, we introduce a Laplacian multi-level separation architecture to split an ODI into different latitude bands, and hierarchically upscale them with different factors. In addition, we propose a deep reinforcement learning scheme with a latitude adaptive reward, in order to automatically select optimal upscaling factors for different latitude bands. To the best of our knowledge, LAU-Net is the first attempt to consider the latitude difference for ODI super-resolution. Extensive results demonstrate that our LAU-Net significantly advances the super-resolution performance for ODIs. Codes are available at https://github.com/wangh-allen/LAU-Net.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126797909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation CoCosNet v2:图像翻译的全分辨率对应学习
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01130
Xingran Zhou, Bo Zhang, Ting Zhang, Pan Zhang, Jianmin Bao, Dong Chen, Zhongfei Zhang, Fang Wen
We present the full-resolution correspondence learning for cross-domain images, which aids image translation. We adopt a hierarchical strategy that uses the correspondence from coarse level to guide the fine levels. At each hierarchy, the correspondence can be efficiently computed via PatchMatch that iteratively leverages the matchings from the neighborhood. Within each PatchMatch iteration, the ConvGRU module is employed to refine the current correspondence considering not only the matchings of larger context but also the historic estimates. The proposed Co-CosNet v2, a GRU-assisted PatchMatch approach, is fully differentiable and highly efficient. When jointly trained with image translation, full-resolution semantic correspondence can be established in an unsupervised manner, which in turn facilitates the exemplar-based image translation. Experiments on diverse translation tasks show that CoCosNet v2 performs considerably better than state-of-the-art literature on producing high-resolution images.
提出了一种跨域图像的全分辨率对应学习方法,用于图像翻译。我们采用了一种分层策略,使用粗层的对应关系来指导细层。在每个层次中,通过PatchMatch迭代地利用邻居的匹配,可以有效地计算对应关系。在每次PatchMatch迭代中,ConvGRU模块不仅考虑更大上下文的匹配,而且考虑历史估计,对当前对应关系进行细化。提出的Co-CosNet v2是一种gru辅助的PatchMatch方法,具有完全可微性和高效率。当与图像翻译联合训练时,可以以无监督的方式建立全分辨率语义对应关系,从而促进基于样本的图像翻译。在不同翻译任务上的实验表明,CoCosNet v2在生成高分辨率图像方面的表现要比目前最先进的文献好得多。
{"title":"CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation","authors":"Xingran Zhou, Bo Zhang, Ting Zhang, Pan Zhang, Jianmin Bao, Dong Chen, Zhongfei Zhang, Fang Wen","doi":"10.1109/CVPR46437.2021.01130","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01130","url":null,"abstract":"We present the full-resolution correspondence learning for cross-domain images, which aids image translation. We adopt a hierarchical strategy that uses the correspondence from coarse level to guide the fine levels. At each hierarchy, the correspondence can be efficiently computed via PatchMatch that iteratively leverages the matchings from the neighborhood. Within each PatchMatch iteration, the ConvGRU module is employed to refine the current correspondence considering not only the matchings of larger context but also the historic estimates. The proposed Co-CosNet v2, a GRU-assisted PatchMatch approach, is fully differentiable and highly efficient. When jointly trained with image translation, full-resolution semantic correspondence can be established in an unsupervised manner, which in turn facilitates the exemplar-based image translation. Experiments on diverse translation tasks show that CoCosNet v2 performs considerably better than state-of-the-art literature on producing high-resolution images.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126984356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
StruMonoNet: Structure-Aware Monocular 3D Prediction StruMonoNet:结构感知单目3D预测
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00733
Zhenpei Yang, Erran L. Li, Qi-Xing Huang
Monocular 3D prediction is one of the fundamental problems in 3D vision. Recent deep learning-based approaches have brought us exciting progress on this problem. However, existing approaches have predominantly focused on end-to-end depth and normal predictions, which do not fully utilize the underlying 3D environment’s geometric structures. This paper introduces StruMonoNet, which detects and enforces a planar structure to enhance pixel-wise predictions. StruMonoNet innovates in leveraging a hybrid representation that combines visual feature and a surfel representation for plane prediction. This formulation allows us to combine the power of visual feature learning and the flexibility of geometric representations in incorporating geometric relations. As a result, StruMonoNet can detect relations between planes such as adjacent planes, perpendicular planes, and parallel planes, all of which are beneficial for dense 3D prediction. Experimental results show that StruMonoNet considerably outperforms state-of-the-art approaches on NYUv2 and ScanNet.
单目三维预测是三维视觉的基本问题之一。最近基于深度学习的方法在这个问题上给我们带来了令人兴奋的进展。然而,现有的方法主要集中在端到端深度和正常预测上,这并没有充分利用底层3D环境的几何结构。本文介绍了StruMonoNet,它检测和执行平面结构来增强逐像素预测。StruMonoNet在利用混合表示方面进行了创新,该表示将视觉特征和冲浪表示相结合,用于平面预测。这个公式使我们能够结合视觉特征学习的力量和几何表示的灵活性来结合几何关系。因此,StruMonoNet可以检测平面之间的关系,例如相邻平面,垂直平面和平行平面,这些都有利于密集的3D预测。实验结果表明,StruMonoNet在NYUv2和ScanNet上的性能明显优于最先进的方法。
{"title":"StruMonoNet: Structure-Aware Monocular 3D Prediction","authors":"Zhenpei Yang, Erran L. Li, Qi-Xing Huang","doi":"10.1109/CVPR46437.2021.00733","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00733","url":null,"abstract":"Monocular 3D prediction is one of the fundamental problems in 3D vision. Recent deep learning-based approaches have brought us exciting progress on this problem. However, existing approaches have predominantly focused on end-to-end depth and normal predictions, which do not fully utilize the underlying 3D environment’s geometric structures. This paper introduces StruMonoNet, which detects and enforces a planar structure to enhance pixel-wise predictions. StruMonoNet innovates in leveraging a hybrid representation that combines visual feature and a surfel representation for plane prediction. This formulation allows us to combine the power of visual feature learning and the flexibility of geometric representations in incorporating geometric relations. As a result, StruMonoNet can detect relations between planes such as adjacent planes, perpendicular planes, and parallel planes, all of which are beneficial for dense 3D prediction. Experimental results show that StruMonoNet considerably outperforms state-of-the-art approaches on NYUv2 and ScanNet.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123805986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Multi-Target Domain Adaptation with Collaborative Consistency Learning 基于协同一致性学习的多目标领域自适应
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00809
Takashi Isobe, Xu Jia, Shuaijun Chen, Jianzhong He, Yongjie Shi, Jian-zhuo Liu, Huchuan Lu, Shengjin Wang
Recently unsupervised domain adaptation for the semantic segmentation task has become more and more popular due to high-cost of pixel-level annotation on real-world images. However, most domain adaptation methods are only restricted to single-source-single-target pair, and can not be directly extended to multiple target domains. In this work, we propose a collaborative learning framework to achieve unsupervised multi-target domain adaptation. An unsupervised domain adaptation expert model is first trained for each source-target pair and is further encouraged to collaborate with each other through a bridge built between different target domains. These expert models are further improved by adding the regularization of making the consistent pixel-wise prediction for each sample with the same structured context. To obtain a single model that works across multiple target domains, we propose to simultaneously learn a student model which is trained to not only imitate the output of each expert on the corresponding target domain, but also to pull different expert close to each other with regularization on their weights. Extensive experiments demonstrate that the proposed method can effectively exploit rich structured information contained in both labeled source domain and multiple unlabeled target domains. Not only does it perform well across multiple target domains but also performs favorably against state-of-the-art unsupervised domain adaptation methods specially trained on a single source-target pair. Code is available at https://github.com/junpan19/MTDA.
近年来,由于对真实图像进行像素级标注的成本较高,无监督域自适应语义分割越来越受到人们的欢迎。然而,大多数域自适应方法仅局限于单源-单目标对,不能直接扩展到多个目标域。在这项工作中,我们提出了一个协作学习框架来实现无监督的多目标域自适应。首先针对每个源-目标对训练一个无监督域自适应专家模型,并通过在不同目标域之间建立桥梁来鼓励彼此协作。通过对具有相同结构上下文的每个样本进行一致的逐像素预测的正则化,进一步改进了这些专家模型。为了获得一个跨多个目标领域的单一模型,我们提出同时学习一个学生模型,该模型不仅可以模仿每个专家在相应目标领域的输出,而且可以通过正则化权值将不同的专家相互拉近。大量的实验表明,该方法可以有效地挖掘包含在标记的源域和多个未标记的目标域中的丰富结构化信息。它不仅在多个目标域上表现良好,而且与专门在单个源-目标对上训练的最先进的无监督域自适应方法相比也表现良好。代码可从https://github.com/junpan19/MTDA获得。
{"title":"Multi-Target Domain Adaptation with Collaborative Consistency Learning","authors":"Takashi Isobe, Xu Jia, Shuaijun Chen, Jianzhong He, Yongjie Shi, Jian-zhuo Liu, Huchuan Lu, Shengjin Wang","doi":"10.1109/CVPR46437.2021.00809","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00809","url":null,"abstract":"Recently unsupervised domain adaptation for the semantic segmentation task has become more and more popular due to high-cost of pixel-level annotation on real-world images. However, most domain adaptation methods are only restricted to single-source-single-target pair, and can not be directly extended to multiple target domains. In this work, we propose a collaborative learning framework to achieve unsupervised multi-target domain adaptation. An unsupervised domain adaptation expert model is first trained for each source-target pair and is further encouraged to collaborate with each other through a bridge built between different target domains. These expert models are further improved by adding the regularization of making the consistent pixel-wise prediction for each sample with the same structured context. To obtain a single model that works across multiple target domains, we propose to simultaneously learn a student model which is trained to not only imitate the output of each expert on the corresponding target domain, but also to pull different expert close to each other with regularization on their weights. Extensive experiments demonstrate that the proposed method can effectively exploit rich structured information contained in both labeled source domain and multiple unlabeled target domains. Not only does it perform well across multiple target domains but also performs favorably against state-of-the-art unsupervised domain adaptation methods specially trained on a single source-target pair. Code is available at https://github.com/junpan19/MTDA.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124224244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Deep Compositional Metric Learning 深度作曲度量学习
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00920
Wenzhao Zheng, Chengkun Wang, Jiwen Lu, Jie Zhou
In this paper, we propose a deep compositional metric learning (DCML) framework for effective and generalizable similarity measurement between images. Conventional deep metric learning methods minimize a discriminative loss to enlarge interclass distances while suppressing intraclass variations, which might lead to inferior generalization performance since samples even from the same class may present diverse characteristics. This motivates the adoption of the ensemble technique to learn a number of sub-embeddings using different and diverse subtasks. However, most subtasks impose weaker or contradictory constraints, which essentially sacrifices the discrimination ability of each sub-embedding to improve the generalization ability of their combination. To achieve a better generalization ability without compromising, we propose to separate the sub-embeddings from direct supervisions from the subtasks and apply the losses on different composites of the sub-embeddings. We employ a set of learnable compositors to combine the sub-embeddings and use a self-reinforced loss to train the compositors, which serve as relays to distribute the diverse training signals to avoid destroying the discrimination ability. Experimental results on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate the superior performance of our framework.1
在本文中,我们提出了一种深度组合度量学习(DCML)框架,用于有效和一般化的图像之间的相似性度量。传统的深度度量学习方法将判别损失最小化,以扩大类间距离,同时抑制类内变化,这可能导致较差的泛化性能,因为即使来自同一类的样本也可能呈现不同的特征。这促使采用集成技术来学习使用不同和不同子任务的许多子嵌入。然而,大多数子任务施加的约束较弱或相互矛盾,这实际上牺牲了每个子嵌入的识别能力,以提高它们组合的泛化能力。为了在不影响泛化能力的前提下获得更好的泛化能力,我们提出将子嵌入与子任务的直接监督分离,并将损失应用于子嵌入的不同组合。我们使用一组可学习的排字器来组合子嵌入,并使用自增强损失来训练排字器,排字器作为继电器来分配不同的训练信号,以避免破坏识别能力。在CUB-200-2011、Cars196和斯坦福在线产品数据集上的实验结果表明,我们的框架具有优越的性能
{"title":"Deep Compositional Metric Learning","authors":"Wenzhao Zheng, Chengkun Wang, Jiwen Lu, Jie Zhou","doi":"10.1109/CVPR46437.2021.00920","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00920","url":null,"abstract":"In this paper, we propose a deep compositional metric learning (DCML) framework for effective and generalizable similarity measurement between images. Conventional deep metric learning methods minimize a discriminative loss to enlarge interclass distances while suppressing intraclass variations, which might lead to inferior generalization performance since samples even from the same class may present diverse characteristics. This motivates the adoption of the ensemble technique to learn a number of sub-embeddings using different and diverse subtasks. However, most subtasks impose weaker or contradictory constraints, which essentially sacrifices the discrimination ability of each sub-embedding to improve the generalization ability of their combination. To achieve a better generalization ability without compromising, we propose to separate the sub-embeddings from direct supervisions from the subtasks and apply the losses on different composites of the sub-embeddings. We employ a set of learnable compositors to combine the sub-embeddings and use a self-reinforced loss to train the compositors, which serve as relays to distribute the diverse training signals to avoid destroying the discrimination ability. Experimental results on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate the superior performance of our framework.1","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121563167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Self-SAGCN: Self-Supervised Semantic Alignment for Graph Convolution Network 图卷积网络的自监督语义对齐
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01650
Xu Yang, Cheng Deng, Zhiyuan Dang, Kun-Juan Wei, Junchi Yan
Graph convolution networks (GCNs) are a powerful deep learning approach and have been successfully applied to representation learning on graphs in a variety of real-world applications. Despite their success, two fundamental weaknesses of GCNs limit their ability to represent graph-structured data: poor performance when labeled data are severely scarce and indistinguishable features when more layers are stacked. In this paper, we propose a simple yet effective Self-Supervised Semantic Alignment Graph Convolution Network (SelfSAGCN), which consists of two crux techniques: Identity Aggregation and Semantic Alignment, to overcome these weaknesses. The behind basic idea is the node features in the same class but learned from semantic and graph structural aspects respectively, are expected to be mapped nearby. Specifically, the Identity Aggregation is applied to extract semantic features from labeled nodes, the Semantic Alignment is utilized to align node features obtained from different aspects using the class central similarity. In this way, the over-smoothing phenomenon is alleviated, while the similarities between the unlabeled features and labeled ones from the same class are enhanced. Experimental results on five popular datasets show that the proposed SelfSAGCN outperforms state-of-the-art methods on various classification tasks.
图卷积网络(GCNs)是一种强大的深度学习方法,已经成功地应用于各种实际应用中的图表示学习。尽管它们取得了成功,但GCNs的两个基本弱点限制了它们表示图结构数据的能力:当标记数据严重稀缺时,性能很差;当多层堆叠时,无法区分特征。本文提出了一种简单而有效的自监督语义对齐图卷积网络(SelfSAGCN),该网络由身份聚合和语义对齐两个关键技术组成。其背后的基本思想是同一类的节点特征,但分别从语义和图结构方面学习,期望映射到附近。其中,使用身份聚合方法从标记节点中提取语义特征,使用语义对齐方法利用类中心相似度对不同方面获得的节点特征进行对齐。这样既减轻了过度平滑现象,又增强了未标记特征与同类别标记特征之间的相似性。在五个流行数据集上的实验结果表明,所提出的SelfSAGCN在各种分类任务上优于最先进的方法。
{"title":"Self-SAGCN: Self-Supervised Semantic Alignment for Graph Convolution Network","authors":"Xu Yang, Cheng Deng, Zhiyuan Dang, Kun-Juan Wei, Junchi Yan","doi":"10.1109/CVPR46437.2021.01650","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01650","url":null,"abstract":"Graph convolution networks (GCNs) are a powerful deep learning approach and have been successfully applied to representation learning on graphs in a variety of real-world applications. Despite their success, two fundamental weaknesses of GCNs limit their ability to represent graph-structured data: poor performance when labeled data are severely scarce and indistinguishable features when more layers are stacked. In this paper, we propose a simple yet effective Self-Supervised Semantic Alignment Graph Convolution Network (SelfSAGCN), which consists of two crux techniques: Identity Aggregation and Semantic Alignment, to overcome these weaknesses. The behind basic idea is the node features in the same class but learned from semantic and graph structural aspects respectively, are expected to be mapped nearby. Specifically, the Identity Aggregation is applied to extract semantic features from labeled nodes, the Semantic Alignment is utilized to align node features obtained from different aspects using the class central similarity. In this way, the over-smoothing phenomenon is alleviated, while the similarities between the unlabeled features and labeled ones from the same class are enhanced. Experimental results on five popular datasets show that the proposed SelfSAGCN outperforms state-of-the-art methods on various classification tasks.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127704018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Unsupervised Hyperbolic Metric Learning 无监督双曲度量学习
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01228
Jiexi Yan, Lei Luo, Cheng Deng, Heng Huang
Learning feature embedding directly from images without any human supervision is a very challenging and essential task in the field of computer vision and machine learning. Following the paradigm in supervised manner, most existing unsupervised metric learning approaches mainly focus on binary similarity in Euclidean space. However, these methods cannot achieve promising performance in many practical applications, where the manual information is lacking and data exhibits non-Euclidean latent anatomy. To address this limitation, we propose an Unsupervised Hyperbolic Metric Learning method with Hierarchical Similarity. It considers the natural hierarchies of data by taking advantage of Hyperbolic metric learning and hierarchical clustering, which can effectively excavate richer similarity information beyond binary in modeling. More importantly, we design a new loss function to capture the hierarchical similarity among samples to enhance the stability of the proposed method. Extensive experimental results on benchmark datasets demonstrate that our method achieves state-of-the-art performance compared with current unsupervised deep metric learning approaches.
在计算机视觉和机器学习领域,直接从图像中学习特征嵌入是一项非常具有挑战性和必要的任务。现有的无监督度量学习方法大多遵循有监督的范式,主要关注欧几里得空间中的二元相似度。然而,这些方法在许多实际应用中,由于缺乏手工信息和数据呈现非欧几里得潜解剖,不能达到令人满意的性能。为了解决这一限制,我们提出了一种具有层次相似性的无监督双曲度量学习方法。利用双曲度量学习和层次聚类,考虑数据的自然层次,在建模中可以有效挖掘出比二值化更丰富的相似信息。更重要的是,我们设计了一个新的损失函数来捕获样本之间的层次相似性,以提高所提方法的稳定性。在基准数据集上的大量实验结果表明,与当前的无监督深度度量学习方法相比,我们的方法达到了最先进的性能。
{"title":"Unsupervised Hyperbolic Metric Learning","authors":"Jiexi Yan, Lei Luo, Cheng Deng, Heng Huang","doi":"10.1109/CVPR46437.2021.01228","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01228","url":null,"abstract":"Learning feature embedding directly from images without any human supervision is a very challenging and essential task in the field of computer vision and machine learning. Following the paradigm in supervised manner, most existing unsupervised metric learning approaches mainly focus on binary similarity in Euclidean space. However, these methods cannot achieve promising performance in many practical applications, where the manual information is lacking and data exhibits non-Euclidean latent anatomy. To address this limitation, we propose an Unsupervised Hyperbolic Metric Learning method with Hierarchical Similarity. It considers the natural hierarchies of data by taking advantage of Hyperbolic metric learning and hierarchical clustering, which can effectively excavate richer similarity information beyond binary in modeling. More importantly, we design a new loss function to capture the hierarchical similarity among samples to enhance the stability of the proposed method. Extensive experimental results on benchmark datasets demonstrate that our method achieves state-of-the-art performance compared with current unsupervised deep metric learning approaches.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127725018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Zillow Indoor Dataset: Annotated Floor Plans With 360° Panoramas and 3D Room Layouts Zillow室内数据集:带有360°全景和3D房间布局的注释平面图
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00217
S. Cruz, Will Hutchcroft, Yuguang Li, Naji Khosravan, Ivaylo Boyadzhiev, S. B. Kang
We present Zillow Indoor Dataset (ZInD): A large indoor dataset with 71,474 panoramas from 1,524 real unfurnished homes. ZInD provides annotations of 3D room layouts, 2D and 3D floor plans, panorama location in the floor plan, and locations of windows and doors. The ground truth construction took over 1,500 hours of annotation work. To the best of our knowledge, ZInD is the largest real dataset with layout annotations. A unique property is the room layout data, which follows a real world distribution (cuboid, more general Manhattan, and non-Manhattan layouts) as opposed to the mostly cuboid or Manhattan layouts in current publicly available datasets. Also, the scale and annotations provided are valuable for effective research related to room layout and floor plan analysis. To demonstrate ZInD’s benefits, we benchmark on room layout estimation from single panoramas and multi-view registration.
我们展示了Zillow室内数据集(ZInD):一个大型室内数据集,包含来自1,524个真实无家具房屋的71,474幅全景图。ZInD提供3D房间布局、2D和3D平面图、平面图中的全景位置以及门窗位置的注释。地面真相的构建花费了超过1500小时的注释工作。据我们所知,ZInD是最大的具有布局注释的真实数据集。一个独特的属性是房间布局数据,它遵循现实世界的分布(长方体、更一般的曼哈顿和非曼哈顿布局),而不是当前公开可用数据集中的大多数长方体或曼哈顿布局。此外,所提供的尺度和注释对于与房间布局和平面图分析相关的有效研究是有价值的。为了证明ZInD的优势,我们对单视图和多视图配准的房间布局估计进行了基准测试。
{"title":"Zillow Indoor Dataset: Annotated Floor Plans With 360° Panoramas and 3D Room Layouts","authors":"S. Cruz, Will Hutchcroft, Yuguang Li, Naji Khosravan, Ivaylo Boyadzhiev, S. B. Kang","doi":"10.1109/CVPR46437.2021.00217","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00217","url":null,"abstract":"We present Zillow Indoor Dataset (ZInD): A large indoor dataset with 71,474 panoramas from 1,524 real unfurnished homes. ZInD provides annotations of 3D room layouts, 2D and 3D floor plans, panorama location in the floor plan, and locations of windows and doors. The ground truth construction took over 1,500 hours of annotation work. To the best of our knowledge, ZInD is the largest real dataset with layout annotations. A unique property is the room layout data, which follows a real world distribution (cuboid, more general Manhattan, and non-Manhattan layouts) as opposed to the mostly cuboid or Manhattan layouts in current publicly available datasets. Also, the scale and annotations provided are valuable for effective research related to room layout and floor plan analysis. To demonstrate ZInD’s benefits, we benchmark on room layout estimation from single panoramas and multi-view registration.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127874173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
期刊
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1