首页 > 最新文献

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Maintaining Reasoning Consistency in Compositional Visual Question Answering 在作文式视觉问答中保持推理一致性
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00504
Chenchen Jing, Yunde Jia, Yuwei Wu, Xinyu Liu, Qi Wu
A compositional question refers to a question that contains multiple visual concepts (e.g., objects, attributes, and relationships) and requires compositional reasoning to answer. Existing VQA models can answer a compositional question well, but cannot work well in terms of reasoning consistency in answering the compositional question and its sub-questions. For example, a compositional question for an image is: “Are there any elephants to the right of the white bird?” and one of its sub-questions is “Is any bird visible in the scene?”. The models may answer “yes” to the compositional question, but “no” to the sub-question. This paper presents a dialog-like reasoning method for maintaining reasoning consistency in answering a compositional question and its sub-questions. Our method integrates the reasoning processes for the sub-questions into the reasoning process for the compositional question like a dialog task, and uses a consistency constraint to penalize inconsistent answer predictions. In order to enable quantitative evaluation of reasoning consistency, we construct a GQA-Sub dataset based on the well-organized GQA dataset. Experimental results on the GQA dataset and the GQA-Sub dataset demonstrate the effectiveness of our method.
组合问题是指包含多个视觉概念(例如,对象、属性和关系)并需要组合推理来回答的问题。现有的VQA模型可以很好地回答组合问题,但在回答组合问题及其子问题的推理一致性方面表现不佳。例如,一张图片的构图问题是:“白鸟的右边有大象吗?”,其中一个子问题是“场景中有鸟吗?”模型可能对组合问题回答“是”,但对子问题回答“否”。本文提出了一种类似对话的推理方法,用于在回答组合题及其子题时保持推理一致性。我们的方法将子问题的推理过程集成到组合问题(如对话任务)的推理过程中,并使用一致性约束来惩罚不一致的答案预测。为了实现推理一致性的定量评估,我们在组织良好的GQA数据集的基础上构建了一个GQA- sub数据集。在GQA数据集和GQA- sub数据集上的实验结果证明了该方法的有效性。
{"title":"Maintaining Reasoning Consistency in Compositional Visual Question Answering","authors":"Chenchen Jing, Yunde Jia, Yuwei Wu, Xinyu Liu, Qi Wu","doi":"10.1109/CVPR52688.2022.00504","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00504","url":null,"abstract":"A compositional question refers to a question that contains multiple visual concepts (e.g., objects, attributes, and relationships) and requires compositional reasoning to answer. Existing VQA models can answer a compositional question well, but cannot work well in terms of reasoning consistency in answering the compositional question and its sub-questions. For example, a compositional question for an image is: “Are there any elephants to the right of the white bird?” and one of its sub-questions is “Is any bird visible in the scene?”. The models may answer “yes” to the compositional question, but “no” to the sub-question. This paper presents a dialog-like reasoning method for maintaining reasoning consistency in answering a compositional question and its sub-questions. Our method integrates the reasoning processes for the sub-questions into the reasoning process for the compositional question like a dialog task, and uses a consistency constraint to penalize inconsistent answer predictions. In order to enable quantitative evaluation of reasoning consistency, we construct a GQA-Sub dataset based on the well-organized GQA dataset. Experimental results on the GQA dataset and the GQA-Sub dataset demonstrate the effectiveness of our method.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129234564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive Regularization 基于混合对比正则化的弱监督三维点云语义分割
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01451
Mengtian Li, Yuan Xie, Yunhang Shen, Bo Ke, Ruizhi Qiao, Bohan Ren, Shaohui Lin, Lizhuang Ma
To address the huge labeling cost in large-scale point cloud semantic segmentation, we propose a novel hybrid contrastive regularization (HybridCR) framework in weakly-supervised setting, which obtains competitive performance compared to its fully-supervised counterpart. Specifically, HybridCR is the first framework to leverage both point consistency and employ contrastive regularization with pseudo labeling in an end-to-end manner. Fundamentally, HybridCR explicitly and effectively considers the semantic similarity between local neighboring points and global characteristics of 3D classes. We further design a dynamic point cloud augmentor to generate diversity and robust sample views, whose transformation parameter is jointly optimized with model training. Through extensive experiments, HybridCR achieves significant performance improvement against the SOTA methods on both indoor and outdoor datasets, e.g., S3DIS, ScanNet-V2, Semantic3D, and SemanticKITTI.
为了解决大规模点云语义分割中巨大的标注成本问题,我们提出了一种新的弱监督环境下的混合对比正则化(HybridCR)框架,该框架与完全监督环境下的混合对比正则化框架相比具有竞争力。具体来说,HybridCR是第一个以端到端方式利用点一致性和使用带有伪标记的对比正则化的框架。从根本上说,HybridCR明确有效地考虑了局部相邻点之间的语义相似性和3D类的全局特征。我们进一步设计了一个动态点云增强器来生成多样性和鲁棒性的样本视图,并将其转换参数与模型训练相结合进行优化。通过大量的实验,HybridCR在室内和室外数据集(如S3DIS、ScanNet-V2、Semantic3D和SemanticKITTI)上都比SOTA方法取得了显著的性能提升。
{"title":"HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive Regularization","authors":"Mengtian Li, Yuan Xie, Yunhang Shen, Bo Ke, Ruizhi Qiao, Bohan Ren, Shaohui Lin, Lizhuang Ma","doi":"10.1109/CVPR52688.2022.01451","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01451","url":null,"abstract":"To address the huge labeling cost in large-scale point cloud semantic segmentation, we propose a novel hybrid contrastive regularization (HybridCR) framework in weakly-supervised setting, which obtains competitive performance compared to its fully-supervised counterpart. Specifically, HybridCR is the first framework to leverage both point consistency and employ contrastive regularization with pseudo labeling in an end-to-end manner. Fundamentally, HybridCR explicitly and effectively considers the semantic similarity between local neighboring points and global characteristics of 3D classes. We further design a dynamic point cloud augmentor to generate diversity and robust sample views, whose transformation parameter is jointly optimized with model training. Through extensive experiments, HybridCR achieves significant performance improvement against the SOTA methods on both indoor and outdoor datasets, e.g., S3DIS, ScanNet-V2, Semantic3D, and SemanticKITTI.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130583143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Occlusion-robust Face Alignment using A Viewpoint-invariant Hierarchical Network Architecture 基于视点不变的分层网络结构的抗遮挡人脸对齐
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01083
Congcong Zhu, Xintong Wan, Shaorong Xie, Xiaoqiang Li, Yinzheng Gu
The occlusion problem heavily degrades the localization performance of face alignment. Most current solutions for this problem focus on annotating new occlusion data, introducing boundary estimation, and stacking deeper models to improve the robustness of neural networks. However, the performance degradation of models remains under extreme occlusion (i.e. average occlusion of over 50%) because of missing a large amount of facial context information. We argue that exploring neural networks to model the facial hierarchies is a more promising method for dealing with extreme occlusion. Surprisingly, in recent studies, little effort has been devoted to representing the facial hierarchies using neural networks. This paper proposes a new network architecture called GlomFace to model the facial hierarchies against various occlusions, which draws inspiration from the viewpoint-invariant hierarchy of facial structure. Specifically, GlomFace is functionally divided into two modules: the part-whole hierarchical module and the whole-part hierarchical module. The former captures the part-whole hierarchical dependencies of facial parts to suppress multi-scale occlusion information, whereas the latter injects structural reasoning into neural networks by building the whole-part hierarchical relations among facial parts. As a result, GlomFace has a clear topological interpretation due to its correspondence to the facial hierarchies. Extensive experimental results indicate that the proposed GlomFace performs comparably to existing state-of-the-art methods, especially in cases of extreme occlusion. Models are available at https://github.com/zhuccly/GlomFace-Face-Alignment.
遮挡问题严重影响了人脸对齐的定位性能。目前该问题的大多数解决方案都集中在注释新的遮挡数据,引入边界估计和堆叠更深的模型以提高神经网络的鲁棒性。然而,由于缺少大量的面部上下文信息,在极端遮挡(即平均遮挡超过50%)下,模型的性能下降仍然存在。我们认为,探索神经网络来模拟面部层次是一种更有前途的方法来处理极端遮挡。令人惊讶的是,在最近的研究中,很少有人致力于用神经网络来表示面部层次。本文从人脸结构的视点不变层次结构中得到启发,提出了一种新的网络结构——GlomFace来对不同遮挡下的人脸层次结构进行建模。具体来说,GlomFace在功能上分为两个模块:部分-整体分层模块和整体-部分分层模块。前者捕获面部部分-整体的层次依赖关系,抑制多尺度遮挡信息;后者通过构建面部部分之间的整体-部分层次关系,在神经网络中注入结构推理。因此,GlomFace具有清晰的拓扑解释,因为它对应于面部层次结构。大量的实验结果表明,所提出的GlomFace与现有的最先进的方法相比,特别是在极端闭塞的情况下。模型可在https://github.com/zhuccly/GlomFace-Face-Alignment上获得。
{"title":"Occlusion-robust Face Alignment using A Viewpoint-invariant Hierarchical Network Architecture","authors":"Congcong Zhu, Xintong Wan, Shaorong Xie, Xiaoqiang Li, Yinzheng Gu","doi":"10.1109/CVPR52688.2022.01083","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01083","url":null,"abstract":"The occlusion problem heavily degrades the localization performance of face alignment. Most current solutions for this problem focus on annotating new occlusion data, introducing boundary estimation, and stacking deeper models to improve the robustness of neural networks. However, the performance degradation of models remains under extreme occlusion (i.e. average occlusion of over 50%) because of missing a large amount of facial context information. We argue that exploring neural networks to model the facial hierarchies is a more promising method for dealing with extreme occlusion. Surprisingly, in recent studies, little effort has been devoted to representing the facial hierarchies using neural networks. This paper proposes a new network architecture called GlomFace to model the facial hierarchies against various occlusions, which draws inspiration from the viewpoint-invariant hierarchy of facial structure. Specifically, GlomFace is functionally divided into two modules: the part-whole hierarchical module and the whole-part hierarchical module. The former captures the part-whole hierarchical dependencies of facial parts to suppress multi-scale occlusion information, whereas the latter injects structural reasoning into neural networks by building the whole-part hierarchical relations among facial parts. As a result, GlomFace has a clear topological interpretation due to its correspondence to the facial hierarchies. Extensive experimental results indicate that the proposed GlomFace performs comparably to existing state-of-the-art methods, especially in cases of extreme occlusion. Models are available at https://github.com/zhuccly/GlomFace-Face-Alignment.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123867483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmentation PanopticDepth:深度感知全视分割的统一框架
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00168
Naiyu Gao, Fei He, Jian Jia, Yanhu Shan, Haoyang Zhang, Xin Zhao, Kaiqi Huang
This paper presents a unified framework for depth-aware panoptic segmentation (DPS), which aims to reconstruct 3D scene with instance-level semantics from one single image. Prior works address this problem by simply adding a dense depth regression head to panoptic segmentation (PS) networks, resulting in two independent task branches. This neglects the mutually-beneficial relations between these two tasks, thus failing to exploit handy instance-level semantic cues to boost depth accuracy while also producing sub-optimal depth maps. To overcome these limitations, we propose a unified framework for the DPS task by applying a dynamic convolution technique to both the PS and depth prediction tasks. Specifically, instead of predicting depth for all pixels at a time, we generate instance-specific kernels to predict depth and segmentation masks for each instance. Moreover, leveraging the instance-wise depth estimation scheme, we add additional instance-level depth cues to assist with supervising the depth learning via a new depth loss. Extensive experiments on Cityscapes-DPS and SemKITTI-DPS show the effectiveness and promise of our method. We hope our unified solution to DPS can lead a new paradigm in this area. Code is available at https://github.com/NaiyuGao/PanopticDepth.
提出了一种统一的深度感知全景分割框架,旨在从单幅图像中重构具有实例级语义的三维场景。先前的工作通过简单地在全光分割(PS)网络中添加密集深度回归头来解决这个问题,从而产生两个独立的任务分支。这忽略了这两个任务之间的互利关系,因此未能利用方便的实例级语义线索来提高深度准确性,同时也产生了次优深度图。为了克服这些限制,我们提出了一个统一的DPS任务框架,将动态卷积技术应用于PS和深度预测任务。具体来说,我们不是一次预测所有像素的深度,而是生成特定于实例的内核来预测每个实例的深度和分割掩码。此外,利用基于实例的深度估计方案,我们添加了额外的实例级深度线索,以通过新的深度损失来帮助监督深度学习。在cityscape - dps和SemKITTI-DPS上的大量实验表明了我们的方法的有效性和前景。我们希望我们对DPS的统一解决方案能够在这一领域引领一个新的范式。代码可从https://github.com/NaiyuGao/PanopticDepth获得。
{"title":"PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmentation","authors":"Naiyu Gao, Fei He, Jian Jia, Yanhu Shan, Haoyang Zhang, Xin Zhao, Kaiqi Huang","doi":"10.1109/CVPR52688.2022.00168","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00168","url":null,"abstract":"This paper presents a unified framework for depth-aware panoptic segmentation (DPS), which aims to reconstruct 3D scene with instance-level semantics from one single image. Prior works address this problem by simply adding a dense depth regression head to panoptic segmentation (PS) networks, resulting in two independent task branches. This neglects the mutually-beneficial relations between these two tasks, thus failing to exploit handy instance-level semantic cues to boost depth accuracy while also producing sub-optimal depth maps. To overcome these limitations, we propose a unified framework for the DPS task by applying a dynamic convolution technique to both the PS and depth prediction tasks. Specifically, instead of predicting depth for all pixels at a time, we generate instance-specific kernels to predict depth and segmentation masks for each instance. Moreover, leveraging the instance-wise depth estimation scheme, we add additional instance-level depth cues to assist with supervising the depth learning via a new depth loss. Extensive experiments on Cityscapes-DPS and SemKITTI-DPS show the effectiveness and promise of our method. We hope our unified solution to DPS can lead a new paradigm in this area. Code is available at https://github.com/NaiyuGao/PanopticDepth.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123971610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Graph-based Spatial Transformer with Memory Replay for Multi-future Pedestrian Trajectory Prediction 基于多未来行人轨迹预测的记忆回放图空间变换器
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00227
Lihuan Li, M. Pagnucco, Yang Song
Pedestrian trajectory prediction is an essential and challenging task for a variety of real-life applications such as autonomous driving and robotic motion planning. Besides generating a single future path, predicting multiple plausible future paths is becoming popular in some recent work on trajectory prediction. However, existing methods typically emphasize spatial interactions between pedestrians and surrounding areas but ignore the smoothness and temporal consistency of predictions. Our model aims to forecast multiple paths based on a historical trajectory by modeling multi-scale graph-based spatial transformers combined with a trajectory smoothing algorithm named “Memory Replay” utilizing a memory graph. Our method can comprehensively exploit the spatial information as well as correct the temporally inconsistent trajectories (e.g., sharp turns). We also propose a new evaluation metric named “Percentage of Trajectory Usage” to evaluate the comprehensiveness of diverse multi-future predictions. Our extensive experiments show that the proposed model achieves state-of-the-art performance on multi-future prediction and competitive results for single-future prediction. Code released at https://github.com/Jacobieee/ST-MR.
对于自动驾驶和机器人运动规划等各种现实应用来说,行人轨迹预测是一项必不可少且具有挑战性的任务。除了生成单一的未来路径外,预测多个可能的未来路径在最近的一些轨迹预测工作中越来越流行。然而,现有的方法通常强调行人与周围区域之间的空间相互作用,而忽略了预测的平滑性和时间一致性。我们的模型旨在通过建模基于多尺度图的空间转换器,结合一种名为“记忆回放”的轨迹平滑算法,利用记忆图来预测基于历史轨迹的多条路径。我们的方法可以综合利用空间信息,并纠正时间不一致的轨迹(如急转弯)。我们还提出了一个新的评价指标“轨迹使用率百分比”来评价不同的多未来预测的全面性。我们的大量实验表明,所提出的模型在多未来预测和单一未来预测的竞争结果上达到了最先进的性能。代码发布于https://github.com/Jacobieee/ST-MR。
{"title":"Graph-based Spatial Transformer with Memory Replay for Multi-future Pedestrian Trajectory Prediction","authors":"Lihuan Li, M. Pagnucco, Yang Song","doi":"10.1109/CVPR52688.2022.00227","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00227","url":null,"abstract":"Pedestrian trajectory prediction is an essential and challenging task for a variety of real-life applications such as autonomous driving and robotic motion planning. Besides generating a single future path, predicting multiple plausible future paths is becoming popular in some recent work on trajectory prediction. However, existing methods typically emphasize spatial interactions between pedestrians and surrounding areas but ignore the smoothness and temporal consistency of predictions. Our model aims to forecast multiple paths based on a historical trajectory by modeling multi-scale graph-based spatial transformers combined with a trajectory smoothing algorithm named “Memory Replay” utilizing a memory graph. Our method can comprehensively exploit the spatial information as well as correct the temporally inconsistent trajectories (e.g., sharp turns). We also propose a new evaluation metric named “Percentage of Trajectory Usage” to evaluate the comprehensiveness of diverse multi-future predictions. Our extensive experiments show that the proposed model achieves state-of-the-art performance on multi-future prediction and competitive results for single-future prediction. Code released at https://github.com/Jacobieee/ST-MR.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114076916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Noise Is Also Useful: Negative Correlation-Steered Latent Contrastive Learning 噪声也是有用的:负相关引导的潜在对比学习
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00013
Jiexi Yan, Lei Luo, Chenghao Xu, Cheng Deng, Heng Huang
How to effectively handle label noise has been one of the most practical but challenging tasks in Deep Neural Networks (DNNs). Recent popular methods for training DNNs with noisy labels mainly focus on directly filtering out samples with low confidence or repeatedly mining valuable information from low-confident samples. However, they cannot guarantee the robust generalization of models due to the ignorance of useful information hidden in noisy data. To address this issue, we propose a new effective method named as LaCoL (Latent Contrastive Learning) to leverage the negative correlations from the noisy data. Specifically, in label space, we exploit the weakly-augmented data to filter samples and adopt classification loss on strong augmentations of the selected sample set, which can preserve the training diversity. While in metric space, we utilize weakly-supervised contrastive learning to excavate these negative correlations hidden in noisy data. Moreover, a cross-space similarity consistency regularization is provided to constrain the gap between label space and metric space. Extensive experiments have validated the superiority of our approach over existing state-of-the-art methods.
如何有效地处理标签噪声一直是深度神经网络(dnn)中最实际但也最具挑战性的任务之一。目前流行的带噪声标签dnn训练方法主要集中在直接过滤掉低置信度样本或从低置信度样本中反复挖掘有价值的信息。然而,由于忽略了隐藏在噪声数据中的有用信息,它们不能保证模型的鲁棒泛化。为了解决这个问题,我们提出了一种新的有效方法,称为LaCoL(潜对比学习)来利用噪声数据的负相关。具体而言,在标签空间中,我们利用弱增广数据对样本进行过滤,并对所选样本集的强增广采用分类损失,以保持训练的多样性。而在度量空间中,我们利用弱监督对比学习来挖掘隐藏在噪声数据中的这些负相关性。此外,还提出了一种跨空间相似性一致性正则化方法来约束标记空间与度量空间之间的差距。大量的实验证实了我们的方法比现有的最先进的方法优越。
{"title":"Noise Is Also Useful: Negative Correlation-Steered Latent Contrastive Learning","authors":"Jiexi Yan, Lei Luo, Chenghao Xu, Cheng Deng, Heng Huang","doi":"10.1109/CVPR52688.2022.00013","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00013","url":null,"abstract":"How to effectively handle label noise has been one of the most practical but challenging tasks in Deep Neural Networks (DNNs). Recent popular methods for training DNNs with noisy labels mainly focus on directly filtering out samples with low confidence or repeatedly mining valuable information from low-confident samples. However, they cannot guarantee the robust generalization of models due to the ignorance of useful information hidden in noisy data. To address this issue, we propose a new effective method named as LaCoL (Latent Contrastive Learning) to leverage the negative correlations from the noisy data. Specifically, in label space, we exploit the weakly-augmented data to filter samples and adopt classification loss on strong augmentations of the selected sample set, which can preserve the training diversity. While in metric space, we utilize weakly-supervised contrastive learning to excavate these negative correlations hidden in noisy data. Moreover, a cross-space similarity consistency regularization is provided to constrain the gap between label space and metric space. Extensive experiments have validated the superiority of our approach over existing state-of-the-art methods.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114195739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Instance Segmentation with Mask-supervised Polygonal Boundary Transformers 基于掩码监督多边形边界变换的实例分割
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00434
Justin Lazarow, Weijian Xu, Z. Tu
In this paper, we present an end-to-end instance segmentation method that regresses a polygonal boundary for each object instance. This sparse, vectorized boundary representation for objects, while attractive in many downstream computer vision tasks, quickly runs into issues of parity that need to be addressed: parity in supervision and parity in performance when compared to existing pixel-based methods. This is due in part to object instances being annotated with ground-truth in the form of polygonal boundaries or segmentation masks, yet being evaluated in a convenient manner using only segmentation masks. Our method, BoundaryFormer, is a Transformer based architecture that directly predicts polygons yet uses instance mask segmentations as the ground-truth supervision for computing the loss. We achieve this by developing an end-to-end differentiable model that solely relies on supervision within the mask space through differentiable rasterization. Boundary-Former matches or surpasses the Mask R-CNN method in terms of instance segmentation quality on both COCO and Cityscapes while exhibiting significantly better transferability across datasets.
在本文中,我们提出了一种端到端实例分割方法,该方法为每个对象实例回归多边形边界。这种稀疏的、矢量化的对象边界表示,虽然在许多下游计算机视觉任务中很有吸引力,但很快就会遇到需要解决的奇偶性问题:与现有的基于像素的方法相比,监督和性能上的奇偶性。这部分是由于对象实例以多边形边界或分割蒙版的形式用ground-truth进行注释,但仅使用分割蒙版以方便的方式进行评估。我们的方法,BoundaryFormer,是一个基于Transformer的架构,它直接预测多边形,但使用实例掩码分割作为计算损失的真值监督。我们通过开发端到端可微模型来实现这一点,该模型仅依赖于通过可微光栅化在掩模空间内的监督。在COCO和cityscape的实例分割质量方面,Boundary-Former匹配或超过Mask R-CNN方法,同时在数据集之间表现出更好的可移植性。
{"title":"Instance Segmentation with Mask-supervised Polygonal Boundary Transformers","authors":"Justin Lazarow, Weijian Xu, Z. Tu","doi":"10.1109/CVPR52688.2022.00434","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00434","url":null,"abstract":"In this paper, we present an end-to-end instance segmentation method that regresses a polygonal boundary for each object instance. This sparse, vectorized boundary representation for objects, while attractive in many downstream computer vision tasks, quickly runs into issues of parity that need to be addressed: parity in supervision and parity in performance when compared to existing pixel-based methods. This is due in part to object instances being annotated with ground-truth in the form of polygonal boundaries or segmentation masks, yet being evaluated in a convenient manner using only segmentation masks. Our method, BoundaryFormer, is a Transformer based architecture that directly predicts polygons yet uses instance mask segmentations as the ground-truth supervision for computing the loss. We achieve this by developing an end-to-end differentiable model that solely relies on supervision within the mask space through differentiable rasterization. Boundary-Former matches or surpasses the Mask R-CNN method in terms of instance segmentation quality on both COCO and Cityscapes while exhibiting significantly better transferability across datasets.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114307724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection 深度无监督显著性检测的多源不确定性挖掘
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01143
Yifa Wang, Wenbo Zhang, Lijun Wang, Tinglong Liu, Huchuan Lu
Deep learning-based image salient object detection (SOD) heavily relies on large-scale training data with pixel-wise labeling. High-quality labels involve intensive labor and are expensive to acquire. In this paper, we propose a novel multi-source uncertainty mining method to facilitate unsupervised deep learning from multiple noisy labels generated by traditional handcrafted SOD methods. We design an Uncertainty Mining Network (UMNet) which consists of multiple Merge-and-Split (MS) modules to recursively analyze the commonality and difference among multiple noisy labels and infer pixel-wise uncertainty map for each label. Meanwhile, we model the noisy labels using Gibbs distribution and propose a weighted uncertainty loss to jointly train the UMNet with the SOD network. As a consequence, our UMNet can adaptively select reliable labels for SOD network learning. Extensive experiments on benchmark datasets demonstrate that our method not only outperforms existing unsupervised methods, but also is on par with fully-supervised state-of-the-art models.
基于深度学习的图像显著目标检测(SOD)严重依赖于具有逐像素标记的大规模训练数据。高质量的标签需要密集的劳动,而且价格昂贵。在本文中,我们提出了一种新的多源不确定性挖掘方法,以促进传统手工SOD方法产生的多个噪声标签的无监督深度学习。设计了一个由多个合并与分裂(Merge-and-Split, MS)模块组成的不确定性挖掘网络(UMNet),递归分析多个噪声标签之间的共性和差异,并推断出每个标签逐像素的不确定性映射。同时,我们使用Gibbs分布对噪声标签进行建模,并提出加权不确定性损失来联合训练UMNet和SOD网络。因此,我们的UMNet可以自适应地选择可靠的标签进行SOD网络学习。在基准数据集上的大量实验表明,我们的方法不仅优于现有的无监督方法,而且与完全监督的最先进模型相当。
{"title":"Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection","authors":"Yifa Wang, Wenbo Zhang, Lijun Wang, Tinglong Liu, Huchuan Lu","doi":"10.1109/CVPR52688.2022.01143","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01143","url":null,"abstract":"Deep learning-based image salient object detection (SOD) heavily relies on large-scale training data with pixel-wise labeling. High-quality labels involve intensive labor and are expensive to acquire. In this paper, we propose a novel multi-source uncertainty mining method to facilitate unsupervised deep learning from multiple noisy labels generated by traditional handcrafted SOD methods. We design an Uncertainty Mining Network (UMNet) which consists of multiple Merge-and-Split (MS) modules to recursively analyze the commonality and difference among multiple noisy labels and infer pixel-wise uncertainty map for each label. Meanwhile, we model the noisy labels using Gibbs distribution and propose a weighted uncertainty loss to jointly train the UMNet with the SOD network. As a consequence, our UMNet can adaptively select reliable labels for SOD network learning. Extensive experiments on benchmark datasets demonstrate that our method not only outperforms existing unsupervised methods, but also is on par with fully-supervised state-of-the-art models.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116159312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Contrastive Learning for Unsupervised Video Highlight Detection 无监督视频高光检测的对比学习
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01365
Taivanbat Badamdorj, Mrigank Rochan, Yang Wang, Li-na Cheng
Video highlight detection can greatly simplify video browsing, potentially paving the way for a wide range of ap-plications. Existing efforts are mostly fully-supervised, requiring humans to manually identify and label the interesting moments (called highlights) in a video. Recent weakly supervised methods forgo the use of highlight annotations, but typically require extensive efforts in collecting external data such as web-crawled videos for model learning. This observation has inspired us to consider unsupervised highlight detection where neither frame-level nor video-level annotations are available in training. We propose a simple contrastive learning framework for unsupervised highlight detection. Our framework encodes a video into a vector representation by learning to pick video clips that help to distinguish it from other videos via a contrastive objective using dropout noise. This inherently allows our framework to identify video clips corresponding to highlight of the video. Extensive empirical evaluations on three highlight detection benchmarks demonstrate the superior performance of our approach.
视频高亮检测可以大大简化视频浏览,潜在地为广泛的应用程序铺平道路。现有的努力大多是完全监督的,需要人类手动识别和标记视频中的有趣时刻(称为亮点)。最近的弱监督方法放弃了高亮注释的使用,但通常需要大量的努力来收集外部数据,例如用于模型学习的网络抓取视频。这一观察启发我们考虑无监督高亮检测,在训练中既没有帧级注释,也没有视频级注释。我们提出了一个简单的对比学习框架,用于无监督突出检测。我们的框架通过学习挑选视频片段,将视频编码为矢量表示,这些视频片段有助于通过使用dropout噪声的对比目标将其与其他视频区分开来。这本质上允许我们的框架识别与视频高亮相对应的视频剪辑。对三个亮点检测基准的广泛实证评估表明,我们的方法具有优越的性能。
{"title":"Contrastive Learning for Unsupervised Video Highlight Detection","authors":"Taivanbat Badamdorj, Mrigank Rochan, Yang Wang, Li-na Cheng","doi":"10.1109/CVPR52688.2022.01365","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01365","url":null,"abstract":"Video highlight detection can greatly simplify video browsing, potentially paving the way for a wide range of ap-plications. Existing efforts are mostly fully-supervised, requiring humans to manually identify and label the interesting moments (called highlights) in a video. Recent weakly supervised methods forgo the use of highlight annotations, but typically require extensive efforts in collecting external data such as web-crawled videos for model learning. This observation has inspired us to consider unsupervised highlight detection where neither frame-level nor video-level annotations are available in training. We propose a simple contrastive learning framework for unsupervised highlight detection. Our framework encodes a video into a vector representation by learning to pick video clips that help to distinguish it from other videos via a contrastive objective using dropout noise. This inherently allows our framework to identify video clips corresponding to highlight of the video. Extensive empirical evaluations on three highlight detection benchmarks demonstrate the superior performance of our approach.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116299838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A Unified Model for Line Projections in Catadioptric Cameras with Rotationally Symmetric Mirrors 旋转对称反射镜反射相机线投影的统一模型
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01534
Pedro Miraldo, J. Iglesias
Lines are among the most used computer vision features, in applications such as camera calibration to object detection. Catadioptric cameras with rotationally symmetric mirrors are omnidirectional imaging devices, capturing up to a 360 degrees field of view. These are used in many applications ranging from robotics to panoramic vision. Although known for some specific configurations, the modeling of line projection was never fully solved for general central and non-central catadioptric cameras. We start by taking some general point reflection assumptions and derive a line reflection constraint. This constraint is then used to define a line projection into the image. Next, we compare our model with previous methods, showing that our general approach outputs the same polynomial degrees as previous configuration-specific systems. We run several experiments using synthetic and real-world data, validating our line projection model. Lastly, we show an application of our methods to an absolute camera pose problem.
线条是最常用的计算机视觉特征之一,在相机校准到物体检测等应用中。带有旋转对称反射镜的反射相机是全方位成像设备,可捕获360度的视野。它们被用于从机器人到全景视觉的许多应用中。虽然已知一些特定的配置,线投影的建模从来没有完全解决一般的中心和非中心反射相机。我们从一般的点反射假设出发,推导出线反射约束。然后使用此约束来定义图像中的直线投影。接下来,我们将我们的模型与以前的方法进行比较,表明我们的一般方法输出与以前的配置特定系统相同的多项式度。我们使用合成数据和真实世界的数据进行了几个实验,验证了我们的直线投影模型。最后,我们展示了我们的方法在绝对相机姿态问题中的应用。
{"title":"A Unified Model for Line Projections in Catadioptric Cameras with Rotationally Symmetric Mirrors","authors":"Pedro Miraldo, J. Iglesias","doi":"10.1109/CVPR52688.2022.01534","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01534","url":null,"abstract":"Lines are among the most used computer vision features, in applications such as camera calibration to object detection. Catadioptric cameras with rotationally symmetric mirrors are omnidirectional imaging devices, capturing up to a 360 degrees field of view. These are used in many applications ranging from robotics to panoramic vision. Although known for some specific configurations, the modeling of line projection was never fully solved for general central and non-central catadioptric cameras. We start by taking some general point reflection assumptions and derive a line reflection constraint. This constraint is then used to define a line projection into the image. Next, we compare our model with previous methods, showing that our general approach outputs the same polynomial degrees as previous configuration-specific systems. We run several experiments using synthetic and real-world data, validating our line projection model. Lastly, we show an application of our methods to an absolute camera pose problem.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"49 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113990425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1