首页 > 最新文献

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation 利用自监督学习框架中的伪标签改进单目深度估计
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00163
Andra Petrovai, S. Nedevschi
We present a novel self-distillation based self-supervised monocular depth estimation (SD-SSMDE) learning framework. In the first step, our network is trained in a self-supervised regime on high-resolution images with the photometric loss. The network is further used to generate pseudo depth labels for all the images in the training set. To improve the performance of our estimates, in the second step, we re-train the network with the scale invariant logarithmic loss supervised by pseudo labels. We resolve scale ambiguity and inter-frame scale consistency by introducing an automatically computed scale in our depth labels. To filter out noisy depth values, we devise a filtering scheme based on the 3D consistency between consecutive views. Extensive experiments demonstrate that each proposed component and the self-supervised learning framework improve the quality of the depth estimation over the baseline and achieve state-of-the-art results on the KITTI and Cityscapes datasets.
提出了一种新的基于自蒸馏的自监督单目深度估计(SD-SSMDE)学习框架。在第一步中,我们的网络在具有光度损失的高分辨率图像上进行自监督训练。该网络进一步用于为训练集中的所有图像生成伪深度标签。为了提高我们估计的性能,在第二步中,我们用伪标签监督的尺度不变对数损失重新训练网络。我们通过在深度标签中引入自动计算的尺度来解决尺度模糊和帧间尺度一致性问题。为了过滤掉噪声深度值,我们设计了一种基于连续视图之间三维一致性的过滤方案。大量的实验表明,每个提出的组件和自监督学习框架都提高了基线深度估计的质量,并在KITTI和cityscape数据集上获得了最先进的结果。
{"title":"Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation","authors":"Andra Petrovai, S. Nedevschi","doi":"10.1109/CVPR52688.2022.00163","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00163","url":null,"abstract":"We present a novel self-distillation based self-supervised monocular depth estimation (SD-SSMDE) learning framework. In the first step, our network is trained in a self-supervised regime on high-resolution images with the photometric loss. The network is further used to generate pseudo depth labels for all the images in the training set. To improve the performance of our estimates, in the second step, we re-train the network with the scale invariant logarithmic loss supervised by pseudo labels. We resolve scale ambiguity and inter-frame scale consistency by introducing an automatically computed scale in our depth labels. To filter out noisy depth values, we devise a filtering scheme based on the 3D consistency between consecutive views. Extensive experiments demonstrate that each proposed component and the self-supervised learning framework improve the quality of the depth estimation over the baseline and achieve state-of-the-art results on the KITTI and Cityscapes datasets.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"244 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123918077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds 三维点云的联合密集字幕和视觉接地的统一框架
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01597
Daigang Cai, Lichen Zhao, Jing Zhang, Lu Sheng, Dong Xu
Observing that the 3D captioning task and the 3D grounding task contain both shared and complementary information in nature, in this work, we propose a unified framework to jointly solve these two distinct but closely related tasks in a synergistic fashion, which consists of both shared task-agnostic modules and lightweight task-specific modules. On one hand, the shared task-agnostic modules aim to learn precise locations of objects, fine-grained attribute features to characterize different objects, and complex relations between objects, which benefit both captioning and visual grounding. On the other hand, by casting each of the two tasks as the proxy task of another one, the lightweight task-specific modules solve the captioning task and the grounding task respectively. Extensive experiments and ablation study on three 3D vision and language datasets demonstrate that our joint training frame-work achieves significant performance gains for each individual task and finally improves the state-of-the-art performance for both captioning and grounding tasks.
鉴于三维字幕任务和三维接地任务在本质上既包含共享信息,又包含互补信息,在本工作中,我们提出了一个统一的框架,以协同方式共同解决这两个不同但密切相关的任务,该框架由共享任务不可知模块和轻量级任务特定模块组成。一方面,共享任务不可知模块旨在学习物体的精确位置、细粒度属性特征以表征不同物体,以及物体之间的复杂关系,这有利于字幕和视觉基础。另一方面,通过将这两个任务转换为另一个任务的代理任务,轻量级的特定任务模块分别解决了字幕任务和接地任务。在三个3D视觉和语言数据集上进行的大量实验和研究表明,我们的联合训练框架在每个单独的任务上都取得了显著的性能提升,并最终提高了字幕和基础任务的最先进性能。
{"title":"3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds","authors":"Daigang Cai, Lichen Zhao, Jing Zhang, Lu Sheng, Dong Xu","doi":"10.1109/CVPR52688.2022.01597","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01597","url":null,"abstract":"Observing that the 3D captioning task and the 3D grounding task contain both shared and complementary information in nature, in this work, we propose a unified framework to jointly solve these two distinct but closely related tasks in a synergistic fashion, which consists of both shared task-agnostic modules and lightweight task-specific modules. On one hand, the shared task-agnostic modules aim to learn precise locations of objects, fine-grained attribute features to characterize different objects, and complex relations between objects, which benefit both captioning and visual grounding. On the other hand, by casting each of the two tasks as the proxy task of another one, the lightweight task-specific modules solve the captioning task and the grounding task respectively. Extensive experiments and ablation study on three 3D vision and language datasets demonstrate that our joint training frame-work achieves significant performance gains for each individual task and finally improves the state-of-the-art performance for both captioning and grounding tasks.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125768418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Whose Track Is It Anyway? Improving Robustness to Tracking Errors with Affinity-based Trajectory Prediction 这到底是谁的歌?基于亲和性的轨迹预测提高跟踪误差的鲁棒性
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00646
Xinshuo Weng, B. Ivanovic, Kris Kitani, M. Pavone
Multi-agent trajectory prediction is critical for planning and decision-making in human-interactive autonomous systems, such as self-driving cars. However, most prediction models are developed separately from their upstream perception (detection and tracking) modules, assuming ground truth past trajectories as inputs. As a result, their performance degrades significantly when using real-world noisy tracking results as inputs. This is typically caused by the propagation of errors from tracking to prediction, such as noisy tracks, fragments and identity switches. To alleviate this propagation of errors, we propose a new prediction paradigm that uses detections and their affinity matrices across frames as inputs, removing the need for error- prone data association during tracking. Since affinity matrices contain “soft” information about the similarity and identity of detections across frames, making prediction directly from affinity matrices retains strictly more information than making prediction from the tracklets generated by data association. Experiments on large-scale, real-world autonomous driving datasets show that our affinity-based prediction scheme 11Our project website is at https://www.xinshuoweng.com/projects/Affinipred. reduces overall prediction errors by up to 57.9%, in comparison to standard prediction pipelines that use tracklets as inputs, with even more significant error reduction (up to 88.6%) if restricting the evaluation to challenging scenarios with tracking errors.
多智能体轨迹预测对于自动驾驶汽车等人机交互自主系统的规划和决策至关重要。然而,大多数预测模型是与其上游感知(检测和跟踪)模块分开开发的,假设地面真实的过去轨迹作为输入。因此,当使用真实世界的噪声跟踪结果作为输入时,它们的性能显著下降。这通常是由从跟踪到预测的错误传播引起的,例如噪声轨道,碎片和身份转换。为了减轻这种错误的传播,我们提出了一种新的预测范式,该范式使用检测及其跨帧的亲和矩阵作为输入,从而消除了在跟踪过程中容易出错的数据关联的需要。由于亲和矩阵包含关于跨帧检测的相似性和同一性的“软”信息,因此直接从亲和矩阵进行预测比从数据关联生成的轨迹进行预测保留了更多的信息。在大规模、真实的自动驾驶数据集上的实验表明,我们基于亲和力的预测方案11我们的项目网站是https://www.xinshuoweng.com/projects/Affinipred。与使用tracklet作为输入的标准预测管道相比,将总体预测误差降低57.9%,如果将评估限制在具有跟踪错误的挑战性场景中,则误差降低幅度更大(高达88.6%)。
{"title":"Whose Track Is It Anyway? Improving Robustness to Tracking Errors with Affinity-based Trajectory Prediction","authors":"Xinshuo Weng, B. Ivanovic, Kris Kitani, M. Pavone","doi":"10.1109/CVPR52688.2022.00646","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00646","url":null,"abstract":"Multi-agent trajectory prediction is critical for planning and decision-making in human-interactive autonomous systems, such as self-driving cars. However, most prediction models are developed separately from their upstream perception (detection and tracking) modules, assuming ground truth past trajectories as inputs. As a result, their performance degrades significantly when using real-world noisy tracking results as inputs. This is typically caused by the propagation of errors from tracking to prediction, such as noisy tracks, fragments and identity switches. To alleviate this propagation of errors, we propose a new prediction paradigm that uses detections and their affinity matrices across frames as inputs, removing the need for error- prone data association during tracking. Since affinity matrices contain “soft” information about the similarity and identity of detections across frames, making prediction directly from affinity matrices retains strictly more information than making prediction from the tracklets generated by data association. Experiments on large-scale, real-world autonomous driving datasets show that our affinity-based prediction scheme 11Our project website is at https://www.xinshuoweng.com/projects/Affinipred. reduces overall prediction errors by up to 57.9%, in comparison to standard prediction pipelines that use tracklets as inputs, with even more significant error reduction (up to 88.6%) if restricting the evaluation to challenging scenarios with tracking errors.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125893902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Learning to Learn and Remember Super Long Multi-Domain Task Sequence 学习学习和记忆超长多域任务序列
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00782
Zhenyi Wang, Li Shen, Tiehang Duan, Donglin Zhan, Le Fang, Mingchen Gao
Catastrophic forgetting (CF) frequently occurs when learning with non-stationary data distribution. The CF issue remains nearly unexplored and is more challenging when meta-learning on a sequence of domains (datasets), called sequential domain meta-learning (SDML). In this work, we propose a simple yet effective learning to learn approach, i.e., meta optimizer, to mitigate the CF problem in SDML. We first apply the proposed meta optimizer to the simplified setting of SDML, domain-aware meta-learning, where the domain labels and boundaries are known during the learning process. We propose dynamically freezing the network and incorporating it with the proposed meta optimizer by considering the domain nature during meta training. In addition, we extend the meta optimizer to the more general setting of SDML, domain-agnostic meta-learning, where domain labels and boundaries are unknown during the learning process. We propose a domain shift detection technique to capture latent domain change and equip the meta optimizer with it to work in this setting. The proposed meta optimizer is versatile and can be easily integrated with several existing meta-learning algorithms. Finally, we construct a challenging and large-scale benchmark consisting of 10 heterogeneous domains with a super long task sequence consisting of 100K tasks. We perform extensive experiments on the proposed benchmark for both settings and demonstrate the effectiveness of our proposed method, outperforming current strong baselines by a large margin.
灾难性遗忘(CF)在非平稳数据分布的学习中经常发生。CF问题几乎没有被探索过,当元学习在一系列领域(数据集)上时,它更具挑战性,称为顺序领域元学习(SDML)。在这项工作中,我们提出了一种简单而有效的“学习到学习”方法,即元优化器,以缓解SDML中的CF问题。我们首先将提出的元优化器应用于SDML的简化设置,即领域感知元学习,其中在学习过程中已知领域标签和边界。我们提出动态冻结网络,并在元训练过程中考虑领域性质,将其与所提出的元优化器相结合。此外,我们将元优化器扩展到更通用的SDML设置,即领域不可知的元学习,其中领域标签和边界在学习过程中是未知的。我们提出了一种域移位检测技术来捕获潜在的域变化,并使元优化器在这种情况下工作。提出的元优化器是通用的,可以很容易地与几种现有的元学习算法集成。最后,我们构建了一个具有挑战性的大规模基准测试,该基准测试由10个异构域组成,具有由100K个任务组成的超长任务序列。我们对两种设置的拟议基准进行了广泛的实验,并证明了我们提出的方法的有效性,在很大程度上优于当前的强基线。
{"title":"Learning to Learn and Remember Super Long Multi-Domain Task Sequence","authors":"Zhenyi Wang, Li Shen, Tiehang Duan, Donglin Zhan, Le Fang, Mingchen Gao","doi":"10.1109/CVPR52688.2022.00782","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00782","url":null,"abstract":"Catastrophic forgetting (CF) frequently occurs when learning with non-stationary data distribution. The CF issue remains nearly unexplored and is more challenging when meta-learning on a sequence of domains (datasets), called sequential domain meta-learning (SDML). In this work, we propose a simple yet effective learning to learn approach, i.e., meta optimizer, to mitigate the CF problem in SDML. We first apply the proposed meta optimizer to the simplified setting of SDML, domain-aware meta-learning, where the domain labels and boundaries are known during the learning process. We propose dynamically freezing the network and incorporating it with the proposed meta optimizer by considering the domain nature during meta training. In addition, we extend the meta optimizer to the more general setting of SDML, domain-agnostic meta-learning, where domain labels and boundaries are unknown during the learning process. We propose a domain shift detection technique to capture latent domain change and equip the meta optimizer with it to work in this setting. The proposed meta optimizer is versatile and can be easily integrated with several existing meta-learning algorithms. Finally, we construct a challenging and large-scale benchmark consisting of 10 heterogeneous domains with a super long task sequence consisting of 100K tasks. We perform extensive experiments on the proposed benchmark for both settings and demonstrate the effectiveness of our proposed method, outperforming current strong baselines by a large margin.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129393847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes IRISformer:用于室内场景单图像反向渲染的密集视觉变压器
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00284
Rui Zhu, Zhengqin Li, J. Matai, F. Porikli, Manmohan Chandraker
Indoor scenes exhibit significant appearance variations due to myriad interactions between arbitrarily diverse object shapes, spatially-changing materials, and complex lighting. Shadows, highlights, and inter-reflections caused by visible and invisible light sources require reasoning about long-range interactions for inverse rendering, which seeks to recover the components of image formation, namely, shape, material, and lighting. In this work, our intuition is that the long-range attention learned by transformer architectures is ideally suited to solve longstanding challenges in single-image inverse rendering. We demonstrate with a specific instantiation of a dense vision transformer, IRISformer, that excels at both single-task and multi-task reasoning required for inverse rendering. Specifically, we propose a transformer architecture to simultaneously estimate depths, normals, spatially-varying albedo, roughness and lighting from a single image of an indoor scene. Our extensive evaluations on benchmark datasets demonstrate state-of-the-art results on each of the above tasks, enabling applications like object insertion and material editing in a single unconstrained real image, with greater photorealism than prior works. Code and data are publicly released.11https://github.com/ViLab-UCSD/IRISformer
由于任意不同的物体形状、空间变化的材料和复杂的照明之间的无数相互作用,室内场景表现出显著的外观变化。由可见和不可见光源引起的阴影,高光和相互反射需要对反向渲染的远程相互作用进行推理,其目的是恢复图像形成的组成部分,即形状,材料和照明。在这项工作中,我们的直觉是,变压器架构学习的远程注意力非常适合解决单图像反向渲染中长期存在的挑战。我们用一个密集视觉转换器IRISformer的具体实例来演示,它在反渲染所需的单任务和多任务推理中都表现出色。具体来说,我们提出了一种变压器架构,可以同时从室内场景的单个图像中估计深度、法线、空间变化反照率、粗糙度和光照。我们对基准数据集的广泛评估展示了上述每个任务的最先进结果,使对象插入和材料编辑等应用能够在单个不受约束的真实图像中实现,比以前的作品具有更高的真实感。11https://github.com/ViLab-UCSD/IRISformer代码和数据是公开发布的
{"title":"IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes","authors":"Rui Zhu, Zhengqin Li, J. Matai, F. Porikli, Manmohan Chandraker","doi":"10.1109/CVPR52688.2022.00284","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00284","url":null,"abstract":"Indoor scenes exhibit significant appearance variations due to myriad interactions between arbitrarily diverse object shapes, spatially-changing materials, and complex lighting. Shadows, highlights, and inter-reflections caused by visible and invisible light sources require reasoning about long-range interactions for inverse rendering, which seeks to recover the components of image formation, namely, shape, material, and lighting. In this work, our intuition is that the long-range attention learned by transformer architectures is ideally suited to solve longstanding challenges in single-image inverse rendering. We demonstrate with a specific instantiation of a dense vision transformer, IRISformer, that excels at both single-task and multi-task reasoning required for inverse rendering. Specifically, we propose a transformer architecture to simultaneously estimate depths, normals, spatially-varying albedo, roughness and lighting from a single image of an indoor scene. Our extensive evaluations on benchmark datasets demonstrate state-of-the-art results on each of the above tasks, enabling applications like object insertion and material editing in a single unconstrained real image, with greater photorealism than prior works. Code and data are publicly released.11https://github.com/ViLab-UCSD/IRISformer","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128499694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Contrastive Dual Gating: Learning Sparse Features With Contrastive Learning 对比双门控:用对比学习学习稀疏特征
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01194
Jian Meng, Li Yang, Jinwoo Shin, Deliang Fan, J.-s. Seo
Contrastive learning (or its variants) has recently become a promising direction in the self-supervised learning domain, achieving similar performance as supervised learning with minimum fine-tuning. Despite the labeling efficiency, wide and large networks are required to achieve high accuracy, which incurs a high amount of computation and hinders the pragmatic merit of self-supervised learning. To effectively reduce the computation of insignificant features or channels, recent dynamic pruning algorithms for supervised learning employed auxiliary salience predictors. However, we found that such salience predictors cannot be easily trained when they are naïvely applied to contrastive learning from scratch. To address this issue, we propose contrastive dual gating (CDG), a novel dynamic pruning algorithm that skips the uninformative features during contrastive learning without hurting the trainability of the networks. We demonstrate the superiority of CDG with ResNet models for CIFAR-10, CIFAR-100, and ImageNet-100 datasets. Compared to our implementations of state-of-the-art dynamic pruning algorithms for self-supervised learning, CDG achieves up to 15% accuracy improvement for CIFAR-10 dataset with higher computation reduction.
对比学习(或其变体)最近成为自监督学习领域的一个有前途的方向,以最小的微调实现与监督学习相似的性能。尽管标注效率高,但要达到高准确率需要广泛和庞大的网络,这导致了大量的计算,阻碍了自监督学习的实用价值。为了有效地减少不重要特征或通道的计算,最近用于监督学习的动态修剪算法使用了辅助显著性预测器。然而,我们发现,当这些显著性预测因子naïvely应用于从头开始的对比学习时,它们不容易训练。为了解决这个问题,我们提出了对比双门(CDG),这是一种新的动态修剪算法,在对比学习过程中跳过非信息特征,而不会损害网络的可训练性。我们用ResNet模型对CIFAR-10、CIFAR-100和ImageNet-100数据集证明了CDG的优越性。与我们在自监督学习中实现的最先进的动态修剪算法相比,CDG在CIFAR-10数据集上实现了高达15%的准确率提高,并且减少了更高的计算量。
{"title":"Contrastive Dual Gating: Learning Sparse Features With Contrastive Learning","authors":"Jian Meng, Li Yang, Jinwoo Shin, Deliang Fan, J.-s. Seo","doi":"10.1109/CVPR52688.2022.01194","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01194","url":null,"abstract":"Contrastive learning (or its variants) has recently become a promising direction in the self-supervised learning domain, achieving similar performance as supervised learning with minimum fine-tuning. Despite the labeling efficiency, wide and large networks are required to achieve high accuracy, which incurs a high amount of computation and hinders the pragmatic merit of self-supervised learning. To effectively reduce the computation of insignificant features or channels, recent dynamic pruning algorithms for supervised learning employed auxiliary salience predictors. However, we found that such salience predictors cannot be easily trained when they are naïvely applied to contrastive learning from scratch. To address this issue, we propose contrastive dual gating (CDG), a novel dynamic pruning algorithm that skips the uninformative features during contrastive learning without hurting the trainability of the networks. We demonstrate the superiority of CDG with ResNet models for CIFAR-10, CIFAR-100, and ImageNet-100 datasets. Compared to our implementations of state-of-the-art dynamic pruning algorithms for self-supervised learning, CDG achieves up to 15% accuracy improvement for CIFAR-10 dataset with higher computation reduction.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128295360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Spiking Transformers for Event-based Single Object Tracking 基于事件的单对象跟踪的峰值变压器
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00860
Jiqing Zhang, B. Dong, Haiwei Zhang, Jianchuan Ding, Felix Heide, Baocai Yin, Xin Yang
Event-based cameras bring a unique capability to tracking, being able to function in challenging real-world conditions as a direct result of their high temporal resolution and high dynamic range. These imagers capture events asynchronously that encode rich temporal and spatial information. However, effectively extracting this information from events remains an open challenge. In this work, we propose a spiking transformer network, STNet, for single object tracking. STNet dynamically extracts and fuses information from both temporal and spatial domains. In particular, the proposed architecture features a transformer module to provide global spatial information and a spiking neural network (SNN) module for extracting temporal cues. The spiking threshold of the SNN module is dynamically adjusted based on the statistical cues of the spatial information, which we find essential in providing robust SNN features. We fuse both feature branches dynamically with a novel cross-domain attention fusion algorithm. Extensive experiments on three event-based datasets, FE240hz, EED and VisEvent validate that the proposed STNet outperforms existing state-of-the-art methods in both tracking accuracy and speed with a significant margin. The code and pretrained models are at https://github.com/Jee-King/CVPR2022_STNet.
基于事件的相机带来了独特的跟踪能力,能够在具有挑战性的现实条件下发挥作用,这是其高时间分辨率和高动态范围的直接结果。这些成像仪异步捕获事件,编码丰富的时间和空间信息。然而,有效地从事件中提取这些信息仍然是一个开放的挑战。在这项工作中,我们提出了一个脉冲变压器网络,STNet,用于单目标跟踪。STNet动态地提取和融合来自时空域的信息。特别是,所提出的架构具有提供全局空间信息的变压器模块和用于提取时间线索的峰值神经网络(SNN)模块。SNN模块的尖峰阈值是根据空间信息的统计线索动态调整的,这对于提供鲁棒的SNN特征至关重要。采用一种新颖的跨域注意力融合算法动态融合两个特征分支。在三个基于事件的数据集(FE240hz, EED和VisEvent)上进行的大量实验验证了所提出的STNet在跟踪精度和速度方面都优于现有的最先进的方法。代码和预训练模型在https://github.com/Jee-King/CVPR2022_STNet。
{"title":"Spiking Transformers for Event-based Single Object Tracking","authors":"Jiqing Zhang, B. Dong, Haiwei Zhang, Jianchuan Ding, Felix Heide, Baocai Yin, Xin Yang","doi":"10.1109/CVPR52688.2022.00860","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00860","url":null,"abstract":"Event-based cameras bring a unique capability to tracking, being able to function in challenging real-world conditions as a direct result of their high temporal resolution and high dynamic range. These imagers capture events asynchronously that encode rich temporal and spatial information. However, effectively extracting this information from events remains an open challenge. In this work, we propose a spiking transformer network, STNet, for single object tracking. STNet dynamically extracts and fuses information from both temporal and spatial domains. In particular, the proposed architecture features a transformer module to provide global spatial information and a spiking neural network (SNN) module for extracting temporal cues. The spiking threshold of the SNN module is dynamically adjusted based on the statistical cues of the spatial information, which we find essential in providing robust SNN features. We fuse both feature branches dynamically with a novel cross-domain attention fusion algorithm. Extensive experiments on three event-based datasets, FE240hz, EED and VisEvent validate that the proposed STNet outperforms existing state-of-the-art methods in both tracking accuracy and speed with a significant margin. The code and pretrained models are at https://github.com/Jee-King/CVPR2022_STNet.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127134995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Geometry-Aware Guided Loss for Deep Crack Recognition 基于几何感知引导损失的深裂纹识别
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00466
Zhuangzhuang Chen, Jin Zhang, Zhuo-Jin Lai, Jie-Min Chen, Zun Liu, Jianqiang
Despite the substantial progress of deep models for crack recognition, due to the inconsistent cracks in varying sizes, shapes, and noisy background textures, there still lacks the discriminative power of the deeply learned features when supervised by the cross-entropy loss. In this paper, we propose the geometry-aware guided loss (GAGL) that enhances the discrimination ability and is only applied in the training stage without extra computation and memory during inference. The GAGL consists of the feature-based geometry-aware projected gradient descent method (FGA-PGD) that approximates the geometric distances of the features to the class boundaries, and the geometry-aware update rule that learns an anchor of each class as the approximation of the feature expected to have the largest geometric distance to the corresponding class boundary. Then the discriminative power can be enhanced by minimizing the distances between the features and their corresponding class anchors in the feature space. To address the limited availability of related benchmarks, we collect a fully annotated dataset, namely, NPP2021, which involves inconsistent cracks and noisy backgrounds in real-world nuclear power plants. Our proposed GAGL outperforms the state of the arts on various benchmark datasets including CRACK2019, SDNET2018, and our NPP2021.
尽管深度模型在裂纹识别方面取得了长足的进步,但由于不同大小、形状和背景纹理的不一致,在交叉熵损失监督下,深度学习的特征仍然缺乏识别能力。在本文中,我们提出了几何感知制导损失(GAGL),该方法提高了识别能力,并且只应用于训练阶段,在推理过程中不需要额外的计算和内存。GAGL包括基于特征的几何感知投影梯度下降方法(FGA-PGD),该方法近似特征到类边界的几何距离,以及几何感知更新规则,该规则学习每个类的锚点作为预计到相应类边界具有最大几何距离的特征的近近值。然后通过最小化特征与其对应的类锚点在特征空间中的距离来增强识别能力。为了解决相关基准的有限可用性,我们收集了一个完全注释的数据集,即NPP2021,该数据集涉及现实世界核电站中不一致的裂缝和噪声背景。我们提出的GAGL在各种基准数据集(包括CRACK2019、SDNET2018和我们的NPP2021)上的表现优于最先进的技术。
{"title":"Geometry-Aware Guided Loss for Deep Crack Recognition","authors":"Zhuangzhuang Chen, Jin Zhang, Zhuo-Jin Lai, Jie-Min Chen, Zun Liu, Jianqiang","doi":"10.1109/CVPR52688.2022.00466","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00466","url":null,"abstract":"Despite the substantial progress of deep models for crack recognition, due to the inconsistent cracks in varying sizes, shapes, and noisy background textures, there still lacks the discriminative power of the deeply learned features when supervised by the cross-entropy loss. In this paper, we propose the geometry-aware guided loss (GAGL) that enhances the discrimination ability and is only applied in the training stage without extra computation and memory during inference. The GAGL consists of the feature-based geometry-aware projected gradient descent method (FGA-PGD) that approximates the geometric distances of the features to the class boundaries, and the geometry-aware update rule that learns an anchor of each class as the approximation of the feature expected to have the largest geometric distance to the corresponding class boundary. Then the discriminative power can be enhanced by minimizing the distances between the features and their corresponding class anchors in the feature space. To address the limited availability of related benchmarks, we collect a fully annotated dataset, namely, NPP2021, which involves inconsistent cracks and noisy backgrounds in real-world nuclear power plants. Our proposed GAGL outperforms the state of the arts on various benchmark datasets including CRACK2019, SDNET2018, and our NPP2021.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127189559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation 问题在于标签:用于鲁棒场景图生成的噪声标签校正
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01830
Lin Li, Long Chen, Yifeng Huang, Zhimeng Zhang, Songyang Zhang, Jun Xiao
Unbiased SGG has achieved significant progress over recent years. However, almost all existing SGG models have overlooked the ground-truth annotation qualities of prevailing SGG datasets, i.e., they always assume: 1) all the manually annotated positive samples are equally correct; 2) all the un-annotated negative samples are absolutely background. In this paper, we argue that both assumptions are inapplicable to SGG: there are numerous “noisy” ground-truth predicate labels that break these two assumptions, and these noisy samples actually harm the training of unbiased SGG models. To this end, we propose a novel model-agnostic NoIsy label CorrEction strategy for SGG: NICE. NICE can not only detect noisy samples but also reassign more high-quality predicate labels to them. After the NICE training, we can obtain a cleaner version of SGG dataset for model training. Specifically, NICE consists of three components: negative Noisy Sample Detection (Neg-NSD), positive NSD (Pos-NSD), and Noisy Sample Correction (NSC). Firstly, in Neg-NSD, we formulate this task as an out-of-distribution detection problem, and assign pseudo labels to all detected noisy negative samples. Then, in Pos-NSD, we use a clustering-based algorithm to divide all positive samples into multiple sets, and treat the samples in the noisiest set as noisy positive samples. Lastly, in NSC, we use a simple but effective weighted KNN to reassign new predicate labels to noisy positive samples. Extensive results on different backbones and tasks have attested to the effectiveness and generalization abilities of each component of NICE.
公正的SGG近年来取得了重大进展。然而,几乎所有现有的SGG模型都忽略了主流SGG数据集的真值标注质量,即它们总是假设:1)所有人工标注的阳性样本都是同样正确的;2)所有未注释的阴性样本都是绝对背景。在本文中,我们认为这两个假设都不适用于SGG:有许多“嘈杂的”基真谓词标签打破了这两个假设,而这些嘈杂的样本实际上损害了无偏SGG模型的训练。为此,我们为SGG: NICE提出了一种新的模型无关的噪声标签校正策略。NICE不仅可以检测有噪声的样本,还可以为它们重新分配更多高质量的谓词标签。经过NICE训练后,我们可以得到一个更清晰的SGG数据集,用于模型训练。具体来说,NICE由三个部分组成:负噪声样本检测(nege -NSD)、正噪声样本检测(poss -NSD)和噪声样本校正(NSC)。首先,在n - nsd中,我们将此任务描述为分布外检测问题,并为所有检测到的有噪声负样本分配伪标签。然后,在poss - nsd中,我们使用基于聚类的算法将所有正样本分成多个集合,并将噪声最大的集合中的样本视为有噪声的正样本。最后,在NSC中,我们使用一个简单但有效的加权KNN将新的谓词标签重新分配给有噪声的正样本。在不同的主干和任务上的大量结果证明了NICE的有效性和泛化能力。
{"title":"The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation","authors":"Lin Li, Long Chen, Yifeng Huang, Zhimeng Zhang, Songyang Zhang, Jun Xiao","doi":"10.1109/CVPR52688.2022.01830","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01830","url":null,"abstract":"Unbiased SGG has achieved significant progress over recent years. However, almost all existing SGG models have overlooked the ground-truth annotation qualities of prevailing SGG datasets, i.e., they always assume: 1) all the manually annotated positive samples are equally correct; 2) all the un-annotated negative samples are absolutely background. In this paper, we argue that both assumptions are inapplicable to SGG: there are numerous “noisy” ground-truth predicate labels that break these two assumptions, and these noisy samples actually harm the training of unbiased SGG models. To this end, we propose a novel model-agnostic NoIsy label CorrEction strategy for SGG: NICE. NICE can not only detect noisy samples but also reassign more high-quality predicate labels to them. After the NICE training, we can obtain a cleaner version of SGG dataset for model training. Specifically, NICE consists of three components: negative Noisy Sample Detection (Neg-NSD), positive NSD (Pos-NSD), and Noisy Sample Correction (NSC). Firstly, in Neg-NSD, we formulate this task as an out-of-distribution detection problem, and assign pseudo labels to all detected noisy negative samples. Then, in Pos-NSD, we use a clustering-based algorithm to divide all positive samples into multiple sets, and treat the samples in the noisiest set as noisy positive samples. Lastly, in NSC, we use a simple but effective weighted KNN to reassign new predicate labels to noisy positive samples. Extensive results on different backbones and tasks have attested to the effectiveness and generalization abilities of each component of NICE.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127234422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
PlanarRecon: Realtime 3D Plane Detection and Reconstruction from Posed Monocular Videos PlanarRecon:实时三维平面检测和重建从摆单目视频
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00612
Yiming Xie, Matheus Gadelha, Fengting Yang, Xiaowei Zhou, Huaizu Jiang
We present PlanarRecon - a novel framework for globally coherent detection and reconstruction of 3D planes from a posed monocular video. Unlike previous works that detect planes in 2D from a single image, PlanarRecon incrementally detects planes in 3D for each video fragment, which consists of a set of key frames, from a volumetric representation of the scene using neural networks. A learning-based tracking and fusion module is designed to merge planes from previous fragments to form a coherent global plane reconstruction. Such design allows Planar-Recon to integrate observations from multiple views within each fragment and temporal information across different ones, resulting in an accurate and coherent reconstruction of the scene abstraction with low-polygonal geometry. Experiments show that the proposed approach achieves state-of-the-art performances on the ScanNet dataset while being real-time. Code is available at the project page: https://neu-vi.github.io/planarrecon/.
我们提出了PlanarRecon -一个新的框架,用于从单目视频中进行全局相干检测和重建3D平面。与以前的工作不同,PlanarRecon从单个图像中检测2D平面,它使用神经网络从场景的体积表示中增量地检测每个视频片段(由一组关键帧组成)的3D平面。设计了基于学习的跟踪和融合模块,将先前碎片中的平面合并,形成连贯的全局平面重建。这样的设计允许Planar-Recon在每个片段中整合来自多个视图的观察结果和不同视图的时间信息,从而以低多边形几何形状精确连贯地重建场景抽象。实验表明,该方法在ScanNet数据集上达到了最先进的性能,同时具有实时性。代码可从项目页面获得:https://neu-vi.github.io/planarrecon/。
{"title":"PlanarRecon: Realtime 3D Plane Detection and Reconstruction from Posed Monocular Videos","authors":"Yiming Xie, Matheus Gadelha, Fengting Yang, Xiaowei Zhou, Huaizu Jiang","doi":"10.1109/CVPR52688.2022.00612","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00612","url":null,"abstract":"We present PlanarRecon - a novel framework for globally coherent detection and reconstruction of 3D planes from a posed monocular video. Unlike previous works that detect planes in 2D from a single image, PlanarRecon incrementally detects planes in 3D for each video fragment, which consists of a set of key frames, from a volumetric representation of the scene using neural networks. A learning-based tracking and fusion module is designed to merge planes from previous fragments to form a coherent global plane reconstruction. Such design allows Planar-Recon to integrate observations from multiple views within each fragment and temporal information across different ones, resulting in an accurate and coherent reconstruction of the scene abstraction with low-polygonal geometry. Experiments show that the proposed approach achieves state-of-the-art performances on the ScanNet dataset while being real-time. Code is available at the project page: https://neu-vi.github.io/planarrecon/.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130036119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1