首页 > 最新文献

International Journal of Computer Vision最新文献

英文 中文
Video Instance Segmentation in an Open-World 开放世界中的视频实例分割
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-30 DOI: 10.1007/s11263-024-02195-4
Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan

Existing video instance segmentation (VIS) approaches generally follow a closed-world assumption, where only seen category instances are identified and spatio-temporally segmented at inference. Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as ‘unknown’ and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available. We propose the first open-world VIS approach, named OW-VISFormer, that introduces a novel feature enrichment mechanism and a spatio-temporal objectness (STO) module. The feature enrichment mechanism based on a light-weight auxiliary network aims at accurate pixel-level (unknown) object delineation from the background as well as distinguishing category-specific known semantic classes. The STO module strives to generate instance-level pseudo-labels by enhancing the foreground activations through a contrastive loss. Moreover, we also introduce an extensive experimental protocol to measure the characteristics of OW-VIS. Our OW-VISFormer performs favorably against a solid baseline in OW-VIS setting. Further, we evaluate our contributions in the standard fully-supervised VIS setting by integrating them into the recent SeqFormer, achieving an absolute gain of 1.6% AP on Youtube-VIS 2019 val. set. Lastly, we show the generalizability of our contributions for the open-world detection (OWOD) setting, outperforming the best existing OWOD method in the literature. Code, models along with OW-VIS splits are available at https://github.com/OmkarThawakar/OWVISFormer.

现有的视频实例分割(VIS)方法一般都遵循封闭世界假设,即在推理时只对看到的类别实例进行识别和时空分割。开放世界方案放宽了封闭世界静态学习假设,具体如下:(a) 首先,它区分一组已知类别,并将未知对象标记为 "未知",然后(b) 当相应的语义标签可用时,它逐步学习未知对象的类别。我们提出了第一种名为 OW-VISFormer 的开放世界 VIS 方法,它引入了一种新颖的特征丰富机制和一个时空对象性(STO)模块。基于轻量级辅助网络的特征丰富机制旨在从背景中准确划分像素级(未知)对象,并区分特定类别的已知语义类别。STO 模块通过对比损失来增强前景激活,从而生成实例级伪标签。此外,我们还引入了一个广泛的实验方案来测量 OW-VIS 的特性。在 OW-VIS 设置中,我们的 OW-VISFormer 与可靠的基线相比表现出色。此外,我们还评估了我们在标准全监督 VIS 设置中的贡献,将其集成到最新的 SeqFormer 中,在 Youtube-VIS 2019 val. 集上实现了 1.6% AP 的绝对增益。最后,我们展示了我们的贡献对于开放世界检测(OWOD)设置的通用性,其性能优于文献中现有的最佳 OWOD 方法。代码、模型以及 OW-VIS 拆分可在 https://github.com/OmkarThawakar/OWVISFormer 上获取。
{"title":"Video Instance Segmentation in an Open-World","authors":"Omkar Thawakar, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan","doi":"10.1007/s11263-024-02195-4","DOIUrl":"https://doi.org/10.1007/s11263-024-02195-4","url":null,"abstract":"<p>Existing video instance segmentation (VIS) approaches generally follow a closed-world assumption, where only seen category instances are identified and spatio-temporally segmented at inference. Open-world formulation relaxes the close-world static-learning assumption as follows: (a) first, it distinguishes a set of known categories as well as labels an unknown object as ‘unknown’ and then (b) it incrementally learns the class of an unknown as and when the corresponding semantic labels become available. We propose the first open-world VIS approach, named OW-VISFormer, that introduces a novel feature enrichment mechanism and a spatio-temporal objectness (STO) module. The feature enrichment mechanism based on a light-weight auxiliary network aims at accurate pixel-level (unknown) object delineation from the background as well as distinguishing category-specific known semantic classes. The STO module strives to generate instance-level pseudo-labels by enhancing the foreground activations through a contrastive loss. Moreover, we also introduce an extensive experimental protocol to measure the characteristics of OW-VIS. Our OW-VISFormer performs favorably against a solid baseline in OW-VIS setting. Further, we evaluate our contributions in the standard fully-supervised VIS setting by integrating them into the recent SeqFormer, achieving an absolute gain of 1.6% AP on Youtube-VIS 2019 val. set. Lastly, we show the generalizability of our contributions for the open-world detection (OWOD) setting, outperforming the best existing OWOD method in the literature. Code, models along with OW-VIS splits are available at https://github.com/OmkarThawakar/OWVISFormer.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"10 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Equiangular Basis Vectors: A Novel Paradigm for Classification Tasks 等边基向量:分类任务的新范式
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-30 DOI: 10.1007/s11263-024-02189-2
Yang Shen, Xuhao Sun, Xiu-Shen Wei, Anqi Xu, Lingyan Gao

In this paper, we propose Equiangular Basis Vectors (EBVs) as a novel training paradigm of deep learning for image classification tasks. Differing from prominent training paradigms, e.g., k-way classification layers (mapping the learned representations to the label space) and deep metric learning (quantifying sample similarity), our method generates normalized vector embeddings as "predefined classifiers", which act as the fixed learning targets corresponding to different categories. By minimizing the spherical distance of the embedding of an input between its categorical EBV in training, the predictions can be obtained by identifying the categorical EBV with the smallest distance during inference. More importantly, by directly adding EBVs corresponding to newly added categories of equal status on the basis of existing EBVs, our method exhibits strong scalability to deal with the large increase of training categories in open-environment machine learning. In experiments, we evaluate EBVs on diverse computer vision tasks with large-scale real-world datasets, including classification on ImageNet-1K, object detection on COCO, semantic segmentation on ADE20K, etc. We further collected a dataset consisting of 100,000 categories to validate the superior performance of EBVs when handling a large number of categories. Comprehensive experiments validate both the effectiveness and scalability of our EBVs. Our method won the first place in the 2022 DIGIX Global AI Challenge, code along with all associated logs are open-source and available at https://github.com/aassxun/Equiangular-Basis-Vectors.

在本文中,我们提出了等边基向量(EBVs)作为一种新的深度学习训练范式,用于图像分类任务。与k-way分类层(将学习到的表征映射到标签空间)和深度度量学习(量化样本相似性)等著名训练范式不同,我们的方法生成归一化向量嵌入作为 "预定义分类器",作为对应于不同类别的固定学习目标。在训练中,通过最小化输入的嵌入与其分类 EBV 之间的球形距离,在推理过程中就可以通过识别距离最小的分类 EBV 来获得预测结果。更重要的是,我们的方法在现有 EBV 的基础上直接添加了与新增加的同等地位类别相对应的 EBV,因此具有很强的可扩展性,可以应对开放环境机器学习中训练类别的大量增加。在实验中,我们利用大规模真实世界数据集评估了 EBV 在各种计算机视觉任务中的应用,包括 ImageNet-1K 的分类、COCO 的物体检测、ADE20K 的语义分割等。我们还收集了一个包含 100,000 个类别的数据集,以验证 EBV 在处理大量类别时的卓越性能。综合实验验证了 EBV 的有效性和可扩展性。我们的方法赢得了 2022 年 DIGIX 全球人工智能挑战赛的第一名,代码和所有相关日志都是开源的,可在 https://github.com/aassxun/Equiangular-Basis-Vectors 上获取。
{"title":"Equiangular Basis Vectors: A Novel Paradigm for Classification Tasks","authors":"Yang Shen, Xuhao Sun, Xiu-Shen Wei, Anqi Xu, Lingyan Gao","doi":"10.1007/s11263-024-02189-2","DOIUrl":"https://doi.org/10.1007/s11263-024-02189-2","url":null,"abstract":"<p>In this paper, we propose Equiangular Basis Vectors (EBVs) as a novel training paradigm of deep learning for image classification tasks. Differing from prominent training paradigms, e.g., <i>k</i>-way classification layers (mapping the learned representations to the label space) and deep metric learning (quantifying sample similarity), our method generates normalized vector embeddings as \"predefined classifiers\", which act as the fixed learning targets corresponding to different categories. By minimizing the spherical distance of the embedding of an input between its categorical EBV in training, the predictions can be obtained by identifying the categorical EBV with the smallest distance during inference. More importantly, by directly adding EBVs corresponding to newly added categories of equal status on the basis of existing EBVs, our method exhibits strong scalability to deal with the large increase of training categories in open-environment machine learning. In experiments, we evaluate EBVs on diverse computer vision tasks with large-scale real-world datasets, including classification on ImageNet-1K, object detection on COCO, semantic segmentation on ADE20K, etc. We further collected a dataset consisting of 100,000 categories to validate the superior performance of EBVs when handling a large number of categories. Comprehensive experiments validate both the effectiveness and scalability of our EBVs. Our method won the first place in the 2022 DIGIX Global AI Challenge, code along with all associated logs are open-source and available at https://github.com/aassxun/Equiangular-Basis-Vectors.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"45 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141857652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Face3DAdv: Exploiting Robust Adversarial 3D Patches on Physical Face Recognition Face3DAdv:在物理人脸识别中利用稳健的对抗性 3D 补丁
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-28 DOI: 10.1007/s11263-024-02177-6
Xiao Yang, Longlong Xu, Tianyu Pang, Yinpeng Dong, Yikai Wang, Hang Su, Jun Zhu

Recent research has elucidated the susceptibility of face recognition models to physical adversarial patches, thus provoking security concerns about the deployed face recognition systems. Most existing 2D and 3D physical attacks on face recognition, however, produce adversarial examples using a single-state face image of an attacker. This point-wise attack paradigm tends to yield inferior results when countering numerous complicated states in physical environments, such as diverse pose variations. In this paper, by reassessing the intrinsic relationship between an attacker’s face and its variations, we propose a practical pipeline that simulates complex facial transformations in the physical world through 3D face modeling. This adaptive simulation serves as a digital counterpart of physical faces and empowers us to regulate various facial variations and physical conditions. With this digital simulator, we present the Face3DAdv method to craft 3D adversarial patches, which account for 3D facial transformations and realistic physical variations. Moreover, by optimizing the latent space on 3D modeling and involving importance sampling on various transformations, we demonstrate that Face3DAdv can significantly improve the effectiveness and naturalness of a wide range of physically feasible adversarial patches. Furthermore, the physically 3D-printed adversarial patches by Face3DAdv can achieve an effective evaluation of adversarial robustness on multiple popular commercial services, including four recognition APIs, three anti-spoofing APIs and one automated access control system.

最近的研究阐明了人脸识别模型对物理对抗补丁的易感性,从而引发了对已部署的人脸识别系统安全性的担忧。然而,大多数现有的针对人脸识别的二维和三维物理攻击,都是利用攻击者单一状态的人脸图像生成对抗范例。在对抗物理环境中的众多复杂状态(如各种姿势变化)时,这种以点带面的攻击范例往往会产生较差的结果。在本文中,通过重新评估攻击者面部及其变化之间的内在关系,我们提出了一种实用的管道,通过三维面部建模模拟物理世界中复杂的面部变化。这种自适应模拟是物理人脸的数字对应物,使我们能够调节各种面部变化和物理条件。利用这种数字模拟器,我们提出了 Face3DAdv 方法来制作三维对抗补丁,这种补丁考虑到了三维面部变化和现实物理变化。此外,通过优化三维建模的潜在空间,并对各种变换进行重要度采样,我们证明 Face3DAdv 可以显著提高各种物理可行对抗补丁的有效性和自然度。此外,Face3DAdv通过物理三维打印的对抗补丁可以在多个流行的商业服务上实现对抗鲁棒性的有效评估,包括四个识别API、三个反欺骗API和一个自动访问控制系统。
{"title":"Face3DAdv: Exploiting Robust Adversarial 3D Patches on Physical Face Recognition","authors":"Xiao Yang, Longlong Xu, Tianyu Pang, Yinpeng Dong, Yikai Wang, Hang Su, Jun Zhu","doi":"10.1007/s11263-024-02177-6","DOIUrl":"https://doi.org/10.1007/s11263-024-02177-6","url":null,"abstract":"<p>Recent research has elucidated the susceptibility of face recognition models to physical adversarial patches, thus provoking security concerns about the deployed face recognition systems. Most existing 2D and 3D physical attacks on face recognition, however, produce adversarial examples using a single-state face image of an attacker. This point-wise attack paradigm tends to yield inferior results when countering numerous complicated states in physical environments, such as diverse pose variations. In this paper, by reassessing the intrinsic relationship between an attacker’s face and its variations, we propose a practical pipeline that simulates complex facial transformations in the physical world through 3D face modeling. This adaptive simulation serves as a digital counterpart of physical faces and empowers us to regulate various facial variations and physical conditions. With this digital simulator, we present the Face3DAdv method to craft 3D adversarial patches, which account for 3D facial transformations and realistic physical variations. Moreover, by optimizing the latent space on 3D modeling and involving importance sampling on various transformations, we demonstrate that Face3DAdv can significantly improve the effectiveness and naturalness of a wide range of physically feasible adversarial patches. Furthermore, the physically 3D-printed adversarial patches by Face3DAdv can achieve an effective evaluation of adversarial robustness on multiple popular commercial services, including four recognition APIs, three anti-spoofing APIs and one automated access control system.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"183 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141769076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kill Two Birds with One Stone: Domain Generalization for Semantic Segmentation via Network Pruning 一石二鸟:通过网络剪枝实现语义分割的领域泛化
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-27 DOI: 10.1007/s11263-024-02194-5
Yawei Luo, Ping Liu, Yi Yang

Deep models are notoriously known to perform poorly when encountering new domains with different statistics. To alleviate this issue, we present a new domain generalization method based on network pruning, dubbed NPDG. Our core idea is to prune the filters or attention heads that are more sensitive to domain shift while preserving those domain-invariant ones. To this end, we propose a new pruning policy tailored to improve generalization ability, which identifies the filter and head sensibility of domain shift by judging its activation variance among different domains (unary manner) and its correlation to other filters (binary manner). To better reveal those potentially sensitive filters and heads, we present a differentiable style perturbation scheme to imitate the domain variance dynamically. NPDG is trained on a single source domain and can be applied to both CNN- and Transformer-based backbones. To our knowledge, we are among the pioneers in tackling domain generalization in segmentation via network pruning. NPDG not only improves the generalization ability of a segmentation model but also decreases its computation cost. Extensive experiments demonstrate the state-of-the-art generalization performance of NPDG with a lighter-weight structure.

众所周知,深度模型在遇到具有不同统计数据的新领域时表现不佳。为了缓解这一问题,我们提出了一种基于网络剪枝的新领域泛化方法,称为 NPDG。我们的核心思想是修剪对领域变化更敏感的过滤器或注意力头,同时保留那些与领域无关的过滤器或注意力头。为此,我们提出了一种为提高泛化能力而量身定制的新剪枝策略,该策略通过判断不同域之间的激活方差(一元方式)及其与其他过滤器的相关性(二元方式)来识别过滤器和注意头对域偏移的敏感性。为了更好地揭示那些潜在的敏感过滤器和磁头,我们提出了一种可变风格扰动方案,以动态模仿域变异。NPDG 在单一源域上进行训练,可应用于基于 CNN 和变换器的骨干网。据我们所知,我们是通过网络剪枝处理分割领域泛化问题的先驱之一。NPDG 不仅能提高分割模型的泛化能力,还能降低其计算成本。广泛的实验证明,NPDG 具有最先进的泛化性能和更轻的结构。
{"title":"Kill Two Birds with One Stone: Domain Generalization for Semantic Segmentation via Network Pruning","authors":"Yawei Luo, Ping Liu, Yi Yang","doi":"10.1007/s11263-024-02194-5","DOIUrl":"https://doi.org/10.1007/s11263-024-02194-5","url":null,"abstract":"<p>Deep models are notoriously known to perform poorly when encountering new domains with different statistics. To alleviate this issue, we present a new domain generalization method based on network pruning, dubbed NPDG. Our core idea is to prune the filters or attention heads that are more sensitive to domain shift while preserving those domain-invariant ones. To this end, we propose a new pruning policy tailored to improve generalization ability, which identifies the filter and head sensibility of domain shift by judging its activation variance among different domains (<i>unary manner</i>) and its correlation to other filters (<i>binary manner</i>). To better reveal those potentially sensitive filters and heads, we present a differentiable style perturbation scheme to imitate the domain variance dynamically. NPDG is trained on a single source domain and can be applied to both CNN- and Transformer-based backbones. To our knowledge, we are among the pioneers in tackling domain generalization in segmentation via network pruning. NPDG not only improves the generalization ability of a segmentation model but also decreases its computation cost. Extensive experiments demonstrate the state-of-the-art generalization performance of NPDG with a lighter-weight structure.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"56 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Rate Curriculum 学习率课程
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-27 DOI: 10.1007/s11263-024-02186-5
Florinel-Alin Croitoru, Nicolae-Cătălin Ristea, Radu Tudor Ionescu, Nicu Sebe

Most curriculum learning methods require an approach to sort the data samples by difficulty, which is often cumbersome to perform. In this work, we propose a novel curriculum learning approach termed Learning Rate Curriculum (LeRaC), which leverages the use of a different learning rate for each layer of a neural network to create a data-agnostic curriculum during the initial training epochs. More specifically, LeRaC assigns higher learning rates to neural layers closer to the input, gradually decreasing the learning rates as the layers are placed farther away from the input. The learning rates increase at various paces during the first training iterations, until they all reach the same value. From this point on, the neural model is trained as usual. This creates a model-level curriculum learning strategy that does not require sorting the examples by difficulty and is compatible with any neural network, generating higher performance levels regardless of the architecture. We conduct comprehensive experiments on 12 data sets from the computer vision (CIFAR-10, CIFAR-100, Tiny ImageNet, ImageNet-1K, Food-101, UTKFace, PASCAL VOC), language (BoolQ, QNLI, RTE) and audio (ESC-50, CREMA-D) domains, considering various convolutional (ResNet-18, Wide-ResNet-50, DenseNet-121, YOLOv5), recurrent (LSTM) and transformer (CvT, BERT, SepTr) architectures. We compare our approach with the conventional training regime, as well as with Curriculum by Smoothing (CBS), a state-of-the-art data-agnostic curriculum learning approach. Unlike CBS, our performance improvements over the standard training regime are consistent across all data sets and models. Furthermore, we significantly surpass CBS in terms of training time (there is no additional cost over the standard training regime for LeRaC). Our code is freely available at: https://github.com/CroitoruAlin/LeRaC.

大多数课程学习方法都需要按照难度对数据样本进行排序,而这种排序方法往往非常繁琐。在这项工作中,我们提出了一种名为 "学习率课程"(LeRaC)的新颖课程学习方法,该方法利用神经网络每层不同的学习率,在初始训练历时中创建与数据无关的课程。更具体地说,LeRaC 为距离输入较近的神经层分配较高的学习率,当神经层距离输入较远时,学习率逐渐降低。在最初的训练迭代中,学习率以不同的速度增加,直到它们都达到相同的值。从这时起,神经模型的训练与往常一样。这就创建了一种模型级课程学习策略,它不需要按难度对示例进行排序,而且与任何神经网络兼容,无论采用何种架构,都能产生更高的性能水平。我们在计算机视觉(CIFAR-10、CIFAR-100、Tiny ImageNet、ImageNet-1K、Food-101、UTKFace、PASCAL VOC)、语言(BoolQ、QNLI、RTE)和音频(ESC-50、CREMA-D)领域,考虑了各种卷积(ResNet-18、Wide-ResNet-50、DenseNet-121、YOLOv5)、递归(LSTM)和变换器(CvT、BERT、SepTr)架构。我们将我们的方法与传统的训练方法以及最先进的数据无关课程学习方法 "平滑课程"(CBS)进行了比较。与 CBS 不同的是,在所有数据集和模型中,我们对标准训练方法的性能改进都是一致的。此外,在训练时间方面,我们大大超过了 CBS(LeRaC 与标准训练机制相比没有额外成本)。我们的代码可在 https://github.com/CroitoruAlin/LeRaC 免费获取。
{"title":"Learning Rate Curriculum","authors":"Florinel-Alin Croitoru, Nicolae-Cătălin Ristea, Radu Tudor Ionescu, Nicu Sebe","doi":"10.1007/s11263-024-02186-5","DOIUrl":"https://doi.org/10.1007/s11263-024-02186-5","url":null,"abstract":"<p>Most curriculum learning methods require an approach to sort the data samples by difficulty, which is often cumbersome to perform. In this work, we propose a novel curriculum learning approach termed Learning Rate Curriculum (LeRaC), which leverages the use of a different learning rate for each layer of a neural network to create a data-agnostic curriculum during the initial training epochs. More specifically, LeRaC assigns higher learning rates to neural layers closer to the input, gradually decreasing the learning rates as the layers are placed farther away from the input. The learning rates increase at various paces during the first training iterations, until they all reach the same value. From this point on, the neural model is trained as usual. This creates a model-level curriculum learning strategy that does not require sorting the examples by difficulty and is compatible with any neural network, generating higher performance levels regardless of the architecture. We conduct comprehensive experiments on 12 data sets from the computer vision (CIFAR-10, CIFAR-100, Tiny ImageNet, ImageNet-1K, Food-101, UTKFace, PASCAL VOC), language (BoolQ, QNLI, RTE) and audio (ESC-50, CREMA-D) domains, considering various convolutional (ResNet-18, Wide-ResNet-50, DenseNet-121, YOLOv5), recurrent (LSTM) and transformer (CvT, BERT, SepTr) architectures. We compare our approach with the conventional training regime, as well as with Curriculum by Smoothing (CBS), a state-of-the-art data-agnostic curriculum learning approach. Unlike CBS, our performance improvements over the standard training regime are consistent across all data sets and models. Furthermore, we significantly surpass CBS in terms of training time (there is no additional cost over the standard training regime for LeRaC). Our code is freely available at: https://github.com/CroitoruAlin/LeRaC.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"302 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Position-Guided Point Cloud Panoptic Segmentation Transformer 位置引导点云全景分割变换器
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-27 DOI: 10.1007/s11263-024-02162-z
Zeqi Xiao, Wenwei Zhang, Tai Wang, Chen Change Loy, Dahua Lin, Jiangmiao Pang

DEtection TRansformer (DETR) started a trend that uses a group of learnable queries for unified visual perception. This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline. Although the naive adaptation obtains fair results, the instance segmentation performance is noticeably inferior to previous works. By diving into the details, we observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain. Considering instances in 3D are more featured by their positional information, we emphasize their roles during the modeling and design a robust Mixed-parameterized Positional Embedding (MPE) to guide the segmentation process. It is embedded into backbone features and later guides the mask prediction and query update processes iteratively, leading to Position-Aware Segmentation (PA-Seg) and Masked Focal Attention (MFA). All these designs impel the queries to attend to specific regions and identify various instances. The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 2.7% and 1.2% PQ on SemanticKITTI and nuScenes datasets, respectively. The source code and models are available at https://github.com/OpenRobotLab/P3Former.

DEtection TRansformer(DETR)开创了使用一组可学习查询来统一视觉感知的潮流。这项工作首先将这一吸引人的范例应用于基于激光雷达的点云分割,并获得了一个简单而有效的基线。尽管天真适应取得了不错的结果,但实例分割性能明显不如以前的作品。通过深入研究细节,我们发现稀疏点云中的实例相对于整个场景较小,通常具有相似的几何形状,但缺乏用于分割的独特外观,这在图像领域非常罕见。考虑到三维实例的位置信息更具特色,我们在建模过程中强调了位置信息的作用,并设计了一种稳健的混合参数化位置嵌入(MPE)来指导分割过程。MPE 被嵌入到骨干特征中,随后迭代地指导掩码预测和查询更新过程,从而实现位置感知分割(PA-Seg)和掩码焦点关注(MFA)。所有这些设计都会促使查询关注特定区域并识别各种实例。该方法被命名为 "位置引导点云泛光分割转换器"(P3Former),在SemanticKITTI和nuScenes数据集上的PQ分别比以前的先进方法高出2.7%和1.2%。源代码和模型见 https://github.com/OpenRobotLab/P3Former。
{"title":"Position-Guided Point Cloud Panoptic Segmentation Transformer","authors":"Zeqi Xiao, Wenwei Zhang, Tai Wang, Chen Change Loy, Dahua Lin, Jiangmiao Pang","doi":"10.1007/s11263-024-02162-z","DOIUrl":"https://doi.org/10.1007/s11263-024-02162-z","url":null,"abstract":"<p>DEtection TRansformer (DETR) started a trend that uses a group of learnable queries for unified visual perception. This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline. Although the naive adaptation obtains fair results, the instance segmentation performance is noticeably inferior to previous works. By diving into the details, we observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain. Considering instances in 3D are more featured by their positional information, we emphasize their roles during the modeling and design a robust Mixed-parameterized Positional Embedding (MPE) to guide the segmentation process. It is embedded into backbone features and later guides the mask prediction and query update processes iteratively, leading to Position-Aware Segmentation (PA-Seg) and Masked Focal Attention (MFA). All these designs impel the queries to attend to specific regions and identify various instances. The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 2.7% and 1.2% PQ on SemanticKITTI and nuScenes datasets, respectively. The source code and models are available at https://github.com/OpenRobotLab/P3Former.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"56 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-label Classification Top-K 配对排序:弥合基于排序的多标签分类方法之间的差距
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-26 DOI: 10.1007/s11263-024-02157-w
Zitai Wang, Qianqian Xu, Zhiyong Yang, Peisong Wen, Yuan He, Xiaochun Cao, Qingming Huang

Multi-label ranking, which returns multiple top-ranked labels for each instance, has a wide range of applications for visual tasks. Due to its complicated setting, prior arts have proposed various measures to evaluate model performances. However, both theoretical analysis and empirical observations show that a model might perform inconsistently on different measures. To bridge this gap, this paper proposes a novel measure named Top-K Pairwise Ranking (TKPR), and a series of analyses show that TKPR is compatible with existing ranking-based measures. In light of this, we further establish an empirical surrogate risk minimization framework for TKPR. On one hand, the proposed framework enjoys convex surrogate losses with the theoretical support of Fisher consistency. On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction. Finally, empirical results on benchmark datasets validate the effectiveness of the proposed framework.

多标签排名是指为每个实例返回多个排名靠前的标签,在视觉任务中有着广泛的应用。由于其设置复杂,先前的研究提出了各种评估模型性能的方法。然而,理论分析和经验观察都表明,一个模型在不同的衡量标准上可能表现不一致。为了弥补这一缺陷,本文提出了一种名为 "Top-K Pairwise Ranking"(TKPR)的新测量方法,一系列分析表明 TKPR 与现有的基于排名的测量方法是兼容的。有鉴于此,我们进一步为 TKPR 建立了一个经验代用风险最小化框架。一方面,所提出的框架在费雪一致性的理论支持下享有凸代理损失。另一方面,我们基于一种名为 "数据依赖收缩 "的新技术,为所提出的框架建立了一个尖锐的泛化边界。最后,基准数据集上的经验结果验证了所提框架的有效性。
{"title":"Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-label Classification","authors":"Zitai Wang, Qianqian Xu, Zhiyong Yang, Peisong Wen, Yuan He, Xiaochun Cao, Qingming Huang","doi":"10.1007/s11263-024-02157-w","DOIUrl":"https://doi.org/10.1007/s11263-024-02157-w","url":null,"abstract":"<p>Multi-label ranking, which returns multiple top-ranked labels for each instance, has a wide range of applications for visual tasks. Due to its complicated setting, prior arts have proposed various measures to evaluate model performances. However, both theoretical analysis and empirical observations show that a model might perform inconsistently on different measures. To bridge this gap, this paper proposes a novel measure named Top-K Pairwise Ranking (TKPR), and a series of analyses show that TKPR is compatible with existing ranking-based measures. In light of this, we further establish an empirical surrogate risk minimization framework for TKPR. On one hand, the proposed framework enjoys convex surrogate losses with the theoretical support of Fisher consistency. On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction. Finally, empirical results on benchmark datasets validate the effectiveness of the proposed framework.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"52 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge Distillation Meets Open-Set Semi-supervised Learning 知识提炼与开放集半监督学习相结合
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-26 DOI: 10.1007/s11263-024-02192-7
Jing Yang, Xiatian Zhu, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

Existing knowledge distillation methods mostly focus on distillation of teacher’s prediction and intermediate activation. However, the structured representation, which arguably is one of the most critical ingredients of deep models, is largely overlooked. In this work, we propose a novel semantic representational distillation (SRD) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student. The key idea is that we leverage the teacher’s classifier as a semantic critic for evaluating the representations of both teacher and student and distilling the semantic knowledge with high-order structured information over all feature dimensions. This is accomplished by introducing a notion of cross-network logit computed through passing student’s representation into teacher’s classifier. Further, considering the set of seen classes as a basis for the semantic space in a combinatorial perspective, we scale SRD to unseen classes for enabling effective exploitation of largely available, arbitrary unlabeled training data. At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL). Extensive experiments show that our SRD outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks, as well as less studied yet practically crucial binary network distillation. Under more realistic open-set SSL settings we introduce, we reveal that knowledge distillation is generally more effective than existing out-of-distribution sample detection, and our proposed SRD is superior over both previous distillation and SSL competitors. The source code is available at https://github.com/jingyang2017/SRD_ossl.

现有的知识提炼方法大多侧重于教师预测和中间激活的提炼。然而,可以说是深度模型最关键要素之一的结构化表征却在很大程度上被忽视了。在这项工作中,我们提出了一种新颖的语义表征提炼(SRD)方法,专用于从预训练教师到目标学生的语义表征知识提炼。其关键思路是,我们利用教师的分类器作为语义批评者,对教师和学生的表征进行评估,并在所有特征维度上提炼出具有高阶结构化信息的语义知识。为此,我们引入了跨网络对数的概念,通过将学生的表征传递给教师的分类器来计算。此外,考虑到已见类别集是组合视角下语义空间的基础,我们将 SRD 扩展到未见类别,以便有效利用大量可用的任意无标记训练数据。在问题层面,这在知识提炼与开放集半监督学习(SSL)之间建立了有趣的联系。广泛的实验表明,在粗略的物体分类和精细的人脸识别任务上,我们的知识提炼方法大大优于之前最先进的知识提炼方法,同时也优于研究较少但实际上至关重要的二进制网络提炼方法。在我们引入的更现实的开放式 SSL 设置下,我们发现知识蒸馏通常比现有的分布外样本检测更有效,而我们提出的 SRD 则优于以前的蒸馏和 SSL 竞争对手。源代码见 https://github.com/jingyang2017/SRD_ossl。
{"title":"Knowledge Distillation Meets Open-Set Semi-supervised Learning","authors":"Jing Yang, Xiatian Zhu, Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos","doi":"10.1007/s11263-024-02192-7","DOIUrl":"https://doi.org/10.1007/s11263-024-02192-7","url":null,"abstract":"<p>Existing knowledge distillation methods mostly focus on distillation of teacher’s prediction and intermediate activation. However, the structured representation, which arguably is one of the most critical ingredients of deep models, is largely overlooked. In this work, we propose a novel semantic representational distillation (SRD) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student. The key idea is that we leverage the teacher’s classifier as a semantic critic for evaluating the representations of both teacher and student and distilling the semantic knowledge with high-order structured information over all feature dimensions. This is accomplished by introducing a notion of cross-network logit computed through passing student’s representation into teacher’s classifier. Further, considering the set of seen classes as a basis for the semantic space in a combinatorial perspective, we scale SRD to unseen classes for enabling effective exploitation of largely available, arbitrary unlabeled training data. At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL). Extensive experiments show that our SRD outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks, as well as less studied yet practically crucial binary network distillation. Under more realistic open-set SSL settings we introduce, we reveal that knowledge distillation is generally more effective than existing out-of-distribution sample detection, and our proposed SRD is superior over both previous distillation and SSL competitors. The source code is available at https://github.com/jingyang2017/SRD_ossl.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"41 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ESceme: Vision-and-Language Navigation with Episodic Scene Memory ESceme:利用外显场景记忆进行视觉语言导航
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-26 DOI: 10.1007/s11263-024-02159-8
Qi Zheng, Daqing Liu, Chaoyue Wang, Jing Zhang, Dadong Wang, Dacheng Tao

Vision-and-language navigation (VLN) simulates a visual agent that follows natural-language navigation instructions in real-world scenes. Existing approaches have made enormous progress in navigation in new environments, such as beam search, pre-exploration, and dynamic or hierarchical history encoding. To balance generalization and efficiency, we resort to memorizing visited scenarios apart from the ongoing route while navigating. In this work, we introduce a mechanism of Episodic Scene memory (ESceme) for VLN that wakes an agent’s memories of past visits when it enters the current scene. The episodic scene memory allows the agent to envision a bigger picture of the next prediction. This way, the agent learns to utilize dynamically updated information instead of merely adapting to the current observations. We provide a simple yet effective implementation of ESceme by enhancing the accessible views at each location and progressively completing the memory while navigating. We verify the superiority of ESceme on short-horizon (R2R), long-horizon (R4R), and vision-and-dialog (CVDN) VLN tasks. Our ESceme also wins first place on the CVDN leaderboard. Code is available: https://github.com/qizhust/esceme.

视觉语言导航(VLN)模拟的是在真实世界场景中遵循自然语言导航指令的视觉代理。现有方法在新环境导航方面取得了巨大进步,如光束搜索、预探索和动态或分层历史编码。为了在通用性和效率之间取得平衡,我们在导航过程中除了记忆当前路线外,还需要记忆访问过的场景。在这项工作中,我们为 VLN 引入了一种外显场景记忆机制(ESceme),它能在代理进入当前场景时唤醒其对过去访问的记忆。外显场景记忆允许代理设想下一个预测的大画面。这样,代理就能学会利用动态更新的信息,而不仅仅是适应当前的观察结果。我们提供了一种简单而有效的 ESceme 实现方法,即增强每个位置的可访问视图,并在导航时逐步完成记忆。我们验证了 ESceme 在短视距 (R2R)、长视距 (R4R) 和视觉与对话 (CVDN) VLN 任务中的优越性。我们的 ESceme 还赢得了 CVDN 排行榜第一名。代码见:https://github.com/qizhust/esceme。
{"title":"ESceme: Vision-and-Language Navigation with Episodic Scene Memory","authors":"Qi Zheng, Daqing Liu, Chaoyue Wang, Jing Zhang, Dadong Wang, Dacheng Tao","doi":"10.1007/s11263-024-02159-8","DOIUrl":"https://doi.org/10.1007/s11263-024-02159-8","url":null,"abstract":"<p>Vision-and-language navigation (VLN) simulates a visual agent that follows natural-language navigation instructions in real-world scenes. Existing approaches have made enormous progress in navigation in new environments, such as beam search, pre-exploration, and dynamic or hierarchical history encoding. To balance generalization and efficiency, we resort to memorizing visited scenarios apart from the ongoing route while navigating. In this work, we introduce a mechanism of Episodic Scene memory (ESceme) for VLN that wakes an agent’s memories of past visits when it enters the current scene. The episodic scene memory allows the agent to envision a bigger picture of the next prediction. This way, the agent learns to utilize dynamically updated information instead of merely adapting to the current observations. We provide a simple yet effective implementation of ESceme by enhancing the accessible views at each location and progressively completing the memory while navigating. We verify the superiority of ESceme on short-horizon (R2R), long-horizon (R4R), and vision-and-dialog (CVDN) VLN tasks. Our ESceme also wins first place on the CVDN leaderboard. Code is available: https://github.com/qizhust/esceme.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"56 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Unrolled Weighted Graph Laplacian Regularization for Depth Completion 深度补全的深度非卷积加权图拉普拉卡正则化
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-25 DOI: 10.1007/s11263-024-02188-3
Jin Zeng, Qingpeng Zhu, Tongxuan Tian, Wenxiu Sun, Lin Zhang, Shengjie Zhao

Depth completion aims to estimate dense depth images from sparse depth measurements with RGB image guidance. However, previous approaches have not fully considered sparse input fidelity, resulting in inconsistency with sparse input and poor robustness to input corruption. In this paper, we propose the deep unrolled Weighted Graph Laplacian Regularization (WGLR) for depth completion which enhances input fidelity and noise robustness by enforcing input constraints in the network design. Specifically, we assume graph Laplacian regularization as the prior for depth completion optimization and derive the WGLR solution by interpreting the depth map as the discrete counterpart of continuous manifold, enabling analysis in continuous domain and enforcing input consistency. Based on its anisotropic diffusion interpretation, we unroll the WGLR solution into iterative filtering for efficient implementation. Furthermore, we integrate the unrolled WGLR into deep learning framework to develop high-performance yet interpretable network, which diffuses the depth in a hierarchical manner to ensure global smoothness while preserving visually salient details. Experimental results demonstrate that the proposed scheme improves consistency with depth measurements and robustness to input corruption for depth completion, outperforming competing schemes on the NYUv2, KITTI-DC and TetrasRGBD datasets.

深度补全的目的是在 RGB 图像的引导下,从稀疏的深度测量结果中估算出稠密的深度图像。然而,以前的方法没有充分考虑稀疏输入的保真度,导致稀疏输入不一致,对输入损坏的鲁棒性差。在本文中,我们提出了用于深度补全的深度非卷积加权图拉普拉卡正则化(WGLR),通过在网络设计中强制输入约束来增强输入保真度和噪声鲁棒性。具体来说,我们将图拉普拉卡正则化假定为深度补全优化的先验,并通过将深度图解释为连续流形的离散对应物来推导 WGLR 解决方案,从而实现连续域分析并强制输入一致性。基于其各向异性扩散解释,我们将 WGLR 解法展开为迭代滤波,以便高效实现。此外,我们将展开的 WGLR 集成到深度学习框架中,开发出高性能且可解释的网络,以分层方式扩散深度,确保全局平滑,同时保留视觉上的突出细节。实验结果表明,在 NYUv2、KITTI-DC 和 TetrasRGBD 数据集上,所提出的方案提高了深度测量的一致性和对输入损坏的鲁棒性,优于其他竞争方案。
{"title":"Deep Unrolled Weighted Graph Laplacian Regularization for Depth Completion","authors":"Jin Zeng, Qingpeng Zhu, Tongxuan Tian, Wenxiu Sun, Lin Zhang, Shengjie Zhao","doi":"10.1007/s11263-024-02188-3","DOIUrl":"https://doi.org/10.1007/s11263-024-02188-3","url":null,"abstract":"<p>Depth completion aims to estimate dense depth images from sparse depth measurements with RGB image guidance. However, previous approaches have not fully considered sparse input fidelity, resulting in inconsistency with sparse input and poor robustness to input corruption. In this paper, we propose the deep unrolled Weighted Graph Laplacian Regularization (WGLR) for depth completion which enhances input fidelity and noise robustness by enforcing input constraints in the network design. Specifically, we assume graph Laplacian regularization as the prior for depth completion optimization and derive the WGLR solution by interpreting the depth map as the discrete counterpart of continuous manifold, enabling analysis in continuous domain and enforcing input consistency. Based on its anisotropic diffusion interpretation, we unroll the WGLR solution into iterative filtering for efficient implementation. Furthermore, we integrate the unrolled WGLR into deep learning framework to develop high-performance yet interpretable network, which diffuses the depth in a hierarchical manner to ensure global smoothness while preserving visually salient details. Experimental results demonstrate that the proposed scheme improves consistency with depth measurements and robustness to input corruption for depth completion, outperforming competing schemes on the NYUv2, KITTI-DC and TetrasRGBD datasets.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"21 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141755439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1