首页 > 最新文献

International Journal of Computer Vision最新文献

英文 中文
Weighted Joint Distribution Optimal Transport Based Domain Adaptation for Cross-Scenario Face Anti-Spoofing 基于加权联合分布优化传输的跨场景人脸防欺骗领域自适应技术
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-11 DOI: 10.1007/s11263-024-02178-5
Shiyun Mao, Ruolin Chen, Huibin Li

Unsupervised domain adaptation-based face anti-spoofing methods have attracted more and more attention due to their promising generalization abilities. To mitigate domain bias, existing methods generally attempt to align the marginal distributions of samples from source and target domains. However, the label and pseudo-label information of the samples from source and target domains are ignored. To solve this problem, this paper proposes a Weighted Joint Distribution Optimal Transport unsupervised multi-source domain adaptation method for cross-scenario face anti-spoofing (WJDOT-FAS). WJDOT-FAS consists of three modules: joint distribution estimation, joint distribution optimal transport, and domain weight optimization. Specifically, the joint distributions of the features and pseudo labels of multi-source and target domains are firstly estimated based on a pre-trained feature extractor and a randomly initialized classifier. Then, we compute the cost matrices and the optimal transportation mappings from the joint distributions related to each source domain and the target domain by solving Lp-L1 optimal transport problems. Finally, based on the loss functions of different source domains, the target domain, and the optimal transportation losses from each source domain to the target domain, we can estimate the weights of each source domain, and meanwhile, the parameters of the feature extractor and classifier are also updated. All the learnable parameters and the computations of the three modules are updated alternatively. Extensive experimental results on four widely used 2D attack datasets and three recently published 3D attack datasets under both single- and multi-source domain adaptation settings (including both close-set and open-set) show the advantages of our proposed method for cross-scenario face anti-spoofing.

基于无监督域适应的人脸防欺骗方法因其良好的泛化能力而受到越来越多的关注。为了减轻域偏差,现有方法一般都尝试对齐源域和目标域样本的边际分布。然而,源域和目标域样本的标签和伪标签信息却被忽略了。为了解决这个问题,本文提出了一种用于跨场景人脸防欺骗的加权联合分布优化传输无监督多源域适应方法(WJDOT-FAS)。WJDOT-FAS 包括三个模块:联合分布估计、联合分布优化传输和域权重优化。具体来说,首先基于预训练的特征提取器和随机初始化的分类器,估计多源域和目标域的特征和伪标签的联合分布。然后,我们通过求解 Lp-L1 最佳传输问题,根据与每个源域和目标域相关的联合分布计算成本矩阵和最佳传输映射。最后,根据不同源域、目标域的损失函数,以及从每个源域到目标域的最优传输损失,我们可以估算出每个源域的权重,同时,特征提取器和分类器的参数也会随之更新。所有可学习的参数和三个模块的计算都是交替更新的。在单源域和多源域自适应设置(包括闭集和开集)下,对四个广泛使用的二维攻击数据集和三个最新发布的三维攻击数据集进行的大量实验结果表明了我们提出的方法在跨场景人脸防欺骗方面的优势。
{"title":"Weighted Joint Distribution Optimal Transport Based Domain Adaptation for Cross-Scenario Face Anti-Spoofing","authors":"Shiyun Mao, Ruolin Chen, Huibin Li","doi":"10.1007/s11263-024-02178-5","DOIUrl":"https://doi.org/10.1007/s11263-024-02178-5","url":null,"abstract":"<p>Unsupervised domain adaptation-based face anti-spoofing methods have attracted more and more attention due to their promising generalization abilities. To mitigate domain bias, existing methods generally attempt to align the marginal distributions of samples from source and target domains. However, the label and pseudo-label information of the samples from source and target domains are ignored. To solve this problem, this paper proposes a Weighted Joint Distribution Optimal Transport unsupervised multi-source domain adaptation method for cross-scenario face anti-spoofing (WJDOT-FAS). WJDOT-FAS consists of three modules: joint distribution estimation, joint distribution optimal transport, and domain weight optimization. Specifically, the joint distributions of the features and pseudo labels of multi-source and target domains are firstly estimated based on a pre-trained feature extractor and a randomly initialized classifier. Then, we compute the cost matrices and the optimal transportation mappings from the joint distributions related to each source domain and the target domain by solving Lp-L1 optimal transport problems. Finally, based on the loss functions of different source domains, the target domain, and the optimal transportation losses from each source domain to the target domain, we can estimate the weights of each source domain, and meanwhile, the parameters of the feature extractor and classifier are also updated. All the learnable parameters and the computations of the three modules are updated alternatively. Extensive experimental results on four widely used 2D attack datasets and three recently published 3D attack datasets under both single- and multi-source domain adaptation settings (including both close-set and open-set) show the advantages of our proposed method for cross-scenario face anti-spoofing.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"1 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141915114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SplitNet: Learnable Clean-Noisy Label Splitting for Learning with Noisy Labels SplitNet:用噪声标签学习时的可学习清洁噪声标签分割
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-09 DOI: 10.1007/s11263-024-02187-4
Daehwan Kim, Kwangrok Ryoo, Hansang Cho, Seungryong Kim

Annotating the dataset with high-quality labels is crucial for deep networks’ performance, but in real-world scenarios, the labels are often contaminated by noise. To address this, some methods were recently proposed to automatically split clean and noisy labels among training data, and learn a semi-supervised learner in a Learning with Noisy Labels (LNL) framework. However, they leverage a handcrafted module for clean-noisy label splitting, which induces a confirmation bias in the semi-supervised learning phase and limits the performance. In this paper, for the first time, we present a learnable module for clean-noisy label splitting, dubbed SplitNet, and a novel LNL framework which complementarily trains the SplitNet and main network for the LNL task. We also propose to use a dynamic threshold based on split confidence by SplitNet to optimize the semi-supervised learner better. To enhance SplitNet training, we further present a risk hedging method. Our proposed method performs at a state-of-the-art level, especially in high noise ratio settings on various LNL benchmarks.

为数据集添加高质量的标签对深度网络的性能至关重要,但在现实世界中,标签往往会受到噪声的污染。为了解决这个问题,最近有人提出了一些方法来自动分割训练数据中的干净标签和噪声标签,并在噪声标签学习(LNL)框架中学习半监督学习器。然而,这些方法利用手工制作的模块来分割干净标签和噪声标签,这会在半监督学习阶段产生确认偏差,从而限制学习效果。在本文中,我们首次提出了一种用于清噪声标签分割的可学习模块(称为 SplitNet),以及一种新型 LNL 框架,该框架可针对 LNL 任务对 SplitNet 和主网络进行互补训练。我们还建议使用基于 SplitNet 拆分置信度的动态阈值,以更好地优化半监督学习器。为了加强 SplitNet 的训练,我们进一步提出了一种风险对冲方法。我们提出的方法达到了最先进的水平,尤其是在各种 LNL 基准的高噪声比设置下。
{"title":"SplitNet: Learnable Clean-Noisy Label Splitting for Learning with Noisy Labels","authors":"Daehwan Kim, Kwangrok Ryoo, Hansang Cho, Seungryong Kim","doi":"10.1007/s11263-024-02187-4","DOIUrl":"https://doi.org/10.1007/s11263-024-02187-4","url":null,"abstract":"<p>Annotating the dataset with high-quality labels is crucial for deep networks’ performance, but in real-world scenarios, the labels are often contaminated by noise. To address this, some methods were recently proposed to automatically split clean and noisy labels among training data, and learn a semi-supervised learner in a Learning with Noisy Labels (LNL) framework. However, they leverage a handcrafted module for clean-noisy label splitting, which induces a confirmation bias in the semi-supervised learning phase and limits the performance. In this paper, for the first time, we present a learnable module for clean-noisy label splitting, dubbed SplitNet, and a novel LNL framework which complementarily trains the SplitNet and main network for the LNL task. We also propose to use a dynamic threshold based on split confidence by SplitNet to optimize the semi-supervised learner better. To enhance SplitNet training, we further present a risk hedging method. Our proposed method performs at a state-of-the-art level, especially in high noise ratio settings on various LNL benchmarks.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"303 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking 图像分类模型鲁棒性综合研究:基准测试与反思
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-09 DOI: 10.1007/s11263-024-02196-3
Chang Liu, Yinpeng Dong, Wenzhao Xiang, Xiao Yang, Hang Su, Jun Zhu, Yuefeng Chen, Yuan He, Hui Xue, Shibao Zheng

The robustness of deep neural networks is frequently compromised when faced with adversarial examples, common corruptions, and distribution shifts, posing a significant research challenge in the advancement of deep learning. Although new deep learning methods and robustness improvement techniques have been constantly proposed, the robustness evaluations of existing methods are often inadequate due to their rapid development, diverse noise patterns, and simple evaluation metrics. Without thorough robustness evaluations, it is hard to understand the advances in the field and identify the effective methods. In this paper, we establish a comprehensive robustness benchmark called ARES-Bench on the image classification task. In our benchmark, we evaluate the robustness of 61 typical deep learning models on ImageNet with diverse architectures (e.g., CNNs, Transformers) and learning algorithms (e.g., normal supervised training, pre-training, adversarial training) under numerous adversarial attacks and out-of-distribution (OOD) datasets. Using robustness curves as the major evaluation criteria, we conduct large-scale experiments and draw several important findings, including: (1) there exists an intrinsic trade-off between the adversarial and natural robustness of specific noise types for the same model architecture; (2) adversarial training effectively improves adversarial robustness, especially when performed on Transformer architectures; (3) pre-training significantly enhances natural robustness by leveraging larger training datasets, incorporating multi-modal data, or employing self-supervised learning techniques. Based on ARES-Bench, we further analyze the training tricks in large-scale adversarial training on ImageNet. Through tailored training settings, we achieve a new state-of-the-art in adversarial robustness. We have made the benchmarking results and code platform publicly available.

深度神经网络在面对对抗性示例、常见破坏和分布偏移时,其鲁棒性经常会受到影响,这给深度学习的发展带来了巨大的研究挑战。虽然新的深度学习方法和鲁棒性改进技术不断被提出,但由于其发展迅速、噪声模式多样、评估指标简单,对现有方法的鲁棒性评估往往不够充分。如果不进行全面的鲁棒性评估,就很难了解该领域的进展并找出有效的方法。在本文中,我们针对图像分类任务建立了一个名为 ARES-Bench 的综合鲁棒性基准。在我们的基准中,我们评估了 61 个典型深度学习模型在 ImageNet 上的鲁棒性,这些模型具有不同的架构(如 CNN、Transformers)和学习算法(如正常监督训练、预训练、对抗训练),在众多对抗性攻击和分布外(OOD)数据集下具有不同的鲁棒性。我们使用鲁棒性曲线作为主要评估标准,进行了大规模实验,并得出了几个重要发现,包括:(1)对于同一模型架构,特定噪声类型的对抗鲁棒性和自然鲁棒性之间存在内在权衡;(2)对抗训练可有效提高对抗鲁棒性,尤其是在变形器架构上执行时;(3)通过利用更大的训练数据集、结合多模态数据或采用自监督学习技术,预训练可显著提高自然鲁棒性。基于 ARES-Bench,我们进一步分析了在 ImageNet 上进行大规模对抗训练时的训练技巧。通过量身定制的训练设置,我们在对抗鲁棒性方面达到了新的先进水平。我们公开了基准测试结果和代码平台。
{"title":"A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking","authors":"Chang Liu, Yinpeng Dong, Wenzhao Xiang, Xiao Yang, Hang Su, Jun Zhu, Yuefeng Chen, Yuan He, Hui Xue, Shibao Zheng","doi":"10.1007/s11263-024-02196-3","DOIUrl":"https://doi.org/10.1007/s11263-024-02196-3","url":null,"abstract":"<p>The robustness of deep neural networks is frequently compromised when faced with adversarial examples, common corruptions, and distribution shifts, posing a significant research challenge in the advancement of deep learning. Although new deep learning methods and robustness improvement techniques have been constantly proposed, the robustness evaluations of existing methods are often inadequate due to their rapid development, diverse noise patterns, and simple evaluation metrics. Without thorough robustness evaluations, it is hard to understand the advances in the field and identify the effective methods. In this paper, we establish a comprehensive robustness benchmark called <b>ARES-Bench</b> on the image classification task. In our benchmark, we evaluate the robustness of 61 typical deep learning models on ImageNet with diverse architectures (e.g., CNNs, Transformers) and learning algorithms (e.g., normal supervised training, pre-training, adversarial training) under numerous adversarial attacks and out-of-distribution (OOD) datasets. Using robustness curves as the major evaluation criteria, we conduct large-scale experiments and draw several important findings, including: (1) there exists an intrinsic trade-off between the adversarial and natural robustness of specific noise types for the same model architecture; (2) adversarial training effectively improves adversarial robustness, especially when performed on Transformer architectures; (3) pre-training significantly enhances natural robustness by leveraging larger training datasets, incorporating multi-modal data, or employing self-supervised learning techniques. Based on ARES-Bench, we further analyze the training tricks in large-scale adversarial training on ImageNet. Through tailored training settings, we achieve a new state-of-the-art in adversarial robustness. We have made the benchmarking results and code platform publicly available.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"55 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel Class Discovery Meets Foundation Models for 3D Semantic Segmentation 新颖的类别发现与三维语义分割的基础模型相结合
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-07 DOI: 10.1007/s11263-024-02180-x
Luigi Riz, Cristiano Saltori, Yiming Wang, Elisa Ricci, Fabio Poiesi

The task of Novel Class Discovery (NCD) in semantic segmentation involves training a model to accurately segment unlabelled (novel) classes, using the supervision available from annotated (base) classes. The NCD task within the 3D point cloud domain is novel, and it is characterised by assumptions and challenges absent in its 2D counterpart. This paper advances the analysis of point cloud data in four directions. Firstly, it introduces the novel task of NCD for point cloud semantic segmentation. Secondly, it demonstrates that directly applying an existing NCD method for 2D image semantic segmentation to 3D data yields limited results. Thirdly, it presents a new NCD approach based on online clustering, uncertainty estimation, and semantic distillation. Lastly, it proposes a novel evaluation protocol to rigorously assess the performance of NCD in point cloud semantic segmentation. Through comprehensive evaluations on the SemanticKITTI, SemanticPOSS, and S3DIS datasets, our approach show superior performance compared to the considered baselines.

语义分割中的 "新类别发现"(NCD)任务涉及训练一个模型,以便利用已注释(基础)类别提供的监督来准确分割未标注的(新)类别。三维点云领域中的 NCD 任务是一项新任务,其特点是具有二维任务中所没有的假设和挑战。本文从四个方面推进了点云数据分析。首先,本文介绍了用于点云语义分割的新颖 NCD 任务。其次,它证明了将现有的二维图像语义分割 NCD 方法直接应用于三维数据会产生有限的结果。第三,它提出了一种基于在线聚类、不确定性估计和语义提炼的新 NCD 方法。最后,它提出了一种新的评估协议,用于严格评估 NCD 在点云语义分割中的性能。通过在 SemanticKITTI、SemanticPOSS 和 S3DIS 数据集上的全面评估,我们的方法与所考虑的基线相比表现出了更优越的性能。
{"title":"Novel Class Discovery Meets Foundation Models for 3D Semantic Segmentation","authors":"Luigi Riz, Cristiano Saltori, Yiming Wang, Elisa Ricci, Fabio Poiesi","doi":"10.1007/s11263-024-02180-x","DOIUrl":"https://doi.org/10.1007/s11263-024-02180-x","url":null,"abstract":"<p>The task of Novel Class Discovery (NCD) in semantic segmentation involves training a model to accurately segment unlabelled (novel) classes, using the supervision available from annotated (base) classes. The NCD task within the 3D point cloud domain is novel, and it is characterised by assumptions and challenges absent in its 2D counterpart. This paper advances the analysis of point cloud data in four directions. Firstly, it introduces the novel task of NCD for point cloud semantic segmentation. Secondly, it demonstrates that directly applying an existing NCD method for 2D image semantic segmentation to 3D data yields limited results. Thirdly, it presents a new NCD approach based on online clustering, uncertainty estimation, and semantic distillation. Lastly, it proposes a novel evaluation protocol to rigorously assess the performance of NCD in point cloud semantic segmentation. Through comprehensive evaluations on the SemanticKITTI, SemanticPOSS, and S3DIS datasets, our approach show superior performance compared to the considered baselines.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"127 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141904415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Progressive Visual Prompt Learning with Contrastive Feature Re-formation 利用对比特征重构进行渐进式视觉提示学习
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-06 DOI: 10.1007/s11263-024-02172-x
Chen Xu, Yuhan Zhu, Haocheng Shen, Boheng Chen, Yixuan Liao, Xiaoxin Chen, Limin Wang

Prompt learning has recently emerged as a compelling alternative to the traditional fine-tuning paradigm for adapting the pre-trained Vision-Language (V-L) models to downstream tasks. Drawing inspiration from the success of prompt learning in Natural Language Processing, pioneering research efforts have been predominantly concentrated on text-based prompting strategies. By contrast, the visual prompting within V-L models remains underexploited. The straightforward transposition of existing visual prompt methods, tailored for Vision Transformers (ViT), into the V-L models often leads to suboptimal performance or training instability. To mitigate these challenges, in this paper, we propose a novel structure called Progressive Visual Prompt (ProVP). This design aims to strengthen the interaction among prompts from adjacent layers, thereby enabling more effective propagation of image embeddings to deeper layers in a manner akin to an instance-specific manner. Additionally, to address the common issue of generalization deterioration in the training period of learnable prompts, we further introduce a contrastive feature re-formation technique for visual prompt learning. This method prevents significant deviations of prompted visual features from the fixed CLIP visual feature distribution, ensuring its better generalization capability. Combining the ProVP and the contrastive feature re-formation technique, our proposed method, ProVP-Ref, significantly stabilizes the training process and enhances both the adaptation and generalization capabilities of visual prompt learning in V-L models. To demonstrate the efficacy of our approach, we evaluate ProVP-Ref across 11 image datasets, achieving the state-of-the-art results on 7 of these datasets in both few-shot learning and base-to-new generalization settings. To the best of our knowledge, this is the first study to showcase the exceptional performance of visual prompts in V-L models compared to previous text prompting methods in this area.

提示学习是近来出现的一种引人注目的替代传统微调范式的方法,用于将预先训练好的视觉语言(V-L)模型适应下游任务。受自然语言处理中提示学习成功经验的启发,开创性的研究工作主要集中在基于文本的提示策略上。相比之下,V-L 模型中的视觉提示仍未得到充分利用。将为视觉转换器(ViT)量身定制的现有视觉提示方法直接移植到 V-L 模型中,往往会导致性能不理想或训练不稳定。为了缓解这些挑战,我们在本文中提出了一种名为渐进式视觉提示(ProVP)的新结构。这种设计旨在加强相邻层提示之间的互动,从而使图像嵌入以类似于特定实例的方式更有效地传播到更深的层。此外,为了解决可学习提示在训练期间泛化能力下降的常见问题,我们进一步引入了一种用于视觉提示学习的对比特征重构技术。这种方法可以防止提示的视觉特征与固定的 CLIP 视觉特征分布产生明显偏差,从而确保其具有更好的泛化能力。结合 ProVP 和对比特征重构技术,我们提出的 ProVP-Ref 方法能显著稳定训练过程,并增强 V-L 模型中视觉提示学习的适应性和泛化能力。为了证明我们的方法的有效性,我们在 11 个图像数据集上对 ProVP-Ref 进行了评估,在其中 7 个数据集上,我们在少次学习和从基础到新的泛化设置上都取得了最先进的结果。据我们所知,这是第一项在 V-L 模型中展示视觉提示与该领域以前的文本提示方法相比的卓越性能的研究。
{"title":"Progressive Visual Prompt Learning with Contrastive Feature Re-formation","authors":"Chen Xu, Yuhan Zhu, Haocheng Shen, Boheng Chen, Yixuan Liao, Xiaoxin Chen, Limin Wang","doi":"10.1007/s11263-024-02172-x","DOIUrl":"https://doi.org/10.1007/s11263-024-02172-x","url":null,"abstract":"<p>Prompt learning has recently emerged as a compelling alternative to the traditional fine-tuning paradigm for adapting the pre-trained Vision-Language (V-L) models to downstream tasks. Drawing inspiration from the success of prompt learning in Natural Language Processing, pioneering research efforts have been predominantly concentrated on text-based prompting strategies. By contrast, the visual prompting within V-L models remains underexploited. The straightforward transposition of existing visual prompt methods, tailored for Vision Transformers (ViT), into the V-L models often leads to suboptimal performance or training instability. To mitigate these challenges, in this paper, we propose a novel structure called <b>Pro</b>gressive <b>V</b>isual <b>P</b>rompt (<b>ProVP</b>). This design aims to strengthen the interaction among prompts from adjacent layers, thereby enabling more effective propagation of image embeddings to deeper layers in a manner akin to an instance-specific manner. Additionally, to address the common issue of generalization deterioration in the training period of learnable prompts, we further introduce a contrastive feature re-formation technique for visual prompt learning. This method prevents significant deviations of prompted visual features from the fixed CLIP visual feature distribution, ensuring its better generalization capability. Combining the <b>ProVP</b> and the contrastive feature re-formation technique, our proposed method, <b>ProVP-Ref</b>, significantly stabilizes the training process and enhances both the adaptation and generalization capabilities of visual prompt learning in V-L models. To demonstrate the efficacy of our approach, we evaluate ProVP-Ref across 11 image datasets, achieving the state-of-the-art results on <b>7</b> of these datasets in both few-shot learning and base-to-new generalization settings. To the best of our knowledge, this is the first study to showcase the exceptional performance of visual prompts in V-L models compared to previous text prompting methods in this area.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"98 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From Easy to Hard: Learning Curricular Shape-Aware Features for Robust Panoptic Scene Graph Generation 从易到难:学习课程形状感知特征以生成强大的全景场景图
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-05 DOI: 10.1007/s11263-024-02190-9
Hanrong Shi, Lin Li, Jun Xiao, Yueting Zhuang, Long Chen

Panoptic Scene Graph Generation (PSG) aims to generate a comprehensive graph-structure representation based on panoptic segmentation masks. Despite remarkable progress in PSG, almost all existing methods neglect the importance of shape-aware features, which inherently focus on the contours and boundaries of objects. To bridge this gap, we propose a model-agnostic Curricular shApe-aware FEature (CAFE) learning strategy for PSG. Specifically, we incorporate shape-aware features (i.e., mask features and boundary features) into PSG, moving beyond reliance solely on bbox features. Furthermore, drawing inspiration from human cognition, we propose to integrate shape-aware features in an easy-to-hard manner. To achieve this, we categorize the predicates into three groups based on cognition learning difficulty and correspondingly divide the training process into three stages. Each stage utilizes a specialized relation classifier to distinguish specific groups of predicates. As the learning difficulty of predicates increases, these classifiers are equipped with features of ascending complexity. We also incorporate knowledge distillation to retain knowledge acquired in earlier stages. Due to its model-agnostic nature, CAFE can be seamlessly incorporated into any PSG model. Extensive experiments and ablations on two PSG tasks under both robust and zero-shot PSG have attested to the superiority and robustness of our proposed CAFE, which outperforms existing state-of-the-art methods by a large margin.

全景图生成(PSG)旨在根据全景分割掩码生成全面的图结构表示。尽管 PSG 取得了长足进步,但几乎所有现有方法都忽视了形状感知特征的重要性,而形状感知特征本质上侧重于物体的轮廓和边界。为了弥补这一缺陷,我们提出了一种针对 PSG 的模型识别课程形状感知特征(CAFE)学习策略。具体来说,我们将形状感知特征(即遮罩特征和边界特征)纳入 PSG,超越了对 bbox 特征的单纯依赖。此外,我们还从人类认知中汲取灵感,提出以由易到难的方式整合形状感知特征。为此,我们根据认知学习难度将谓词分为三类,并相应地将训练过程分为三个阶段。每个阶段都使用专门的关系分类器来区分特定的谓词组。随着谓词学习难度的增加,这些分类器所配备的功能也会逐渐复杂。我们还结合了知识提炼,以保留在早期阶段获得的知识。由于其与模型无关的性质,CAFE 可以无缝集成到任何 PSG 模型中。在鲁棒性和零镜头 PSG 条件下对两项 PSG 任务进行的广泛实验和消融证明了我们所提出的 CAFE 的优越性和鲁棒性,其性能大大优于现有的最先进方法。
{"title":"From Easy to Hard: Learning Curricular Shape-Aware Features for Robust Panoptic Scene Graph Generation","authors":"Hanrong Shi, Lin Li, Jun Xiao, Yueting Zhuang, Long Chen","doi":"10.1007/s11263-024-02190-9","DOIUrl":"https://doi.org/10.1007/s11263-024-02190-9","url":null,"abstract":"<p>Panoptic Scene Graph Generation (PSG) aims to generate a comprehensive graph-structure representation based on panoptic segmentation masks. Despite remarkable progress in PSG, almost all existing methods neglect the importance of shape-aware features, which inherently focus on the contours and boundaries of objects. To bridge this gap, we propose a model-agnostic Curricular shApe-aware FEature (CAFE) learning strategy for PSG. Specifically, we incorporate shape-aware features (i.e., mask features and boundary features) into PSG, moving beyond reliance solely on bbox features. Furthermore, drawing inspiration from human cognition, we propose to integrate shape-aware features in an easy-to-hard manner. To achieve this, we categorize the predicates into three groups based on cognition learning difficulty and correspondingly divide the training process into three stages. Each stage utilizes a specialized relation classifier to distinguish specific groups of predicates. As the learning difficulty of predicates increases, these classifiers are equipped with features of ascending complexity. We also incorporate knowledge distillation to retain knowledge acquired in earlier stages. Due to its model-agnostic nature, CAFE can be seamlessly incorporated into any PSG model. Extensive experiments and ablations on two PSG tasks under both robust and zero-shot PSG have attested to the superiority and robustness of our proposed CAFE, which outperforms existing state-of-the-art methods by a large margin.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"57 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141891722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Winning Prize Comes from Losing Tickets: Improve Invariant Learning by Exploring Variant Parameters for Out-of-Distribution Generalization 中奖来自输票:通过探索分布外泛化的变量参数改进不变性学习
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-01 DOI: 10.1007/s11263-024-02075-x
Zhuo Huang, Muyang Li, Li Shen, Jun Yu, Chen Gong, Bo Han, Tongliang Liu

Out-of-Distribution (OOD) Generalization aims to learn robust models that generalize well to various environments without fitting to distribution-specific features. Recent studies based on Lottery Ticket Hypothesis (LTH) address this problem by minimizing the learning target to find some of the parameters that are critical to the task. However, in open-world visual recognition problems, such solutions are suboptimal as the learning task contains severe distribution noises, which can mislead the optimization process. Therefore, apart from finding the task-related parameters (i.e., invariant parameters), we propose Exploring Variant parameters for Invariant Learning (EVIL) which also leverages the distribution knowledge to find the parameters that are sensitive to distribution shift (i.e., variant parameters). Once the variant parameters are left out of invariant learning, a robust subnetwork that is resistant to distribution shift can be found. Additionally, the parameters that are relatively stable across distributions can be considered invariant ones to improve invariant learning. By fully exploring both variant and invariant parameters, our EVIL can effectively identify a robust subnetwork to improve OOD generalization. In extensive experiments on integrated testbed: DomainBed, EVIL can effectively and efficiently enhance many popular methods, such as ERM, IRM, SAM, etc. Our code is available at https://github.com/tmllab/EVIL.

分布外泛化(OOD)旨在学习稳健的模型,这些模型能很好地泛化到各种环境中,而无需适应特定的分布特征。最近基于彩票假说(LTH)的研究通过最小化学习目标来找到对任务至关重要的一些参数,从而解决了这一问题。然而,在开放世界的视觉识别问题中,这种解决方案是次优的,因为学习任务包含严重的分布噪声,会误导优化过程。因此,除了找到与任务相关的参数(即不变参数)外,我们还提出了 "探索不变学习的变异参数"(EVIL),它还能利用分布知识找到对分布变化敏感的参数(即变异参数)。一旦将变异参数排除在不变性学习之外,就能找到能抵御分布偏移的稳健子网络。此外,在不同分布中相对稳定的参数也可被视为不变参数,以改善不变性学习。通过充分探索变异参数和不变参数,我们的 EVIL 可以有效识别鲁棒子网络,从而提高 OOD 的泛化能力。在综合测试平台上进行的大量实验表明,EVIL 能够有效地识别和识别 OOD 子网络,从而提高 OOD 的泛化能力:DomainBed "上进行的大量实验中,EVIL 可以有效增强 ERM、IRM、SAM 等多种流行方法。我们的代码见 https://github.com/tmllab/EVIL。
{"title":"Winning Prize Comes from Losing Tickets: Improve Invariant Learning by Exploring Variant Parameters for Out-of-Distribution Generalization","authors":"Zhuo Huang, Muyang Li, Li Shen, Jun Yu, Chen Gong, Bo Han, Tongliang Liu","doi":"10.1007/s11263-024-02075-x","DOIUrl":"https://doi.org/10.1007/s11263-024-02075-x","url":null,"abstract":"<p>Out-of-Distribution (OOD) Generalization aims to learn robust models that generalize well to various environments without fitting to distribution-specific features. Recent studies based on Lottery Ticket Hypothesis (LTH) address this problem by minimizing the learning target to find some of the parameters that are critical to the task. However, in open-world visual recognition problems, such solutions are suboptimal as the learning task contains severe distribution noises, which can mislead the optimization process. Therefore, apart from finding the task-related parameters (i.e., invariant parameters), we propose <b>Exploring Variant parameters for Invariant Learning (EVIL)</b> which also leverages the distribution knowledge to find the parameters that are sensitive to distribution shift (i.e., variant parameters). Once the variant parameters are left out of invariant learning, a robust subnetwork that is resistant to distribution shift can be found. Additionally, the parameters that are relatively stable across distributions can be considered invariant ones to improve invariant learning. By fully exploring both variant and invariant parameters, our EVIL can effectively identify a robust subnetwork to improve OOD generalization. In extensive experiments on integrated testbed: DomainBed, EVIL can effectively and efficiently enhance many popular methods, such as ERM, IRM, SAM, etc. Our code is available at https://github.com/tmllab/EVIL.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"44 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Triplane-Smoothed Video Dehazing with CLIP-Enhanced Generalization 采用 CLIP 增强泛化技术的三平面平滑视频去毛刺技术
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-01 DOI: 10.1007/s11263-024-02161-0
Jingjing Ren, Haoyu Chen, Tian Ye, Hongtao Wu, Lei Zhu

Video dehazing is a critical research area in computer vision that aims to enhance the quality of hazy frames, which benefits many downstream tasks, e.g. semantic segmentation. Recent work devise CNN-based structure or attention mechanism to fuse temporal information, while some others utilize offset between frames to align frames explicitly. Another significant line of video dehazing research focuses on constructing paired datasets by synthesizing foggy effect on clear video or generating real haze effect on indoor scenes. Despite the significant contributions of these dehazing networks and datasets to the advancement of video dehazing, current methods still suffer from spatial–temporal inconsistency and poor generalization ability. We address the aforementioned issues by proposing a triplane smoothing module to explicitly benefit from spatial–temporal smooth prior of the input video and generate temporally coherent dehazing results. We further devise a query base decoder to extract haze-relevant information while also aggregate temporal clues implicitly. To increase the generalization ability of our dehazing model we utilize CLIP guidance with a rich and high-level understanding of hazy effect. We conduct extensive experiments to verify the effectiveness of our model to generate spatial–temporally consistent dehazing results and produce pleasing dehazing results of real-world data.

视频去毛刺是计算机视觉的一个重要研究领域,其目的是提高模糊帧的质量,这对许多下游任务(如语义分割)都有好处。最近的研究设计了基于 CNN 的结构或注意力机制来融合时间信息,还有一些研究利用帧间偏移来明确对齐帧。视频去噪研究的另一个重要方向是通过在清晰视频中合成雾化效果或在室内场景中生成真实的雾霾效果来构建配对数据集。尽管这些去毛刺网络和数据集对视频去毛刺的发展做出了重大贡献,但目前的方法仍然存在时空不一致和泛化能力差的问题。针对上述问题,我们提出了一个三平面平滑模块,以明确受益于输入视频的时空平滑先验,并生成时空一致的去毛刺结果。我们还进一步设计了一个查询基础解码器,以提取与雾霾相关的信息,同时隐含地汇总时间线索。为了提高去雾模型的泛化能力,我们利用了对雾霾效果有丰富和高层次理解的 CLIP 指导。我们进行了大量实验,以验证我们的模型在生成空间-时间一致的去雾结果方面的有效性,并为真实世界的数据生成令人愉悦的去雾结果。
{"title":"Triplane-Smoothed Video Dehazing with CLIP-Enhanced Generalization","authors":"Jingjing Ren, Haoyu Chen, Tian Ye, Hongtao Wu, Lei Zhu","doi":"10.1007/s11263-024-02161-0","DOIUrl":"https://doi.org/10.1007/s11263-024-02161-0","url":null,"abstract":"<p>Video dehazing is a critical research area in computer vision that aims to enhance the quality of hazy frames, which benefits many downstream tasks, e.g. semantic segmentation. Recent work devise CNN-based structure or attention mechanism to fuse temporal information, while some others utilize offset between frames to align frames explicitly. Another significant line of video dehazing research focuses on constructing paired datasets by synthesizing foggy effect on clear video or generating real haze effect on indoor scenes. Despite the significant contributions of these dehazing networks and datasets to the advancement of video dehazing, current methods still suffer from spatial–temporal inconsistency and poor generalization ability. We address the aforementioned issues by proposing a triplane smoothing module to explicitly benefit from spatial–temporal smooth prior of the input video and generate temporally coherent dehazing results. We further devise a query base decoder to extract haze-relevant information while also aggregate temporal clues implicitly. To increase the generalization ability of our dehazing model we utilize CLIP guidance with a rich and high-level understanding of hazy effect. We conduct extensive experiments to verify the effectiveness of our model to generate spatial–temporally consistent dehazing results and produce pleasing dehazing results of real-world data.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"11 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging the Source-to-Target Gap for Cross-Domain Person Re-identification with Intermediate Domains 利用中间域缩小源到目标的差距,实现跨域人员再识别
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-31 DOI: 10.1007/s11263-024-02169-6
Yongxing Dai, Yifan Sun, Jun Liu, Zekun Tong, Ling-Yu Duan

Cross-domain person re-identification (re-ID), such as unsupervised domain adaptive re-ID (UDA re-ID), aims to transfer the identity-discriminative knowledge from the source to the target domain. Existing methods commonly consider the source and target domains are isolated from each other, i.e., no intermediate status is modeled between the source and target domains. Directly transferring the knowledge between two isolated domains can be very difficult, especially when the domain gap is large. This paper, from a novel perspective, assumes these two domains are not completely isolated, but can be connected through a series of intermediate domains. Instead of directly aligning the source and target domains against each other, we propose to align the source and target domains against their intermediate domains so as to facilitate a smooth knowledge transfer. To discover and utilize these intermediate domains, this paper proposes an Intermediate Domain Module (IDM) and a Mirrors Generation Module (MGM). IDM has two functions: (1) it generates multiple intermediate domains by mixing the hidden-layer features from source and target domains and (2) it dynamically reduces the domain gap between the source/target domain features and the intermediate domain features. While IDM achieves good domain alignment effect, it introduces a side effect, i.e., the mix-up operation may mix the identities into a new identity and lose the original identities. Accordingly, MGM is introduced to compensate the loss of the original identity by mapping the features into the IDM-generated intermediate domains without changing their original identity. It allows to focus on minimizing domain variations to further promote the alignment between the source/target domain and intermediate domains, which reinforces IDM into IDM++. We extensively evaluate our method under both the UDA and domain generalization (DG) scenarios and observe that IDM++ yields consistent (and usually significant) performance improvement for cross-domain re-ID, achieving new state of the art. For example, on the challenging MSMT17 benchmark, IDM++ surpasses the prior state of the art by a large margin (e.g., up to 9.9% and 7.8% rank-1 accuracy) for UDA and DG scenarios, respectively. Code is available at https://github.com/SikaStar/IDM.

跨域人员再识别(re-ID),如无监督域自适应再识别(UDA re-ID),旨在将身份识别知识从源域转移到目标域。现有方法通常认为源域和目标域是相互隔离的,即源域和目标域之间没有中间状态模型。在两个孤立的域之间直接转移知识可能非常困难,尤其是当域差距较大时。本文从一个新颖的角度出发,假设这两个域并非完全孤立,而是可以通过一系列中间域连接起来。我们建议不直接将源域和目标域对齐,而是将源域和目标域对齐其中间域,以促进知识的顺利转移。为了发现和利用这些中间域,本文提出了一个中间域模块(IDM)和一个镜像生成模块(MGM)。IDM 有两个功能:(1) 通过混合源域和目标域的隐藏层特征生成多个中间域;(2) 动态减少源域/目标域特征与中间域特征之间的域差距。虽然 IDM 实现了良好的域对齐效果,但它也带来了副作用,即混合操作可能会将标识混合成一个新标识,从而丢失原始标识。因此,我们引入了 MGM,通过将特征映射到 IDM 生成的中间域而不改变其原始标识,以补偿原始标识的丢失。它可以将重点放在最小化域变化上,进一步促进源域/目标域和中间域之间的一致性,从而将 IDM 强化为 IDM++。我们在 UDA 和域泛化 (DG) 场景下对我们的方法进行了广泛评估,观察到 IDM++ 在跨域重新识别(cross-domain re-ID)方面产生了一致(通常是显著)的性能改进,达到了新的技术水平。例如,在具有挑战性的 MSMT17 基准上,IDM++ 在 UDA 和 DG 场景下的排名-1 准确率分别高达 9.9% 和 7.8%,远远超过了之前的技术水平。代码见 https://github.com/SikaStar/IDM。
{"title":"Bridging the Source-to-Target Gap for Cross-Domain Person Re-identification with Intermediate Domains","authors":"Yongxing Dai, Yifan Sun, Jun Liu, Zekun Tong, Ling-Yu Duan","doi":"10.1007/s11263-024-02169-6","DOIUrl":"https://doi.org/10.1007/s11263-024-02169-6","url":null,"abstract":"<p>Cross-domain person re-identification (re-ID), such as unsupervised domain adaptive re-ID (UDA re-ID), aims to transfer the identity-discriminative knowledge from the source to the target domain. Existing methods commonly consider the source and target domains are isolated from each other, i.e., no intermediate status is modeled between the source and target domains. Directly transferring the knowledge between two isolated domains can be very difficult, especially when the domain gap is large. This paper, from a novel perspective, assumes these two domains are not completely isolated, but can be connected through a series of intermediate domains. Instead of directly aligning the source and target domains against each other, we propose to align the source and target domains against their intermediate domains so as to facilitate a smooth knowledge transfer. To discover and utilize these intermediate domains, this paper proposes an Intermediate Domain Module (IDM) and a Mirrors Generation Module (MGM). IDM has two functions: (1) it generates multiple intermediate domains by mixing the hidden-layer features from source and target domains and (2) it dynamically reduces the domain gap between the source/target domain features and the intermediate domain features. While IDM achieves good domain alignment effect, it introduces a side effect, i.e., the mix-up operation may mix the identities into a new identity and lose the original identities. Accordingly, MGM is introduced to compensate the loss of the original identity by mapping the features into the IDM-generated intermediate domains without changing their original identity. It allows to focus on minimizing domain variations to further promote the alignment between the source/target domain and intermediate domains, which reinforces IDM into IDM++. We extensively evaluate our method under both the UDA and domain generalization (DG) scenarios and observe that IDM++ yields consistent (and usually significant) performance improvement for cross-domain re-ID, achieving new state of the art. For example, on the challenging MSMT17 benchmark, IDM++ surpasses the prior state of the art by a large margin (e.g., up to 9.9% and 7.8% rank-1 accuracy) for UDA and DG scenarios, respectively. Code is available at https://github.com/SikaStar/IDM.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"98 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compressed Event Sensing (CES) Volumes for Event Cameras 事件摄像机的压缩事件传感 (CES) 容量
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-31 DOI: 10.1007/s11263-024-02197-2
Songnan Lin, Ye Ma, Jing Chen, Bihan Wen

Deep learning has made significant progress in event-driven applications. But to match standard vision networks, most approaches rely on aggregating events into grid-like representations, which obscure crucial temporal information and limit overall performance. To address this issue, we propose a novel event representation called compressed event sensing (CES) volumes. CES volumes preserve the high temporal resolution of event streams by leveraging the sparsity property of events and the principles of compressed sensing theory. They effectively capture the frequency characteristics of events in low-dimensional representations, which can be accurately decoded to raw high-dimensional event signals. In addition, our theoretical analysis show that, when integrated with a neural network, CES volumes demonstrates greater expressive power under the neural tangent kernel approximation. Through synthetic phantom validation on dense frame regression and two downstream applications involving intensity-image reconstruction and object recognition tasks, we demonstrate the superior performance of CES volumes compared to state-of-the-art event representations.

深度学习在事件驱动应用方面取得了重大进展。但是,为了与标准视觉网络相匹配,大多数方法都依赖于将事件聚合到网格状表示中,这就掩盖了关键的时间信息,限制了整体性能。为了解决这个问题,我们提出了一种新颖的事件表示法,称为压缩事件感应(CES)卷。CES 卷利用事件的稀疏性和压缩传感理论的原理,保留了事件流的高时间分辨率。它们能以低维表示有效捕捉事件的频率特性,并能准确解码为原始的高维事件信号。此外,我们的理论分析表明,当与神经网络集成时,CES 卷在神经切核近似下表现出更强的表现力。通过对密集帧回归的合成模型验证,以及涉及强度图像重建和物体识别任务的两个下游应用,我们证明了 CES volume 与最先进的事件表示法相比具有更优越的性能。
{"title":"Compressed Event Sensing (CES) Volumes for Event Cameras","authors":"Songnan Lin, Ye Ma, Jing Chen, Bihan Wen","doi":"10.1007/s11263-024-02197-2","DOIUrl":"https://doi.org/10.1007/s11263-024-02197-2","url":null,"abstract":"<p>Deep learning has made significant progress in event-driven applications. But to match standard vision networks, most approaches rely on aggregating events into grid-like representations, which obscure crucial temporal information and limit overall performance. To address this issue, we propose a novel event representation called compressed event sensing (CES) volumes. CES volumes preserve the high temporal resolution of event streams by leveraging the sparsity property of events and the principles of compressed sensing theory. They effectively capture the frequency characteristics of events in low-dimensional representations, which can be accurately decoded to raw high-dimensional event signals. In addition, our theoretical analysis show that, when integrated with a neural network, CES volumes demonstrates greater expressive power under the neural tangent kernel approximation. Through synthetic phantom validation on dense frame regression and two downstream applications involving intensity-image reconstruction and object recognition tasks, we demonstrate the superior performance of CES volumes compared to state-of-the-art event representations.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"29 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1