Pattern Recognition最新文献_第3页

SAR target augmentation and recognition via cross-domain reconstruction 通过跨域重建进行合成孔径雷达目标增强和识别

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2024-11-04 DOI: 10.1016/j.patcog.2024.111117

Ganggang Dong, Yafei Song

The deep learning-based target recognition methods have achieved great performance in the preceding works. Large amounts of training data with label were collected to train a deep architecture, by which the inference can be obtained. For radar sensors, the data could be collected easily, yet the prior knowledge on label was difficult to be accessed. To solve the problem, a cross-domain re-imaging target augmentation method was proposed in this paper. The original image was first recast into the frequency domain. The frequency were then randomly filtered by a randomly generated mask. The size and the shape of mask was randomly determined. The filtering results were finally used for re-imaging. The original target can be then reconstructed accordingly. A series of new samples can be generated freely. The amounts and the diversity of dataset can be therefore improved. The proposed augmentation method can be implemented on-line or off-line, making it adaptable to various downstream tasks. Multiple comparative studies throw the light on the superiority of proposed method over the standard and recent techniques. It served to generate the images that would aid the downstream tasks.

基于深度学习的目标识别方法在前人的研究中取得了巨大的成就。收集大量带有标签的训练数据来训练深度架构，从而获得推理结果。对于雷达传感器来说，数据很容易收集，但关于标签的先验知识却很难获取。为了解决这个问题，本文提出了一种跨域再成像目标增强方法。首先将原始图像转换到频域。然后用随机生成的掩码对频率进行随机滤波。掩膜的大小和形状是随机确定的。滤波结果最终用于重新成像。然后就可以相应地重建原始目标。一系列新样本可以自由生成。因此，数据集的数量和多样性可以得到改善。所提出的增强方法可以在线或离线实施，因此可以适应各种下游任务。多项比较研究表明，拟议方法优于标准和最新技术。它可以生成有助于下游任务的图像。

{"title":"SAR target augmentation and recognition via cross-domain reconstruction","authors":"Ganggang Dong, Yafei Song","doi":"10.1016/j.patcog.2024.111117","DOIUrl":"10.1016/j.patcog.2024.111117","url":null,"abstract":"<div><div>The deep learning-based target recognition methods have achieved great performance in the preceding works. Large amounts of training data with label were collected to train a deep architecture, by which the inference can be obtained. For radar sensors, the data could be collected easily, yet the prior knowledge on label was difficult to be accessed. To solve the problem, a cross-domain re-imaging target augmentation method was proposed in this paper. The original image was first recast into the frequency domain. The frequency were then randomly filtered by a randomly generated mask. The size and the shape of mask was randomly determined. The filtering results were finally used for re-imaging. The original target can be then reconstructed accordingly. A series of new samples can be generated freely. The amounts and the diversity of dataset can be therefore improved. The proposed augmentation method can be implemented on-line or off-line, making it adaptable to various downstream tasks. Multiple comparative studies throw the light on the superiority of proposed method over the standard and recent techniques. It served to generate the images that would aid the downstream tasks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111117"},"PeriodicalIF":7.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint Intra-view and Inter-view Enhanced Tensor Low-rank Induced Affinity Graph Learning 联合视图内和视图间增强型张量低秩诱导亲和图学习

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2024-11-04 DOI: 10.1016/j.patcog.2024.111140

Weijun Sun, Chaoye Li, Qiaoyun Li, Xiaozhao Fang, Jiakai He, Lei Liu

Graph-based and tensor-based multi-view clustering have gained popularity in recent years due to their ability to explore the relationship between samples. However, there are still several shortcomings in the current multi-view graph clustering algorithms. (1) Most previous methods only focus on the inter-view correlation, while ignoring the intra-view correlation. (2) They usually use the Tensor Nuclear Norm (TNN) to approximate the rank of tensors. However, while it has the same penalty for different singular values, the model cannot approximate the true rank of tensors well. To solve these problems in a unified way, we propose a new tensor-based multi-view graph clustering method. Specifically, we introduce the Enhanced Tensor Rank (ETR) minimization of intra-view and inter-view in the process of learning the affinity graph of each view. Compared with 10 state-of-the-art methods on 8 real datasets, the experimental results demonstrate the superiority of our method.

近年来，基于图形和张量的多视图聚类因其能够探索样本之间的关系而广受欢迎。然而，目前的多视图聚类算法仍存在一些不足。(1) 以往的方法大多只关注视图间的相关性，而忽略了视图内的相关性。(2) 它们通常使用张量核规范（TNN）来逼近张量的秩。然而，虽然它对不同奇异值的惩罚相同，但该模型不能很好地逼近张量的真实秩。为了统一解决这些问题，我们提出了一种新的基于张量的多视图聚类方法。具体来说，我们在学习每个视图的亲和图的过程中引入了视图内和视图间的增强张量秩（ETR）最小化。在 8 个真实数据集上与 10 种最先进的方法相比，实验结果证明了我们的方法的优越性。

引用次数: 0

PIM-Net: Progressive Inconsistency Mining Network for image manipulation localization PIM-Net：用于图像处理定位的渐进式不一致挖掘网络

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2024-11-03 DOI: 10.1016/j.patcog.2024.111136

Ningning Bai , Xiaofeng Wang , Ruidong Han , Jianpeng Hou , Yihang Wang , Shanmin Pang

The content authenticity and reliability of digital images have promoted the research on image manipulation localization (IML). Most current deep learning-based methods focus on extracting global or local tampering features for identifying forged regions. These features usually contain semantic information and lead to inaccurate detection results for non-object or incomplete semantic tampered regions. In this study, we propose a novel Progressive Inconsistency Mining Network (PIM-Net) for effective IML. Specifically, PIM-Net consists of two core modules, the Inconsistency Mining Module (ICMM) and the Progressive Fusion Refinement module (PFR). ICMM models the inconsistency between authentic and forged regions at two levels, i.e., pixel correlation inconsistency and region attribute incongruity, while avoiding the interference of semantic information. Then PFR progressively aggregates and refines extracted inconsistent features, which in turn yields finer and pure localization responses. Extensive qualitative and quantitative experiments on five benchmarks demonstrate PIM-Net’s superiority to current state-of-the-art IML methods. Code: https://github.com/ningnbai/PIM-Net.

数字图像内容的真实性和可靠性促进了图像处理定位（IML）的研究。目前大多数基于深度学习的方法都侧重于提取全局或局部篡改特征来识别伪造区域。这些特征通常包含语义信息，导致对非对象或语义不完整的篡改区域的检测结果不准确。在本研究中，我们提出了一种新颖的渐进式不一致性挖掘网络（PIM-Net），以实现有效的 IML。具体来说，PIM-Net 由两个核心模块组成，即不一致性挖掘模块（ICMM）和渐进式融合细化模块（PFR）。ICMM 从像素相关性不一致性和区域属性不一致性两个层面对真实区域和伪造区域之间的不一致性进行建模，同时避免语义信息的干扰。然后，PFR 对提取的不一致特征进行逐步聚合和细化，进而得到更精细、更纯粹的定位响应。在五个基准上进行的大量定性和定量实验证明，PIM-Net 优于目前最先进的 IML 方法。代码：https://github.com/ningnbai/PIM-Net。

{"title":"PIM-Net: Progressive Inconsistency Mining Network for image manipulation localization","authors":"Ningning Bai , Xiaofeng Wang , Ruidong Han , Jianpeng Hou , Yihang Wang , Shanmin Pang","doi":"10.1016/j.patcog.2024.111136","DOIUrl":"10.1016/j.patcog.2024.111136","url":null,"abstract":"<div><div>The content authenticity and reliability of digital images have promoted the research on image manipulation localization (IML). Most current deep learning-based methods focus on extracting global or local tampering features for identifying forged regions. These features usually contain semantic information and lead to inaccurate detection results for non-object or incomplete semantic tampered regions. In this study, we propose a novel Progressive Inconsistency Mining Network (PIM-Net) for effective IML. Specifically, PIM-Net consists of two core modules, the Inconsistency Mining Module (ICMM) and the Progressive Fusion Refinement module (PFR). ICMM models the inconsistency between authentic and forged regions at two levels, i.e., pixel correlation inconsistency and region attribute incongruity, while avoiding the interference of semantic information. Then PFR progressively aggregates and refines extracted inconsistent features, which in turn yields finer and pure localization responses. Extensive qualitative and quantitative experiments on five benchmarks demonstrate PIM-Net’s superiority to current state-of-the-art IML methods. Code: <span><span>https://github.com/ningnbai/PIM-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111136"},"PeriodicalIF":7.5,"publicationDate":"2024-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-modal adapter for vision–language retrieval 视觉语言检索的跨模态适配器

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2024-11-03 DOI: 10.1016/j.patcog.2024.111144

Haojun Jiang , Jianke Zhang , Rui Huang , Chunjiang Ge , Zanlin Ni , Shiji Song , Gao Huang

Vision–language retrieval is an important multi-modal learning topic, where the goal is to retrieve the most relevant visual candidate for a given text query. Recently, pre-trained models, e.g., CLIP, show great potential on retrieval tasks. However, as pre-trained models are scaling up, fully fine-tuning them on donwstream retrieval datasets has a high risk of overfitting. Moreover, in practice, it would be costly to train and store a large model for each task. To overcome the above issues, we present a novel Cross-Modal Adapter for parameter-efficient transfer learning. Inspired by adapter-based methods, we adjust the pre-trained model with a few parameterization layers. However, there are two notable differences. First, our method is designed for the multi-modal domain. Secondly, it allows encoder-level implicit cross-modal interactions between vision and language encoders. Although surprisingly simple, our approach has three notable benefits: (1) reduces the vast majority of fine-tuned parameters, (2) saves training time, and (3) allows all the pre-trained parameters to be fixed, enabling the pre-trained model to be shared across datasets. Extensive experiments demonstrate that, without bells and whistles, our approach outperforms adapter-based methods on image–text retrieval datasets (MSCOCO, Flickr30K) and video–text retrieval datasets (MSR-VTT, DiDeMo, and ActivityNet).

视觉语言检索是一个重要的多模态学习课题，其目标是为给定的文本查询检索最相关的视觉候选对象。最近，预训练模型（如 CLIP）在检索任务中显示出巨大的潜力。然而，由于预训练模型的规模不断扩大，在东流检索数据集上对其进行完全微调很有可能会造成过拟合。此外，在实践中，为每个任务训练和存储一个大型模型的成本很高。为了克服上述问题，我们提出了一种用于参数高效迁移学习的新型交叉模式适配器。受基于适配器的方法的启发，我们通过几个参数化层来调整预训练模型。不过，我们的方法有两点明显不同。首先，我们的方法是为多模态领域设计的。其次，它允许视觉编码器和语言编码器之间进行编码器级的隐式跨模态交互。虽然简单得出人意料，但我们的方法有三个显著的优点：(1) 减少了绝大多数微调参数；(2) 节省了训练时间；(3) 允许固定所有预训练参数，使预训练模型可以跨数据集共享。广泛的实验证明，在图像文本检索数据集（MSCOCO、Flickr30K）和视频文本检索数据集（MSR-VTT、DiDeMo 和 ActivityNet）上，我们的方法在没有附加功能的情况下优于基于适配器的方法。

{"title":"Cross-modal adapter for vision–language retrieval","authors":"Haojun Jiang , Jianke Zhang , Rui Huang , Chunjiang Ge , Zanlin Ni , Shiji Song , Gao Huang","doi":"10.1016/j.patcog.2024.111144","DOIUrl":"10.1016/j.patcog.2024.111144","url":null,"abstract":"<div><div>Vision–language retrieval is an important multi-modal learning topic, where the goal is to retrieve the most relevant visual candidate for a given text query. Recently, pre-trained models, <em>e.g.</em>, CLIP, show great potential on retrieval tasks. However, as pre-trained models are scaling up, fully fine-tuning them on donwstream retrieval datasets has a high risk of overfitting. Moreover, in practice, it would be costly to train and store a large model for each task. To overcome the above issues, we present a novel <strong>Cross-Modal Adapter</strong> for parameter-efficient transfer learning. Inspired by adapter-based methods, we adjust the pre-trained model with a few parameterization layers. However, there are two notable differences. First, our method is designed for the multi-modal domain. Secondly, it allows encoder-level implicit cross-modal interactions between vision and language encoders. Although surprisingly simple, our approach has three notable benefits: (1) reduces the vast majority of fine-tuned parameters, (2) saves training time, and (3) allows all the pre-trained parameters to be fixed, enabling the pre-trained model to be shared across datasets. Extensive experiments demonstrate that, without bells and whistles, our approach outperforms adapter-based methods on image–text retrieval datasets (MSCOCO, Flickr30K) and video–text retrieval datasets (MSR-VTT, DiDeMo, and ActivityNet).</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111144"},"PeriodicalIF":7.5,"publicationDate":"2024-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Image shadow removal via multi-scale deep Retinex decomposition 通过多尺度深度 Retinex 分解去除图像阴影

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111126

Yan Huang , Xinchang Lu , Yuhui Quan , Yong Xu , Hui Ji

In recent years, deep learning has emerged as an important tool for image shadow removal. However, existing methods often prioritize shadow detection and, in doing so, they oversimplify the lighting conditions of shadow regions. Furthermore, these methods neglect cues from the overall image lighting when re-lighting shadow areas, thereby failing to ensure global lighting consistency. To address these challenges in images captured under complex lighting conditions, this paper introduces a multi-scale network built on a Retinex decomposition model. The proposed approach effectively senses shadows with uneven lighting and re-light them, achieving greater consistency along shadow boundaries. Furthermore, for the design of network, we introduce several techniques for boosting shadow removal performance, including a shadow-aware channel attention module, local discriminative and Retinex decomposition loss functions, and a multi-scale mechanism for guiding Retinex decomposition by concurrently capturing both fine-grained details and large-scale contextual information. Experimental results demonstrate the superiority of our proposed method over existing solutions, particularly for images taken under complex lighting conditions.

近年来，深度学习已成为去除图像阴影的重要工具。然而，现有的方法往往优先考虑阴影检测，在这样做的过程中，它们过度简化了阴影区域的照明条件。此外，这些方法在重新照亮阴影区域时，会忽略来自整体图像照明的线索，从而无法确保全局照明的一致性。为了应对在复杂照明条件下拍摄的图像所面临的这些挑战，本文介绍了一种基于 Retinex 分解模型的多尺度网络。所提出的方法能有效感知光照不均匀的阴影，并对其进行重新照明，从而提高阴影边界的一致性。此外，在设计网络时，我们引入了几种提高阴影去除性能的技术，包括阴影感知通道注意模块、局部判别和 Retinex 分解损失函数，以及通过同时捕捉细粒度细节和大规模上下文信息来指导 Retinex 分解的多尺度机制。实验结果表明，我们提出的方法优于现有的解决方案，特别是在复杂光照条件下拍摄的图像。

{"title":"Image shadow removal via multi-scale deep Retinex decomposition","authors":"Yan Huang , Xinchang Lu , Yuhui Quan , Yong Xu , Hui Ji","doi":"10.1016/j.patcog.2024.111126","DOIUrl":"10.1016/j.patcog.2024.111126","url":null,"abstract":"<div><div>In recent years, deep learning has emerged as an important tool for image shadow removal. However, existing methods often prioritize shadow detection and, in doing so, they oversimplify the lighting conditions of shadow regions. Furthermore, these methods neglect cues from the overall image lighting when re-lighting shadow areas, thereby failing to ensure global lighting consistency. To address these challenges in images captured under complex lighting conditions, this paper introduces a multi-scale network built on a Retinex decomposition model. The proposed approach effectively senses shadows with uneven lighting and re-light them, achieving greater consistency along shadow boundaries. Furthermore, for the design of network, we introduce several techniques for boosting shadow removal performance, including a shadow-aware channel attention module, local discriminative and Retinex decomposition loss functions, and a multi-scale mechanism for guiding Retinex decomposition by concurrently capturing both fine-grained details and large-scale contextual information. Experimental results demonstrate the superiority of our proposed method over existing solutions, particularly for images taken under complex lighting conditions.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111126"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

S2Match: Self-paced sampling for data-limited semi-supervised learning S2Match：用于数据有限的半监督学习的自定步调采样

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111121

Dayan Guan , Yun Xing , Jiaxing Huang , Aoran Xiao , Abdulmotaleb El Saddik , Shijian Lu

Data-limited semi-supervised learning tends to be severely degraded by miscalibration (i.e., misalignment between confidence and correctness of predicted pseudo labels) and stuck at poor local minima while learning from the same set of over-confident yet incorrect pseudo labels repeatedly. We design a simple and effective self-paced sampling technique that can greatly alleviate the impact of miscalibration and learn more accurate semi-supervised models from limited training data. Instead of employing static or dynamic confidence thresholds which is sensitive to miscalibration, the proposed self-paced sampling follows a simple linear policy to select pseudo labels which eases repeated learning from the same set of falsely predicted pseudo labels at the early training stage and lowers the chance of being stuck at local minima effectively. Despite its simplicity, extensive evaluations over multiple data-limited semi-supervised tasks show the proposed self-paced sampling outperforms the state-of-the-art consistently by large margins.

受数据限制的半监督学习往往会因误判（即预测伪标签的置信度和正确性不一致）而严重退化，并在重复从同一组过于自信但却不正确的伪标签中学习时陷入糟糕的局部极小值。我们设计了一种简单有效的自定步调采样技术，可以大大减轻误判的影响，并从有限的训练数据中学习到更准确的半监督模型。我们提出的自步进采样技术不采用对误判敏感的静态或动态置信度阈值，而是采用简单的线性策略来选择伪标签，从而减轻了在早期训练阶段从同一组错误预测的伪标签中重复学习的难度，并有效降低了陷入局部最小值的几率。尽管这种方法很简单，但在多个数据有限的半监督任务中进行的广泛评估表明，所提出的自步调采样方法始终以较大的优势优于最先进的方法。

引用次数: 0

Consistency-driven feature scoring and regularization network for visible–infrared person re-identification 用于可见红外人员再识别的一致性驱动特征评分和正则化网络

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111131

Xueting Chen , Yan Yan , Jing-Hao Xue , Nannan Wang , Hanzi Wang

Recently, visible–infrared person re-identification (VI-ReID) has received considerable attention due to its practical importance. A number of methods extract multiple local features to enrich the diversity of feature representations. However, some local features often involve modality-relevant information, leading to deteriorated performance. Moreover, existing methods optimize the models by only considering the samples at each batch while ignoring the learned features at previous iterations. As a result, the features of the same person images drastically change at different training epochs, hindering the training stability. To alleviate the above issues, we propose a novel consistency-driven feature scoring and regularization network (CFSR-Net), which consists of a backbone network, a local feature learning block, a feature scoring block, and a global–local feature fusion block, for VI-ReID. On the one hand, we design a cross-modality consistency loss to highlight modality-irrelevant local features and suppress modality-relevant local features for each modality, facilitating the generation of a reliable compact local feature. On the other hand, we develop a feature consistency regularization strategy (including a momentum class contrastive loss and a momentum distillation loss) to impose consistency regularization on the learning of different levels of features by considering the learned features at historical epochs. This effectively enables smooth feature changes and thus improves the training stability. Extensive experiments on public VI-ReID datasets clearly show the effectiveness of our method against several state-of-the-art VI-ReID methods. Code will be released at https://github.com/cxtjl/CFSR-Net.

最近，可见红外人员再识别（Visible-Infrared person re-identification，VI-ReID）因其实际重要性而受到广泛关注。许多方法提取多个局部特征，以丰富特征表征的多样性。然而，一些局部特征往往涉及模态相关信息，导致性能下降。此外，现有方法在优化模型时，只考虑每批样本，而忽略了之前迭代学习到的特征。因此，同一人物图像的特征在不同的训练期会发生巨大变化，从而影响了训练的稳定性。为了解决上述问题，我们针对 VI-ReID 提出了一种新型的一致性驱动特征评分和正则化网络（CFSR-Net），它由主干网络、局部特征学习块、特征评分块和全局-局部特征融合块组成。一方面，我们设计了一种跨模态一致性损失，以突出与模态无关的局部特征，抑制各模态的相关局部特征，从而有助于生成可靠的紧凑局部特征。另一方面，我们开发了一种特征一致性正则化策略（包括动量类对比损失和动量蒸馏损失），通过考虑历史时间的学习特征，对不同层次的特征学习施加一致性正则化。这有效地实现了平滑的特征变化，从而提高了训练的稳定性。在公开的 VI-ReID 数据集上进行的大量实验清楚地表明，我们的方法与几种最先进的 VI-ReID 方法相比非常有效。代码将在 https://github.com/cxtjl/CFSR-Net 上发布。

{"title":"Consistency-driven feature scoring and regularization network for visible–infrared person re-identification","authors":"Xueting Chen , Yan Yan , Jing-Hao Xue , Nannan Wang , Hanzi Wang","doi":"10.1016/j.patcog.2024.111131","DOIUrl":"10.1016/j.patcog.2024.111131","url":null,"abstract":"<div><div>Recently, visible–infrared person re-identification (VI-ReID) has received considerable attention due to its practical importance. A number of methods extract multiple local features to enrich the diversity of feature representations. However, some local features often involve modality-relevant information, leading to deteriorated performance. Moreover, existing methods optimize the models by only considering the samples at each batch while ignoring the learned features at previous iterations. As a result, the features of the same person images drastically change at different training epochs, hindering the training stability. To alleviate the above issues, we propose a novel consistency-driven feature scoring and regularization network (CFSR-Net), which consists of a backbone network, a local feature learning block, a feature scoring block, and a global–local feature fusion block, for VI-ReID. On the one hand, we design a cross-modality consistency loss to highlight modality-irrelevant local features and suppress modality-relevant local features for each modality, facilitating the generation of a reliable compact local feature. On the other hand, we develop a feature consistency regularization strategy (including a momentum class contrastive loss and a momentum distillation loss) to impose consistency regularization on the learning of different levels of features by considering the learned features at historical epochs. This effectively enables smooth feature changes and thus improves the training stability. Extensive experiments on public VI-ReID datasets clearly show the effectiveness of our method against several state-of-the-art VI-ReID methods. Code will be released at <span><span>https://github.com/cxtjl/CFSR-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111131"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ANNE: Adaptive Nearest Neighbours and Eigenvector-based sample selection for robust learning with noisy labels ANNE：基于自适应近邻和特征向量的样本选择，实现带噪声标签的稳健学习

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111132

Filipe R. Cordeiro , Gustavo Carneiro

An important stage of most state-of-the-art (SOTA) noisy-label learning methods consists of a sample selection procedure that classifies samples from the noisy-label training set into noisy-label or clean-label subsets. The process of sample selection typically consists of one of the two approaches: loss-based sampling, where high-loss samples are considered to have noisy labels, or feature-based sampling, where samples from the same class tend to cluster together in the feature space and noisy-label samples are identified as anomalies within those clusters. Empirically, loss-based sampling is robust to a wide range of noise rates, while feature-based sampling tends to work effectively in particular scenarios, e.g., the filtering of noisy instances via their eigenvectors (FINE) sampling exhibits greater robustness in scenarios with low noise rates, and the K nearest neighbour (KNN) sampling mitigates better high noise-rate problems. This paper introduces the Adaptive Nearest Neighbours and Eigenvector-based (ANNE) sample selection methodology, a novel approach that integrates loss-based sampling with the feature-based sampling methods FINE and Adaptive KNN to optimize performance across a wide range of noise rate scenarios. ANNE achieves this integration by first partitioning the training set into high-loss and low-loss sub-groups using loss-based sampling. Subsequently, within the low-loss subset, sample selection is performed using FINE, while the high-loss subset employs Adaptive KNN for effective sample selection. We integrate ANNE into the noisy-label learning state of the art (SOTA) method SSR+, and test it on CIFAR-10/-100 (with symmetric, asymmetric and instance-dependent noise), Webvision and ANIMAL-10, where our method shows better accuracy than the SOTA in most experiments, with a competitive training time. The code is available at https://github.com/filipe-research/anne.

大多数最先进的（SOTA）噪声标签学习方法的一个重要阶段是样本选择过程，该过程将噪声标签训练集中的样本分类为噪声标签或清洁标签子集。样本选择过程通常包括两种方法中的一种：基于损失的抽样，即认为高损失样本具有噪声标签；或基于特征的抽样，即同一类别的样本在特征空间中趋于聚类，噪声标签样本被识别为这些聚类中的异常样本。从经验上看，基于损失的采样对各种噪声率都很稳健，而基于特征的采样往往在特定情况下有效，例如，通过特征向量过滤噪声实例（FINE）采样在噪声率较低的情况下表现出更强的稳健性，而 K 近邻（KNN）采样能更好地缓解高噪声率问题。本文介绍了基于自适应近邻和特征向量（ANNE）的样本选择方法，这是一种新颖的方法，它将基于损失的采样与基于特征的 FINE 和自适应 KNN 采样方法整合在一起，以优化各种噪声率情况下的性能。ANNE 通过首先使用基于损失的采样将训练集划分为高损失子群和低损失子群来实现这种整合。随后，在低损失子集中，使用 FINE 进行样本选择，而高损失子集则使用自适应 KNN 进行有效的样本选择。我们将 ANNE 集成到了最先进的噪声标签学习（SOTA）方法 SSR+ 中，并在 CIFAR-10/-100（具有对称、非对称和依赖实例的噪声）、Webvision 和 ANIMAL-10 上进行了测试。代码见 https://github.com/filipe-research/anne。

{"title":"ANNE: Adaptive Nearest Neighbours and Eigenvector-based sample selection for robust learning with noisy labels","authors":"Filipe R. Cordeiro , Gustavo Carneiro","doi":"10.1016/j.patcog.2024.111132","DOIUrl":"10.1016/j.patcog.2024.111132","url":null,"abstract":"<div><div>An important stage of most state-of-the-art (SOTA) noisy-label learning methods consists of a sample selection procedure that classifies samples from the noisy-label training set into noisy-label or clean-label subsets. The process of sample selection typically consists of one of the two approaches: loss-based sampling, where high-loss samples are considered to have noisy labels, or feature-based sampling, where samples from the same class tend to cluster together in the feature space and noisy-label samples are identified as anomalies within those clusters. Empirically, loss-based sampling is robust to a wide range of noise rates, while feature-based sampling tends to work effectively in particular scenarios, e.g., the filtering of noisy instances via their eigenvectors (FINE) sampling exhibits greater robustness in scenarios with low noise rates, and the K nearest neighbour (KNN) sampling mitigates better high noise-rate problems. This paper introduces the Adaptive Nearest Neighbours and Eigenvector-based (ANNE) sample selection methodology, a novel approach that integrates loss-based sampling with the feature-based sampling methods FINE and Adaptive KNN to optimize performance across a wide range of noise rate scenarios. ANNE achieves this integration by first partitioning the training set into high-loss and low-loss sub-groups using loss-based sampling. Subsequently, within the low-loss subset, sample selection is performed using FINE, while the high-loss subset employs Adaptive KNN for effective sample selection. We integrate ANNE into the noisy-label learning state of the art (SOTA) method SSR+, and test it on CIFAR-10/-100 (with symmetric, asymmetric and instance-dependent noise), Webvision and ANIMAL-10, where our method shows better accuracy than the SOTA in most experiments, with a competitive training time. The code is available at <span><span>https://github.com/filipe-research/anne</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111132"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-supervised random mask attention GAN in tackling pose-invariant face recognition 解决姿态不变人脸识别问题的自监督随机掩码注意力 GAN

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111112

Jiashu Liao , Tanaya Guha , Victor Sanchez

Pose Invariant Face Recognition (PIFR) has significantly advanced with Generative Adversarial Networks (GANs), which rotate face images acquired at any angle to a frontal view for enhanced recognition. However, such frontalization methods typically need ground-truth frontal-view images, often collected under strict laboratory conditions, making it challenging and costly to acquire the necessary training data. Additionally, traditional self-supervised PIFR methods rely on external rendering models for training, further complicating the overall training process. To tackle these two issues, we propose a new framework called Mask Rotate. Our framework introduces a novel training approach that requires no paired ground truth data for the face image frontalization task. Moreover, it eliminates the need for an external rendering model during training. Specifically, our framework simplifies the face image frontalization task by transforming it into a face image completion task. During the inference or testing stage, it employs a reliable pre-trained rendering model to obtain a frontal-view face image, which may have several regions with missing texture due to pose variations and occlusion. Our framework then uses a novel self-supervised Random Mask Attention Generative Adversarial Network (RMAGAN) to fill in these missing regions by considering them as randomly masked regions. Furthermore, our proposed Mask Rotate framework uses a reliable post-processing model designed to improve the visual quality of the face images after frontalization. In comprehensive experiments, the Mask Rotate framework eliminates the requirement for complex computations during training and achieves strong results, both qualitative and quantitative, compared to the state-of-the-art.

姿态不变人脸识别（PIFR）在生成对抗网络（GANs）的帮助下取得了长足的进步，该网络可将从任何角度获取的人脸图像旋转为正面视图，从而提高识别率。然而，这种正面化方法通常需要在严格的实验室条件下采集的地面真实正面视图图像，因此获取必要的训练数据具有挑战性且成本高昂。此外，传统的自监督 PIFR 方法依赖外部渲染模型进行训练，使整个训练过程更加复杂。为了解决这两个问题，我们提出了一个名为 Mask Rotate 的新框架。我们的框架引入了一种新颖的训练方法，无需为人脸图像正面化任务提供成对的地面真实数据。此外，它在训练过程中无需外部渲染模型。具体来说，我们的框架将人脸图像正面化任务转化为人脸图像完成任务，从而简化了人脸图像正面化任务。在推理或测试阶段，它采用可靠的预训练渲染模型来获取正面视角的人脸图像，由于姿势变化和遮挡，该图像可能会有几个纹理缺失的区域。然后，我们的框架使用新颖的自监督随机遮罩注意力生成对抗网络（RMAGAN），通过将这些区域视为随机遮罩区域来填补这些缺失区域。此外，我们提出的掩码旋转框架采用了可靠的后处理模型，旨在提高正面化后人脸图像的视觉质量。在综合实验中，面具旋转框架无需在训练过程中进行复杂的计算，在定性和定量方面都取得了优于最先进技术的结果。

{"title":"Self-supervised random mask attention GAN in tackling pose-invariant face recognition","authors":"Jiashu Liao , Tanaya Guha , Victor Sanchez","doi":"10.1016/j.patcog.2024.111112","DOIUrl":"10.1016/j.patcog.2024.111112","url":null,"abstract":"<div><div>Pose Invariant Face Recognition (PIFR) has significantly advanced with Generative Adversarial Networks (GANs), which rotate face images acquired at any angle to a frontal view for enhanced recognition. However, such frontalization methods typically need ground-truth frontal-view images, often collected under strict laboratory conditions, making it challenging and costly to acquire the necessary training data. Additionally, traditional self-supervised PIFR methods rely on external rendering models for training, further complicating the overall training process. To tackle these two issues, we propose a new framework called <em>Mask Rotate</em>. Our framework introduces a novel training approach that requires no paired ground truth data for the face image frontalization task. Moreover, it eliminates the need for an external rendering model during training. Specifically, our framework simplifies the face image frontalization task by transforming it into a face image completion task. During the inference or testing stage, it employs a reliable pre-trained rendering model to obtain a frontal-view face image, which may have several regions with missing texture due to pose variations and occlusion. Our framework then uses a novel self-supervised <em>Random Mask</em> Attention Generative Adversarial Network (RMAGAN) to fill in these missing regions by considering them as randomly masked regions. Furthermore, our proposed <em>Mask Rotate</em> framework uses a reliable post-processing model designed to improve the visual quality of the face images after frontalization. In comprehensive experiments, the <em>Mask Rotate</em> framework eliminates the requirement for complex computations during training and achieves strong results, both qualitative and quantitative, compared to the state-of-the-art.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111112"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spcolor: Semantic prior guided exemplar-based image colorization Spcolor：基于先验语义引导的示例图像着色

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition

Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111109

Siqi Chen , Xianlin Zhang , Mingdao Wang , Xueming Li , Yu Zhang , Yue Zhang

Exemplar-based image colorization aims to colorize a target grayscale image based on a color reference image, and the key is to establish accurate pixel-level semantic correspondence between these two images. Previous methods directly search for correspondence over the entire reference image, and this type of global matching is prone to mismatch. Intuitively, a reasonable correspondence should be established between objects which are semantically similar. Motivated by this, we introduce the idea of semantic prior and propose SPColor, a semantic prior guided exemplar-based image colorization framework. Several novel components are systematically designed in SPColor, including a semantic prior guided correspondence network (SPC), a category reduction algorithm (CRA), and a similarity masked perceptual loss (SMP loss). Different from previous methods, SPColor establishes the correspondence between the pixels in the same semantic class locally. In this way, improper correspondence between different semantic classes is explicitly excluded, and the mismatch is obviously alleviated. In addition, SPColor supports region-level class assignments before SPC in the pipeline. With this feature, a category manipulation process (CMP) is proposed as an interactive interface to control colorization, which can also produce more varied colorization results and improve the flexibility of reference selection. Experiments demonstrate that our model outperforms recent state-of-the-art methods both quantitatively and qualitatively on public dataset. Our code is available at https://github.com/viector/spcolor.

基于范例的图像着色旨在根据彩色参考图像对目标灰度图像进行着色，关键是在这两幅图像之间建立精确的像素级语义对应关系。以往的方法直接在整个参考图像中寻找对应关系，这种全局匹配容易出现不匹配的情况。直观地说，应在语义相似的对象之间建立合理的对应关系。受此启发，我们引入了语义先验的概念，并提出了基于语义先验引导的示例图像着色框架 SPColor。SPColor 中系统地设计了几个新组件，包括语义先验引导的对应网络 (SPC)、类别还原算法 (CRA) 和相似性掩蔽感知损失 (SMP)。与以往的方法不同，SPColor 在本地建立同一语义类别像素之间的对应关系。这样，就明确排除了不同语义类别之间的不恰当对应，明显缓解了不匹配问题。此外，SPColor 还支持在管道 SPC 之前进行区域级类别分配。有了这一功能，我们提出了类别操作流程（CMP）作为控制着色的交互界面，这也能产生更多样的着色结果，提高参照物选择的灵活性。实验证明，在公共数据集上，我们的模型在定量和定性方面都优于最新的先进方法。我们的代码见 https://github.com/viector/spcolor。

{"title":"Spcolor: Semantic prior guided exemplar-based image colorization","authors":"Siqi Chen , Xianlin Zhang , Mingdao Wang , Xueming Li , Yu Zhang , Yue Zhang","doi":"10.1016/j.patcog.2024.111109","DOIUrl":"10.1016/j.patcog.2024.111109","url":null,"abstract":"<div><div>Exemplar-based image colorization aims to colorize a target grayscale image based on a color reference image, and the key is to establish accurate pixel-level semantic correspondence between these two images. Previous methods directly search for correspondence over the entire reference image, and this type of global matching is prone to mismatch. Intuitively, a reasonable correspondence should be established between objects which are semantically similar. Motivated by this, we introduce the idea of semantic prior and propose SPColor, a semantic prior guided exemplar-based image colorization framework. Several novel components are systematically designed in SPColor, including a semantic prior guided correspondence network (SPC), a category reduction algorithm (CRA), and a similarity masked perceptual loss (SMP loss). Different from previous methods, SPColor establishes the correspondence between the pixels in the same semantic class locally. In this way, improper correspondence between different semantic classes is explicitly excluded, and the mismatch is obviously alleviated. In addition, SPColor supports region-level class assignments before SPC in the pipeline. With this feature, a category manipulation process (CMP) is proposed as an interactive interface to control colorization, which can also produce more varied colorization results and improve the flexibility of reference selection. Experiments demonstrate that our model outperforms recent state-of-the-art methods both quantitatively and qualitatively on public dataset. Our code is available at <span><span>https://github.com/viector/spcolor</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111109"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0