Pub Date : 2024-11-04DOI: 10.1016/j.patcog.2024.111117
Ganggang Dong, Yafei Song
The deep learning-based target recognition methods have achieved great performance in the preceding works. Large amounts of training data with label were collected to train a deep architecture, by which the inference can be obtained. For radar sensors, the data could be collected easily, yet the prior knowledge on label was difficult to be accessed. To solve the problem, a cross-domain re-imaging target augmentation method was proposed in this paper. The original image was first recast into the frequency domain. The frequency were then randomly filtered by a randomly generated mask. The size and the shape of mask was randomly determined. The filtering results were finally used for re-imaging. The original target can be then reconstructed accordingly. A series of new samples can be generated freely. The amounts and the diversity of dataset can be therefore improved. The proposed augmentation method can be implemented on-line or off-line, making it adaptable to various downstream tasks. Multiple comparative studies throw the light on the superiority of proposed method over the standard and recent techniques. It served to generate the images that would aid the downstream tasks.
{"title":"SAR target augmentation and recognition via cross-domain reconstruction","authors":"Ganggang Dong, Yafei Song","doi":"10.1016/j.patcog.2024.111117","DOIUrl":"10.1016/j.patcog.2024.111117","url":null,"abstract":"<div><div>The deep learning-based target recognition methods have achieved great performance in the preceding works. Large amounts of training data with label were collected to train a deep architecture, by which the inference can be obtained. For radar sensors, the data could be collected easily, yet the prior knowledge on label was difficult to be accessed. To solve the problem, a cross-domain re-imaging target augmentation method was proposed in this paper. The original image was first recast into the frequency domain. The frequency were then randomly filtered by a randomly generated mask. The size and the shape of mask was randomly determined. The filtering results were finally used for re-imaging. The original target can be then reconstructed accordingly. A series of new samples can be generated freely. The amounts and the diversity of dataset can be therefore improved. The proposed augmentation method can be implemented on-line or off-line, making it adaptable to various downstream tasks. Multiple comparative studies throw the light on the superiority of proposed method over the standard and recent techniques. It served to generate the images that would aid the downstream tasks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111117"},"PeriodicalIF":7.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-04DOI: 10.1016/j.patcog.2024.111140
Weijun Sun, Chaoye Li, Qiaoyun Li, Xiaozhao Fang, Jiakai He, Lei Liu
Graph-based and tensor-based multi-view clustering have gained popularity in recent years due to their ability to explore the relationship between samples. However, there are still several shortcomings in the current multi-view graph clustering algorithms. (1) Most previous methods only focus on the inter-view correlation, while ignoring the intra-view correlation. (2) They usually use the Tensor Nuclear Norm (TNN) to approximate the rank of tensors. However, while it has the same penalty for different singular values, the model cannot approximate the true rank of tensors well. To solve these problems in a unified way, we propose a new tensor-based multi-view graph clustering method. Specifically, we introduce the Enhanced Tensor Rank (ETR) minimization of intra-view and inter-view in the process of learning the affinity graph of each view. Compared with 10 state-of-the-art methods on 8 real datasets, the experimental results demonstrate the superiority of our method.
{"title":"Joint Intra-view and Inter-view Enhanced Tensor Low-rank Induced Affinity Graph Learning","authors":"Weijun Sun, Chaoye Li, Qiaoyun Li, Xiaozhao Fang, Jiakai He, Lei Liu","doi":"10.1016/j.patcog.2024.111140","DOIUrl":"10.1016/j.patcog.2024.111140","url":null,"abstract":"<div><div>Graph-based and tensor-based multi-view clustering have gained popularity in recent years due to their ability to explore the relationship between samples. However, there are still several shortcomings in the current multi-view graph clustering algorithms. (1) Most previous methods only focus on the inter-view correlation, while ignoring the intra-view correlation. (2) They usually use the Tensor Nuclear Norm (TNN) to approximate the rank of tensors. However, while it has the same penalty for different singular values, the model cannot approximate the true rank of tensors well. To solve these problems in a unified way, we propose a new tensor-based multi-view graph clustering method. Specifically, we introduce the Enhanced Tensor Rank (ETR) minimization of intra-view and inter-view in the process of learning the affinity graph of each view. Compared with 10 state-of-the-art methods on 8 real datasets, the experimental results demonstrate the superiority of our method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111140"},"PeriodicalIF":7.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-03DOI: 10.1016/j.patcog.2024.111136
Ningning Bai , Xiaofeng Wang , Ruidong Han , Jianpeng Hou , Yihang Wang , Shanmin Pang
The content authenticity and reliability of digital images have promoted the research on image manipulation localization (IML). Most current deep learning-based methods focus on extracting global or local tampering features for identifying forged regions. These features usually contain semantic information and lead to inaccurate detection results for non-object or incomplete semantic tampered regions. In this study, we propose a novel Progressive Inconsistency Mining Network (PIM-Net) for effective IML. Specifically, PIM-Net consists of two core modules, the Inconsistency Mining Module (ICMM) and the Progressive Fusion Refinement module (PFR). ICMM models the inconsistency between authentic and forged regions at two levels, i.e., pixel correlation inconsistency and region attribute incongruity, while avoiding the interference of semantic information. Then PFR progressively aggregates and refines extracted inconsistent features, which in turn yields finer and pure localization responses. Extensive qualitative and quantitative experiments on five benchmarks demonstrate PIM-Net’s superiority to current state-of-the-art IML methods. Code: https://github.com/ningnbai/PIM-Net.
{"title":"PIM-Net: Progressive Inconsistency Mining Network for image manipulation localization","authors":"Ningning Bai , Xiaofeng Wang , Ruidong Han , Jianpeng Hou , Yihang Wang , Shanmin Pang","doi":"10.1016/j.patcog.2024.111136","DOIUrl":"10.1016/j.patcog.2024.111136","url":null,"abstract":"<div><div>The content authenticity and reliability of digital images have promoted the research on image manipulation localization (IML). Most current deep learning-based methods focus on extracting global or local tampering features for identifying forged regions. These features usually contain semantic information and lead to inaccurate detection results for non-object or incomplete semantic tampered regions. In this study, we propose a novel Progressive Inconsistency Mining Network (PIM-Net) for effective IML. Specifically, PIM-Net consists of two core modules, the Inconsistency Mining Module (ICMM) and the Progressive Fusion Refinement module (PFR). ICMM models the inconsistency between authentic and forged regions at two levels, i.e., pixel correlation inconsistency and region attribute incongruity, while avoiding the interference of semantic information. Then PFR progressively aggregates and refines extracted inconsistent features, which in turn yields finer and pure localization responses. Extensive qualitative and quantitative experiments on five benchmarks demonstrate PIM-Net’s superiority to current state-of-the-art IML methods. Code: <span><span>https://github.com/ningnbai/PIM-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111136"},"PeriodicalIF":7.5,"publicationDate":"2024-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-03DOI: 10.1016/j.patcog.2024.111144
Haojun Jiang , Jianke Zhang , Rui Huang , Chunjiang Ge , Zanlin Ni , Shiji Song , Gao Huang
Vision–language retrieval is an important multi-modal learning topic, where the goal is to retrieve the most relevant visual candidate for a given text query. Recently, pre-trained models, e.g., CLIP, show great potential on retrieval tasks. However, as pre-trained models are scaling up, fully fine-tuning them on donwstream retrieval datasets has a high risk of overfitting. Moreover, in practice, it would be costly to train and store a large model for each task. To overcome the above issues, we present a novel Cross-Modal Adapter for parameter-efficient transfer learning. Inspired by adapter-based methods, we adjust the pre-trained model with a few parameterization layers. However, there are two notable differences. First, our method is designed for the multi-modal domain. Secondly, it allows encoder-level implicit cross-modal interactions between vision and language encoders. Although surprisingly simple, our approach has three notable benefits: (1) reduces the vast majority of fine-tuned parameters, (2) saves training time, and (3) allows all the pre-trained parameters to be fixed, enabling the pre-trained model to be shared across datasets. Extensive experiments demonstrate that, without bells and whistles, our approach outperforms adapter-based methods on image–text retrieval datasets (MSCOCO, Flickr30K) and video–text retrieval datasets (MSR-VTT, DiDeMo, and ActivityNet).
{"title":"Cross-modal adapter for vision–language retrieval","authors":"Haojun Jiang , Jianke Zhang , Rui Huang , Chunjiang Ge , Zanlin Ni , Shiji Song , Gao Huang","doi":"10.1016/j.patcog.2024.111144","DOIUrl":"10.1016/j.patcog.2024.111144","url":null,"abstract":"<div><div>Vision–language retrieval is an important multi-modal learning topic, where the goal is to retrieve the most relevant visual candidate for a given text query. Recently, pre-trained models, <em>e.g.</em>, CLIP, show great potential on retrieval tasks. However, as pre-trained models are scaling up, fully fine-tuning them on donwstream retrieval datasets has a high risk of overfitting. Moreover, in practice, it would be costly to train and store a large model for each task. To overcome the above issues, we present a novel <strong>Cross-Modal Adapter</strong> for parameter-efficient transfer learning. Inspired by adapter-based methods, we adjust the pre-trained model with a few parameterization layers. However, there are two notable differences. First, our method is designed for the multi-modal domain. Secondly, it allows encoder-level implicit cross-modal interactions between vision and language encoders. Although surprisingly simple, our approach has three notable benefits: (1) reduces the vast majority of fine-tuned parameters, (2) saves training time, and (3) allows all the pre-trained parameters to be fixed, enabling the pre-trained model to be shared across datasets. Extensive experiments demonstrate that, without bells and whistles, our approach outperforms adapter-based methods on image–text retrieval datasets (MSCOCO, Flickr30K) and video–text retrieval datasets (MSR-VTT, DiDeMo, and ActivityNet).</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111144"},"PeriodicalIF":7.5,"publicationDate":"2024-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-02DOI: 10.1016/j.patcog.2024.111126
Yan Huang , Xinchang Lu , Yuhui Quan , Yong Xu , Hui Ji
In recent years, deep learning has emerged as an important tool for image shadow removal. However, existing methods often prioritize shadow detection and, in doing so, they oversimplify the lighting conditions of shadow regions. Furthermore, these methods neglect cues from the overall image lighting when re-lighting shadow areas, thereby failing to ensure global lighting consistency. To address these challenges in images captured under complex lighting conditions, this paper introduces a multi-scale network built on a Retinex decomposition model. The proposed approach effectively senses shadows with uneven lighting and re-light them, achieving greater consistency along shadow boundaries. Furthermore, for the design of network, we introduce several techniques for boosting shadow removal performance, including a shadow-aware channel attention module, local discriminative and Retinex decomposition loss functions, and a multi-scale mechanism for guiding Retinex decomposition by concurrently capturing both fine-grained details and large-scale contextual information. Experimental results demonstrate the superiority of our proposed method over existing solutions, particularly for images taken under complex lighting conditions.
{"title":"Image shadow removal via multi-scale deep Retinex decomposition","authors":"Yan Huang , Xinchang Lu , Yuhui Quan , Yong Xu , Hui Ji","doi":"10.1016/j.patcog.2024.111126","DOIUrl":"10.1016/j.patcog.2024.111126","url":null,"abstract":"<div><div>In recent years, deep learning has emerged as an important tool for image shadow removal. However, existing methods often prioritize shadow detection and, in doing so, they oversimplify the lighting conditions of shadow regions. Furthermore, these methods neglect cues from the overall image lighting when re-lighting shadow areas, thereby failing to ensure global lighting consistency. To address these challenges in images captured under complex lighting conditions, this paper introduces a multi-scale network built on a Retinex decomposition model. The proposed approach effectively senses shadows with uneven lighting and re-light them, achieving greater consistency along shadow boundaries. Furthermore, for the design of network, we introduce several techniques for boosting shadow removal performance, including a shadow-aware channel attention module, local discriminative and Retinex decomposition loss functions, and a multi-scale mechanism for guiding Retinex decomposition by concurrently capturing both fine-grained details and large-scale contextual information. Experimental results demonstrate the superiority of our proposed method over existing solutions, particularly for images taken under complex lighting conditions.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111126"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-02DOI: 10.1016/j.patcog.2024.111121
Dayan Guan , Yun Xing , Jiaxing Huang , Aoran Xiao , Abdulmotaleb El Saddik , Shijian Lu
Data-limited semi-supervised learning tends to be severely degraded by miscalibration (i.e., misalignment between confidence and correctness of predicted pseudo labels) and stuck at poor local minima while learning from the same set of over-confident yet incorrect pseudo labels repeatedly. We design a simple and effective self-paced sampling technique that can greatly alleviate the impact of miscalibration and learn more accurate semi-supervised models from limited training data. Instead of employing static or dynamic confidence thresholds which is sensitive to miscalibration, the proposed self-paced sampling follows a simple linear policy to select pseudo labels which eases repeated learning from the same set of falsely predicted pseudo labels at the early training stage and lowers the chance of being stuck at local minima effectively. Despite its simplicity, extensive evaluations over multiple data-limited semi-supervised tasks show the proposed self-paced sampling outperforms the state-of-the-art consistently by large margins.
{"title":"S2Match: Self-paced sampling for data-limited semi-supervised learning","authors":"Dayan Guan , Yun Xing , Jiaxing Huang , Aoran Xiao , Abdulmotaleb El Saddik , Shijian Lu","doi":"10.1016/j.patcog.2024.111121","DOIUrl":"10.1016/j.patcog.2024.111121","url":null,"abstract":"<div><div>Data-limited semi-supervised learning tends to be severely degraded by miscalibration (i.e., misalignment between confidence and correctness of predicted pseudo labels) and stuck at poor local minima while learning from the same set of over-confident yet incorrect pseudo labels repeatedly. We design a simple and effective self-paced sampling technique that can greatly alleviate the impact of miscalibration and learn more accurate semi-supervised models from limited training data. Instead of employing static or dynamic confidence thresholds which is sensitive to miscalibration, the proposed self-paced sampling follows a simple linear policy to select pseudo labels which eases repeated learning from the same set of falsely predicted pseudo labels at the early training stage and lowers the chance of being stuck at local minima effectively. Despite its simplicity, extensive evaluations over multiple data-limited semi-supervised tasks show the proposed self-paced sampling outperforms the state-of-the-art consistently by large margins.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111121"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-02DOI: 10.1016/j.patcog.2024.111131
Xueting Chen , Yan Yan , Jing-Hao Xue , Nannan Wang , Hanzi Wang
Recently, visible–infrared person re-identification (VI-ReID) has received considerable attention due to its practical importance. A number of methods extract multiple local features to enrich the diversity of feature representations. However, some local features often involve modality-relevant information, leading to deteriorated performance. Moreover, existing methods optimize the models by only considering the samples at each batch while ignoring the learned features at previous iterations. As a result, the features of the same person images drastically change at different training epochs, hindering the training stability. To alleviate the above issues, we propose a novel consistency-driven feature scoring and regularization network (CFSR-Net), which consists of a backbone network, a local feature learning block, a feature scoring block, and a global–local feature fusion block, for VI-ReID. On the one hand, we design a cross-modality consistency loss to highlight modality-irrelevant local features and suppress modality-relevant local features for each modality, facilitating the generation of a reliable compact local feature. On the other hand, we develop a feature consistency regularization strategy (including a momentum class contrastive loss and a momentum distillation loss) to impose consistency regularization on the learning of different levels of features by considering the learned features at historical epochs. This effectively enables smooth feature changes and thus improves the training stability. Extensive experiments on public VI-ReID datasets clearly show the effectiveness of our method against several state-of-the-art VI-ReID methods. Code will be released at https://github.com/cxtjl/CFSR-Net.
{"title":"Consistency-driven feature scoring and regularization network for visible–infrared person re-identification","authors":"Xueting Chen , Yan Yan , Jing-Hao Xue , Nannan Wang , Hanzi Wang","doi":"10.1016/j.patcog.2024.111131","DOIUrl":"10.1016/j.patcog.2024.111131","url":null,"abstract":"<div><div>Recently, visible–infrared person re-identification (VI-ReID) has received considerable attention due to its practical importance. A number of methods extract multiple local features to enrich the diversity of feature representations. However, some local features often involve modality-relevant information, leading to deteriorated performance. Moreover, existing methods optimize the models by only considering the samples at each batch while ignoring the learned features at previous iterations. As a result, the features of the same person images drastically change at different training epochs, hindering the training stability. To alleviate the above issues, we propose a novel consistency-driven feature scoring and regularization network (CFSR-Net), which consists of a backbone network, a local feature learning block, a feature scoring block, and a global–local feature fusion block, for VI-ReID. On the one hand, we design a cross-modality consistency loss to highlight modality-irrelevant local features and suppress modality-relevant local features for each modality, facilitating the generation of a reliable compact local feature. On the other hand, we develop a feature consistency regularization strategy (including a momentum class contrastive loss and a momentum distillation loss) to impose consistency regularization on the learning of different levels of features by considering the learned features at historical epochs. This effectively enables smooth feature changes and thus improves the training stability. Extensive experiments on public VI-ReID datasets clearly show the effectiveness of our method against several state-of-the-art VI-ReID methods. Code will be released at <span><span>https://github.com/cxtjl/CFSR-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111131"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-02DOI: 10.1016/j.patcog.2024.111132
Filipe R. Cordeiro , Gustavo Carneiro
An important stage of most state-of-the-art (SOTA) noisy-label learning methods consists of a sample selection procedure that classifies samples from the noisy-label training set into noisy-label or clean-label subsets. The process of sample selection typically consists of one of the two approaches: loss-based sampling, where high-loss samples are considered to have noisy labels, or feature-based sampling, where samples from the same class tend to cluster together in the feature space and noisy-label samples are identified as anomalies within those clusters. Empirically, loss-based sampling is robust to a wide range of noise rates, while feature-based sampling tends to work effectively in particular scenarios, e.g., the filtering of noisy instances via their eigenvectors (FINE) sampling exhibits greater robustness in scenarios with low noise rates, and the K nearest neighbour (KNN) sampling mitigates better high noise-rate problems. This paper introduces the Adaptive Nearest Neighbours and Eigenvector-based (ANNE) sample selection methodology, a novel approach that integrates loss-based sampling with the feature-based sampling methods FINE and Adaptive KNN to optimize performance across a wide range of noise rate scenarios. ANNE achieves this integration by first partitioning the training set into high-loss and low-loss sub-groups using loss-based sampling. Subsequently, within the low-loss subset, sample selection is performed using FINE, while the high-loss subset employs Adaptive KNN for effective sample selection. We integrate ANNE into the noisy-label learning state of the art (SOTA) method SSR+, and test it on CIFAR-10/-100 (with symmetric, asymmetric and instance-dependent noise), Webvision and ANIMAL-10, where our method shows better accuracy than the SOTA in most experiments, with a competitive training time. The code is available at https://github.com/filipe-research/anne.
大多数最先进的(SOTA)噪声标签学习方法的一个重要阶段是样本选择过程,该过程将噪声标签训练集中的样本分类为噪声标签或清洁标签子集。样本选择过程通常包括两种方法中的一种:基于损失的抽样,即认为高损失样本具有噪声标签;或基于特征的抽样,即同一类别的样本在特征空间中趋于聚类,噪声标签样本被识别为这些聚类中的异常样本。从经验上看,基于损失的采样对各种噪声率都很稳健,而基于特征的采样往往在特定情况下有效,例如,通过特征向量过滤噪声实例(FINE)采样在噪声率较低的情况下表现出更强的稳健性,而 K 近邻(KNN)采样能更好地缓解高噪声率问题。本文介绍了基于自适应近邻和特征向量(ANNE)的样本选择方法,这是一种新颖的方法,它将基于损失的采样与基于特征的 FINE 和自适应 KNN 采样方法整合在一起,以优化各种噪声率情况下的性能。ANNE 通过首先使用基于损失的采样将训练集划分为高损失子群和低损失子群来实现这种整合。随后,在低损失子集中,使用 FINE 进行样本选择,而高损失子集则使用自适应 KNN 进行有效的样本选择。我们将 ANNE 集成到了最先进的噪声标签学习(SOTA)方法 SSR+ 中,并在 CIFAR-10/-100(具有对称、非对称和依赖实例的噪声)、Webvision 和 ANIMAL-10 上进行了测试。代码见 https://github.com/filipe-research/anne。
{"title":"ANNE: Adaptive Nearest Neighbours and Eigenvector-based sample selection for robust learning with noisy labels","authors":"Filipe R. Cordeiro , Gustavo Carneiro","doi":"10.1016/j.patcog.2024.111132","DOIUrl":"10.1016/j.patcog.2024.111132","url":null,"abstract":"<div><div>An important stage of most state-of-the-art (SOTA) noisy-label learning methods consists of a sample selection procedure that classifies samples from the noisy-label training set into noisy-label or clean-label subsets. The process of sample selection typically consists of one of the two approaches: loss-based sampling, where high-loss samples are considered to have noisy labels, or feature-based sampling, where samples from the same class tend to cluster together in the feature space and noisy-label samples are identified as anomalies within those clusters. Empirically, loss-based sampling is robust to a wide range of noise rates, while feature-based sampling tends to work effectively in particular scenarios, e.g., the filtering of noisy instances via their eigenvectors (FINE) sampling exhibits greater robustness in scenarios with low noise rates, and the K nearest neighbour (KNN) sampling mitigates better high noise-rate problems. This paper introduces the Adaptive Nearest Neighbours and Eigenvector-based (ANNE) sample selection methodology, a novel approach that integrates loss-based sampling with the feature-based sampling methods FINE and Adaptive KNN to optimize performance across a wide range of noise rate scenarios. ANNE achieves this integration by first partitioning the training set into high-loss and low-loss sub-groups using loss-based sampling. Subsequently, within the low-loss subset, sample selection is performed using FINE, while the high-loss subset employs Adaptive KNN for effective sample selection. We integrate ANNE into the noisy-label learning state of the art (SOTA) method SSR+, and test it on CIFAR-10/-100 (with symmetric, asymmetric and instance-dependent noise), Webvision and ANIMAL-10, where our method shows better accuracy than the SOTA in most experiments, with a competitive training time. The code is available at <span><span>https://github.com/filipe-research/anne</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111132"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-02DOI: 10.1016/j.patcog.2024.111112
Jiashu Liao , Tanaya Guha , Victor Sanchez
Pose Invariant Face Recognition (PIFR) has significantly advanced with Generative Adversarial Networks (GANs), which rotate face images acquired at any angle to a frontal view for enhanced recognition. However, such frontalization methods typically need ground-truth frontal-view images, often collected under strict laboratory conditions, making it challenging and costly to acquire the necessary training data. Additionally, traditional self-supervised PIFR methods rely on external rendering models for training, further complicating the overall training process. To tackle these two issues, we propose a new framework called Mask Rotate. Our framework introduces a novel training approach that requires no paired ground truth data for the face image frontalization task. Moreover, it eliminates the need for an external rendering model during training. Specifically, our framework simplifies the face image frontalization task by transforming it into a face image completion task. During the inference or testing stage, it employs a reliable pre-trained rendering model to obtain a frontal-view face image, which may have several regions with missing texture due to pose variations and occlusion. Our framework then uses a novel self-supervised Random Mask Attention Generative Adversarial Network (RMAGAN) to fill in these missing regions by considering them as randomly masked regions. Furthermore, our proposed Mask Rotate framework uses a reliable post-processing model designed to improve the visual quality of the face images after frontalization. In comprehensive experiments, the Mask Rotate framework eliminates the requirement for complex computations during training and achieves strong results, both qualitative and quantitative, compared to the state-of-the-art.
{"title":"Self-supervised random mask attention GAN in tackling pose-invariant face recognition","authors":"Jiashu Liao , Tanaya Guha , Victor Sanchez","doi":"10.1016/j.patcog.2024.111112","DOIUrl":"10.1016/j.patcog.2024.111112","url":null,"abstract":"<div><div>Pose Invariant Face Recognition (PIFR) has significantly advanced with Generative Adversarial Networks (GANs), which rotate face images acquired at any angle to a frontal view for enhanced recognition. However, such frontalization methods typically need ground-truth frontal-view images, often collected under strict laboratory conditions, making it challenging and costly to acquire the necessary training data. Additionally, traditional self-supervised PIFR methods rely on external rendering models for training, further complicating the overall training process. To tackle these two issues, we propose a new framework called <em>Mask Rotate</em>. Our framework introduces a novel training approach that requires no paired ground truth data for the face image frontalization task. Moreover, it eliminates the need for an external rendering model during training. Specifically, our framework simplifies the face image frontalization task by transforming it into a face image completion task. During the inference or testing stage, it employs a reliable pre-trained rendering model to obtain a frontal-view face image, which may have several regions with missing texture due to pose variations and occlusion. Our framework then uses a novel self-supervised <em>Random Mask</em> Attention Generative Adversarial Network (RMAGAN) to fill in these missing regions by considering them as randomly masked regions. Furthermore, our proposed <em>Mask Rotate</em> framework uses a reliable post-processing model designed to improve the visual quality of the face images after frontalization. In comprehensive experiments, the <em>Mask Rotate</em> framework eliminates the requirement for complex computations during training and achieves strong results, both qualitative and quantitative, compared to the state-of-the-art.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111112"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-02DOI: 10.1016/j.patcog.2024.111109
Siqi Chen , Xianlin Zhang , Mingdao Wang , Xueming Li , Yu Zhang , Yue Zhang
Exemplar-based image colorization aims to colorize a target grayscale image based on a color reference image, and the key is to establish accurate pixel-level semantic correspondence between these two images. Previous methods directly search for correspondence over the entire reference image, and this type of global matching is prone to mismatch. Intuitively, a reasonable correspondence should be established between objects which are semantically similar. Motivated by this, we introduce the idea of semantic prior and propose SPColor, a semantic prior guided exemplar-based image colorization framework. Several novel components are systematically designed in SPColor, including a semantic prior guided correspondence network (SPC), a category reduction algorithm (CRA), and a similarity masked perceptual loss (SMP loss). Different from previous methods, SPColor establishes the correspondence between the pixels in the same semantic class locally. In this way, improper correspondence between different semantic classes is explicitly excluded, and the mismatch is obviously alleviated. In addition, SPColor supports region-level class assignments before SPC in the pipeline. With this feature, a category manipulation process (CMP) is proposed as an interactive interface to control colorization, which can also produce more varied colorization results and improve the flexibility of reference selection. Experiments demonstrate that our model outperforms recent state-of-the-art methods both quantitatively and qualitatively on public dataset. Our code is available at https://github.com/viector/spcolor.
{"title":"Spcolor: Semantic prior guided exemplar-based image colorization","authors":"Siqi Chen , Xianlin Zhang , Mingdao Wang , Xueming Li , Yu Zhang , Yue Zhang","doi":"10.1016/j.patcog.2024.111109","DOIUrl":"10.1016/j.patcog.2024.111109","url":null,"abstract":"<div><div>Exemplar-based image colorization aims to colorize a target grayscale image based on a color reference image, and the key is to establish accurate pixel-level semantic correspondence between these two images. Previous methods directly search for correspondence over the entire reference image, and this type of global matching is prone to mismatch. Intuitively, a reasonable correspondence should be established between objects which are semantically similar. Motivated by this, we introduce the idea of semantic prior and propose SPColor, a semantic prior guided exemplar-based image colorization framework. Several novel components are systematically designed in SPColor, including a semantic prior guided correspondence network (SPC), a category reduction algorithm (CRA), and a similarity masked perceptual loss (SMP loss). Different from previous methods, SPColor establishes the correspondence between the pixels in the same semantic class locally. In this way, improper correspondence between different semantic classes is explicitly excluded, and the mismatch is obviously alleviated. In addition, SPColor supports region-level class assignments before SPC in the pipeline. With this feature, a category manipulation process (CMP) is proposed as an interactive interface to control colorization, which can also produce more varied colorization results and improve the flexibility of reference selection. Experiments demonstrate that our model outperforms recent state-of-the-art methods both quantitatively and qualitatively on public dataset. Our code is available at <span><span>https://github.com/viector/spcolor</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111109"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}