首页 > 最新文献

Pattern Recognition最新文献

英文 中文
Instant pose extraction based on mask transformer for occluded person re-identification 基于遮罩变换器的即时姿势提取,用于模糊人物再识别
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-22 DOI: 10.1016/j.patcog.2024.111082
Re-Identification (Re-ID) of obscured pedestrians is a daunting task, primarily due to the frequent occlusion caused by various obstacles like buildings, vehicles, and even other pedestrians. To address this challenge, we propose a novel approach named Instant Pose Extraction based on Mask Transformer (MTIPE), tailored specifically for occluded person Re-ID. MTIPE consists of several new modules: a Mask Aware Module (MAM) for alignment between the overall prototype and the occluded image; a Multi-headed Attention Constraint Module (MACM) to enrich the feature representation; a Pose Aggregation Module (PAM) to separate useful human information from the occlusion noise; a Feature Matching Module (FMM) in matching non-occluded parts; introduction of learnable local prototypes in the defined local prototype-based transformer decoder; a Pooling Attention Module (PAM) instead of traditional self-attention module to better extract and propagate local contextual information; and Pose Key-points Loss to better match non-occluded body parts. Through comprehensive experimental evaluations and comparisons, MTIPE demonstrates encouraging performance improvements in both occluded and holistic person Re-ID tasks. Its results surpass or at least match those of current state-of-the-art methods in various aspects, highlighting its potential advantages and promising application prospects.
对被遮挡的行人进行再识别(Re-ID)是一项艰巨的任务,这主要是由于建筑物、车辆甚至其他行人等各种障碍物经常造成遮挡。为了应对这一挑战,我们提出了一种名为 "基于掩模变换器的即时姿态提取"(MTIPE)的新方法,专门用于模糊行人的重新识别。MTIPE 由几个新模块组成:遮罩感知模块(MAM),用于整体原型与遮挡图像之间的对齐;多头注意力约束模块(MACM),用于丰富特征表示;姿态聚合模块(PAM),用于从遮挡噪声中分离出有用的人体信息;特征匹配模块(FMM),用于匹配非遮挡部分;在已定义的基于局部原型的变换解码器中引入可学习的局部原型;汇集注意力模块(PAM)取代传统的自我注意力模块,以更好地提取和传播局部上下文信息;以及姿势关键点丢失,以更好地匹配非闭塞身体部位。通过全面的实验评估和比较,MTIPE 在隐蔽和整体人物再识别任务中都取得了令人鼓舞的性能改进。其结果在各个方面都超越或至少与当前最先进的方法相当,凸显了其潜在优势和广阔的应用前景。
{"title":"Instant pose extraction based on mask transformer for occluded person re-identification","authors":"","doi":"10.1016/j.patcog.2024.111082","DOIUrl":"10.1016/j.patcog.2024.111082","url":null,"abstract":"<div><div>Re-Identification (Re-ID) of obscured pedestrians is a daunting task, primarily due to the frequent occlusion caused by various obstacles like buildings, vehicles, and even other pedestrians. To address this challenge, we propose a novel approach named Instant Pose Extraction based on Mask Transformer (MTIPE), tailored specifically for occluded person Re-ID. MTIPE consists of several new modules: a Mask Aware Module (MAM) for alignment between the overall prototype and the occluded image; a Multi-headed Attention Constraint Module (MACM) to enrich the feature representation; a Pose Aggregation Module (PAM) to separate useful human information from the occlusion noise; a Feature Matching Module (FMM) in matching non-occluded parts; introduction of learnable local prototypes in the defined local prototype-based transformer decoder; a Pooling Attention Module (PAM) instead of traditional self-attention module to better extract and propagate local contextual information; and Pose Key-points Loss to better match non-occluded body parts. Through comprehensive experimental evaluations and comparisons, MTIPE demonstrates encouraging performance improvements in both occluded and holistic person Re-ID tasks. Its results surpass or at least match those of current state-of-the-art methods in various aspects, highlighting its potential advantages and promising application prospects.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fine-grained Automatic Augmentation for handwritten character recognition 用于手写字符识别的细粒度自动增强技术
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-22 DOI: 10.1016/j.patcog.2024.111079
With the advancement of deep learning-based character recognition models, the training data size has become a crucial factor in improving the performance of handwritten text recognition. For languages with low-resource handwriting samples, data augmentation methods can effectively scale up the data size and improve the performance of handwriting recognition models. However, existing data augmentation methods for handwritten text face two limitations: (1) Methods based on global spatial transformations typically augment the training data by transforming each word sample as a whole but ignore the potential to generate fine-grained transformation from local word areas, limiting the diversity of the generated samples; (2) It is challenging to adaptively choose a reasonable augmentation parameter when applying these methods to different language datasets. To address these issues, this paper proposes Fine-grained Automatic Augmentation (FgAA) for handwritten character recognition. Specifically, FgAA views each word sample as composed of multiple strokes and achieves data augmentation by performing fine-grained transformations on the strokes. Each word is automatically segmented into various strokes, and each stroke is fitted with a Bézier curve. On such a basis, we define the augmentation policy related to the fine-grained transformation and use Bayesian optimization to select the optimal augmentation policy automatically, thereby achieving the automatic augmentation of handwriting samples. Experiments on seven handwriting datasets of different languages demonstrate that FgAA achieves the best augmentation effect for handwritten character recognition. Our code is available at https://github.com/IMU-MachineLearningSXD/Fine-grained-Automatic-Augmentation
随着基于深度学习的字符识别模型的发展,训练数据规模已成为提高手写文本识别性能的关键因素。对于手写样本资源较少的语言,数据扩增方法可以有效扩大数据规模,提高手写识别模型的性能。然而,现有的手写文本数据增强方法面临两个局限性:(1)基于全局空间变换的方法通常通过对每个单词样本进行整体变换来增强训练数据,但忽略了从局部单词区域生成细粒度变换的潜力,从而限制了生成样本的多样性;(2)将这些方法应用于不同语言数据集时,如何自适应地选择合理的增强参数具有挑战性。为了解决这些问题,本文提出了用于手写字符识别的细粒度自动增强(FgAA)方法。具体来说,FgAA 将每个单词样本视为由多个笔画组成,并通过对笔画进行细粒度变换来实现数据增强。每个字被自动分割成不同的笔画,每个笔画都用贝塞尔曲线拟合。在此基础上,我们定义了与细粒度变换相关的扩增策略,并使用贝叶斯优化法自动选择最优扩增策略,从而实现手写样本的自动扩增。在七个不同语言的手写数据集上进行的实验表明,FgAA 在手写字符识别方面取得了最佳的增强效果。我们的代码见 https://github.com/IMU-MachineLearningSXD/Fine-grained-Automatic-Augmentation
{"title":"Fine-grained Automatic Augmentation for handwritten character recognition","authors":"","doi":"10.1016/j.patcog.2024.111079","DOIUrl":"10.1016/j.patcog.2024.111079","url":null,"abstract":"<div><div>With the advancement of deep learning-based character recognition models, the training data size has become a crucial factor in improving the performance of handwritten text recognition. For languages with low-resource handwriting samples, data augmentation methods can effectively scale up the data size and improve the performance of handwriting recognition models. However, existing data augmentation methods for handwritten text face two limitations: (1) Methods based on global spatial transformations typically augment the training data by transforming each word sample as a whole but ignore the potential to generate fine-grained transformation from local word areas, limiting the diversity of the generated samples; (2) It is challenging to adaptively choose a reasonable augmentation parameter when applying these methods to different language datasets. To address these issues, this paper proposes Fine-grained Automatic Augmentation (FgAA) for handwritten character recognition. Specifically, FgAA views each word sample as composed of multiple strokes and achieves data augmentation by performing fine-grained transformations on the strokes. Each word is automatically segmented into various strokes, and each stroke is fitted with a Bézier curve. On such a basis, we define the augmentation policy related to the fine-grained transformation and use Bayesian optimization to select the optimal augmentation policy automatically, thereby achieving the automatic augmentation of handwriting samples. Experiments on seven handwriting datasets of different languages demonstrate that FgAA achieves the best augmentation effect for handwritten character recognition. Our code is available at <span><span>https://github.com/IMU-MachineLearningSXD/Fine-grained-Automatic-Augmentation</span><svg><path></path></svg></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142553017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “Fast adaptively balanced min-cut clustering” [Pattern Recognition 158 (2025) 111027] 快速自适应平衡最小切割聚类 "的更正 [Pattern Recognition 158 (2025) 111027]
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-22 DOI: 10.1016/j.patcog.2024.111084
{"title":"Corrigendum to “Fast adaptively balanced min-cut clustering” [Pattern Recognition 158 (2025) 111027]","authors":"","doi":"10.1016/j.patcog.2024.111084","DOIUrl":"10.1016/j.patcog.2024.111084","url":null,"abstract":"","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Piecewise convolutional neural network relation extraction with self-attention mechanism 具有自我关注机制的片断卷积神经网络关系提取
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-18 DOI: 10.1016/j.patcog.2024.111083
The task of relation extraction in natural language processing is to identify the relation between two specified entities in a sentence. However, the existing model methods do not fully utilize the word feature information and pay little attention to the influence degree of the relative relation extraction results of each word. In order to address the aforementioned issues, we propose a relation extraction method based on self-attention mechanism (SPCNN-VAE) to solve the above problems. First, we use a multi-head self-attention mechanism to process word vectors and generate sentence feature vector representations, which can be used to extract semantic dependencies between words in sentences. Then, we introduce the word position to combine the sentence feature representation with the position feature representation of words to form the input representation of piecewise convolutional neural network (PCNN). Furthermore, to identify the word feature information that is most useful for relation extraction, an attention-based pooling operation is employed to capture key convolutional features and classify the feature vectors. Finally, regularization is performed by a variational autoencoder (VAE) to enhance the encoding ability of model word information features. The performance analysis is performed on SemEval 2010 task 8, and the experimental results show that the proposed relation extraction model is effective and outperforms some competitive baselines.
自然语言处理中关系提取的任务是识别句子中两个指定实体之间的关系。然而,现有的模型方法没有充分利用词的特征信息,也很少关注每个词的相对关系提取结果的影响程度。针对上述问题,我们提出了一种基于自注意机制的关系提取方法(SPCNN-VAE)来解决上述问题。首先,我们使用多头自注意机制处理词向量,生成句子特征向量表示,用于提取句子中词与词之间的语义依赖关系。然后,我们引入词的位置,将句子特征表示与词的位置特征表示相结合,形成片断卷积神经网络(PCNN)的输入表示。此外,为了识别对关系提取最有用的单词特征信息,我们采用了基于注意力的池化操作来捕捉关键卷积特征并对特征向量进行分类。最后,通过变异自动编码器(VAE)进行正则化,以增强模型词信息特征的编码能力。在 SemEval 2010 任务 8 中进行了性能分析,实验结果表明所提出的关系提取模型是有效的,其性能优于一些竞争基线。
{"title":"Piecewise convolutional neural network relation extraction with self-attention mechanism","authors":"","doi":"10.1016/j.patcog.2024.111083","DOIUrl":"10.1016/j.patcog.2024.111083","url":null,"abstract":"<div><div>The task of relation extraction in natural language processing is to identify the relation between two specified entities in a sentence. However, the existing model methods do not fully utilize the word feature information and pay little attention to the influence degree of the relative relation extraction results of each word. In order to address the aforementioned issues, we propose a relation extraction method based on self-attention mechanism (SPCNN-VAE) to solve the above problems. First, we use a multi-head self-attention mechanism to process word vectors and generate sentence feature vector representations, which can be used to extract semantic dependencies between words in sentences. Then, we introduce the word position to combine the sentence feature representation with the position feature representation of words to form the input representation of piecewise convolutional neural network (PCNN). Furthermore, to identify the word feature information that is most useful for relation extraction, an attention-based pooling operation is employed to capture key convolutional features and classify the feature vectors. Finally, regularization is performed by a variational autoencoder (VAE) to enhance the encoding ability of model word information features. The performance analysis is performed on SemEval 2010 task 8, and the experimental results show that the proposed relation extraction model is effective and outperforms some competitive baselines.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding adversarial robustness against on-manifold adversarial examples 了解对抗性的鲁棒性,对抗性的例子在曲面上
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-18 DOI: 10.1016/j.patcog.2024.111071
Deep neural networks (DNNs) are shown to be vulnerable to adversarial examples. A well-trained model can be easily attacked by adding small perturbations to the original data. One of the hypotheses of the existence of the adversarial examples is the off-manifold assumption: adversarial examples lie off the data manifold. However, recent researches showed that on-manifold adversarial examples also exist. In this paper, we revisit the off-manifold assumption and study a question: at what level is the poor adversarial robustness of neural networks due to on-manifold adversarial examples? Since the true data manifold is unknown in practice, we consider two approximated on-manifold adversarial examples on both real and synthesis datasets. On real datasets, we show that on-manifold adversarial examples have greater attack rates than off-manifold adversarial examples on both standard-trained and adversarially-trained models. On synthetic datasets, theoretically, we prove that on-manifold adversarial examples are powerful, yet adversarial training focuses on off-manifold directions and ignores the on-manifold adversarial examples. Furthermore, we provide analysis to show that the properties derived theoretically can also be observed in practice. Our analysis suggests that on-manifold adversarial examples are important. We should pay more attention to on-manifold adversarial examples to train robust models.
深度神经网络(DNN)很容易受到对抗性示例的影响。一个训练有素的模型很容易受到攻击,方法是在原始数据中添加微小的扰动。对抗示例存在的假设之一是离谱假设:对抗示例位于数据流形之外。然而,最近的研究表明,在流形上的对抗示例也是存在的。在本文中,我们重新审视了离manifold假设,并研究了一个问题:on-manifold对抗示例导致神经网络对抗鲁棒性差的程度如何?由于真正的数据流形在实践中是未知的,因此我们在真实数据集和合成数据集上考虑了两个近似的在manifold上的对抗示例。在真实数据集上,我们发现在标准训练模型和对抗训练模型上,manifold 上的对抗示例比manifold 下的对抗示例具有更高的攻击率。在合成数据集上,从理论上讲,我们证明了在manifold上的对抗示例是强大的,然而对抗训练侧重于非manifold方向,而忽略了在manifold上的对抗示例。此外,我们还通过分析表明,在实践中也能观察到理论上得出的特性。我们的分析表明,在manifold上的对抗示例非常重要。我们应更多地关注on-manifold对抗示例,以训练稳健的模型。
{"title":"Understanding adversarial robustness against on-manifold adversarial examples","authors":"","doi":"10.1016/j.patcog.2024.111071","DOIUrl":"10.1016/j.patcog.2024.111071","url":null,"abstract":"<div><div>Deep neural networks (DNNs) are shown to be vulnerable to adversarial examples. A well-trained model can be easily attacked by adding small perturbations to the original data. One of the hypotheses of the existence of the adversarial examples is the off-manifold assumption: adversarial examples lie off the data manifold. However, recent researches showed that on-manifold adversarial examples also exist. In this paper, we revisit the off-manifold assumption and study a question: at what level is the poor adversarial robustness of neural networks due to on-manifold adversarial examples? Since the true data manifold is unknown in practice, we consider two approximated on-manifold adversarial examples on both real and synthesis datasets. On real datasets, we show that on-manifold adversarial examples have greater attack rates than off-manifold adversarial examples on both standard-trained and adversarially-trained models. On synthetic datasets, theoretically, we prove that on-manifold adversarial examples are powerful, yet adversarial training focuses on off-manifold directions and ignores the on-manifold adversarial examples. Furthermore, we provide analysis to show that the properties derived theoretically can also be observed in practice. Our analysis suggests that on-manifold adversarial examples are important. We should pay more attention to on-manifold adversarial examples to train robust models.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised learning from images: No negative pairs, no cluster-balancing 图像自监督学习无负对,无聚类平衡
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-18 DOI: 10.1016/j.patcog.2024.111081
Learning with self-derived targets provides a non-contrastive method for unsupervised image representation learning, where the variety in targets is crucial. Recent work has achieved good performance by learning with targets obtained via cluster-balancing. However, the equal-cluster-size constraint becomes too restrictive for handling data with imbalanced categories or coming in small batches. In this paper, we propose a new clustering-based approach for non-contrastive image representation learning with no need for a particular architecture design or extra memory bank and no explicit constraints on cluster size. A key formulation is to learn embedding consistency and variable decorrelation in the cluster space by tweaking the batch-wise cross-correlation matrix towards an identity one. With this identitization loss incorporated, predicted cluster assignments of two randomly augmented views of the same image serve as targets for each other. We carried out comprehensive experimental studies of linear classification with learned representations of benchmark image datasets. Our results show that the proposed approach significantly outperforms state-of-the-art approaches and is more robust to class imbalance than those with cluster balancing.
利用自生成的目标进行学习为无监督图像表征学习提供了一种非对比方法,在这种方法中,目标的多样性至关重要。最近的研究通过聚类平衡获得的目标进行学习,取得了很好的效果。然而,在处理类别不平衡或小批量数据时,簇大小相等的约束变得过于严格。在本文中,我们提出了一种新的基于聚类的非对比图像表征学习方法,它不需要特定的架构设计或额外的内存库,也没有明确的聚类大小限制。其关键表述是,通过调整批次交叉相关矩阵,使其趋向于一个同一矩阵,从而学习聚类空间中的嵌入一致性和可变去相关性。有了这种识别损失,同一图像的两个随机增强视图的预测聚类分配就会成为彼此的目标。我们利用基准图像数据集的学习表示对线性分类进行了全面的实验研究。我们的结果表明,所提出的方法明显优于最先进的方法,而且与那些采用聚类平衡的方法相比,它对类不平衡具有更强的鲁棒性。
{"title":"Self-supervised learning from images: No negative pairs, no cluster-balancing","authors":"","doi":"10.1016/j.patcog.2024.111081","DOIUrl":"10.1016/j.patcog.2024.111081","url":null,"abstract":"<div><div>Learning with self-derived targets provides a non-contrastive method for unsupervised image representation learning, where the variety in targets is crucial. Recent work has achieved good performance by learning with targets obtained via cluster-balancing. However, the equal-cluster-size constraint becomes too restrictive for handling data with imbalanced categories or coming in small batches. In this paper, we propose a new clustering-based approach for non-contrastive image representation learning with no need for a particular architecture design or extra memory bank and no explicit constraints on cluster size. A key formulation is to learn embedding consistency and variable decorrelation in the cluster space by tweaking the batch-wise cross-correlation matrix towards an identity one. With this identitization loss incorporated, predicted cluster assignments of two randomly augmented views of the same image serve as targets for each other. We carried out comprehensive experimental studies of linear classification with learned representations of benchmark image datasets. Our results show that the proposed approach significantly outperforms state-of-the-art approaches and is more robust to class imbalance than those with cluster balancing.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature-matching method based on keypoint response constraint using binary encoding of phase congruency 基于关键点响应约束的特征匹配方法,使用相位一致性二进制编码
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-16 DOI: 10.1016/j.patcog.2024.111078
At present, the cross-view geo-localization (CGL) task is still far from practical. This is mainly because of the intensity differences between the two images from different sensors. In this study, we propose a learning feature-matching framework with binary encoding of phase congruency to solve the problem of intensity differences between the two images. First, the autoencoder-weighted fusion method is used to obtain an intensity alignment image that would make the two images from different sensors comparable. Second, the keypoint responses of the two images are calculated using the binary encoding of the phase congruency theory, which is employed to construct the feature-matching method. This method considers the invariance of the phase information in weak-texture images and uses the phase information to compute the keypoint response with higher distinguishability and matchability. Finally, using the two intensity-aligned images, a method for computing the binary encoding of the phase congruency keypoint response loss function is employed to optimize the keypoint detector and feature descriptor and obtain the corresponding keypoint set of the two images. The experimental results show that the improved feature matching is superior to existing methods and solves the problem of view differences in object matching. The code can be found at https://github.com/lqq-dot/FMPCKR.
目前,跨视角地理定位(CGL)任务还远未实现。这主要是因为来自不同传感器的两幅图像之间存在强度差异。在本研究中,我们提出了一种具有相位一致性二进制编码的学习特征匹配框架,以解决两幅图像之间的强度差异问题。首先,使用自编码器加权融合方法获得强度对齐图像,使来自不同传感器的两幅图像具有可比性。其次,利用相位一致性理论的二进制编码计算两幅图像的关键点响应,并以此构建特征匹配方法。这种方法考虑了弱纹理图像中相位信息的不变性,并利用相位信息计算出具有更高区分度和匹配度的关键点响应。最后,利用两幅强度对齐的图像,采用计算相位一致性关键点响应损失函数二进制编码的方法,优化关键点检测器和特征描述器,得到两幅图像对应的关键点集。实验结果表明,改进后的特征匹配优于现有方法,并解决了物体匹配中的视图差异问题。代码见 https://github.com/lqq-dot/FMPCKR。
{"title":"Feature-matching method based on keypoint response constraint using binary encoding of phase congruency","authors":"","doi":"10.1016/j.patcog.2024.111078","DOIUrl":"10.1016/j.patcog.2024.111078","url":null,"abstract":"<div><div>At present, the cross-view geo-localization (CGL) task is still far from practical. This is mainly because of the intensity differences between the two images from different sensors. In this study, we propose a learning feature-matching framework with binary encoding of phase congruency to solve the problem of intensity differences between the two images. First, the autoencoder-weighted fusion method is used to obtain an intensity alignment image that would make the two images from different sensors comparable. Second, the keypoint responses of the two images are calculated using the binary encoding of the phase congruency theory, which is employed to construct the feature-matching method. This method considers the invariance of the phase information in weak-texture images and uses the phase information to compute the keypoint response with higher distinguishability and matchability. Finally, using the two intensity-aligned images, a method for computing the binary encoding of the phase congruency keypoint response loss function is employed to optimize the keypoint detector and feature descriptor and obtain the corresponding keypoint set of the two images. The experimental results show that the improved feature matching is superior to existing methods and solves the problem of view differences in object matching. The code can be found at <span><span>https://github.com/lqq-dot/FMPCKR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UPT-Flow: Multi-scale transformer-guided normalizing flow for low-light image enhancement UPT-Flow:用于低照度图像增强的多尺度变压器引导归一化流程
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-11 DOI: 10.1016/j.patcog.2024.111076
Low-light images often suffer from information loss and RGB value degradation due to extremely low or nonuniform lighting conditions. Many existing methods primarily focus on optimizing the appearance distance between the enhanced image and the normal-light image, while neglecting the explicit modeling of information loss regions or incorrect information points in low-light images. To address this, this paper proposes an Unbalanced Points-guided multi-scale Transformer-based conditional normalizing Flow (UPT-Flow) for low-light image enhancement. We design an unbalanced point map prior based on the differences in the proportion of RGB values for each pixel in the image, which is used to modify traditional self-attention and mitigate the negative effects of areas with information distortion in the attention calculation. The Multi-Scale Transformer (MSFormer) is composed of several global-local transformer blocks, which encode rich global contextual information and local fine-grained details for conditional normalizing flow. In the invertible network of flow, we design cross-coupling conditional affine layers based on channel and spatial attention, enhancing the expressive power of a single flow step. Without bells and whistles, extensive experiments on low-light image enhancement, night traffic monitoring enhancement, low-light object detection, and nighttime image segmentation have demonstrated that our proposed method achieves state-of-the-art performance across a variety of real-world scenes. The code and pre-trained models will be available at https://github.com/NJUPT-IPR-XuLintao/UPT-Flow.
由于光照条件极低或不均匀,低照度图像通常会出现信息丢失和 RGB 值下降的问题。现有的许多方法主要侧重于优化增强图像与正常光照图像之间的外观距离,而忽视了对低照度图像中信息丢失区域或不正确信息点的明确建模。针对这一问题,本文提出了一种用于弱光图像增强的非平衡点引导的基于多尺度变换器的条件归一化流程(UPT-Flow)。我们根据图像中每个像素的 RGB 值比例差异设计了一种非平衡点图先验,用来修正传统的自我注意力,减轻信息失真区域在注意力计算中的负面影响。多尺度变换器(MSFormer)由多个全局-局部变换器块组成,编码丰富的全局上下文信息和局部细粒度细节,用于条件归一化流量。在流动的可逆网络中,我们设计了基于通道和空间注意力的交叉耦合条件仿射层,从而增强了单一流动步骤的表现力。在没有任何附加功能的情况下,我们在弱光图像增强、夜间交通监控增强、弱光物体检测和夜间图像分割等方面进行的大量实验证明,我们提出的方法在各种真实场景中都能达到最先进的性能。代码和预训练模型将发布在 https://github.com/NJUPT-IPR-XuLintao/UPT-Flow 网站上。
{"title":"UPT-Flow: Multi-scale transformer-guided normalizing flow for low-light image enhancement","authors":"","doi":"10.1016/j.patcog.2024.111076","DOIUrl":"10.1016/j.patcog.2024.111076","url":null,"abstract":"<div><div>Low-light images often suffer from information loss and RGB value degradation due to extremely low or nonuniform lighting conditions. Many existing methods primarily focus on optimizing the appearance distance between the enhanced image and the normal-light image, while neglecting the explicit modeling of information loss regions or incorrect information points in low-light images. To address this, this paper proposes an Unbalanced Points-guided multi-scale Transformer-based conditional normalizing Flow (UPT-Flow) for low-light image enhancement. We design an unbalanced point map prior based on the differences in the proportion of RGB values for each pixel in the image, which is used to modify traditional self-attention and mitigate the negative effects of areas with information distortion in the attention calculation. The Multi-Scale Transformer (MSFormer) is composed of several global-local transformer blocks, which encode rich global contextual information and local fine-grained details for conditional normalizing flow. In the invertible network of flow, we design cross-coupling conditional affine layers based on channel and spatial attention, enhancing the expressive power of a single flow step. Without bells and whistles, extensive experiments on low-light image enhancement, night traffic monitoring enhancement, low-light object detection, and nighttime image segmentation have demonstrated that our proposed method achieves state-of-the-art performance across a variety of real-world scenes. The code and pre-trained models will be available at <span><span>https://github.com/NJUPT-IPR-XuLintao/UPT-Flow</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CEDNet: A cascade encoder–decoder network for dense prediction CEDNet:用于密集预测的级联编码器-解码器网络
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-10 DOI: 10.1016/j.patcog.2024.111072
The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these features using a lightweight module. However, these methods allocate most computational resources to the classification backbone, which delays the multi-scale feature fusion and potentially leads to inadequate feature fusion. Although some methods perform feature fusion from early stages, they either fail to fully leverage high-level features to guide low-level feature learning or have complex structures, resulting in sub-optimal performance. We propose a streamlined cascade encoder–decoder network, named CEDNet, tailored for dense prediction tasks. All stages in CEDNet share the same encoder–decoder structure and perform multi-scale feature fusion within each decoder, thereby enhancing the effectiveness of multi-scale feature fusion. We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN, all of which yielded promising results. Experiments on various dense prediction tasks demonstrated the effectiveness of our method.1
用于密集预测任务的主流方法通常利用重型分类骨干网提取多尺度特征,然后使用轻型模块融合这些特征。然而,这些方法将大部分计算资源都分配给了分类主干,从而延迟了多尺度特征融合,并可能导致特征融合不充分。虽然有些方法在早期阶段就进行了特征融合,但它们要么不能充分利用高级特征来指导低级特征学习,要么结构复杂,导致性能未达到最佳。我们提出了一种精简的级联编码器-解码器网络,名为 CEDNet,专为密集预测任务定制。CEDNet 中的所有阶段共享相同的编码器-解码器结构,并在每个解码器中执行多尺度特征融合,从而提高了多尺度特征融合的效果。我们探索了三种著名的编码器-解码器结构:沙漏、UNet 和 FPN,它们都取得了令人满意的结果。各种密集预测任务的实验证明了我们方法的有效性1。
{"title":"CEDNet: A cascade encoder–decoder network for dense prediction","authors":"","doi":"10.1016/j.patcog.2024.111072","DOIUrl":"10.1016/j.patcog.2024.111072","url":null,"abstract":"<div><div>The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these features using a lightweight module. However, these methods allocate most computational resources to the classification backbone, which delays the multi-scale feature fusion and potentially leads to inadequate feature fusion. Although some methods perform feature fusion from early stages, they either fail to fully leverage high-level features to guide low-level feature learning or have complex structures, resulting in sub-optimal performance. We propose a streamlined cascade encoder–decoder network, named CEDNet, tailored for dense prediction tasks. All stages in CEDNet share the same encoder–decoder structure and perform multi-scale feature fusion within each decoder, thereby enhancing the effectiveness of multi-scale feature fusion. We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN, all of which yielded promising results. Experiments on various dense prediction tasks demonstrated the effectiveness of our method.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EENet: An effective and efficient network for single image dehazing EENet:有效、高效的单一图像去毛刺网络
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-10 DOI: 10.1016/j.patcog.2024.111074
While numerous solutions leveraging convolutional neural networks and Transformers have been proposed for image dehazing, there remains significant potential to improve the balance between efficiency and reconstruction performance. In this paper, we introduce an efficient and effective network named EENet, designed for image dehazing through enhanced spatial–spectral learning. EENet comprises three primary modules: the frequency processing module, the spatial processing module, and the dual-domain interaction module. Specifically, the frequency processing module handles Fourier components individually based on their distinct properties for image dehazing while also modeling global dependencies according to the convolution theorem. Additionally, the spatial processing module is designed to enable multi-scale learning. Finally, the dual-domain interaction module promotes information exchange between the frequency and spatial domains. Extensive experiments demonstrate that EENet achieves state-of-the-art performance on seven synthetic and real-world datasets for image dehazing. Moreover, the network’s generalization ability is validated by extending it to image desnowing, image defocus deblurring, and low-light image enhancement.
虽然利用卷积神经网络和变换器为图像去毛刺提出了许多解决方案,但在提高效率和重建性能之间的平衡方面仍有很大潜力。在本文中,我们介绍了一种名为 EENet 的高效网络,旨在通过增强空间光谱学习实现图像去毛刺。EENet 包括三个主要模块:频率处理模块、空间处理模块和双域交互模块。具体来说,频率处理模块根据傅立叶分量的不同特性对其进行单独处理,以便对图像进行消隐,同时还根据卷积定理对全局依赖性进行建模。此外,空间处理模块旨在实现多尺度学习。最后,双域交互模块促进了频域和空间域之间的信息交流。广泛的实验证明,EENet 在七个合成和真实世界数据集上实现了最先进的图像去毛刺性能。此外,该网络的通用能力还通过扩展到图像去胶、图像散焦去模糊和弱光图像增强得到了验证。
{"title":"EENet: An effective and efficient network for single image dehazing","authors":"","doi":"10.1016/j.patcog.2024.111074","DOIUrl":"10.1016/j.patcog.2024.111074","url":null,"abstract":"<div><div>While numerous solutions leveraging convolutional neural networks and Transformers have been proposed for image dehazing, there remains significant potential to improve the balance between efficiency and reconstruction performance. In this paper, we introduce an efficient and effective network named EENet, designed for image dehazing through enhanced spatial–spectral learning. EENet comprises three primary modules: the frequency processing module, the spatial processing module, and the dual-domain interaction module. Specifically, the frequency processing module handles Fourier components individually based on their distinct properties for image dehazing while also modeling global dependencies according to the convolution theorem. Additionally, the spatial processing module is designed to enable multi-scale learning. Finally, the dual-domain interaction module promotes information exchange between the frequency and spatial domains. Extensive experiments demonstrate that EENet achieves state-of-the-art performance on seven synthetic and real-world datasets for image dehazing. Moreover, the network’s generalization ability is validated by extending it to image desnowing, image defocus deblurring, and low-light image enhancement.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1