首页 > 最新文献

Pattern Recognition最新文献

英文 中文
Enhancing robust VQA via contrastive and self-supervised learning 通过对比和自我监督学习增强稳健 VQA
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111129
Runlin Cao , Zhixin Li , Zhenjun Tang , Canlong Zhang , Huifang Ma
Visual Question Answering (VQA) aims to evaluate the reasoning abilities of an intelligent agent using visual and textual information. However, recent research indicates that many VQA models rely primarily on learning the correlation between questions and answers in the training dataset rather than demonstrating actual reasoning ability. To address this limitation, we propose a novel training approach called Enhancing Robust VQA via Contrastive and Self-supervised Learning (CSL-VQA) to construct a more robust VQA model. Our approach involves generating two types of negative samples to balance the biased data, using self-supervised auxiliary tasks to help the base VQA model overcome language priors, and filtering out biased training samples. In addition, we construct positive samples by removing spurious correlations in biased samples and perform auxiliary training through contrastive learning. Our approach does not require additional annotations and is compatible with different VQA backbones. Experimental results demonstrate that CSL-VQA significantly outperforms current state-of-the-art approaches, achieving an accuracy of 62.30% on the VQA-CP v2 dataset, while maintaining robust performance on the in-distribution VQA v2 dataset. Moreover, our method shows superior generalization capabilities on challenging datasets such as GQA-OOD and VQA-CE, proving its effectiveness in reducing language bias and enhancing the overall robustness of VQA models.
可视化问题解答(VQA)旨在利用可视化和文本信息评估智能代理的推理能力。然而,最近的研究表明,许多 VQA 模型主要依赖于学习训练数据集中问题与答案之间的相关性,而不是展示实际的推理能力。为了解决这一局限性,我们提出了一种名为 "通过对比和自我监督学习增强稳健 VQA"(CSL-VQA)的新颖训练方法,以构建更稳健的 VQA 模型。我们的方法包括生成两类负样本来平衡有偏差的数据,使用自我监督辅助任务来帮助基础 VQA 模型克服语言先验,以及过滤掉有偏差的训练样本。此外,我们还通过去除有偏见样本中的虚假相关性来构建正样本,并通过对比学习进行辅助训练。我们的方法不需要额外的注释,并且与不同的 VQA 骨干兼容。实验结果表明,CSL-VQA 明显优于目前最先进的方法,在 VQA-CP v2 数据集上达到了 62.30% 的准确率,同时在分布式 VQA v2 数据集上保持了稳健的性能。此外,我们的方法在具有挑战性的数据集(如 GQA-OOD 和 VQA-CE)上显示出卓越的泛化能力,证明了它在减少语言偏差和增强 VQA 模型整体稳健性方面的有效性。
{"title":"Enhancing robust VQA via contrastive and self-supervised learning","authors":"Runlin Cao ,&nbsp;Zhixin Li ,&nbsp;Zhenjun Tang ,&nbsp;Canlong Zhang ,&nbsp;Huifang Ma","doi":"10.1016/j.patcog.2024.111129","DOIUrl":"10.1016/j.patcog.2024.111129","url":null,"abstract":"<div><div>Visual Question Answering (VQA) aims to evaluate the reasoning abilities of an intelligent agent using visual and textual information. However, recent research indicates that many VQA models rely primarily on learning the correlation between questions and answers in the training dataset rather than demonstrating actual reasoning ability. To address this limitation, we propose a novel training approach called Enhancing Robust VQA via Contrastive and Self-supervised Learning (CSL-VQA) to construct a more robust VQA model. Our approach involves generating two types of negative samples to balance the biased data, using self-supervised auxiliary tasks to help the base VQA model overcome language priors, and filtering out biased training samples. In addition, we construct positive samples by removing spurious correlations in biased samples and perform auxiliary training through contrastive learning. Our approach does not require additional annotations and is compatible with different VQA backbones. Experimental results demonstrate that CSL-VQA significantly outperforms current state-of-the-art approaches, achieving an accuracy of 62.30% on the VQA-CP v2 dataset, while maintaining robust performance on the in-distribution VQA v2 dataset. Moreover, our method shows superior generalization capabilities on challenging datasets such as GQA-OOD and VQA-CE, proving its effectiveness in reducing language bias and enhancing the overall robustness of VQA models.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111129"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransMatch: Transformer-based correspondence pruning via local and global consensus TransMatch:通过局部和全局共识进行基于变换器的对应剪枝
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111120
Yizhang Liu , Yanping Li , Shengjie Zhao
Correspondence pruning aims to filter out false correspondences (a.k.a. outliers) from the initial feature correspondence set, which is pivotal to matching-based vision tasks, such as image registration. To solve this problem, most existing learning-based methods typically use a multilayer perceptron framework and several well-designed modules to capture local and global contexts. However, few studies have explored how local and global consensuses interact to form cohesive feature representations. This paper proposes a novel framework called TransMatch, which leverages the full power of Transformer structure to extract richer features and facilitate progressive local and global consensus learning. In addition to enhancing feature learning, Transformer is used as a powerful tool to connect the above two consensuses. Benefiting from Transformer, our TransMatch is surprisingly effective for differentiating correspondences. Experimental results on correspondence pruning and camera pose estimation demonstrate that the proposed TransMatch outperforms other state-of-the-art methods by a large margin. The code will be available at https://github.com/lyz8023lyp/TransMatch/.
对应关系剪枝的目的是从初始特征对应集中过滤出错误的对应关系(又称异常值),这对于图像配准等基于匹配的视觉任务至关重要。为了解决这个问题,现有的大多数基于学习的方法通常使用多层感知器框架和几个精心设计的模块来捕捉局部和全局上下文。然而,很少有研究探讨局部和全局共识如何相互作用以形成内聚特征表征。本文提出了一种名为 TransMatch 的新型框架,它充分利用 Transformer 结构的全部功能来提取更丰富的特征,并促进渐进的局部和全局共识学习。除了加强特征学习,Transformer 还是连接上述两种共识的有力工具。得益于 Transformer,我们的 TransMatch 在区分对应关系方面出奇地有效。在对应关系剪枝和相机姿态估计方面的实验结果表明,所提出的 TransMatch 在很大程度上优于其他最先进的方法。代码可在 https://github.com/lyz8023lyp/TransMatch/ 上获取。
{"title":"TransMatch: Transformer-based correspondence pruning via local and global consensus","authors":"Yizhang Liu ,&nbsp;Yanping Li ,&nbsp;Shengjie Zhao","doi":"10.1016/j.patcog.2024.111120","DOIUrl":"10.1016/j.patcog.2024.111120","url":null,"abstract":"<div><div>Correspondence pruning aims to filter out false correspondences (a.k.a. outliers) from the initial feature correspondence set, which is pivotal to matching-based vision tasks, such as image registration. To solve this problem, most existing learning-based methods typically use a multilayer perceptron framework and several well-designed modules to capture local and global contexts. However, few studies have explored how local and global consensuses interact to form cohesive feature representations. This paper proposes a novel framework called TransMatch, which leverages the full power of Transformer structure to extract richer features and facilitate progressive local and global consensus learning. In addition to enhancing feature learning, Transformer is used as a powerful tool to connect the above two consensuses. Benefiting from Transformer, our TransMatch is surprisingly effective for differentiating correspondences. Experimental results on correspondence pruning and camera pose estimation demonstrate that the proposed TransMatch outperforms other state-of-the-art methods by a large margin. The code will be available at <span><span>https://github.com/lyz8023lyp/TransMatch/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111120"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
L2T-DFM: Learning to Teach with Dynamic Fused Metric L2T-DFM:利用动态融合指标学习教学
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111124
Zhaoyang Hai, Liyuan Pan, Xiabi Liu, Mengqiao Han
The loss function plays a crucial role in the construction of machine learning algorithms. Employing a teacher model to set loss functions dynamically for student models has attracted attention. In existing works, (1) the characterization of the dynamic loss suffers from some inherent limitations, ie, the computational cost of loss networks and the restricted similarity measurement handcrafted loss functions; and (2) the states of the student model are provided to the teacher model directly without integration, causing the teacher model to underperform when trained on insufficient amounts of data. To alleviate the above-mentioned issues, in this paper, we select and weigh a set of similarity metrics by a confidence-based selection algorithm and a temporal teacher model to enhance the dynamic loss functions. Subsequently, to integrate the states of the student model, we employ statistics to quantify the information loss of the student model. Extensive experiments demonstrate that our approach can enhance student learning and improve the performance of various deep models on real-world tasks, including classification, object detection, and semantic segmentation scenarios.
损失函数在构建机器学习算法中起着至关重要的作用。利用教师模型为学生模型动态设置损失函数已引起人们的关注。在现有的研究中,(1) 动态损失的表征存在一些固有的局限性,即损失网络的计算成本和手工制作的损失函数的相似性测量受限;(2) 学生模型的状态未经整合就直接提供给教师模型,导致教师模型在训练数据量不足时表现不佳。为了缓解上述问题,本文通过基于置信度的选择算法和时态教师模型来选择和权衡一组相似度指标,从而增强动态损失函数。随后,为了整合学生模型的状态,我们采用统计方法来量化学生模型的信息损失。广泛的实验证明,我们的方法可以增强学生的学习能力,并提高各种深度模型在实际任务中的性能,包括分类、物体检测和语义分割等场景。
{"title":"L2T-DFM: Learning to Teach with Dynamic Fused Metric","authors":"Zhaoyang Hai,&nbsp;Liyuan Pan,&nbsp;Xiabi Liu,&nbsp;Mengqiao Han","doi":"10.1016/j.patcog.2024.111124","DOIUrl":"10.1016/j.patcog.2024.111124","url":null,"abstract":"<div><div>The loss function plays a crucial role in the construction of machine learning algorithms. Employing a teacher model to set loss functions dynamically for student models has attracted attention. In existing works, (1) the characterization of the dynamic loss suffers from some inherent limitations, <em>ie</em>, the computational cost of loss networks and the restricted similarity measurement handcrafted loss functions; and (2) the states of the student model are provided to the teacher model directly without integration, causing the teacher model to underperform when trained on insufficient amounts of data. To alleviate the above-mentioned issues, in this paper, we select and weigh a set of similarity metrics by a confidence-based selection algorithm and a temporal teacher model to enhance the dynamic loss functions. Subsequently, to integrate the states of the student model, we employ statistics to quantify the information loss of the student model. Extensive experiments demonstrate that our approach can enhance student learning and improve the performance of various deep models on real-world tasks, including classification, object detection, and semantic segmentation scenarios.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111124"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-distillation with beta label smoothing-based cross-subject transfer learning for P300 classification 利用基于贝塔标签平滑的跨主体迁移学习进行 P300 分类的自发散学习
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111114
Shurui Li , Liming Zhao , Chang Liu , Jing Jin , Cuntai Guan

Background:

The P300 speller is one of the most well-known brain-computer interface (BCI) systems, offering users a novel way to communicate with their environment by decoding brain activity.

Problem:

However, most P300-based BCI systems require a longer calibration phase to develop a subject-specific model, which can be inconvenient and time-consuming. Additionally, it is challenging to implement cross-subject P300 classification due to significant inter-individual variations.

Method:

To address these issues, this study proposes a calibration-free approach for P300 signal detection. Specifically, we incorporate self-distillation along with a beta label smoothing method to enhance model generalization and overall system performance, which can not only enable the distillation of informative knowledge from the electroencephalogram (EEG) data of other subjects but effectively reduce individual variability.

Experimental results:

The results conducted on the publicly available OpenBMI dataset demonstrate that the proposed method achieves statistically significantly higher performance compared to state-of-the-art approaches. Notably, the average character recognition accuracy of our method reaches up to 97.37% without the need for calibration. And information transfer rate and visualization further confirm its effectiveness.

Significance:

This method holds great promise for future developments in BCI applications.
背景:P300 拼写是最著名的脑机接口(BCI)系统之一,通过解码大脑活动为用户提供了一种与周围环境交流的新方法。问题:然而,大多数基于 P300 的 BCI 系统需要较长的校准阶段来开发特定受试者模型,这可能既不方便又耗时。方法:为了解决这些问题,本研究提出了一种无需校准的 P300 信号检测方法。实验结果:在公开的 OpenBMI 数据集上进行的实验结果表明,与最先进的方法相比,本研究提出的方法在统计学上取得了显著提高。值得注意的是,我们方法的平均字符识别准确率高达 97.37%,无需校准。意义:这一方法为未来的生物识别(BCI)应用带来了巨大的发展前景。
{"title":"Self-distillation with beta label smoothing-based cross-subject transfer learning for P300 classification","authors":"Shurui Li ,&nbsp;Liming Zhao ,&nbsp;Chang Liu ,&nbsp;Jing Jin ,&nbsp;Cuntai Guan","doi":"10.1016/j.patcog.2024.111114","DOIUrl":"10.1016/j.patcog.2024.111114","url":null,"abstract":"<div><h3>Background:</h3><div>The P300 speller is one of the most well-known brain-computer interface (BCI) systems, offering users a novel way to communicate with their environment by decoding brain activity.</div></div><div><h3>Problem:</h3><div>However, most P300-based BCI systems require a longer calibration phase to develop a subject-specific model, which can be inconvenient and time-consuming. Additionally, it is challenging to implement cross-subject P300 classification due to significant inter-individual variations.</div></div><div><h3>Method:</h3><div>To address these issues, this study proposes a calibration-free approach for P300 signal detection. Specifically, we incorporate self-distillation along with a beta label smoothing method to enhance model generalization and overall system performance, which can not only enable the distillation of informative knowledge from the electroencephalogram (EEG) data of other subjects but effectively reduce individual variability.</div></div><div><h3>Experimental results:</h3><div>The results conducted on the publicly available OpenBMI dataset demonstrate that the proposed method achieves statistically significantly higher performance compared to state-of-the-art approaches. Notably, the average character recognition accuracy of our method reaches up to 97.37% without the need for calibration. And information transfer rate and visualization further confirm its effectiveness.</div></div><div><h3>Significance:</h3><div>This method holds great promise for future developments in BCI applications.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111114"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text–video retrieval re-ranking via multi-grained cross attention and frozen image encoders 通过多粒度交叉注意力和冻结图像编码器进行文本-视频检索重新排序
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-01 DOI: 10.1016/j.patcog.2024.111099
Zuozhuo Dai , Kaihui Cheng , Fangtao Shao , Zilong Dong , Siyu Zhu
State-of-the-art methods for text–video retrieval generally leverage CLIP embeddings and cosine similarity for efficient retrieval. Meanwhile, recent advancements in cross-attention techniques introduce transformer decoders to facilitate attention computation between text queries and visual tokens extracted from video frames, enabling a more comprehensive interaction between textual and visual information. In this study, we combine the advantages of both approaches and propose a fine-grained re-ranking approach incorporating a multi-grained text–video cross attention module. Specifically, the re-ranker enhances the top K similar candidates identified by the cosine similarity network. To explore video and text interactions efficiently, we introduce frame and video token selectors to obtain salient visual tokens at both frame and video levels. Then, a multi-grained cross-attention mechanism is applied between text and visual tokens at these levels to capture multimodal information. To reduce the training overhead associated with the multi-grained cross-attention module, we freeze the vision backbone and only train the multi-grained cross attention module. This frozen strategy allows for scalability to larger pre-trained vision models such as ViT-G, leading to enhanced retrieval performance. Experimental evaluations on text–video retrieval datasets showcase the effectiveness and scalability of our proposed re-ranker combined with existing state-of-the-art methodologies.
最先进的文本-视频检索方法通常利用 CLIP 嵌入和余弦相似性来实现高效检索。同时,交叉注意力技术的最新进展引入了变换器解码器,以促进文本查询和从视频帧中提取的视觉标记之间的注意力计算,从而实现文本和视觉信息之间更全面的交互。在本研究中,我们结合了这两种方法的优点,提出了一种包含多粒度文本-视频交叉注意力模块的细粒度重新排序方法。具体来说,重排序器会增强余弦相似性网络识别出的前 K 个相似候选者。为了有效探索视频和文本之间的交互,我们引入了帧和视频标记选择器,以获取帧和视频级别的突出视觉标记。然后,在这些级别的文本和视觉标记之间应用多级交叉关注机制,以捕捉多模态信息。为了减少与多粒度交叉注意模块相关的训练开销,我们冻结了视觉骨干,只训练多粒度交叉注意模块。这种冻结策略可以扩展到更大的预训练视觉模型(如 ViT-G),从而提高检索性能。在文本-视频检索数据集上进行的实验评估展示了我们提出的重排序器与现有先进方法相结合的有效性和可扩展性。
{"title":"Text–video retrieval re-ranking via multi-grained cross attention and frozen image encoders","authors":"Zuozhuo Dai ,&nbsp;Kaihui Cheng ,&nbsp;Fangtao Shao ,&nbsp;Zilong Dong ,&nbsp;Siyu Zhu","doi":"10.1016/j.patcog.2024.111099","DOIUrl":"10.1016/j.patcog.2024.111099","url":null,"abstract":"<div><div>State-of-the-art methods for text–video retrieval generally leverage CLIP embeddings and cosine similarity for efficient retrieval. Meanwhile, recent advancements in cross-attention techniques introduce transformer decoders to facilitate attention computation between text queries and visual tokens extracted from video frames, enabling a more comprehensive interaction between textual and visual information. In this study, we combine the advantages of both approaches and propose a fine-grained re-ranking approach incorporating a multi-grained text–video cross attention module. Specifically, the re-ranker enhances the top K similar candidates identified by the cosine similarity network. To explore video and text interactions efficiently, we introduce frame and video token selectors to obtain salient visual tokens at both frame and video levels. Then, a multi-grained cross-attention mechanism is applied between text and visual tokens at these levels to capture multimodal information. To reduce the training overhead associated with the multi-grained cross-attention module, we freeze the vision backbone and only train the multi-grained cross attention module. This frozen strategy allows for scalability to larger pre-trained vision models such as ViT-G, leading to enhanced retrieval performance. Experimental evaluations on text–video retrieval datasets showcase the effectiveness and scalability of our proposed re-ranker combined with existing state-of-the-art methodologies.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111099"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating the convergence of concept drift based on knowledge transfer 在知识转移的基础上加速概念漂移的融合
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-01 DOI: 10.1016/j.patcog.2024.111145
Husheng Guo , Zhijie Wu , Qiaoyan Ren , Wenjian Wang
Concept drift detection and processing is an important issue in streaming data mining. When concept drift occurs, online learning model often cannot quickly adapt to the new data distribution due to the insufficient newly distributed data, which may lead to poor model performance. Currently, most online learning methods adapt to new data distributions after concept drift through autonomous adjustment of the model, but they may often fail to update the model to a stable state quickly. To solve these problems, this paper proposes an accelerating convergence method of concept drift based on knowledge transfer (ACC_KT). It extracts the most valuable information from the source domain (pre-drift data), and transfers it to the target domain (post-drift data), to realize the update of the ensemble model by knowledge transfer. Besides, different knowledge transfer patterns are adopted to accelerate convergence of model performance when different types concept drift occur. Experimental results show that the proposed method has an obvious acceleration effect on the online learning model after concept drift.
概念漂移检测和处理是流数据挖掘中的一个重要问题。当概念漂移发生时,由于新分布的数据不足,在线学习模型往往无法快速适应新的数据分布,从而导致模型性能低下。目前,大多数在线学习方法都是通过自主调整模型来适应概念漂移后的新数据分布,但往往无法将模型快速更新到稳定状态。为了解决这些问题,本文提出了一种基于知识转移的概念漂移加速收敛方法(ACC_KT)。它从源领域(漂移前数据)中提取最有价值的信息,并将其转移到目标领域(漂移后数据),通过知识转移实现集合模型的更新。此外,当出现不同类型的概念漂移时,还采用了不同的知识转移模式来加速模型性能的收敛。实验结果表明,所提出的方法对概念漂移后的在线学习模型具有明显的加速效果。
{"title":"Accelerating the convergence of concept drift based on knowledge transfer","authors":"Husheng Guo ,&nbsp;Zhijie Wu ,&nbsp;Qiaoyan Ren ,&nbsp;Wenjian Wang","doi":"10.1016/j.patcog.2024.111145","DOIUrl":"10.1016/j.patcog.2024.111145","url":null,"abstract":"<div><div>Concept drift detection and processing is an important issue in streaming data mining. When concept drift occurs, online learning model often cannot quickly adapt to the new data distribution due to the insufficient newly distributed data, which may lead to poor model performance. Currently, most online learning methods adapt to new data distributions after concept drift through autonomous adjustment of the model, but they may often fail to update the model to a stable state quickly. To solve these problems, this paper proposes an accelerating convergence method of concept drift based on knowledge transfer (<span><math><mrow><mi>ACC</mi><mtext>_</mtext><mi>KT</mi></mrow></math></span>). It extracts the most valuable information from the source domain (pre-drift data), and transfers it to the target domain (post-drift data), to realize the update of the ensemble model by knowledge transfer. Besides, different knowledge transfer patterns are adopted to accelerate convergence of model performance when different types concept drift occur. Experimental results show that the proposed method has an obvious acceleration effect on the online learning model after concept drift.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111145"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A unified framework for unsupervised action learning via global-to-local motion transformer 通过全局到局部运动变换器实现无监督动作学习的统一框架
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-01 DOI: 10.1016/j.patcog.2024.111118
Boeun Kim , Jungho Kim , Hyung Jin Chang , Tae-Hyun Oh
Human action recognition remains challenging due to the inherent complexity arising from the combination of diverse granularity of semantics, ranging from the local motion of body joints to high-level relationships across multiple people. To learn this multi-level characteristic of human action in an unsupervised manner, we propose a novel pretraining strategy along with a transformer-based model architecture named GL-Transformer++. Prior methods in unsupervised action recognition or unsupervised group activity recognition (GAR) have shown limitations, often focusing solely on capturing a partial scope of the action, such as the local movements of each individual or the broader context of the overall motion. To tackle this problem, we introduce a novel pretraining strategy named multi-interval pose displacement prediction (MPDP) that enables the model to learn the diverse extents of the action. In the architectural aspect, we incorporate the global and local attention (GLA) mechanism within the transformer blocks to learn local dynamics between joints, global context of each individual, as well as high-level interpersonal relationships in both spatial and temporal manner. In fact, the proposed method is a unified approach that demonstrates efficacy in both action recognition and GAR. Particularly, our method presents a new and strong baseline, surpassing the current SOTA GAR method by significant margins: 29.6% in Volleyball and 60.3% and 59.9% on the xsub and xset settings of the Mutual NTU dataset, respectively.
从身体关节的局部运动到多人之间的高层次关系,各种语义粒度的组合产生了固有的复杂性,因此人类动作识别仍然具有挑战性。为了在无监督的情况下学习人类动作的这种多层次特征,我们提出了一种新颖的预训练策略以及一种基于转换器的模型架构,命名为 GL-Transformer++。之前的无监督动作识别或无监督群体活动识别(GAR)方法存在局限性,通常只能捕捉动作的部分范围,如每个人的局部动作或整体动作的大背景。为了解决这个问题,我们引入了一种名为多区间姿势位移预测(MPDP)的新型预训练策略,使模型能够学习动作的不同范围。在架构方面,我们将全局和局部注意力(GLA)机制纳入变压器模块,以学习关节间的局部动态、每个个体的全局上下文以及高层次的空间和时间人际关系。事实上,所提出的方法是一种统一的方法,在动作识别和 GAR 方面都显示出了功效。特别是,我们的方法提出了一个新的、强大的基线,大大超过了目前的 SOTA GAR 方法:在排球比赛中超过了 29.6%,在 Mutual NTU 数据集的 xsub 和 xset 设置中分别超过了 60.3% 和 59.9%。
{"title":"A unified framework for unsupervised action learning via global-to-local motion transformer","authors":"Boeun Kim ,&nbsp;Jungho Kim ,&nbsp;Hyung Jin Chang ,&nbsp;Tae-Hyun Oh","doi":"10.1016/j.patcog.2024.111118","DOIUrl":"10.1016/j.patcog.2024.111118","url":null,"abstract":"<div><div>Human action recognition remains challenging due to the inherent complexity arising from the combination of diverse granularity of semantics, ranging from the local motion of body joints to high-level relationships across multiple people. To learn this multi-level characteristic of human action in an unsupervised manner, we propose a novel pretraining strategy along with a transformer-based model architecture named <em>GL-Transformer++</em>. Prior methods in unsupervised action recognition or unsupervised group activity recognition (GAR) have shown limitations, often focusing solely on capturing a partial scope of the action, such as the local movements of each individual or the broader context of the overall motion. To tackle this problem, we introduce a novel pretraining strategy named <em>multi-interval pose displacement prediction (MPDP)</em> that enables the model to learn the diverse extents of the action. In the architectural aspect, we incorporate the <em>global and local attention (GLA)</em> mechanism within the transformer blocks to learn local dynamics between joints, global context of each individual, as well as high-level interpersonal relationships in both spatial and temporal manner. In fact, the proposed method is a unified approach that demonstrates efficacy in both action recognition and GAR. Particularly, our method presents a new and strong baseline, surpassing the current SOTA GAR method by significant margins: 29.6% in Volleyball and 60.3% and 59.9% on the xsub and xset settings of the Mutual NTU dataset, respectively.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111118"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ACFNet: An adaptive cross-fusion network for infrared and visible image fusion ACFNet:用于红外和可见光图像融合的自适应交叉融合网络
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-01 DOI: 10.1016/j.patcog.2024.111098
Xiaoxuan Chen , Shuwen Xu , Shaohai Hu , Xiaole Ma
Considering the prospects for image fusion, it is necessary to guide the fusion to adapt to downstream vision tasks. In this paper, we propose an Adaptive Cross-Fusion Network (ACFNet) that utilizes an adaptive approach to fuse infrared and visible images, addressing cross-modal differences to enhance object detection performance. In ACFNet, a hierarchical cross-fusion module is designed to enrich the features at each level of the reconstructed images. In addition, a special adaptive gating selection module is proposed to realize feature fusion in an adaptive manner so as to obtain fused images without the interference of manual design. Extensive qualitative and quantitative experiments have demonstrated that ACFNet is superior to current state-of-the-art fusion methods and achieves excellent results in preserving target information and texture details. The fusion framework, when combined with the object detection framework, has the potential to significantly improve the precision of object detection in low-light conditions.
考虑到图像融合的前景,有必要引导融合以适应下游视觉任务。在本文中,我们提出了一种自适应交叉融合网络(ACFNet),利用自适应方法融合红外图像和可见光图像,解决跨模态差异问题,从而提高物体检测性能。在 ACFNet 中,设计了一个分层交叉融合模块,以丰富重建图像各层次的特征。此外,还提出了一种特殊的自适应门控选择模块,以自适应方式实现特征融合,从而在不受人工设计干扰的情况下获得融合图像。广泛的定性和定量实验证明,ACFNet 优于目前最先进的融合方法,在保留目标信息和纹理细节方面取得了出色的效果。该融合框架与目标检测框架相结合,有望显著提高低照度条件下的目标检测精度。
{"title":"ACFNet: An adaptive cross-fusion network for infrared and visible image fusion","authors":"Xiaoxuan Chen ,&nbsp;Shuwen Xu ,&nbsp;Shaohai Hu ,&nbsp;Xiaole Ma","doi":"10.1016/j.patcog.2024.111098","DOIUrl":"10.1016/j.patcog.2024.111098","url":null,"abstract":"<div><div>Considering the prospects for image fusion, it is necessary to guide the fusion to adapt to downstream vision tasks. In this paper, we propose an Adaptive Cross-Fusion Network (ACFNet) that utilizes an adaptive approach to fuse infrared and visible images, addressing cross-modal differences to enhance object detection performance. In ACFNet, a hierarchical cross-fusion module is designed to enrich the features at each level of the reconstructed images. In addition, a special adaptive gating selection module is proposed to realize feature fusion in an adaptive manner so as to obtain fused images without the interference of manual design. Extensive qualitative and quantitative experiments have demonstrated that ACFNet is superior to current state-of-the-art fusion methods and achieves excellent results in preserving target information and texture details. The fusion framework, when combined with the object detection framework, has the potential to significantly improve the precision of object detection in low-light conditions.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111098"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MvWECM: Multi-view Weighted Evidential C-Means clustering MvWECM:多视角加权证据 C-Means 聚类
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-01 DOI: 10.1016/j.patcog.2024.111108
Kuang Zhou , Yuchen Zhu , Mei Guo , Ming Jiang
Traditional multi-view clustering algorithms, designed to produce hard or fuzzy partitions, often neglect the inherent ambiguity and uncertainty in the cluster assignment of objects. This oversight may lead to performance degradation. To address these issues, this paper introduces a novel multi-view clustering method, termed MvWECM, capable of generating credal partitions within the framework of belief functions. The objective function of MvWECM is introduced considering the uncertainty in the cluster structure included in the multi-view dataset. We take into account inter-view conflict to effectively leverage coherent information across different views. Moreover, the effectiveness is heightened through the incorporation of adaptive view weights, which are customized to modulate their smoothness in accordance with their entropy. The optimization method to get the optimal credal membership and class prototypes is derived. The view wights can be also provided as a by-product. Experimental results on several real-word datasets demonstrate the effectiveness and superiority of MvWECM by comparing with some state-of-the-art methods.
传统的多视图聚类算法旨在产生硬分区或模糊分区,但往往忽略了对象聚类分配中固有的模糊性和不确定性。这种疏忽可能会导致性能下降。为了解决这些问题,本文介绍了一种新颖的多视图聚类方法,称为 MvWECM,能够在信念函数框架内生成可信分区。MvWECM 的目标函数考虑到了多视角数据集所包含的聚类结构的不确定性。我们考虑了视图间的冲突,以有效利用不同视图间的一致性信息。此外,我们还加入了自适应视图权重,根据视图的熵值调整视图的平滑度,从而提高了有效性。此外,还推导出了获得最佳信元成员和类原型的优化方法。视图权重也可以作为副产品提供。在多个实词数据集上的实验结果表明,与一些最先进的方法相比,MvWECM 是有效和优越的。
{"title":"MvWECM: Multi-view Weighted Evidential C-Means clustering","authors":"Kuang Zhou ,&nbsp;Yuchen Zhu ,&nbsp;Mei Guo ,&nbsp;Ming Jiang","doi":"10.1016/j.patcog.2024.111108","DOIUrl":"10.1016/j.patcog.2024.111108","url":null,"abstract":"<div><div>Traditional multi-view clustering algorithms, designed to produce hard or fuzzy partitions, often neglect the inherent ambiguity and uncertainty in the cluster assignment of objects. This oversight may lead to performance degradation. To address these issues, this paper introduces a novel multi-view clustering method, termed MvWECM, capable of generating credal partitions within the framework of belief functions. The objective function of MvWECM is introduced considering the uncertainty in the cluster structure included in the multi-view dataset. We take into account inter-view conflict to effectively leverage coherent information across different views. Moreover, the effectiveness is heightened through the incorporation of adaptive view weights, which are customized to modulate their smoothness in accordance with their entropy. The optimization method to get the optimal credal membership and class prototypes is derived. The view wights can be also provided as a by-product. Experimental results on several real-word datasets demonstrate the effectiveness and superiority of MvWECM by comparing with some state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111108"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A robust fingerprint identification approach using a fuzzy system and novel rotation method 使用模糊系统和新型旋转方法的稳健指纹识别方法
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-01 DOI: 10.1016/j.patcog.2024.111134
Ahmad A. Momani, László T. Kóczy
Forensic science has developed significantly in the last few decades. Its key role is to provide crime investigators with processed data obtained from the crime scene to achieve more accurate results presented in court. Biometrics has proved its robustness against various critical crimes encountered by forensics experts. Fingerprints are the most important biometric used until now due to their uniqueness and production low cost. The automated fingerprint identification system (AFIS) came into existence in the early 1960s through the cooperation of the countries: USA, UK, France, and Japan. Ever since it started to develop gradually because of the challenges found at the crime scenes such as fingerprints distortions and partial cuts which in turn can severely affect the final calculations made by experts. The vagueness of the results was the main motivation to build a robust fingerprint identification system that introduces new and enhanced methods in its stages to help experts make more accurate decisions. The proposed fingerprint identification system uses Fourier domain analysis for image enhancement, then the system cuts the image around the core point after applying the rotation and core point detection methods. After that, it calculates the similarity based on the distance between fingerprint histograms extracted using the Local Binary Pattern (LBP). The system's last step is to translate the results into a sensible form where it utilizes fuzziness to provide more possibilities for the answer. The proposed identification system showed high efficiency on FVC 2002 and FVC 2000 databases. For instance, the results of applying our system on FVC 2002 provided a set of three ordered matching candidates such that 97.5 % of the results provided the correct candidate as the first order, and the rest of 2.5 % provided the correct candidate as the second order.
过去几十年来,法医学得到了长足的发展。其关键作用是为犯罪调查人员提供从犯罪现场获得的经过处理的数据,以便在法庭上取得更准确的结果。生物识别技术已经证明了它对法医专家遇到的各种重大犯罪的强大威力。指纹因其唯一性和低成本而成为迄今为止最重要的生物识别技术。自动指纹识别系统(AFIS)于 20 世纪 60 年代初在各国的合作下诞生:美国、英国、法国和日本。从那时起,自动指纹识别系统就开始逐渐发展起来,因为在犯罪现场发现的一些难题,如指纹变形和部分切割,反过来又会严重影响专家的最终计算。指纹识别结果的模糊性是建立一个强大的指纹识别系统的主要动机,该系统在各个阶段引入了新的增强型方法,以帮助专家做出更准确的判断。拟议的指纹识别系统使用傅立叶域分析进行图像增强,然后在应用旋转和核心点检测方法后围绕核心点切割图像。之后,系统根据使用局部二进制模式(LBP)提取的指纹直方图之间的距离计算相似度。系统的最后一步是将结果转化为合理的形式,利用模糊性为答案提供更多可能性。所提出的识别系统在 FVC 2002 和 FVC 2000 数据库中表现出了很高的效率。例如,将我们的系统应用于 FVC 2002 的结果提供了一组三个有序匹配的候选对象,其中 97.5% 的结果提供了正确的第一顺序候选对象,其余 2.5% 的结果提供了正确的第二顺序候选对象。
{"title":"A robust fingerprint identification approach using a fuzzy system and novel rotation method","authors":"Ahmad A. Momani,&nbsp;László T. Kóczy","doi":"10.1016/j.patcog.2024.111134","DOIUrl":"10.1016/j.patcog.2024.111134","url":null,"abstract":"<div><div>Forensic science has developed significantly in the last few decades. Its key role is to provide crime investigators with processed data obtained from the crime scene to achieve more accurate results presented in court. Biometrics has proved its robustness against various critical crimes encountered by forensics experts. Fingerprints are the most important biometric used until now due to their uniqueness and production low cost. The automated fingerprint identification system (AFIS) came into existence in the early 1960s through the cooperation of the countries: USA, UK, France, and Japan. Ever since it started to develop gradually because of the challenges found at the crime scenes such as fingerprints distortions and partial cuts which in turn can severely affect the final calculations made by experts. The vagueness of the results was the main motivation to build a robust fingerprint identification system that introduces new and enhanced methods in its stages to help experts make more accurate decisions. The proposed fingerprint identification system uses Fourier domain analysis for image enhancement, then the system cuts the image around the core point after applying the rotation and core point detection methods. After that, it calculates the similarity based on the distance between fingerprint histograms extracted using the Local Binary Pattern (LBP). The system's last step is to translate the results into a sensible form where it utilizes fuzziness to provide more possibilities for the answer. The proposed identification system showed high efficiency on FVC 2002 and FVC 2000 databases. For instance, the results of applying our system on FVC 2002 provided a set of three ordered matching candidates such that 97.5 % of the results provided the correct candidate as the first order, and the rest of 2.5 % provided the correct candidate as the second order.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111134"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1