首页 > 最新文献

Pattern Recognition最新文献

英文 中文
A masking, linkage and guidance framework for online class incremental learning 在线班级增量学习的屏蔽、链接和引导框架
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-15 DOI: 10.1016/j.patcog.2024.111185
Guoqiang Liang , Zhaojie Chen , Shibin Su , Shizhou Zhang , Yanning Zhang
Due to the powerful ability to acquire new knowledge and preserve previously learned concepts from a dynamic data stream, continual learning has recently garnered substantial interest. Since training data can only be used once, online class incremental learning (OCIL) is more practical and difficult. Although replay-based OCIL methods have made great progress, there is still a severe class imbalance problem. Specifically, limited by the small memory size, the number of samples for new classes is much larger than that for old classes, which finally leads to task recency bias and abrupt feature drift. To alleviate this problem, we propose a masking, linkage, and guidance framework (MLG) for OCIL, which consists of three effective modules, i.e. batch-level logit mask (BLM, masking), batch-level feature cross fusion (BFCF, linkage) and accumulative mean feature distillation (AMFD, guidance). The former two focus on class imbalance problem while the last aims to alleviate abrupt feature drift. In BLM, we only activate the logits of classes occurring in a batch, which makes the model learn knowledge within each batch. The BFCF module employs a transformer encoder layer to fuse the sample features within a batch, which rebalances the gradients of classifier’s weights and implicitly learns the sample relationship. Instead of a strict regularization in traditional feature distillation, the proposed AMFD guides previously learned features to move on purpose, which can reduce abrupt feature drift and produce a clearer boundary in feature space. Extensive experiments on four popular datasets for OCIL have shown the effectiveness of proposed MLG framework.
持续学习具有从动态数据流中获取新知识和保留以前所学概念的强大能力,因此最近引起了人们的极大兴趣。由于训练数据只能使用一次,在线类增量学习(OCIL)更加实用,也更加困难。虽然基于重放的 OCIL 方法已经取得了很大进展,但仍然存在严重的类不平衡问题。具体来说,受限于较小的内存容量,新类的样本数量远远大于旧类的样本数量,这最终会导致任务重现偏差和突然的特征漂移。为了缓解这一问题,我们为 OCIL 提出了一个掩码、链接和引导框架(MLG),它由三个有效模块组成,即批量级 logit 掩码(BLM,掩码)、批量级特征交叉融合(BFCF,链接)和累积平均特征提炼(AMFD,引导)。前两种方法主要针对类不平衡问题,而后一种方法则旨在缓解突然的特征漂移。在 BLM 中,我们只激活批次中出现的类的对数,这使得模型能在每个批次中学习知识。BFCF 模块采用转换编码器层来融合批次内的样本特征,从而重新平衡分类器权重的梯度,并隐式学习样本关系。所提出的 AMFD 不是传统特征蒸馏中的严格正则化,而是引导先前学习的特征有目的地移动,这可以减少突然的特征漂移,并在特征空间中产生更清晰的边界。在 OCIL 的四个流行数据集上进行的广泛实验表明了所提出的 MLG 框架的有效性。
{"title":"A masking, linkage and guidance framework for online class incremental learning","authors":"Guoqiang Liang ,&nbsp;Zhaojie Chen ,&nbsp;Shibin Su ,&nbsp;Shizhou Zhang ,&nbsp;Yanning Zhang","doi":"10.1016/j.patcog.2024.111185","DOIUrl":"10.1016/j.patcog.2024.111185","url":null,"abstract":"<div><div>Due to the powerful ability to acquire new knowledge and preserve previously learned concepts from a dynamic data stream, continual learning has recently garnered substantial interest. Since training data can only be used once, online class incremental learning (OCIL) is more practical and difficult. Although replay-based OCIL methods have made great progress, there is still a severe class imbalance problem. Specifically, limited by the small memory size, the number of samples for new classes is much larger than that for old classes, which finally leads to task recency bias and abrupt feature drift. To alleviate this problem, we propose a masking, linkage, and guidance framework (MLG) for OCIL, which consists of three effective modules, i.e. batch-level logit mask (BLM, masking), batch-level feature cross fusion (BFCF, linkage) and accumulative mean feature distillation (AMFD, guidance). The former two focus on class imbalance problem while the last aims to alleviate abrupt feature drift. In BLM, we only activate the logits of classes occurring in a batch, which makes the model learn knowledge within each batch. The BFCF module employs a transformer encoder layer to fuse the sample features within a batch, which rebalances the gradients of classifier’s weights and implicitly learns the sample relationship. Instead of a strict regularization in traditional feature distillation, the proposed AMFD guides previously learned features to move on purpose, which can reduce abrupt feature drift and produce a clearer boundary in feature space. Extensive experiments on four popular datasets for OCIL have shown the effectiveness of proposed MLG framework.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111185"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CDHN: Cross-domain hallucination network for 3D keypoints estimation CDHN:用于三维关键点估算的跨域幻觉网络
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-15 DOI: 10.1016/j.patcog.2024.111188
Mohammad Zohaib , Milind Gajanan Padalkar , Pietro Morerio , Matteo Taiana , Alessio Del Bue
This paper presents a novel method to estimate sparse 3D keypoints from single-view RGB images. Our network is trained in two steps using a knowledge distillation framework. In the first step, the teacher is trained to extract 3D features from point cloud data, which are used in combination with 2D features to estimate the 3D keypoints. In the second step, the teacher teaches the student module to hallucinate the 3D features from RGB images that are similar to those extracted from the point clouds. This procedure helps the network during inference to extract 2D and 3D features directly from images, without requiring point clouds as input. Moreover, the network also predicts a confidence score for every keypoint, which is used to select the valid ones from a set of N predicted keypoints. This allows the prediction of different number of keypoints depending on the object’s geometry. We use the estimated keypoints for computing the relative pose between two views of an object. The results are compared with those of KP-Net and StarMap , which are the state-of-the-art for estimating 3D keypoints from a single-view RGB image. The average angular distance error of our approach (5.94°) is 8.46° and 55.26° lower than that of KP-Net (14.40°) and StarMap (61.20°), respectively.
本文提出了一种从单视角 RGB 图像中估计稀疏三维关键点的新方法。我们的网络采用知识提炼框架,分两步进行训练。第一步,训练教师从点云数据中提取三维特征,结合二维特征来估计三维关键点。在第二步中,教师教授学生模块从 RGB 图像中幻化出与点云提取的特征相似的三维特征。这一步骤有助于网络在推理过程中直接从图像中提取二维和三维特征,而无需将点云作为输入。此外,该网络还能为每个关键点预测一个置信度分数,用于从 N 个预测关键点集合中选出有效的关键点。这样就能根据物体的几何形状预测出不同数量的关键点。我们使用估计的关键点来计算物体两个视图之间的相对姿态。我们将结果与 KP-Net 和 StarMap 的结果进行了比较,后者是从单视角 RGB 图像中估计三维关键点的最先进方法。我们的方法的平均角距离误差(5.94°)比 KP-Net 的(14.40°)和 StarMap 的(61.20°)分别低 8.46°和 55.26°。
{"title":"CDHN: Cross-domain hallucination network for 3D keypoints estimation","authors":"Mohammad Zohaib ,&nbsp;Milind Gajanan Padalkar ,&nbsp;Pietro Morerio ,&nbsp;Matteo Taiana ,&nbsp;Alessio Del Bue","doi":"10.1016/j.patcog.2024.111188","DOIUrl":"10.1016/j.patcog.2024.111188","url":null,"abstract":"<div><div>This paper presents a novel method to estimate sparse 3D keypoints from single-view RGB images. Our network is trained in two steps using a knowledge distillation framework. In the first step, the teacher is trained to extract 3D features from point cloud data, which are used in combination with 2D features to estimate the 3D keypoints. In the second step, the teacher teaches the student module to hallucinate the 3D features from RGB images that are similar to those extracted from the point clouds. This procedure helps the network during inference to extract 2D and 3D features directly from images, without requiring point clouds as input. Moreover, the network also predicts a confidence score for every keypoint, which is used to select the valid ones from a set of <em>N</em> predicted keypoints. This allows the prediction of different number of keypoints depending on the object’s geometry. We use the estimated keypoints for computing the relative pose between two views of an object. The results are compared with those of KP-Net and StarMap , which are the state-of-the-art for estimating 3D keypoints from a single-view RGB image. The average angular distance error of our approach (5.94°) is 8.46° and 55.26° lower than that of KP-Net (14.40°) and StarMap (61.20°), respectively.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111188"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight remote sensing super-resolution with multi-scale graph attention network 利用多尺度图注意网络实现轻量级遥感超分辨率
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-15 DOI: 10.1016/j.patcog.2024.111178
Yu Wang , Zhenfeng Shao , Tao Lu , Xiao Huang , Jiaming Wang , Zhizheng Zhang , Xiaolong Zuo
Remote Sensing Super-Resolution (RS-SR) constitutes a pivotal component in the domain of remote sensing image analysis, aimed at enhancing the spatial resolution of low-resolution imagery. Recent advancements have seen deep learning techniques achieving substantial progress in the RS-SR field. Notably, Graph Neural Networks (GNNs) have emerged as a potent mechanism for processing remote sensing images, adept at elucidating the intricate inter-pixel relationships within images. Nevertheless, a prevalent limitation among existing GNN-based methodologies is their disregard for the high computational demands, which circumscribes their applicability in environments with limited computational resources. This paper introduces a streamlined RS-SR framework, leveraging a Multi-Scale Graph Attention Network (MSGAN), designed to effectively balance computational efficiency with high performance. The core of MSGAN is a novel multi-scale graph attention module, integrating graph attention block and multi-scale lattice block structures, engineered to comprehensively assimilate both localized and extensive spatial information in remote sensing images. This enhances the framework’s overall efficacy and resilience in RS-SR tasks. Comparative experimental analyses demonstrate that MSGAN delivers competitive results against state-of-the-art methods while reducing parameter count and computational overhead, presenting a promising avenue for deployment in scenarios with limited computational resources.
遥感超分辨率(RS-SR)是遥感图像分析领域的重要组成部分,旨在提高低分辨率图像的空间分辨率。近年来,深度学习技术在 RS-SR 领域取得了长足的进步。值得注意的是,图神经网络(GNN)已成为处理遥感图像的有效机制,善于阐明图像中错综复杂的像素间关系。然而,现有的基于 GNN 的方法普遍存在一个局限性,即不考虑高计算需求,这限制了它们在计算资源有限的环境中的适用性。本文利用多尺度图注意网络(MSGAN)引入了一个简化的 RS-SR 框架,旨在有效平衡计算效率和高性能。MSGAN 的核心是一个新颖的多尺度图注意模块,集成了图注意块和多尺度网格块结构,旨在全面吸收遥感图像中的局部和广泛空间信息。这增强了该框架在 RS-SR 任务中的整体效率和弹性。对比实验分析表明,与最先进的方法相比,MSGAN 在减少参数数量和计算开销的同时,还提供了具有竞争力的结果,为在计算资源有限的场景中进行部署提供了广阔的前景。
{"title":"Lightweight remote sensing super-resolution with multi-scale graph attention network","authors":"Yu Wang ,&nbsp;Zhenfeng Shao ,&nbsp;Tao Lu ,&nbsp;Xiao Huang ,&nbsp;Jiaming Wang ,&nbsp;Zhizheng Zhang ,&nbsp;Xiaolong Zuo","doi":"10.1016/j.patcog.2024.111178","DOIUrl":"10.1016/j.patcog.2024.111178","url":null,"abstract":"<div><div>Remote Sensing Super-Resolution (RS-SR) constitutes a pivotal component in the domain of remote sensing image analysis, aimed at enhancing the spatial resolution of low-resolution imagery. Recent advancements have seen deep learning techniques achieving substantial progress in the RS-SR field. Notably, Graph Neural Networks (GNNs) have emerged as a potent mechanism for processing remote sensing images, adept at elucidating the intricate inter-pixel relationships within images. Nevertheless, a prevalent limitation among existing GNN-based methodologies is their disregard for the high computational demands, which circumscribes their applicability in environments with limited computational resources. This paper introduces a streamlined RS-SR framework, leveraging a Multi-Scale Graph Attention Network (MSGAN), designed to effectively balance computational efficiency with high performance. The core of MSGAN is a novel multi-scale graph attention module, integrating graph attention block and multi-scale lattice block structures, engineered to comprehensively assimilate both localized and extensive spatial information in remote sensing images. This enhances the framework’s overall efficacy and resilience in RS-SR tasks. Comparative experimental analyses demonstrate that MSGAN delivers competitive results against state-of-the-art methods while reducing parameter count and computational overhead, presenting a promising avenue for deployment in scenarios with limited computational resources.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111178"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive learning rate algorithms based on the improved Barzilai–Borwein method 基于改进的 Barzilai-Borwein 方法的自适应学习率算法
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-15 DOI: 10.1016/j.patcog.2024.111179
Zhi-Jun Wang , Hong Li , Zhou-Xiang Xu , Shuai-Ye Zhao , Peng-Jun Wang , He-Bei Gao

Objective:

The Barzilai–Borwein(BB) method is essential in solving unconstrained optimization problems. The momentum method accelerates optimization algorithms with exponentially weighted moving average. In order to design reliable deep learning optimization algorithms, this paper proposes applying the BB method in four variants to the optimization algorithm of deep learning.

Findings:

The momentum method generates the BB step size under different step range limits. We also apply the momentum method and its variants to the stochastic gradient descent with the BB step size.

Novelty:

The algorithm’s robustness has been demonstrated through experiments on the initial learning rate and random seeds. The algorithm’s sensitivity is tested by choosing different momentum factors until a suitable momentum factor is found. Moreover, we compare our algorithms with popular algorithms in various neural networks. The results show that the new algorithms improve the efficiency of the BB step size in deep learning and provide a variety of optimization algorithm choices.
目的:Barzilai-Borwein(BB)方法是解决无约束优化问题的关键。动量法通过指数加权移动平均来加速优化算法。为了设计可靠的深度学习优化算法,本文提出将BB法的四个变体应用到深度学习的优化算法中。新颖性:通过对初始学习率和随机种子的实验,证明了该算法的鲁棒性。通过选择不同的动量因子来测试算法的灵敏度,直到找到合适的动量因子。此外,我们还将我们的算法与各种神经网络中的流行算法进行了比较。结果表明,新算法提高了深度学习中 BB 步长的效率,并提供了多种优化算法选择。
{"title":"Adaptive learning rate algorithms based on the improved Barzilai–Borwein method","authors":"Zhi-Jun Wang ,&nbsp;Hong Li ,&nbsp;Zhou-Xiang Xu ,&nbsp;Shuai-Ye Zhao ,&nbsp;Peng-Jun Wang ,&nbsp;He-Bei Gao","doi":"10.1016/j.patcog.2024.111179","DOIUrl":"10.1016/j.patcog.2024.111179","url":null,"abstract":"<div><h3>Objective:</h3><div>The Barzilai–Borwein(BB) method is essential in solving unconstrained optimization problems. The momentum method accelerates optimization algorithms with exponentially weighted moving average. In order to design reliable deep learning optimization algorithms, this paper proposes applying the BB method in four variants to the optimization algorithm of deep learning.</div></div><div><h3>Findings:</h3><div>The momentum method generates the BB step size under different step range limits. We also apply the momentum method and its variants to the stochastic gradient descent with the BB step size.</div></div><div><h3>Novelty:</h3><div>The algorithm’s robustness has been demonstrated through experiments on the initial learning rate and random seeds. The algorithm’s sensitivity is tested by choosing different momentum factors until a suitable momentum factor is found. Moreover, we compare our algorithms with popular algorithms in various neural networks. The results show that the new algorithms improve the efficiency of the BB step size in deep learning and provide a variety of optimization algorithm choices.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111179"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty estimation in color constancy 色彩恒定性的不确定性估计
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-15 DOI: 10.1016/j.patcog.2024.111175
Marco Buzzelli , Simone Bianco
Computational color constancy is an under-determined problem. As such, a key objective is to assign a level of uncertainty to the output illuminant estimations, which can significantly impact the reliability of the corrected images for downstream computer vision tasks. In this paper we present a formalization of uncertainty estimation in color constancy, and we define three forms of uncertainty that require at most one inference run to be estimated. The defined uncertainty estimators are applied to five different categories of color constancy algorithms. The experimental results on two standard datasets show a strong correlation between the estimated uncertainty and the illuminant estimation error. Furthermore, we show how color constancy algorithms can be cascaded leveraging the estimated uncertainty to provide more accurate illuminant estimates.
计算色彩恒定是一个不确定的问题。因此,一个关键目标是为输出光照度估计值分配一定程度的不确定性,这会极大地影响下游计算机视觉任务中校正图像的可靠性。在本文中,我们对色彩恒定性中的不确定性估计进行了形式化,并定义了三种形式的不确定性,这些不确定性最多需要一次推理运行来估计。定义的不确定性估计器被应用于五种不同类别的色彩恒定算法。在两个标准数据集上的实验结果表明,估计的不确定性与照度估计误差之间存在很强的相关性。此外,我们还展示了如何利用估计的不确定性级联色彩恒定算法,以提供更精确的光照度估计。
{"title":"Uncertainty estimation in color constancy","authors":"Marco Buzzelli ,&nbsp;Simone Bianco","doi":"10.1016/j.patcog.2024.111175","DOIUrl":"10.1016/j.patcog.2024.111175","url":null,"abstract":"<div><div>Computational color constancy is an under-determined problem. As such, a key objective is to assign a level of uncertainty to the output illuminant estimations, which can significantly impact the reliability of the corrected images for downstream computer vision tasks. In this paper we present a formalization of uncertainty estimation in color constancy, and we define three forms of uncertainty that require at most one inference run to be estimated. The defined uncertainty estimators are applied to five different categories of color constancy algorithms. The experimental results on two standard datasets show a strong correlation between the estimated uncertainty and the illuminant estimation error. Furthermore, we show how color constancy algorithms can be cascaded leveraging the estimated uncertainty to provide more accurate illuminant estimates.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111175"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FocTrack: Focus attention for visual tracking FocTrack:集中注意力进行视觉跟踪
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-15 DOI: 10.1016/j.patcog.2024.111128
Jian Tao , Sixian Chan , Zhenchao Shi , Cong Bai , Shengyong Chen
Transformer trackers have achieved widespread success based on their attention mechanism. The vanilla attention mechanism focuses on modeling the long-range dependencies between tokens to gain a global perspective. However, in human tracking behavior, the line of sight first skims apparent regions and then focuses on the differences between similar regions. To explore this issue, we build a powerful online tacker with focus attention, named FocTrack. Firstly, we design a focus attention module, which adopts the iterative binary clustering function (IBCF) before self-attention to simulate human behavior. Specifically, for a given cluster, other clusters are treated as apparent tokens that are skimmed during the clustering process, while the subsequent self-attention performs focused discriminative learning on the target cluster. Moreover, we propose a local template update strategy (LTUS) to probe into the effective temporal information for visual object tracking. In the testing, LTUS only replaces outdated local templates to ensure overall reliability and holds a low computational burden. Finally, extensive experiments show that our proposed FocTrack achieves state-of-the-art performance in several benchmarks.In particular, FocTrack achieves 71.5% AUC on the LaSOT, 84.7% AUC on the TrackingNet, and a running speed of around 36 FPS, outperforming the popular approaches.
变形追踪器凭借其注意力机制取得了广泛的成功。vanilla 注意力机制的重点是对标记之间的长距离依赖关系进行建模,从而获得全局视角。然而,在人类的追踪行为中,视线首先会掠过明显的区域,然后关注相似区域之间的差异。为了探讨这个问题,我们建立了一个功能强大的具有聚焦注意力的在线追踪器,命名为 FocTrack。首先,我们设计了一个聚焦注意力模块,它在自我注意力之前采用了迭代二进制聚类函数(IBCF)来模拟人类行为。具体来说,对于一个给定的聚类,其他聚类被视为明显的标记,在聚类过程中被略过,而随后的自我关注则对目标聚类进行聚焦判别学习。此外,我们还提出了一种局部模板更新策略(LTUS),以探究视觉对象跟踪的有效时间信息。在测试中,LTUS 只替换过期的局部模板,以确保整体可靠性,并保持较低的计算负担。最后,大量的实验表明,我们提出的 FocTrack 在多个基准测试中取得了最先进的性能,特别是在 LaSOT 上取得了 71.5% 的 AUC,在 TrackingNet 上取得了 84.7% 的 AUC,运行速度约为 36 FPS,优于流行的方法。
{"title":"FocTrack: Focus attention for visual tracking","authors":"Jian Tao ,&nbsp;Sixian Chan ,&nbsp;Zhenchao Shi ,&nbsp;Cong Bai ,&nbsp;Shengyong Chen","doi":"10.1016/j.patcog.2024.111128","DOIUrl":"10.1016/j.patcog.2024.111128","url":null,"abstract":"<div><div>Transformer trackers have achieved widespread success based on their attention mechanism. The vanilla attention mechanism focuses on modeling the long-range dependencies between tokens to gain a global perspective. However, in human tracking behavior, the line of sight first skims apparent regions and then focuses on the differences between similar regions. To explore this issue, we build a powerful online tacker with focus attention, named FocTrack. Firstly, we design a focus attention module, which adopts the iterative binary clustering function (IBCF) before self-attention to simulate human behavior. Specifically, for a given cluster, other clusters are treated as apparent tokens that are skimmed during the clustering process, while the subsequent self-attention performs focused discriminative learning on the target cluster. Moreover, we propose a local template update strategy (LTUS) to probe into the effective temporal information for visual object tracking. In the testing, LTUS only replaces outdated local templates to ensure overall reliability and holds a low computational burden. Finally, extensive experiments show that our proposed FocTrack achieves state-of-the-art performance in several benchmarks.In particular, FocTrack achieves 71.5% AUC on the LaSOT, 84.7% AUC on the TrackingNet, and a running speed of around 36 FPS, outperforming the popular approaches.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111128"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual Contrastive Label Enhancement 双对比标签增强功能
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-15 DOI: 10.1016/j.patcog.2024.111183
Ren Guan , Yifei Wang , Xinyuan Liu , Bin Chen , Jihua Zhu
Label Enhancement (LE) strives to convert logical labels of instances into label distributions to provide data preparation for label distribution learning (LDL). Existing LE methods ordinarily neglect to consider original features and logical labels as two complementary descriptive views of instances for extracting implicit related information across views, resulting in insufficient utilization of the feature and logical label information of the instances. To address this issue, we propose a novel method named Dual Contrastive Label Enhancement (DCLE). This method regards original features and logical labels as two view-specific descriptions and encodes them into a unified projection space. We employ dual contrastive learning strategy at both instance-level and class-level to excavate cross-view consensus information and distinguish instance representations by exploring inherent correlations among features, thereby generating high-level representations of the instances. Subsequently, to recover label distributions from obtained high-level representations, we design a distance-minimized and margin-penalized training strategy and preserve the consistency of label attributes. Extensive experiments conducted on 13 benchmark datasets of LDL validate the efficacy and competitiveness of DCLE.
标签增强(LE)致力于将实例的逻辑标签转换为标签分布,为标签分布学习(LDL)提供数据准备。现有的标签增强方法通常忽视了将原始特征和逻辑标签作为实例的两个互补描述视图,以提取跨视图的隐含相关信息,从而导致对实例的特征和逻辑标签信息利用不足。针对这一问题,我们提出了一种名为 "双对比标签增强"(Dual Contrastive Label Enhancement,DCLE)的新方法。这种方法将原始特征和逻辑标签视为两个特定视图的描述,并将它们编码到一个统一的投影空间中。我们在实例级和类级采用双对比学习策略,挖掘跨视图的共识信息,并通过探索特征之间的内在关联来区分实例表征,从而生成实例的高层表征。随后,为了从获得的高层表征中恢复标签分布,我们设计了一种距离最小化和边际惩罚的训练策略,并保持标签属性的一致性。在 13 个 LDL 基准数据集上进行的广泛实验验证了 DCLE 的有效性和竞争力。
{"title":"Dual Contrastive Label Enhancement","authors":"Ren Guan ,&nbsp;Yifei Wang ,&nbsp;Xinyuan Liu ,&nbsp;Bin Chen ,&nbsp;Jihua Zhu","doi":"10.1016/j.patcog.2024.111183","DOIUrl":"10.1016/j.patcog.2024.111183","url":null,"abstract":"<div><div>Label Enhancement (LE) strives to convert logical labels of instances into label distributions to provide data preparation for label distribution learning (LDL). Existing LE methods ordinarily neglect to consider original features and logical labels as two complementary descriptive views of instances for extracting implicit related information across views, resulting in insufficient utilization of the feature and logical label information of the instances. To address this issue, we propose a novel method named Dual Contrastive Label Enhancement (DCLE). This method regards original features and logical labels as two view-specific descriptions and encodes them into a unified projection space. We employ dual contrastive learning strategy at both instance-level and class-level to excavate cross-view consensus information and distinguish instance representations by exploring inherent correlations among features, thereby generating high-level representations of the instances. Subsequently, to recover label distributions from obtained high-level representations, we design a distance-minimized and margin-penalized training strategy and preserve the consistency of label attributes. Extensive experiments conducted on 13 benchmark datasets of LDL validate the efficacy and competitiveness of DCLE.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111183"},"PeriodicalIF":7.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning data association for multi-object tracking using only coordinates 仅使用坐标学习多目标跟踪的数据关联
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-14 DOI: 10.1016/j.patcog.2024.111169
Mehdi Miah, Guillaume-Alexandre Bilodeau, Nicolas Saunier
We propose a novel Transformer-based module to address the data association problem for multi-object tracking. From detections obtained by a pretrained detector, this module uses only coordinates from bounding boxes to estimate an affinity score between pairs of tracks extracted from two distinct temporal windows. This module, named TWiX, is trained on sets of tracks with the objective of discriminating pairs of tracks coming from the same object from those which are not. Our module does not use the intersection over union measure, nor does it requires any motion priors or any camera motion compensation technique. By inserting TWiX within an online cascade matching pipeline, our tracker C-TWiX achieves state-of-the-art performance on the DanceTrack and KITTIMOT datasets, and gets competitive results on the MOT17 dataset. The code will be made available upon publication on the website https://mehdimiah.com/twix.
我们提出了一种基于变换器的新型模块,用于解决多目标跟踪的数据关联问题。通过预训练检测器获得的检测结果,该模块仅使用边界框中的坐标来估算从两个不同时间窗口中提取的轨迹对之间的亲和力得分。该模块被命名为 TWiX,在轨迹集上进行训练,目的是区分来自同一物体的轨迹对与非同一物体的轨迹对。我们的模块不使用 "交集大于联合 "的测量方法,也不需要任何运动先验或摄像机运动补偿技术。通过在在线级联匹配管道中插入 TWiX,我们的跟踪器 C-TWiX 在 DanceTrack 和 KITTIMOT 数据集上实现了最先进的性能,并在 MOT17 数据集上获得了具有竞争力的结果。代码将在网站 https://mehdimiah.com/twix 上公布。
{"title":"Learning data association for multi-object tracking using only coordinates","authors":"Mehdi Miah,&nbsp;Guillaume-Alexandre Bilodeau,&nbsp;Nicolas Saunier","doi":"10.1016/j.patcog.2024.111169","DOIUrl":"10.1016/j.patcog.2024.111169","url":null,"abstract":"<div><div>We propose a novel Transformer-based module to address the data association problem for multi-object tracking. From detections obtained by a pretrained detector, this module uses only coordinates from bounding boxes to estimate an affinity score between pairs of tracks extracted from two distinct temporal windows. This module, named TWiX, is trained on sets of tracks with the objective of discriminating pairs of tracks coming from the same object from those which are not. Our module does not use the intersection over union measure, nor does it requires any motion priors or any camera motion compensation technique. By inserting TWiX within an online cascade matching pipeline, our tracker C-TWiX achieves state-of-the-art performance on the DanceTrack and KITTIMOT datasets, and gets competitive results on the MOT17 dataset. The code will be made available upon publication on the website <span><span>https://mehdimiah.com/twix</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111169"},"PeriodicalIF":7.5,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pseudo-labeling with keyword refining for few-supervised video captioning 针对少数人监督的视频字幕,利用关键词提炼进行伪标记
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-14 DOI: 10.1016/j.patcog.2024.111176
Ping Li , Tao Wang , Xinkui Zhao , Xianghua Xu , Mingli Song
Video captioning generate a sentence that describes the video content. Existing methods always require a number of captions (e.g., 10 or 20) per video to train the model, which is quite costly. In this work, we explore the possibility of using only one or very few ground-truth sentences, and introduce a new task named few-supervised video captioning. Specifically, we propose a few-supervised video captioning framework that consists of lexically constrained pseudo-labeling module and keyword-refined captioning module. Unlike the random sampling in natural language processing that may cause invalid modifications (i.e., edit words), the former module guides the model to edit words using some actions (e.g., copy, replace, insert, and delete) by a pretrained token-level classifier, and then fine-tunes candidate sentences by a pretrained language model. Meanwhile, the former employs the repetition penalized sampling to encourage the model to yield concise pseudo-labeled sentences with less repetition, and selects the most relevant sentences upon a pretrained video-text model. Moreover, to keep semantic consistency between pseudo-labeled sentences and video content, we develop the transformer-based keyword refiner with the video-keyword gated fusion strategy to emphasize more on relevant words. Extensive experiments on several benchmarks demonstrate the advantages of the proposed approach in both few-supervised and fully-supervised scenarios.
视频字幕生成描述视频内容的句子。现有方法总是要求每段视频有一定数量的字幕(如 10 或 20 个)来训练模型,成本相当高。在这项工作中,我们探索了只使用一个或极少数地面实况句子的可能性,并引入了一项名为 "少数监督视频字幕 "的新任务。具体来说,我们提出了一种少数监督视频字幕制作框架,该框架由词汇约束伪标签模块和关键字提炼字幕制作模块组成。与自然语言处理中可能导致无效修改(即编辑词语)的随机抽样不同,前者通过预训练的标记级分类器引导模型使用一些操作(如复制、替换、插入和删除)来编辑词语,然后通过预训练的语言模型对候选句子进行微调。同时,前者采用了重复惩罚采样法,鼓励模型生成重复较少的简洁伪标签句子,并通过预训练的视频文本模型选择最相关的句子。此外,为了保持伪标签句子与视频内容之间的语义一致性,我们开发了基于转换器的关键词提炼器,并采用了视频-关键词门控融合策略,以更加强调相关词语。在多个基准上进行的广泛实验证明了所提出的方法在少数监督和完全监督场景下的优势。
{"title":"Pseudo-labeling with keyword refining for few-supervised video captioning","authors":"Ping Li ,&nbsp;Tao Wang ,&nbsp;Xinkui Zhao ,&nbsp;Xianghua Xu ,&nbsp;Mingli Song","doi":"10.1016/j.patcog.2024.111176","DOIUrl":"10.1016/j.patcog.2024.111176","url":null,"abstract":"<div><div>Video captioning generate a sentence that describes the video content. Existing methods always require a number of captions (e.g., 10 or 20) per video to train the model, which is quite costly. In this work, we explore the possibility of using only one or very few ground-truth sentences, and introduce a new task named few-supervised video captioning. Specifically, we propose a few-supervised video captioning framework that consists of lexically constrained pseudo-labeling module and keyword-refined captioning module. Unlike the random sampling in natural language processing that may cause invalid modifications (i.e., edit words), the former module guides the model to edit words using some actions (e.g., copy, replace, insert, and delete) by a pretrained token-level classifier, and then fine-tunes candidate sentences by a pretrained language model. Meanwhile, the former employs the repetition penalized sampling to encourage the model to yield concise pseudo-labeled sentences with less repetition, and selects the most relevant sentences upon a pretrained video-text model. Moreover, to keep semantic consistency between pseudo-labeled sentences and video content, we develop the transformer-based keyword refiner with the video-keyword gated fusion strategy to emphasize more on relevant words. Extensive experiments on several benchmarks demonstrate the advantages of the proposed approach in both few-supervised and fully-supervised scenarios.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111176"},"PeriodicalIF":7.5,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Progressive label enhancement 渐进式标签增强
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-14 DOI: 10.1016/j.patcog.2024.111172
Zhiqiang Kou , Jing Wang , Yuheng Jia , Xin Geng
Label Distribution Learning (LDL) leverages label distribution (LD) to represent instances, which helps solve label ambiguity. However, obtaining LD can be extremely challenging in many real-world scenarios. Label Enhancement (LE) has emerged as a solution to enhance logical labels to LD since logical labels are highly available. In this paper, we explore the application of dimension reduction techniques to enhance LE. We present a learning framework known as Progressive Label Enhancement (PLE). PLE progressively conducts dependency-maximization-oriented dimension reduction and LE. First, PLE generates LD by leveraging the manifold structure within the feature space induced by dependency-maximization-driven dimension reduction. Second, PLE optimizes the projection matrix for dependency maximization based on the obtained LD. Finally, extensive experiments conducted on 15 real-world datasets consistently demonstrate that PLE outperforms the other six comparative approaches.
标签分布学习(LDL)利用标签分布(LD)来表示实例,这有助于解决标签模糊性问题。然而,在现实世界的许多场景中,获取 LD 可能极具挑战性。由于逻辑标签的可用性很高,因此标签增强(LE)成为将逻辑标签增强为 LD 的一种解决方案。在本文中,我们将探索如何应用降维技术来增强标签增强功能。我们提出了一个称为渐进式标签增强(PLE)的学习框架。PLE 循序渐进地进行面向依赖关系最大化的降维和 LE。首先,PLE 利用依赖最大化驱动的降维在特征空间中形成的流形结构来生成 LD。其次,PLE 根据获得的 LD 优化投影矩阵,以实现依赖最大化。最后,在 15 个真实世界数据集上进行的大量实验一致证明,PLE 优于其他六种比较方法。
{"title":"Progressive label enhancement","authors":"Zhiqiang Kou ,&nbsp;Jing Wang ,&nbsp;Yuheng Jia ,&nbsp;Xin Geng","doi":"10.1016/j.patcog.2024.111172","DOIUrl":"10.1016/j.patcog.2024.111172","url":null,"abstract":"<div><div>Label Distribution Learning (LDL) leverages label distribution (LD) to represent instances, which helps solve label ambiguity. However, obtaining LD can be extremely challenging in many real-world scenarios. Label Enhancement (LE) has emerged as a solution to enhance logical labels to LD since logical labels are highly available. In this paper, we explore the application of dimension reduction techniques to enhance LE. We present a learning framework known as Progressive Label Enhancement (PLE). PLE progressively conducts dependency-maximization-oriented dimension reduction and LE. First, PLE generates LD by leveraging the manifold structure within the feature space induced by dependency-maximization-driven dimension reduction. Second, PLE optimizes the projection matrix for dependency maximization based on the obtained LD. Finally, extensive experiments conducted on 15 real-world datasets consistently demonstrate that PLE outperforms the other six comparative approaches.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111172"},"PeriodicalIF":7.5,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1