IET Computer Vision最新文献

英文中文

Robust object tracking via ensembling semantic-aware network and redetection 基于集成语义感知网络和重检测的鲁棒目标跟踪

IF 1.7 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2023-06-24 DOI: 10.1049/cvi2.12219

Peiqiang Liu, Qifeng Liang, Zhiyong An, Jingyi Fu, Yanyan Mao

Most Siamese-based trackers use classification and regression to determine the target bounding box, which can be formulated as a linear matching process of the template and search region. However, this only takes into account the similarity of features while ignoring the semantic object information, resulting in some cases in which the regression box with the highest classification score is not accurate. To address the lack of semantic information, an object tracking approach based on an ensemble semantic-aware network and redetection (ESART) is proposed. Furthermore, a DarkNet53 network with transfer learning is used as our semantic-aware model to adapt the detection task for extracting semantic information. In addition, a semantic tag redetection method to re-evaluate the bounding box and overcome inaccurate scaling issues is proposed. Extensive experiments based on OTB2015, UAV123, UAV20L, and GOT-10k show that our tracker is superior to other state-of-the-art trackers. It is noteworthy that our semantic-aware ensemble method can be embedded into any tracker for classification and regression task.

大多数基于连体的跟踪器使用分类和回归来确定目标边界框，这可以表述为模板和搜索区域的线性匹配过程。然而，这种方法只考虑了特征的相似性，却忽略了物体的语义信息，导致在某些情况下，分类得分最高的回归框并不准确。为了解决语义信息缺乏的问题，我们提出了一种基于集合语义感知网络和再检测（ESART）的物体跟踪方法。此外，我们还使用了具有迁移学习功能的 DarkNet53 网络作为语义感知模型，以适应提取语义信息的检测任务。此外，还提出了一种语义标签再检测方法，用于重新评估边界框和克服不准确的缩放问题。基于 OTB2015、UAV123、UAV20L 和 GOT-10k 的大量实验表明，我们的跟踪器优于其他最先进的跟踪器。值得注意的是，我们的语义感知集合方法可以嵌入到任何跟踪器中，用于分类和回归任务。

{"title":"Robust object tracking via ensembling semantic-aware network and redetection","authors":"Peiqiang Liu, Qifeng Liang, Zhiyong An, Jingyi Fu, Yanyan Mao","doi":"10.1049/cvi2.12219","DOIUrl":"10.1049/cvi2.12219","url":null,"abstract":"Most Siamese-based trackers use classification and regression to determine the target bounding box, which can be formulated as a linear matching process of the template and search region. However, this only takes into account the similarity of features while ignoring the semantic object information, resulting in some cases in which the regression box with the highest classification score is not accurate. To address the lack of semantic information, an object tracking approach based on an ensemble semantic-aware network and redetection (ESART) is proposed. Furthermore, a DarkNet53 network with transfer learning is used as our semantic-aware model to adapt the detection task for extracting semantic information. In addition, a semantic tag redetection method to re-evaluate the bounding box and overcome inaccurate scaling issues is proposed. Extensive experiments based on OTB2015, UAV123, UAV20L, and GOT-10k show that our tracker is superior to other state-of-the-art trackers. It is noteworthy that our semantic-aware ensemble method can be embedded into any tracker for classification and regression task.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 1","pages":"46-59"},"PeriodicalIF":1.7,"publicationDate":"2023-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12219","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42081075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Attribute-guided transformer for robust person re-identification 用于稳健人员重新识别的属性导向变换器

IF 1.7 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2023-06-23 DOI: 10.1049/cvi2.12215

Zhe Wang, Jun Wang, Junliang Xing

Recent studies reveal the crucial role of local features in learning robust and discriminative representations for person re-identification (Re-ID). Existing approaches typically rely on external tasks, for example, semantic segmentation, or pose estimation, to locate identifiable parts of given images. However, they heuristically utilise the predictions from off-the-shelf models, which may be sub-optimal in terms of both local partition and computational efficiency. They also ignore the mutual information with other inputs, which weakens the representation capabilities of local features. In this study, the authors put forward a novel Attribute-guided Transformer (AiT), which explicitly exploits pedestrian attributes as semantic priors for discriminative representation learning. Specifically, the authors first introduce an attribute learning process, which generates a set of attention maps highlighting the informative parts of pedestrian images. Then, the authors design a Feature Diffusion Module (FDM) to iteratively inject attribute information into global feature maps, aiming at suppressing unnecessary noise and inferring attribute-aware representations. Last, the authors propose a Feature Aggregation Module (FAM) to exploit mutual information for aggregating attribute characteristics from different images, enhancing the representation capabilities of feature embedding. Extensive experiments demonstrate the superiority of our AiT in learning robust and discriminative representations. As a result, the authors achieve competitive performance with state-of-the-art methods on several challenging benchmarks without any bells and whistles.

最近的研究揭示了局部特征在学习稳健且具有辨别力的人物再识别（Re-ID）表征中的关键作用。现有方法通常依赖于外部任务，例如语义分割或姿势估计，来定位给定图像的可识别部分。然而，这些方法只是启发式地利用现成模型的预测结果，在局部分割和计算效率方面可能都不是最佳的。它们还忽略了与其他输入的互信息，从而削弱了局部特征的表示能力。在这项研究中，作者提出了一种新颖的 "属性引导转换器"（Attribute-guided Transformer，AiT），该转换器明确利用行人属性作为语义前置条件来进行判别表征学习。具体来说，作者首先引入了一个属性学习过程，该过程会生成一组注意力地图，突出行人图像的信息部分。然后，作者设计了一个特征扩散模块（FDM），迭代地将属性信息注入全局特征图中，旨在抑制不必要的噪音，推断出属性感知表征。最后，作者提出了特征聚合模块（FAM），利用互信息聚合不同图像的属性特征，增强特征嵌入的表示能力。广泛的实验证明了我们的 AiT 在学习鲁棒性和鉴别性表征方面的优越性。因此，作者在几个具有挑战性的基准测试中取得了与最先进方法相媲美的性能，而且没有任何附加功能。

{"title":"Attribute-guided transformer for robust person re-identification","authors":"Zhe Wang, Jun Wang, Junliang Xing","doi":"10.1049/cvi2.12215","DOIUrl":"10.1049/cvi2.12215","url":null,"abstract":"Recent studies reveal the crucial role of local features in learning robust and discriminative representations for person re-identification (Re-ID). Existing approaches typically rely on external tasks, for example, semantic segmentation, or pose estimation, to locate identifiable parts of given images. However, they heuristically utilise the predictions from off-the-shelf models, which may be sub-optimal in terms of both local partition and computational efficiency. They also ignore the mutual information with other inputs, which weakens the representation capabilities of local features. In this study, the authors put forward a novel Attribute-guided Transformer (AiT), which explicitly exploits pedestrian attributes as semantic priors for discriminative representation learning. Specifically, the authors first introduce an attribute learning process, which generates a set of attention maps highlighting the informative parts of pedestrian images. Then, the authors design a Feature Diffusion Module (FDM) to iteratively inject attribute information into global feature maps, aiming at suppressing unnecessary noise and inferring attribute-aware representations. Last, the authors propose a Feature Aggregation Module (FAM) to exploit mutual information for aggregating attribute characteristics from different images, enhancing the representation capabilities of feature embedding. Extensive experiments demonstrate the superiority of our AiT in learning robust and discriminative representations. As a result, the authors achieve competitive performance with state-of-the-art methods on several challenging benchmarks without any bells and whistles.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 8","pages":"977-992"},"PeriodicalIF":1.7,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12215","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49366041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DASTSiam: Spatio-temporal fusion and discriminative enhancement for Siamese visual tracking DASTSiam：暹罗视觉跟踪的时空融合和判别增强

IF 1.7 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2023-06-19 DOI: 10.1049/cvi2.12213

Yucheng Huang, Eksan Firkat, Jinlai Zhang, Lijuan Zhu, Bin Zhu, Jihong Zhu, Askar Hamdulla

The use of deep neural networks has revolutionised object tracking tasks, and Siamese trackers have emerged as a prominent technique for this purpose. Existing Siamese trackers use a fixed template or template updating technique, but it is prone to overfitting, lacks the capacity to exploit global temporal sequences, and cannot utilise multi-layer features. As a result, it is challenging to deal with dramatic appearance changes in complicated scenarios. Siamese trackers also struggle to learn background information, which impairs their discriminative ability. Hence, two transformer-based modules, the Spatio-Temporal Fusion (ST) module and the Discriminative Enhancement (DE) module, are proposed to improve the performance of Siamese trackers. The ST module leverages cross-attention to accumulate global temporal cues and generates an attention matrix with ST similarity to enhance the template's adaptability to changes in target appearance. The DE module associates semantically similar points from the template and search area, thereby generating a learnable discriminative mask to enhance the discriminative ability of the Siamese trackers. In addition, a Multi-Layer ST module (ST + ML) was constructed, which can be integrated into Siamese trackers based on multi-layer cross-correlation for further improvement. The authors evaluate the proposed modules on four public datasets and show comparative performance compared to existing Siamese trackers.

深度神经网络的使用给物体追踪任务带来了革命性的变化，而连体追踪器已成为这方面的一项突出技术。现有的连体追踪器使用固定模板或模板更新技术，但容易出现过度拟合，缺乏利用全局时序的能力，也无法利用多层特征。因此，在复杂场景中处理剧烈的外观变化具有挑战性。连体跟踪器在学习背景信息方面也很吃力，这削弱了其辨别能力。因此，我们提出了两个基于变换器的模块，即时空融合（ST）模块和判别增强（DE）模块，以提高连体跟踪器的性能。ST 模块利用交叉注意力来积累全局时间线索，并生成具有 ST 相似性的注意力矩阵，以增强模板对目标外观变化的适应性。DE 模块将模板和搜索区域中语义相似的点联系起来，从而生成可学习的分辨掩码，以增强连体跟踪器的分辨能力。此外，作者还构建了一个多层 ST 模块（ST + ML），该模块可集成到基于多层交叉相关的连体跟踪器中，以进一步提高跟踪能力。作者在四个公共数据集上对所提出的模块进行了评估，并显示了与现有连体跟踪器的比较性能。

{"title":"DASTSiam: Spatio-temporal fusion and discriminative enhancement for Siamese visual tracking","authors":"Yucheng Huang, Eksan Firkat, Jinlai Zhang, Lijuan Zhu, Bin Zhu, Jihong Zhu, Askar Hamdulla","doi":"10.1049/cvi2.12213","DOIUrl":"10.1049/cvi2.12213","url":null,"abstract":"The use of deep neural networks has revolutionised object tracking tasks, and Siamese trackers have emerged as a prominent technique for this purpose. Existing Siamese trackers use a fixed template or template updating technique, but it is prone to overfitting, lacks the capacity to exploit global temporal sequences, and cannot utilise multi-layer features. As a result, it is challenging to deal with dramatic appearance changes in complicated scenarios. Siamese trackers also struggle to learn background information, which impairs their discriminative ability. Hence, two transformer-based modules, the Spatio-Temporal Fusion (ST) module and the Discriminative Enhancement (DE) module, are proposed to improve the performance of Siamese trackers. The ST module leverages cross-attention to accumulate global temporal cues and generates an attention matrix with ST similarity to enhance the template's adaptability to changes in target appearance. The DE module associates semantically similar points from the template and search area, thereby generating a learnable discriminative mask to enhance the discriminative ability of the Siamese trackers. In addition, a Multi-Layer ST module (ST + ML) was constructed, which can be integrated into Siamese trackers based on multi-layer cross-correlation for further improvement. The authors evaluate the proposed modules on four public datasets and show comparative performance compared to existing Siamese trackers.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 8","pages":"1017-1033"},"PeriodicalIF":1.7,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12213","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48829474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

The following article for this Special Issue was published in a different issue 本期特刊的以下文章发表在另一期上

IF 1.7 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2023-06-17 DOI: 10.1049/cvi2.12211

引用次数: 0

The following article for this Special Issue was published in a different issue 本期特刊的以下文章发表在另一期

IF 1.7 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2023-06-17 DOI: 10.1049/cvi2.12211

Fan Liu, Feifan Li, Sai Yang. Few-shot classification using Gaussianisation prototypical classifier.

IET Computer Vision 2023 February; 17(1); 62–75. https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cvi2.12129

刘帆，李非凡，赛扬。使用高斯化原型分类器的少量镜头分类。IET计算机视觉2023年2月；17（1）；62–75。https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cvi2.12129

引用次数: 0

A monocular image depth estimation method based on weighted fusion and point-wise convolution 基于加权融合和点向卷积的单眼图像深度估计方法

IF 1.7 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2023-06-14 DOI: 10.1049/cvi2.12212

Chen Lei, Liang Zhengyou, Sun Yu

The existing monocular depth estimation methods based on deep learning have difficulty in estimating the depth near the edges of the objects in an image when the depth distance between these objects changes abruptly and decline in accuracy when an image has more noises. Furthermore, these methods consume more hardware resources because they have huge network parameters. To solve these problems, this paper proposes a depth estimation method based on weighted fusion and point-wise convolution. The authors design a maximum-average adaptive pooling weighted fusion module (MAWF) that fuses global features and local features and a continuous point-wise convolution module for processing the fused features derived from the (MAWF) module. The two modules work closely together for three times to perform weighted fusion and point-wise convolution of features of multi-scale from the encoder output, which can better decode the depth information of a scene. Experimental results show that our method achieves state-of-the-art performance on the KITTI dataset with δ₁ up to 0.996 and the root mean square error metric down to 8% and has demonstrated the strong generalisation and robustness.

现有的基于深度学习的单目深度估算方法难以估算图像中物体边缘附近的深度，当这些物体之间的深度距离发生突然变化时，深度估算的准确性就会下降。此外，由于这些方法的网络参数巨大，因此会消耗更多的硬件资源。为了解决这些问题，本文提出了一种基于加权融合和点卷积的深度估计方法。作者设计了一个最大平均自适应池化加权融合模块（MAWF），用于融合全局特征和局部特征；还设计了一个连续点式卷积模块，用于处理从（MAWF）模块得到的融合特征。这两个模块三次紧密配合，对编码器输出的多尺度特征进行加权融合和点卷积，从而更好地解码场景的深度信息。实验结果表明，我们的方法在 KITTI 数据集上取得了最先进的性能，δ1 高达 0.996，均方根误差指标低至 8%，并表现出很强的泛化能力和鲁棒性。

{"title":"A monocular image depth estimation method based on weighted fusion and point-wise convolution","authors":"Chen Lei, Liang Zhengyou, Sun Yu","doi":"10.1049/cvi2.12212","DOIUrl":"10.1049/cvi2.12212","url":null,"abstract":"The existing monocular depth estimation methods based on deep learning have difficulty in estimating the depth near the edges of the objects in an image when the depth distance between these objects changes abruptly and decline in accuracy when an image has more noises. Furthermore, these methods consume more hardware resources because they have huge network parameters. To solve these problems, this paper proposes a depth estimation method based on weighted fusion and point-wise convolution. The authors design a maximum-average adaptive pooling weighted fusion module (MAWF) that fuses global features and local features and a continuous point-wise convolution module for processing the fused features derived from the (MAWF) module. The two modules work closely together for three times to perform weighted fusion and point-wise convolution of features of multi-scale from the encoder output, which can better decode the depth information of a scene. Experimental results show that our method achieves state-of-the-art performance on the KITTI dataset with δ1 up to 0.996 and the root mean square error metric down to 8% and has demonstrated the strong generalisation and robustness.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 8","pages":"1005-1016"},"PeriodicalIF":1.7,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12212","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46812260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generalizable and efficient cross-domain person re-identification model using deep metric learning 基于深度度量学习的可泛化高效跨领域人员再识别模型

IF 1.7 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2023-06-13 DOI: 10.1049/cvi2.12214

Saba Sadat Faghih Imani, Kazim Fouladi-Ghaleh, Hossein Aghababa

Most of the successful person re-ID models conduct supervised training and need a large number of training data. These models fail to generalise well on unseen unlabelled testing sets. The authors aim to learn a generalisable person re-identification model. The model uses one labelled source dataset and one unlabelled target dataset during training and generalises well on the target testing set. To this end, after a feature extraction by the ResNext-50 network, the authors optimise the model by three loss functions. (a) One loss function is designed to learn the features of the target domain by tuning the distances between target images. Therefore, the trained model will be more robust to overcome the intra-domain variations in the target domain and generalises well on the target testing set. (b) One triplet loss is used which considers both source and target domains and makes the model learn the inter-domain variations between source and target domain as well as the variations in the target domain. (c) Also, one loss function is for supervised learning on the labelled source domain. Extensive experiments on Market1501 and DukeMTMC re-ID show that the model achieves a very competitive performance compared with state-of-the-art models and also it requires an acceptable amount of GPU RAM compared to other successful models.

大多数成功的人物再识别模型都是在监督下进行训练的，需要大量的训练数据。这些模型无法在不可见的无标签测试集上很好地泛化。作者的目标是学习一个可泛化的人物再识别模型。该模型在训练过程中使用一个有标签的源数据集和一个无标签的目标数据集，并能在目标测试集上很好地泛化。为此，在 ResNext-50 网络进行特征提取后，作者通过三个损失函数对模型进行了优化。(a) 其中一个损失函数旨在通过调整目标图像之间的距离来学习目标域的特征。因此，训练出的模型将更加稳健，能够克服目标域中的域内变化，并能在目标测试集上很好地泛化。(b) 使用一个三重损失函数，同时考虑源域和目标域，使模型学习源域和目标域之间的域间变化以及目标域中的变化。(c) 此外，还有一个损失函数用于对标记的源域进行监督学习。在 Market1501 和 DukeMTMC re-ID 上进行的大量实验表明，与最先进的模型相比，该模型的性能极具竞争力，而且与其他成功的模型相比，它所需的 GPU 内存量也是可以接受的。

{"title":"Generalizable and efficient cross-domain person re-identification model using deep metric learning","authors":"Saba Sadat Faghih Imani, Kazim Fouladi-Ghaleh, Hossein Aghababa","doi":"10.1049/cvi2.12214","DOIUrl":"10.1049/cvi2.12214","url":null,"abstract":"Most of the successful person re-ID models conduct supervised training and need a large number of training data. These models fail to generalise well on unseen unlabelled testing sets. The authors aim to learn a generalisable person re-identification model. The model uses one labelled source dataset and one unlabelled target dataset during training and generalises well on the target testing set. To this end, after a feature extraction by the ResNext-50 network, the authors optimise the model by three loss functions. (a) One loss function is designed to learn the features of the target domain by tuning the distances between target images. Therefore, the trained model will be more robust to overcome the intra-domain variations in the target domain and generalises well on the target testing set. (b) One triplet loss is used which considers both source and target domains and makes the model learn the inter-domain variations between source and target domain as well as the variations in the target domain. (c) Also, one loss function is for supervised learning on the labelled source domain. Extensive experiments on Market1501 and DukeMTMC re-ID show that the model achieves a very competitive performance compared with state-of-the-art models and also it requires an acceptable amount of GPU RAM compared to other successful models.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 8","pages":"993-1004"},"PeriodicalIF":1.7,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12214","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41952356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Erratum: Integration graph attention network and multi-centre constrained loss for cross-modality person re-identification 更正:跨模态人再识别的整合图、注意网络和多中心约束损失

IF 1.7 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2023-06-01 DOI: 10.1049/cvi2.12210

引用次数: 0

Erratum: Integration graph attention network and multi-centre constrained loss for cross-modality person re-identification 勘误表：跨模态人再识别的集成图注意力网络和多中心约束损失

IF 1.7 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2023-06-01 DOI: 10.1049/cvi2.12210

The authors wish to bring to the readers' attention the following errors in the article by He, D., et al.: Integration graph attention network and multi-centre constrained loss for cross-modality person re-identification [1].

In Funding Information section the funding number for National Natural Science Foundation of China is incorrectly mentioned as 2022KYCX032Z. It should be 62171321.

作者希望提请读者注意何等文章中的以下错误：跨模态人再识别的集成图注意力网络和多中心约束损失[1]。在资助信息部分，国家自然科学基金的资助编号被错误地提到为2022KYCX032Z。应该是62171321。

引用次数: 0

Sketch face recognition based on light semantic Transformer network 基于光语义变换器网络的素描人脸识别

IF 1.7 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision

Pub Date : 2023-05-30 DOI: 10.1049/cvi2.12209

Lin Cao, Jianqiang Yin, Yanan Guo, Kangning Du, Fan Zhang

Sketch face recognition has a wide range of applications in criminal investigation, but it remains a challenging task due to the small-scale sample and the semantic deficiencies caused by cross-modality differences. The authors propose a light semantic Transformer network to extract and model the semantic information of cross-modality images. First, the authors employ a meta-learning training strategy to obtain task-related training samples to solve the small sample problem. Then to solve the contradiction between the high complexity of the Transformer and the small sample problem of sketch face recognition, the authors build the light semantic transformer network by proposing a hierarchical group linear transformation and introducing parameter sharing, which can extract highly discriminative semantic features on small–scale datasets. Finally, the authors propose a domain-adaptive focal loss to reduce the cross-modality differences between sketches and photos and improve the training effect of the light semantic Transformer network. Extensive experiments have shown that the features extracted by the proposed method have significant discriminative effects. The authors’ method improves the recognition rate by 7.6% on the UoM-SGFSv2 dataset, and the recognition rate reaches 92.59% on the CUFSF dataset.

素描人脸识别在刑事侦查中有着广泛的应用，但由于样本规模小以及跨模态差异造成的语义缺陷，它仍然是一项具有挑战性的任务。作者提出了一种轻语义变换器网络来提取跨模态图像的语义信息并建立模型。首先，作者采用元学习训练策略获取与任务相关的训练样本，以解决小样本问题。然后，为了解决变换器的高复杂性与素描人脸识别的小样本问题之间的矛盾，作者通过提出分层群线性变换并引入参数共享，构建了轻语义变换器网络，该网络可以在小规模数据集上提取高辨别度的语义特征。最后，作者提出了一种域自适应焦点损失，以减少草图和照片之间的跨模态差异，提高光语义变换器网络的训练效果。大量实验表明，所提方法提取的特征具有显著的识别效果。作者的方法在 UoM-SGFSv2 数据集上的识别率提高了 7.6%，在 CUFSF 数据集上的识别率达到了 92.59%。

{"title":"Sketch face recognition based on light semantic Transformer network","authors":"Lin Cao, Jianqiang Yin, Yanan Guo, Kangning Du, Fan Zhang","doi":"10.1049/cvi2.12209","DOIUrl":"10.1049/cvi2.12209","url":null,"abstract":"Sketch face recognition has a wide range of applications in criminal investigation, but it remains a challenging task due to the small-scale sample and the semantic deficiencies caused by cross-modality differences. The authors propose a light semantic Transformer network to extract and model the semantic information of cross-modality images. First, the authors employ a meta-learning training strategy to obtain task-related training samples to solve the small sample problem. Then to solve the contradiction between the high complexity of the Transformer and the small sample problem of sketch face recognition, the authors build the light semantic transformer network by proposing a hierarchical group linear transformation and introducing parameter sharing, which can extract highly discriminative semantic features on small–scale datasets. Finally, the authors propose a domain-adaptive focal loss to reduce the cross-modality differences between sketches and photos and improve the training effect of the light semantic Transformer network. Extensive experiments have shown that the features extracted by the proposed method have significant discriminative effects. The authors’ method improves the recognition rate by 7.6% on the UoM-SGFSv2 dataset, and the recognition rate reaches 92.59% on the CUFSF dataset.","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"17 8","pages":"962-976"},"PeriodicalIF":1.7,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12209","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135641694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IET Computer Vision

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀