DCSG: data complement pseudo-label refinement and self-guided pre-training for unsupervised person re-identification

Qing Han, Jiongjin Chen, Weidong Min, Jiahao Li, Lixin Zhan, Longfei Li
{"title":"DCSG: data complement pseudo-label refinement and self-guided pre-training for unsupervised person re-identification","authors":"Qing Han, Jiongjin Chen, Weidong Min, Jiahao Li, Lixin Zhan, Longfei Li","doi":"10.1007/s00371-024-03542-9","DOIUrl":null,"url":null,"abstract":"<p>Existing unsupervised person re-identification (Re-ID) methods use clustering to generate pseudo-labels that are generally noisy, and initializing the model with ImageNet pre-training weights introduces a large domain gap that severely impacts the model’s performance. To address the aforementioned issues, we propose the data complement pseudo-label refinement and self-guided pre-training framework, referred to as DCSG. Firstly, our method utilizes image information from multiple augmentation views to complement the source image data, resulting in aggregated information. We employ this aggregated information to design a correlation score that serves as a reliability evaluation for the source features and cluster centroids. By optimizing the pseudo-labels for each sample, we enhance their robustness. Secondly, we propose a pre-training strategy that leverages the potential information within the training process. This strategy involves mining classes with high similarity in the training set to guide model training and facilitate smooth pre-training. Consequently, the model acquires preliminary capabilities to distinguish pedestrian-related features at an early stage of training, thereby reducing the impact of domain gaps arising from ImageNet pre-training weights. Our method demonstrates superior performance on multiple person Re-ID datasets, validating the effectiveness of our proposed approach. Notably, it achieves an mAP metric of 84.3% on the Market1501 dataset, representing a 2.8% improvement compared to the state-of-the-art method. The code is available at https://github.com/duolaJohn/DCSG.git.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03542-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Existing unsupervised person re-identification (Re-ID) methods use clustering to generate pseudo-labels that are generally noisy, and initializing the model with ImageNet pre-training weights introduces a large domain gap that severely impacts the model’s performance. To address the aforementioned issues, we propose the data complement pseudo-label refinement and self-guided pre-training framework, referred to as DCSG. Firstly, our method utilizes image information from multiple augmentation views to complement the source image data, resulting in aggregated information. We employ this aggregated information to design a correlation score that serves as a reliability evaluation for the source features and cluster centroids. By optimizing the pseudo-labels for each sample, we enhance their robustness. Secondly, we propose a pre-training strategy that leverages the potential information within the training process. This strategy involves mining classes with high similarity in the training set to guide model training and facilitate smooth pre-training. Consequently, the model acquires preliminary capabilities to distinguish pedestrian-related features at an early stage of training, thereby reducing the impact of domain gaps arising from ImageNet pre-training weights. Our method demonstrates superior performance on multiple person Re-ID datasets, validating the effectiveness of our proposed approach. Notably, it achieves an mAP metric of 84.3% on the Market1501 dataset, representing a 2.8% improvement compared to the state-of-the-art method. The code is available at https://github.com/duolaJohn/DCSG.git.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DCSG:数据补充伪标签完善和无监督人员再识别的自我指导预训练
现有的无监督人员再识别(Re-ID)方法使用聚类来生成伪标签,而这些伪标签通常是有噪声的,而且使用 ImageNet 预训练权重初始化模型会带来很大的领域差距,严重影响模型的性能。为了解决上述问题,我们提出了数据补充伪标签完善和自引导预训练框架,简称 DCSG。首先,我们的方法利用来自多个增强视图的图像信息对源图像数据进行补充,从而产生聚合信息。我们利用这些聚合信息来设计一个相关性得分,作为源特征和聚类中心点的可靠性评估。通过优化每个样本的伪标签,我们增强了它们的鲁棒性。其次,我们提出了一种在训练过程中利用潜在信息的预训练策略。该策略包括挖掘训练集中相似度高的类别,以指导模型训练并促进预训练的顺利进行。因此,模型在训练的早期阶段就获得了分辨行人相关特征的初步能力,从而减少了 ImageNet 预训练权重带来的领域差距的影响。我们的方法在多个人物再识别数据集上表现出了卓越的性能,验证了我们提出的方法的有效性。值得注意的是,它在 Market1501 数据集上实现了 84.3% 的 mAP 指标,与最先进的方法相比提高了 2.8%。代码见 https://github.com/duolaJohn/DCSG.git。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Predicting pancreatic diseases from fundus images using deep learning A modal fusion network with dual attention mechanism for 6D pose estimation Crafting imperceptible and transferable adversarial examples: leveraging conditional residual generator and wavelet transforms to deceive deepfake detection HCT-Unet: multi-target medical image segmentation via a hybrid CNN-transformer Unet incorporating multi-axis gated multi-layer perceptron HASN: hybrid attention separable network for efficient image super-resolution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1