End-to-end Network Embedding Unsupervised Key Frame Extraction for Video-based Person Re-identification

2021 11th International Conference on Information Science and Technology (ICIST) Pub Date : 2021-05-21 DOI:10.1109/ICIST52614.2021.9440586

Ye Li, Xiaoyu Luo, Shaoqi Hou, Chao Li, Guangqiang Yin

{"title":"End-to-end Network Embedding Unsupervised Key Frame Extraction for Video-based Person Re-identification","authors":"Ye Li, Xiaoyu Luo, Shaoqi Hou, Chao Li, Guangqiang Yin","doi":"10.1109/ICIST52614.2021.9440586","DOIUrl":null,"url":null,"abstract":"At present, regarding the task of video-based person re-identification, the input sequences have subtle differences and large redundancies because there are not enough effective interventions in the extraction of frame sequences. Although some studies have mentioned that key frame should be extracted first, they have not jointed the key frame extraction and the person re-identification. Consequently, it is difficult to evaluate whether the extracted key frames are effective for person re-identification. In this paper, we introduce an End-to-end Network Embedding Unsupervised Key Frame Extraction (EKEN) to address the above problems. First, we design a key frame extraction module and train it using pseudo labels generated by hierarchical clustering to extract key frames. Second, we embed the key frame extraction module into the person re-identification task. The results of the key frame extraction and the pedestrian re-recognition are fed back to each other in time. The instant feedback promotes the synchronization optimization of these two modules. The mAP achieved by our method in the MARS dataset is improved by 0.7%, 2.9%, 2.1% and 2.3% over the methods based on Random, Evenly, Cluster and Frame difference, respectively. Particularly, our method is more fit for the real-world application comparing to existing methods.","PeriodicalId":371599,"journal":{"name":"2021 11th International Conference on Information Science and Technology (ICIST)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 11th International Conference on Information Science and Technology (ICIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIST52614.2021.9440586","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

At present, regarding the task of video-based person re-identification, the input sequences have subtle differences and large redundancies because there are not enough effective interventions in the extraction of frame sequences. Although some studies have mentioned that key frame should be extracted first, they have not jointed the key frame extraction and the person re-identification. Consequently, it is difficult to evaluate whether the extracted key frames are effective for person re-identification. In this paper, we introduce an End-to-end Network Embedding Unsupervised Key Frame Extraction (EKEN) to address the above problems. First, we design a key frame extraction module and train it using pseudo labels generated by hierarchical clustering to extract key frames. Second, we embed the key frame extraction module into the person re-identification task. The results of the key frame extraction and the pedestrian re-recognition are fed back to each other in time. The instant feedback promotes the synchronization optimization of these two modules. The mAP achieved by our method in the MARS dataset is improved by 0.7%, 2.9%, 2.1% and 2.3% over the methods based on Random, Evenly, Cluster and Frame difference, respectively. Particularly, our method is more fit for the real-world application comparing to existing methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于视频的人再识别端到端网络嵌入无监督关键帧提取

目前，在基于视频的人物再识别任务中，由于帧序列的提取没有足够的有效干预，输入序列存在细微的差异和较大的冗余。虽然有研究提出先提取关键帧，但并没有将关键帧提取与人物再识别结合起来。因此，很难评估提取的关键帧是否对人物再识别有效。在本文中，我们引入了端到端网络嵌入无监督关键帧提取(EKEN)来解决上述问题。首先，设计关键帧提取模块，并使用分层聚类生成的伪标签对其进行训练，提取关键帧;其次，将关键帧提取模块嵌入到人物再识别任务中。关键帧提取结果和行人再识别结果及时反馈。即时反馈促进了两个模块的同步优化。与基于Random、average、Cluster和Frame差分的方法相比，本文方法在MARS数据集上实现的mAP分别提高了0.7%、2.9%、2.1%和2.3%。特别是，与现有方法相比，我们的方法更适合实际应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 11th International Conference on Information Science and Technology (ICIST)

自引率

0.00%

发文量