Deep Spatial-Temporal Fusion Network for Video-Based Person Re-identification

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2017-07-21 DOI:10.1109/CVPRW.2017.191

Lin Chen, Hua Yang, Ji Zhu, Qin Zhou, Shuang Wu, Zhiyong Gao

引用次数: 21

Abstract

In this paper, we propose a novel deep end-to-end network to automatically learn the spatial-temporal fusion features for video-based person re-identification. Specifically, the proposed network consists of CNN and RNN to jointly learn both the spatial and the temporal features of input image sequences. The network is optimized by utilizing the siamese and softmax losses simultaneously to pull the instances of the same person closer and push the instances of different persons apart. Our network is trained on full-body and part-body image sequences respectively to learn complementary representations from holistic and local perspectives. By combining them together, we obtain more discriminative features that are beneficial to person re-identification. Experiments conducted on the PRID-2011, i-LIDS-VIS and MARS datasets show that the proposed method performs favorably against existing approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于视频的人物再识别深度时空融合网络

在本文中，我们提出了一种新的深度端到端网络来自动学习基于视频的人物再识别的时空融合特征。具体来说，该网络由CNN和RNN组成，共同学习输入图像序列的空间和时间特征。通过同时利用siamese和softmax损失来优化网络，将同一个人的实例拉得更近，并将不同人的实例分开。我们的网络分别在全身和部分身体图像序列上进行训练，从整体和局部角度学习互补表示。将它们结合在一起，我们得到了更多有利于人的再识别的判别特征。在PRID-2011、i- lid - vis和MARS数据集上进行的实验表明，该方法优于现有方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

自引率

0.00%

发文量