Video-based person re-identification with complementary local and global features using a graph transformer.

IF 2.6 4区 工程技术 Q1 Mathematics Mathematical Biosciences and Engineering Pub Date : 2024-07-23 DOI:10.3934/mbe.2024293
Hai Lu, Enbo Luo, Yong Feng, Yifan Wang
{"title":"Video-based person re-identification with complementary local and global features using a graph transformer.","authors":"Hai Lu, Enbo Luo, Yong Feng, Yifan Wang","doi":"10.3934/mbe.2024293","DOIUrl":null,"url":null,"abstract":"<p><p>In recent years, significant progress has been made in video-based person re-identification (Re-ID). The key challenge in video person Re-ID lies in effectively constructing discriminative and robust person feature representations. Methods based on local regions utilize spatial and temporal attention to extract representative local features. However, prior approaches often overlook the correlations between local regions. To leverage relationships among different local regions, we have proposed a novel video person Re-ID representation learning approach based on a graph transformer, which facilitates contextual interactions between relevant region features. Specifically, we construct a local relation graph to model intrinsic relationships between nodes representing local regions. This graph employs the architecture of a transformer for feature propagation, iteratively refining region features and considering information from adjacent nodes to obtain partial feature representations. To learn compact and discriminative representations, we have further proposed a global feature learning branch based on a vision transformer to capture the relationships between different frames in a sequence. Additionally, we designed a dual-branch interaction network based on multi-head fusion attention to integrate frame-level features extracted by both local and global branches. Finally, the concatenated global and local features, after interaction, are used for testing. We evaluated the proposed method on three datasets, namely iLIDS-VID, MARS, and DukeMTMC-VideoReID. Experimental results demonstrate competitive performance, validating the effectiveness of our proposed approach.</p>","PeriodicalId":49870,"journal":{"name":"Mathematical Biosciences and Engineering","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Biosciences and Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3934/mbe.2024293","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, significant progress has been made in video-based person re-identification (Re-ID). The key challenge in video person Re-ID lies in effectively constructing discriminative and robust person feature representations. Methods based on local regions utilize spatial and temporal attention to extract representative local features. However, prior approaches often overlook the correlations between local regions. To leverage relationships among different local regions, we have proposed a novel video person Re-ID representation learning approach based on a graph transformer, which facilitates contextual interactions between relevant region features. Specifically, we construct a local relation graph to model intrinsic relationships between nodes representing local regions. This graph employs the architecture of a transformer for feature propagation, iteratively refining region features and considering information from adjacent nodes to obtain partial feature representations. To learn compact and discriminative representations, we have further proposed a global feature learning branch based on a vision transformer to capture the relationships between different frames in a sequence. Additionally, we designed a dual-branch interaction network based on multi-head fusion attention to integrate frame-level features extracted by both local and global branches. Finally, the concatenated global and local features, after interaction, are used for testing. We evaluated the proposed method on three datasets, namely iLIDS-VID, MARS, and DukeMTMC-VideoReID. Experimental results demonstrate competitive performance, validating the effectiveness of our proposed approach.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用图变换器,利用互补的局部和全局特征进行基于视频的人物再识别。
近年来,基于视频的人员再识别(Re-ID)技术取得了重大进展。视频人物再识别的关键挑战在于如何有效地构建具有辨别力和稳健性的人物特征表征。基于局部区域的方法利用空间和时间注意力来提取具有代表性的局部特征。然而,先前的方法往往忽略了局部区域之间的相关性。为了充分利用不同局部区域之间的关系,我们提出了一种基于图转换器的新型视频人物再识别表征学习方法,该方法可促进相关区域特征之间的上下文交互。具体来说,我们构建了一个局部关系图来模拟代表局部区域的节点之间的内在关系。该图采用变换器架构进行特征传播,迭代完善区域特征,并考虑相邻节点的信息,从而获得部分特征表征。为了学习紧凑且具有区分性的表征,我们进一步提出了基于视觉转换器的全局特征学习分支,以捕捉序列中不同帧之间的关系。此外,我们还设计了一个基于多头融合注意力的双分支交互网络,以整合由局部和全局分支提取的帧级特征。最后,交互后的全局和局部特征被用于测试。我们在 iLIDS-VID、MARS 和 DukeMTMC-VideoReID 三个数据集上评估了所提出的方法。实验结果表明,我们提出的方法具有竞争力,验证了其有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Mathematical Biosciences and Engineering
Mathematical Biosciences and Engineering 工程技术-数学跨学科应用
CiteScore
3.90
自引率
7.70%
发文量
586
审稿时长
>12 weeks
期刊介绍: Mathematical Biosciences and Engineering (MBE) is an interdisciplinary Open Access journal promoting cutting-edge research, technology transfer and knowledge translation about complex data and information processing. MBE publishes Research articles (long and original research); Communications (short and novel research); Expository papers; Technology Transfer and Knowledge Translation reports (description of new technologies and products); Announcements and Industrial Progress and News (announcements and even advertisement, including major conferences).
期刊最新文献
CTFusion: CNN-transformer-based self-supervised learning for infrared and visible image fusion. Video-based person re-identification with complementary local and global features using a graph transformer. Modeling free tumor growth: Discrete, continuum, and hybrid approaches to interpreting cancer development. Retraction notice to "A video images-aware knowledge extraction method for intelligent healthcare management of basketball players" [Mathematical Biosciences and Engineering 20(2) (2023) 1919-1937]. Improved optimizer with deep learning model for emotion detection and classification.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1