{"title":"使用图变换器,利用互补的局部和全局特征进行基于视频的人物再识别。","authors":"Hai Lu, Enbo Luo, Yong Feng, Yifan Wang","doi":"10.3934/mbe.2024293","DOIUrl":null,"url":null,"abstract":"<p><p>In recent years, significant progress has been made in video-based person re-identification (Re-ID). The key challenge in video person Re-ID lies in effectively constructing discriminative and robust person feature representations. Methods based on local regions utilize spatial and temporal attention to extract representative local features. However, prior approaches often overlook the correlations between local regions. To leverage relationships among different local regions, we have proposed a novel video person Re-ID representation learning approach based on a graph transformer, which facilitates contextual interactions between relevant region features. Specifically, we construct a local relation graph to model intrinsic relationships between nodes representing local regions. This graph employs the architecture of a transformer for feature propagation, iteratively refining region features and considering information from adjacent nodes to obtain partial feature representations. To learn compact and discriminative representations, we have further proposed a global feature learning branch based on a vision transformer to capture the relationships between different frames in a sequence. Additionally, we designed a dual-branch interaction network based on multi-head fusion attention to integrate frame-level features extracted by both local and global branches. Finally, the concatenated global and local features, after interaction, are used for testing. We evaluated the proposed method on three datasets, namely iLIDS-VID, MARS, and DukeMTMC-VideoReID. Experimental results demonstrate competitive performance, validating the effectiveness of our proposed approach.</p>","PeriodicalId":49870,"journal":{"name":"Mathematical Biosciences and Engineering","volume":"21 7","pages":"6694-6709"},"PeriodicalIF":2.6000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Video-based person re-identification with complementary local and global features using a graph transformer.\",\"authors\":\"Hai Lu, Enbo Luo, Yong Feng, Yifan Wang\",\"doi\":\"10.3934/mbe.2024293\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In recent years, significant progress has been made in video-based person re-identification (Re-ID). The key challenge in video person Re-ID lies in effectively constructing discriminative and robust person feature representations. Methods based on local regions utilize spatial and temporal attention to extract representative local features. However, prior approaches often overlook the correlations between local regions. To leverage relationships among different local regions, we have proposed a novel video person Re-ID representation learning approach based on a graph transformer, which facilitates contextual interactions between relevant region features. Specifically, we construct a local relation graph to model intrinsic relationships between nodes representing local regions. This graph employs the architecture of a transformer for feature propagation, iteratively refining region features and considering information from adjacent nodes to obtain partial feature representations. To learn compact and discriminative representations, we have further proposed a global feature learning branch based on a vision transformer to capture the relationships between different frames in a sequence. Additionally, we designed a dual-branch interaction network based on multi-head fusion attention to integrate frame-level features extracted by both local and global branches. Finally, the concatenated global and local features, after interaction, are used for testing. We evaluated the proposed method on three datasets, namely iLIDS-VID, MARS, and DukeMTMC-VideoReID. Experimental results demonstrate competitive performance, validating the effectiveness of our proposed approach.</p>\",\"PeriodicalId\":49870,\"journal\":{\"name\":\"Mathematical Biosciences and Engineering\",\"volume\":\"21 7\",\"pages\":\"6694-6709\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mathematical Biosciences and Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.3934/mbe.2024293\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Biosciences and Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3934/mbe.2024293","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
Video-based person re-identification with complementary local and global features using a graph transformer.
In recent years, significant progress has been made in video-based person re-identification (Re-ID). The key challenge in video person Re-ID lies in effectively constructing discriminative and robust person feature representations. Methods based on local regions utilize spatial and temporal attention to extract representative local features. However, prior approaches often overlook the correlations between local regions. To leverage relationships among different local regions, we have proposed a novel video person Re-ID representation learning approach based on a graph transformer, which facilitates contextual interactions between relevant region features. Specifically, we construct a local relation graph to model intrinsic relationships between nodes representing local regions. This graph employs the architecture of a transformer for feature propagation, iteratively refining region features and considering information from adjacent nodes to obtain partial feature representations. To learn compact and discriminative representations, we have further proposed a global feature learning branch based on a vision transformer to capture the relationships between different frames in a sequence. Additionally, we designed a dual-branch interaction network based on multi-head fusion attention to integrate frame-level features extracted by both local and global branches. Finally, the concatenated global and local features, after interaction, are used for testing. We evaluated the proposed method on three datasets, namely iLIDS-VID, MARS, and DukeMTMC-VideoReID. Experimental results demonstrate competitive performance, validating the effectiveness of our proposed approach.
期刊介绍:
Mathematical Biosciences and Engineering (MBE) is an interdisciplinary Open Access journal promoting cutting-edge research, technology transfer and knowledge translation about complex data and information processing.
MBE publishes Research articles (long and original research); Communications (short and novel research); Expository papers; Technology Transfer and Knowledge Translation reports (description of new technologies and products); Announcements and Industrial Progress and News (announcements and even advertisement, including major conferences).