Lizhi Geng, Dongming Zhou, Kerui Wang, Yisong Liu, Kaixiang Yan
{"title":"SiamMGT:通过图注意和可靠的模态权重学习实现鲁棒 RGBT 跟踪","authors":"Lizhi Geng, Dongming Zhou, Kerui Wang, Yisong Liu, Kaixiang Yan","doi":"10.1007/s11227-024-06443-9","DOIUrl":null,"url":null,"abstract":"<p>In recent years, RGBT trackers based on the Siamese network have gained significant attention due to their balanced accuracy and efficiency. However, these trackers often rely on similarity matching of features between a fixed-size target template and search region, which can result in unsatisfactory tracking performance when there are dramatic changes in target scale or shape or occlusion occurs. Additionally, while these trackers often employ feature-level fusion for different modalities, they frequently overlook the benefits of decision-level fusion, which can diminish their flexibility and independence. In this paper, a novel Siamese tracker through graph attention and reliable modality weighting is proposed for robust RGBT tracking. Specifically, a modality feature interaction learning network is constructed to perform bidirectional learning of the local features from each modality while extracting their respective characteristics. Subsequently, a multimodality graph attention network is used to match the local features of the template and search region, generating more accurate and robust similarity responses. Finally, a modality fusion prediction network is designed to perform decision-level adaptive fusion of the two modality responses, leveraging their complementary nature for prediction. Extensive experiments on three large-scale RGBT benchmarks demonstrate outstanding tracking capabilities over other state-of-the-art trackers. Code will be shared at https://github.com/genglizhi/SiamMGT.</p>","PeriodicalId":501596,"journal":{"name":"The Journal of Supercomputing","volume":"78 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SiamMGT: robust RGBT tracking via graph attention and reliable modality weight learning\",\"authors\":\"Lizhi Geng, Dongming Zhou, Kerui Wang, Yisong Liu, Kaixiang Yan\",\"doi\":\"10.1007/s11227-024-06443-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In recent years, RGBT trackers based on the Siamese network have gained significant attention due to their balanced accuracy and efficiency. However, these trackers often rely on similarity matching of features between a fixed-size target template and search region, which can result in unsatisfactory tracking performance when there are dramatic changes in target scale or shape or occlusion occurs. Additionally, while these trackers often employ feature-level fusion for different modalities, they frequently overlook the benefits of decision-level fusion, which can diminish their flexibility and independence. In this paper, a novel Siamese tracker through graph attention and reliable modality weighting is proposed for robust RGBT tracking. Specifically, a modality feature interaction learning network is constructed to perform bidirectional learning of the local features from each modality while extracting their respective characteristics. Subsequently, a multimodality graph attention network is used to match the local features of the template and search region, generating more accurate and robust similarity responses. Finally, a modality fusion prediction network is designed to perform decision-level adaptive fusion of the two modality responses, leveraging their complementary nature for prediction. Extensive experiments on three large-scale RGBT benchmarks demonstrate outstanding tracking capabilities over other state-of-the-art trackers. Code will be shared at https://github.com/genglizhi/SiamMGT.</p>\",\"PeriodicalId\":501596,\"journal\":{\"name\":\"The Journal of Supercomputing\",\"volume\":\"78 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Journal of Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s11227-024-06443-9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11227-024-06443-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SiamMGT: robust RGBT tracking via graph attention and reliable modality weight learning
In recent years, RGBT trackers based on the Siamese network have gained significant attention due to their balanced accuracy and efficiency. However, these trackers often rely on similarity matching of features between a fixed-size target template and search region, which can result in unsatisfactory tracking performance when there are dramatic changes in target scale or shape or occlusion occurs. Additionally, while these trackers often employ feature-level fusion for different modalities, they frequently overlook the benefits of decision-level fusion, which can diminish their flexibility and independence. In this paper, a novel Siamese tracker through graph attention and reliable modality weighting is proposed for robust RGBT tracking. Specifically, a modality feature interaction learning network is constructed to perform bidirectional learning of the local features from each modality while extracting their respective characteristics. Subsequently, a multimodality graph attention network is used to match the local features of the template and search region, generating more accurate and robust similarity responses. Finally, a modality fusion prediction network is designed to perform decision-level adaptive fusion of the two modality responses, leveraging their complementary nature for prediction. Extensive experiments on three large-scale RGBT benchmarks demonstrate outstanding tracking capabilities over other state-of-the-art trackers. Code will be shared at https://github.com/genglizhi/SiamMGT.