Zhiguo Zhang , Zhiqing Guo , Liejun Wang, Yongming Li
{"title":"CTIFTrack:用于物体跟踪的连续时态信息融合技术","authors":"Zhiguo Zhang , Zhiqing Guo , Liejun Wang, Yongming Li","doi":"10.1016/j.eswa.2024.125654","DOIUrl":null,"url":null,"abstract":"<div><div>In visual tracking tasks, researchers usually focus on increasing the complexity of the model or only discretely focusing on the changes in the object itself to achieve accurate recognition and tracking of the moving object. However, they often overlook the significant contribution of video-level linear temporal information fusion and continuous spatiotemporal mapping to tracking tasks. This oversight may lead to poor tracking performance or insufficient real-time ability of the model in complex scenes. Therefore, this paper proposes a real-time tracker, namely Continuous Temporal Information Fusion Tracker (CTIFTrack). The key of CTIFTrack lies in its well-designed Temporal Information Fusion (TIF) module, which cleverly performs a linear fusion of the temporal information between the <span><math><mrow><mrow><mo>(</mo><mi>t</mi><mtext>-</mtext><mn>1</mn><mo>)</mo></mrow><mtext>-th</mtext></mrow></math></span> and the <span><math><mrow><mi>t</mi><mtext>-th</mtext></mrow></math></span> frames and completes the spatiotemporal mapping. This enables the tracker to better understand the overall spatiotemporal information and contextual spatiotemporal correlations within the video, thereby having a positive impact on the tracking task. In addition, this paper also proposes the Object Template Feature Refinement (OTFR) module, which effectively captures the global information and local details of the object, and further improves the tracker’s understanding of the object features. Extensive experiments are conducted on seven benchmarks, such as LaSOT, GOT-10K, UAV123, NFS, TrackingNet, VOT2018 and OTB-100. The experimental results validate the significant contribution of the TIF module and OTFR module to the tracking task, as well as the effectiveness of CTIFTrack. It is worth noting that while maintaining excellent tracking performance, CTIFTrack also shows outstanding real-time tracking speed. On the Nvidia Tesla T4-16GB GPU, the <span><math><mrow><mi>F</mi><mi>P</mi><mi>S</mi></mrow></math></span> of CTIFTrack reaches 71.98. The code and demo materials will be available at <span><span>https://github.com/vpsg-research/CTIFTrack</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"262 ","pages":"Article 125654"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CTIFTrack: Continuous Temporal Information Fusion for object track\",\"authors\":\"Zhiguo Zhang , Zhiqing Guo , Liejun Wang, Yongming Li\",\"doi\":\"10.1016/j.eswa.2024.125654\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In visual tracking tasks, researchers usually focus on increasing the complexity of the model or only discretely focusing on the changes in the object itself to achieve accurate recognition and tracking of the moving object. However, they often overlook the significant contribution of video-level linear temporal information fusion and continuous spatiotemporal mapping to tracking tasks. This oversight may lead to poor tracking performance or insufficient real-time ability of the model in complex scenes. Therefore, this paper proposes a real-time tracker, namely Continuous Temporal Information Fusion Tracker (CTIFTrack). The key of CTIFTrack lies in its well-designed Temporal Information Fusion (TIF) module, which cleverly performs a linear fusion of the temporal information between the <span><math><mrow><mrow><mo>(</mo><mi>t</mi><mtext>-</mtext><mn>1</mn><mo>)</mo></mrow><mtext>-th</mtext></mrow></math></span> and the <span><math><mrow><mi>t</mi><mtext>-th</mtext></mrow></math></span> frames and completes the spatiotemporal mapping. This enables the tracker to better understand the overall spatiotemporal information and contextual spatiotemporal correlations within the video, thereby having a positive impact on the tracking task. In addition, this paper also proposes the Object Template Feature Refinement (OTFR) module, which effectively captures the global information and local details of the object, and further improves the tracker’s understanding of the object features. Extensive experiments are conducted on seven benchmarks, such as LaSOT, GOT-10K, UAV123, NFS, TrackingNet, VOT2018 and OTB-100. The experimental results validate the significant contribution of the TIF module and OTFR module to the tracking task, as well as the effectiveness of CTIFTrack. It is worth noting that while maintaining excellent tracking performance, CTIFTrack also shows outstanding real-time tracking speed. On the Nvidia Tesla T4-16GB GPU, the <span><math><mrow><mi>F</mi><mi>P</mi><mi>S</mi></mrow></math></span> of CTIFTrack reaches 71.98. The code and demo materials will be available at <span><span>https://github.com/vpsg-research/CTIFTrack</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"262 \",\"pages\":\"Article 125654\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417424025211\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424025211","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
在视觉跟踪任务中,研究人员通常专注于提高模型的复杂度,或仅离散地关注物体本身的变化,以实现对运动物体的精确识别和跟踪。然而,他们往往忽视了视频级线性时空信息融合和连续时空映射对跟踪任务的重要贡献。这种疏忽可能会导致复杂场景下的跟踪性能不佳或模型的实时性不足。因此,本文提出了一种实时跟踪器,即连续时空信息融合跟踪器(CTIFTrack)。CTIFTrack 的关键在于其精心设计的时空信息融合(Temporal Information Fusion,TIF)模块,该模块巧妙地将第 (t-1)-th 帧与第 t 帧之间的时空信息进行线性融合,并完成时空映射。这样,跟踪器就能更好地理解视频中的整体时空信息和上下文时空相关性,从而对跟踪任务产生积极影响。此外,本文还提出了物体模板特征提纯(OTFR)模块,该模块能有效捕捉物体的全局信息和局部细节,进一步提高跟踪器对物体特征的理解。在 LaSOT、GOT-10K、UAV123、NFS、TrackingNet、VOT2018 和 OTB-100 等七个基准上进行了广泛的实验。实验结果验证了 TIF 模块和 OTFR 模块对跟踪任务的重要贡献,以及 CTIFTrack 的有效性。值得注意的是,在保持出色跟踪性能的同时,CTIFTrack 还表现出了出色的实时跟踪速度。在 Nvidia Tesla T4-16GB GPU 上,CTIFTrack 的 FPS 达到 71.98。代码和演示材料将在 https://github.com/vpsg-research/CTIFTrack 网站上提供。
CTIFTrack: Continuous Temporal Information Fusion for object track
In visual tracking tasks, researchers usually focus on increasing the complexity of the model or only discretely focusing on the changes in the object itself to achieve accurate recognition and tracking of the moving object. However, they often overlook the significant contribution of video-level linear temporal information fusion and continuous spatiotemporal mapping to tracking tasks. This oversight may lead to poor tracking performance or insufficient real-time ability of the model in complex scenes. Therefore, this paper proposes a real-time tracker, namely Continuous Temporal Information Fusion Tracker (CTIFTrack). The key of CTIFTrack lies in its well-designed Temporal Information Fusion (TIF) module, which cleverly performs a linear fusion of the temporal information between the and the frames and completes the spatiotemporal mapping. This enables the tracker to better understand the overall spatiotemporal information and contextual spatiotemporal correlations within the video, thereby having a positive impact on the tracking task. In addition, this paper also proposes the Object Template Feature Refinement (OTFR) module, which effectively captures the global information and local details of the object, and further improves the tracker’s understanding of the object features. Extensive experiments are conducted on seven benchmarks, such as LaSOT, GOT-10K, UAV123, NFS, TrackingNet, VOT2018 and OTB-100. The experimental results validate the significant contribution of the TIF module and OTFR module to the tracking task, as well as the effectiveness of CTIFTrack. It is worth noting that while maintaining excellent tracking performance, CTIFTrack also shows outstanding real-time tracking speed. On the Nvidia Tesla T4-16GB GPU, the of CTIFTrack reaches 71.98. The code and demo materials will be available at https://github.com/vpsg-research/CTIFTrack.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.