{"title":"Object tracking based on temporal and spatial context information","authors":"Yan Chen, Tao Lin, Jixiang Du, Hongbo Zhang","doi":"10.1016/j.imavis.2025.105488","DOIUrl":null,"url":null,"abstract":"<div><div>Currently, numerous advanced trackers improve stability by optimizing the target visual appearance models or by improving interactions between templates and search areas. Despite these advancements, appearance-based trackers still primarily depend on the visual information of targets without adequately integrating spatio-temporal context information, thus limiting their effectiveness in handling similar objects around the target. To address this challenge, a novel object tracking method, TSCTrack, which leverages spatio-temporal context information, has been introduced. TSCTrack overcomes the shortcomings of traditional center-cropping preprocessing techniques by introducing Global Spatial Position Embedding, effectively preserving spatial information and capturing motion data of targets. Additionally, TSCTrack incorporates a Spatial Relationship Aggregation module and a Temporal Relationship Aggregation module—the former captures static spatial context information per frame, while the latter integrates dynamic temporal context information. This sophisticated integration allows the Dynamic Tracking Prediction module to generate precise target coordinates effectively, greatly reducing the impact of target deformations and scale changes on tracking performance. Demonstrated across multiple public tracking datasets including LaSOT, TrackingNet, UAV123, GOT-10k, and OTB, TSCTrack showcases superior performance and validates its exceptional tracking capabilities in diverse scenarios.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105488"},"PeriodicalIF":4.2000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625000769","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Currently, numerous advanced trackers improve stability by optimizing the target visual appearance models or by improving interactions between templates and search areas. Despite these advancements, appearance-based trackers still primarily depend on the visual information of targets without adequately integrating spatio-temporal context information, thus limiting their effectiveness in handling similar objects around the target. To address this challenge, a novel object tracking method, TSCTrack, which leverages spatio-temporal context information, has been introduced. TSCTrack overcomes the shortcomings of traditional center-cropping preprocessing techniques by introducing Global Spatial Position Embedding, effectively preserving spatial information and capturing motion data of targets. Additionally, TSCTrack incorporates a Spatial Relationship Aggregation module and a Temporal Relationship Aggregation module—the former captures static spatial context information per frame, while the latter integrates dynamic temporal context information. This sophisticated integration allows the Dynamic Tracking Prediction module to generate precise target coordinates effectively, greatly reducing the impact of target deformations and scale changes on tracking performance. Demonstrated across multiple public tracking datasets including LaSOT, TrackingNet, UAV123, GOT-10k, and OTB, TSCTrack showcases superior performance and validates its exceptional tracking capabilities in diverse scenarios.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.