{"title":"基于多头交叉注意力变换器的高效物体追踪技术","authors":"Jiahai Dai, Huimin Li, Shan Jiang, Hongwei Yang","doi":"10.1111/exsy.13650","DOIUrl":null,"url":null,"abstract":"Object tracking is an essential component of computer vision and plays a significant role in various practical applications. Recently, transformer‐based trackers have become the predominant method for tracking due to their robustness and efficiency. However, existing transformer‐based trackers typically focus solely on the template features, neglecting the interactions between the search features and the template features during the tracking process. To address this issue, this article introduces a multi‐head cross‐attention transformer for visual tracking (MCTT), which effectively enhance the interaction between the template branch and the search branch, enabling the tracker to prioritize discriminative feature. Additionally, an auxiliary segmentation mask head has been designed to produce a pixel‐level feature representation, enhancing and tracking accuracy by predicting a set of binary masks. Comprehensive experiments have been performed on benchmark datasets, such as LaSOT, GOT‐10k, UAV123 and TrackingNet using various advanced methods, demonstrating that our approach achieves promising tracking performance. MCTT achieves an AO score of 72.8 on the GOT‐10k.","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":"28 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An efficient object tracking based on multi‐head cross‐attention transformer\",\"authors\":\"Jiahai Dai, Huimin Li, Shan Jiang, Hongwei Yang\",\"doi\":\"10.1111/exsy.13650\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Object tracking is an essential component of computer vision and plays a significant role in various practical applications. Recently, transformer‐based trackers have become the predominant method for tracking due to their robustness and efficiency. However, existing transformer‐based trackers typically focus solely on the template features, neglecting the interactions between the search features and the template features during the tracking process. To address this issue, this article introduces a multi‐head cross‐attention transformer for visual tracking (MCTT), which effectively enhance the interaction between the template branch and the search branch, enabling the tracker to prioritize discriminative feature. Additionally, an auxiliary segmentation mask head has been designed to produce a pixel‐level feature representation, enhancing and tracking accuracy by predicting a set of binary masks. Comprehensive experiments have been performed on benchmark datasets, such as LaSOT, GOT‐10k, UAV123 and TrackingNet using various advanced methods, demonstrating that our approach achieves promising tracking performance. MCTT achieves an AO score of 72.8 on the GOT‐10k.\",\"PeriodicalId\":51053,\"journal\":{\"name\":\"Expert Systems\",\"volume\":\"28 1\",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1111/exsy.13650\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1111/exsy.13650","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
An efficient object tracking based on multi‐head cross‐attention transformer
Object tracking is an essential component of computer vision and plays a significant role in various practical applications. Recently, transformer‐based trackers have become the predominant method for tracking due to their robustness and efficiency. However, existing transformer‐based trackers typically focus solely on the template features, neglecting the interactions between the search features and the template features during the tracking process. To address this issue, this article introduces a multi‐head cross‐attention transformer for visual tracking (MCTT), which effectively enhance the interaction between the template branch and the search branch, enabling the tracker to prioritize discriminative feature. Additionally, an auxiliary segmentation mask head has been designed to produce a pixel‐level feature representation, enhancing and tracking accuracy by predicting a set of binary masks. Comprehensive experiments have been performed on benchmark datasets, such as LaSOT, GOT‐10k, UAV123 and TrackingNet using various advanced methods, demonstrating that our approach achieves promising tracking performance. MCTT achieves an AO score of 72.8 on the GOT‐10k.
期刊介绍:
Expert Systems: The Journal of Knowledge Engineering publishes papers dealing with all aspects of knowledge engineering, including individual methods and techniques in knowledge acquisition and representation, and their application in the construction of systems – including expert systems – based thereon. Detailed scientific evaluation is an essential part of any paper.
As well as traditional application areas, such as Software and Requirements Engineering, Human-Computer Interaction, and Artificial Intelligence, we are aiming at the new and growing markets for these technologies, such as Business, Economy, Market Research, and Medical and Health Care. The shift towards this new focus will be marked by a series of special issues covering hot and emergent topics.