Faxue Liu, Xuan Wang, Qiqi Chen, Jinghong Liu, Chenglong Liu
{"title":"SiamMAN:用于无人机实时跟踪的暹罗多相感知网络","authors":"Faxue Liu, Xuan Wang, Qiqi Chen, Jinghong Liu, Chenglong Liu","doi":"10.3390/drones7120707","DOIUrl":null,"url":null,"abstract":"In this paper, we address aerial tracking tasks by designing multi-phase aware networks to obtain rich long-range dependencies. For aerial tracking tasks, the existing methods are prone to tracking drift in scenarios with high demand for multi-layer long-range feature dependencies such as viewpoint change caused by the characteristics of the UAV shooting perspective, low resolution, etc. In contrast to the previous works that only used multi-scale feature fusion to obtain contextual information, we designed a new architecture to adapt the characteristics of different levels of features in challenging scenarios to adaptively integrate regional features and the corresponding global dependencies information. Specifically, for the proposed tracker (SiamMAN), we first propose a two-stage aware neck (TAN), where first a cascaded splitting encoder (CSE) is used to obtain the distributed long-range relevance among the sub-branches by the splitting of feature channels, and then a multi-level contextual decoder (MCD) is used to achieve further global dependency fusion. Finally, we design the response map context encoder (RCE) utilizing long-range contextual information in backpropagation to accomplish pixel-level updating for the deeper features and better balance the semantic and spatial information. Several experiments on well-known tracking benchmarks illustrate that the proposed method outperforms SOTA trackers, which results from the effective utilization of the proposed multi-phase aware network for different levels of features.","PeriodicalId":36448,"journal":{"name":"Drones","volume":"55 7","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SiamMAN: Siamese Multi-Phase Aware Network for Real-Time Unmanned Aerial Vehicle Tracking\",\"authors\":\"Faxue Liu, Xuan Wang, Qiqi Chen, Jinghong Liu, Chenglong Liu\",\"doi\":\"10.3390/drones7120707\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we address aerial tracking tasks by designing multi-phase aware networks to obtain rich long-range dependencies. For aerial tracking tasks, the existing methods are prone to tracking drift in scenarios with high demand for multi-layer long-range feature dependencies such as viewpoint change caused by the characteristics of the UAV shooting perspective, low resolution, etc. In contrast to the previous works that only used multi-scale feature fusion to obtain contextual information, we designed a new architecture to adapt the characteristics of different levels of features in challenging scenarios to adaptively integrate regional features and the corresponding global dependencies information. Specifically, for the proposed tracker (SiamMAN), we first propose a two-stage aware neck (TAN), where first a cascaded splitting encoder (CSE) is used to obtain the distributed long-range relevance among the sub-branches by the splitting of feature channels, and then a multi-level contextual decoder (MCD) is used to achieve further global dependency fusion. Finally, we design the response map context encoder (RCE) utilizing long-range contextual information in backpropagation to accomplish pixel-level updating for the deeper features and better balance the semantic and spatial information. Several experiments on well-known tracking benchmarks illustrate that the proposed method outperforms SOTA trackers, which results from the effective utilization of the proposed multi-phase aware network for different levels of features.\",\"PeriodicalId\":36448,\"journal\":{\"name\":\"Drones\",\"volume\":\"55 7\",\"pages\":\"\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2023-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Drones\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.3390/drones7120707\",\"RegionNum\":2,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"REMOTE SENSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Drones","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/drones7120707","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}
引用次数: 0
摘要
本文通过设计多相感知网络来获取丰富的长距离依赖关系,从而解决航拍跟踪任务。对于航拍跟踪任务,现有方法在对多层长距离特征依赖性要求较高的场景中容易出现跟踪漂移,如无人机拍摄视角特征引起的视点变化、低分辨率等。与以往仅利用多尺度特征融合获取上下文信息的研究相比,我们设计了一种新的架构,以适应挑战性场景中不同层次特征的特点,自适应地融合区域特征和相应的全局依赖信息。具体来说,对于所提出的跟踪器(SiamMAN),我们首先提出了两级感知颈(TAN),其中首先使用级联分割编码器(CSE)通过特征通道的分割获得子分支间的分布式远距离相关性,然后使用多级上下文解码器(MCD)实现进一步的全局依赖性融合。最后,我们设计了响应图上下文编码器(RCE),利用反向传播中的长距离上下文信息来完成深层特征的像素级更新,从而更好地平衡语义和空间信息。在一些著名的跟踪基准上进行的实验表明,所提出的方法优于 SOTA 跟踪器,这得益于所提出的多阶段感知网络对不同层次特征的有效利用。
In this paper, we address aerial tracking tasks by designing multi-phase aware networks to obtain rich long-range dependencies. For aerial tracking tasks, the existing methods are prone to tracking drift in scenarios with high demand for multi-layer long-range feature dependencies such as viewpoint change caused by the characteristics of the UAV shooting perspective, low resolution, etc. In contrast to the previous works that only used multi-scale feature fusion to obtain contextual information, we designed a new architecture to adapt the characteristics of different levels of features in challenging scenarios to adaptively integrate regional features and the corresponding global dependencies information. Specifically, for the proposed tracker (SiamMAN), we first propose a two-stage aware neck (TAN), where first a cascaded splitting encoder (CSE) is used to obtain the distributed long-range relevance among the sub-branches by the splitting of feature channels, and then a multi-level contextual decoder (MCD) is used to achieve further global dependency fusion. Finally, we design the response map context encoder (RCE) utilizing long-range contextual information in backpropagation to accomplish pixel-level updating for the deeper features and better balance the semantic and spatial information. Several experiments on well-known tracking benchmarks illustrate that the proposed method outperforms SOTA trackers, which results from the effective utilization of the proposed multi-phase aware network for different levels of features.