{"title":"基于区域的高分辨率Siamese网络鲁棒视觉跟踪","authors":"Chunbao Li, Bo Yang","doi":"10.1145/3354031.3354051","DOIUrl":null,"url":null,"abstract":"Visual tracking is an active and challenging research topic in computer vision, as objects often undergo significant appearance variations caused by occlusion, deformation and background clutter. In recent years, many convolutional neural network based trackers have achieved impressive performance by integrating multi-layer features. However, in order to conduct multi-scale feature fusion, most of these trackers recover high-resolution presentations from low-resolution representations produced by a high-to-low resolution network, which tend to result in inaccurate feature maps or lose of details of the target object. In this paper, we propose an end-to-end region-based high-resolution fully convolutional Siamese network for tracking. In the tracker, we propose to extract the spatial information and semantic information of the target object using a high-resolution network that maintains rich high-resolution representations of the target object through the whole process. Furthermore, a set of position-sensitive score maps are obtained for all regions of the target template, and an adaptive weighting method is proposed to fuse score maps of multiple regions. Experimental results on the OTB50 and OTB100 benchmark datasets demonstrate that our tracker performs better than several state-of-the-art trackers while running in real-time.","PeriodicalId":286321,"journal":{"name":"Proceedings of the 4th International Conference on Biomedical Signal and Image Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Region-based High-resolution Siamese Network for Robust Visual Tracking\",\"authors\":\"Chunbao Li, Bo Yang\",\"doi\":\"10.1145/3354031.3354051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual tracking is an active and challenging research topic in computer vision, as objects often undergo significant appearance variations caused by occlusion, deformation and background clutter. In recent years, many convolutional neural network based trackers have achieved impressive performance by integrating multi-layer features. However, in order to conduct multi-scale feature fusion, most of these trackers recover high-resolution presentations from low-resolution representations produced by a high-to-low resolution network, which tend to result in inaccurate feature maps or lose of details of the target object. In this paper, we propose an end-to-end region-based high-resolution fully convolutional Siamese network for tracking. In the tracker, we propose to extract the spatial information and semantic information of the target object using a high-resolution network that maintains rich high-resolution representations of the target object through the whole process. Furthermore, a set of position-sensitive score maps are obtained for all regions of the target template, and an adaptive weighting method is proposed to fuse score maps of multiple regions. Experimental results on the OTB50 and OTB100 benchmark datasets demonstrate that our tracker performs better than several state-of-the-art trackers while running in real-time.\",\"PeriodicalId\":286321,\"journal\":{\"name\":\"Proceedings of the 4th International Conference on Biomedical Signal and Image Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th International Conference on Biomedical Signal and Image Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3354031.3354051\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Biomedical Signal and Image Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3354031.3354051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Region-based High-resolution Siamese Network for Robust Visual Tracking
Visual tracking is an active and challenging research topic in computer vision, as objects often undergo significant appearance variations caused by occlusion, deformation and background clutter. In recent years, many convolutional neural network based trackers have achieved impressive performance by integrating multi-layer features. However, in order to conduct multi-scale feature fusion, most of these trackers recover high-resolution presentations from low-resolution representations produced by a high-to-low resolution network, which tend to result in inaccurate feature maps or lose of details of the target object. In this paper, we propose an end-to-end region-based high-resolution fully convolutional Siamese network for tracking. In the tracker, we propose to extract the spatial information and semantic information of the target object using a high-resolution network that maintains rich high-resolution representations of the target object through the whole process. Furthermore, a set of position-sensitive score maps are obtained for all regions of the target template, and an adaptive weighting method is proposed to fuse score maps of multiple regions. Experimental results on the OTB50 and OTB100 benchmark datasets demonstrate that our tracker performs better than several state-of-the-art trackers while running in real-time.