基于区域的高分辨率Siamese网络鲁棒视觉跟踪

Proceedings of the 4th International Conference on Biomedical Signal and Image Processing Pub Date : 2019-08-13 DOI:10.1145/3354031.3354051

Chunbao Li, Bo Yang

{"title":"基于区域的高分辨率Siamese网络鲁棒视觉跟踪","authors":"Chunbao Li, Bo Yang","doi":"10.1145/3354031.3354051","DOIUrl":null,"url":null,"abstract":"Visual tracking is an active and challenging research topic in computer vision, as objects often undergo significant appearance variations caused by occlusion, deformation and background clutter. In recent years, many convolutional neural network based trackers have achieved impressive performance by integrating multi-layer features. However, in order to conduct multi-scale feature fusion, most of these trackers recover high-resolution presentations from low-resolution representations produced by a high-to-low resolution network, which tend to result in inaccurate feature maps or lose of details of the target object. In this paper, we propose an end-to-end region-based high-resolution fully convolutional Siamese network for tracking. In the tracker, we propose to extract the spatial information and semantic information of the target object using a high-resolution network that maintains rich high-resolution representations of the target object through the whole process. Furthermore, a set of position-sensitive score maps are obtained for all regions of the target template, and an adaptive weighting method is proposed to fuse score maps of multiple regions. Experimental results on the OTB50 and OTB100 benchmark datasets demonstrate that our tracker performs better than several state-of-the-art trackers while running in real-time.","PeriodicalId":286321,"journal":{"name":"Proceedings of the 4th International Conference on Biomedical Signal and Image Processing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Region-based High-resolution Siamese Network for Robust Visual Tracking\",\"authors\":\"Chunbao Li, Bo Yang\",\"doi\":\"10.1145/3354031.3354051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual tracking is an active and challenging research topic in computer vision, as objects often undergo significant appearance variations caused by occlusion, deformation and background clutter. In recent years, many convolutional neural network based trackers have achieved impressive performance by integrating multi-layer features. However, in order to conduct multi-scale feature fusion, most of these trackers recover high-resolution presentations from low-resolution representations produced by a high-to-low resolution network, which tend to result in inaccurate feature maps or lose of details of the target object. In this paper, we propose an end-to-end region-based high-resolution fully convolutional Siamese network for tracking. In the tracker, we propose to extract the spatial information and semantic information of the target object using a high-resolution network that maintains rich high-resolution representations of the target object through the whole process. Furthermore, a set of position-sensitive score maps are obtained for all regions of the target template, and an adaptive weighting method is proposed to fuse score maps of multiple regions. Experimental results on the OTB50 and OTB100 benchmark datasets demonstrate that our tracker performs better than several state-of-the-art trackers while running in real-time.\",\"PeriodicalId\":286321,\"journal\":{\"name\":\"Proceedings of the 4th International Conference on Biomedical Signal and Image Processing\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th International Conference on Biomedical Signal and Image Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3354031.3354051\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Biomedical Signal and Image Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3354031.3354051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

视觉跟踪是计算机视觉中一个活跃而富有挑战性的研究课题，由于遮挡、变形和背景杂波等原因，物体的外观往往会发生显著变化。近年来，许多基于卷积神经网络的跟踪器通过集成多层特征，取得了令人印象深刻的性能。然而，为了进行多尺度特征融合，这些跟踪器大多是从高到低分辨率网络产生的低分辨率表示中恢复高分辨率表示，这往往导致不准确的特征映射或丢失目标物体的细节。在本文中，我们提出了一个端到端基于区域的高分辨率全卷积Siamese网络用于跟踪。在跟踪器中，我们提出使用高分辨率网络提取目标物体的空间信息和语义信息，该网络在整个过程中保持目标物体丰富的高分辨率表示。在此基础上，对目标模板的所有区域获得了一组位置敏感的评分图，并提出了一种融合多区域评分图的自适应加权方法。在OTB50和OTB100基准数据集上的实验结果表明，我们的跟踪器在实时运行时的性能优于几种最先进的跟踪器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Region-based High-resolution Siamese Network for Robust Visual Tracking

Visual tracking is an active and challenging research topic in computer vision, as objects often undergo significant appearance variations caused by occlusion, deformation and background clutter. In recent years, many convolutional neural network based trackers have achieved impressive performance by integrating multi-layer features. However, in order to conduct multi-scale feature fusion, most of these trackers recover high-resolution presentations from low-resolution representations produced by a high-to-low resolution network, which tend to result in inaccurate feature maps or lose of details of the target object. In this paper, we propose an end-to-end region-based high-resolution fully convolutional Siamese network for tracking. In the tracker, we propose to extract the spatial information and semantic information of the target object using a high-resolution network that maintains rich high-resolution representations of the target object through the whole process. Furthermore, a set of position-sensitive score maps are obtained for all regions of the target template, and an adaptive weighting method is proposed to fuse score maps of multiple regions. Experimental results on the OTB50 and OTB100 benchmark datasets demonstrate that our tracker performs better than several state-of-the-art trackers while running in real-time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 4th International Conference on Biomedical Signal and Image Processing

自引率

0.00%

发文量