Robust Deep Object Tracking against Adversarial Attacks

IF 11.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Computer Vision Pub Date : 2024-09-26 DOI:10.1007/s11263-024-02226-0

Shuai Jia, Chao Ma, Yibing Song, Xiaokang Yang, Ming-Hsuan Yang

{"title":"Robust Deep Object Tracking against Adversarial Attacks","authors":"Shuai Jia, Chao Ma, Yibing Song, Xiaokang Yang, Ming-Hsuan Yang","doi":"10.1007/s11263-024-02226-0","DOIUrl":null,"url":null,"abstract":"<p>Addressing the vulnerability of deep neural networks (DNNs) has attracted significant attention in recent years. While recent studies on adversarial attack and defense mainly reside in a single image, few efforts have been made to perform temporal attacks against video sequences. As the temporal consistency between frames is not considered, existing adversarial attack approaches designed for static images do not perform well for deep object tracking. In this work, we generate adversarial examples on top of video sequences to improve the tracking robustness against adversarial attacks under white-box and black-box settings. To this end, we consider motion signals when generating lightweight perturbations over the estimated tracking results frame-by-frame. For the white-box attack, we generate temporal perturbations via known trackers to degrade significantly the tracking performance. We transfer the generated perturbations into unknown targeted trackers for the black-box attack to achieve transferring attacks. Furthermore, we train universal adversarial perturbations and directly add them into all frames of videos, improving the attack effectiveness with minor computational costs. On the other hand, we sequentially learn to estimate and remove the perturbations from input sequences to restore the tracking performance. We apply the proposed adversarial attack and defense approaches to state-of-the-art tracking algorithms. Extensive evaluations on large-scale benchmark datasets, including OTB, VOT, UAV123, and LaSOT, demonstrate that our attack method degrades the tracking performance significantly with favorable transferability to other backbones and trackers. Notably, the proposed defense method restores the original tracking performance to some extent and achieves additional performance gains when not under adversarial attacks.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"2 1","pages":""},"PeriodicalIF":11.6000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-024-02226-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Addressing the vulnerability of deep neural networks (DNNs) has attracted significant attention in recent years. While recent studies on adversarial attack and defense mainly reside in a single image, few efforts have been made to perform temporal attacks against video sequences. As the temporal consistency between frames is not considered, existing adversarial attack approaches designed for static images do not perform well for deep object tracking. In this work, we generate adversarial examples on top of video sequences to improve the tracking robustness against adversarial attacks under white-box and black-box settings. To this end, we consider motion signals when generating lightweight perturbations over the estimated tracking results frame-by-frame. For the white-box attack, we generate temporal perturbations via known trackers to degrade significantly the tracking performance. We transfer the generated perturbations into unknown targeted trackers for the black-box attack to achieve transferring attacks. Furthermore, we train universal adversarial perturbations and directly add them into all frames of videos, improving the attack effectiveness with minor computational costs. On the other hand, we sequentially learn to estimate and remove the perturbations from input sequences to restore the tracking performance. We apply the proposed adversarial attack and defense approaches to state-of-the-art tracking algorithms. Extensive evaluations on large-scale benchmark datasets, including OTB, VOT, UAV123, and LaSOT, demonstrate that our attack method degrades the tracking performance significantly with favorable transferability to other backbones and trackers. Notably, the proposed defense method restores the original tracking performance to some extent and achieves additional performance gains when not under adversarial attacks.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

对抗对抗性攻击的鲁棒深度目标跟踪

近年来，解决深度神经网络（DNN）的脆弱性问题引起了广泛关注。最近关于对抗性攻击和防御的研究主要集中在单幅图像上，而针对视频序列进行时间攻击的研究却很少。由于没有考虑帧与帧之间的时间一致性，现有的针对静态图像设计的对抗性攻击方法在深度目标跟踪方面表现不佳。在这项工作中，我们在视频序列上生成对抗示例，以提高在白盒和黑盒设置下对抗攻击的跟踪鲁棒性。为此，我们在对逐帧估计的跟踪结果生成轻量级扰动时考虑了运动信号。在白盒攻击中，我们通过已知的跟踪器产生时间扰动，从而显著降低跟踪性能。对于黑盒攻击，我们将生成的扰动转移到未知的目标跟踪器中，以实现转移攻击。此外，我们还训练了通用对抗扰动，并将其直接添加到视频的所有帧中，从而以较小的计算成本提高了攻击效果。另一方面，我们通过连续学习来估计并移除输入序列中的扰动，从而恢复跟踪性能。我们将提出的对抗性攻击和防御方法应用于最先进的跟踪算法。在大规模基准数据集（包括 OTB、VOT、UAV123 和 LaSOT）上进行的广泛评估表明，我们的攻击方法显著降低了跟踪性能，并能很好地移植到其他骨干网和跟踪器上。值得注意的是，所提出的防御方法在一定程度上恢复了原有的跟踪性能，并在未受到对抗性攻击时实现了额外的性能提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Computer Vision 工程技术-计算机：人工智能

CiteScore

29.80

自引率

2.10%

发文量

163

审稿时长

6 months

期刊介绍： The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs. Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision. Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community. Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas. In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives. The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research. Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.

期刊最新文献

CS-CoLBP: Cross-Scale Co-occurrence Local Binary Pattern for Image Classification Warping the Residuals for Image Editing with StyleGAN Pulling Target to Source: A New Perspective on Domain Adaptive Semantic Segmentation Feature Matching via Graph Clustering with Local Affine Consensus Learning to Detect Novel Species with SAM in the Wild