Towards Frame Rate Agnostic Multi-object Tracking

IF 11.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Computer Vision Pub Date : 2023-11-20 DOI:10.1007/s11263-023-01943-2

Weitao Feng, Lei Bai, Yongqiang Yao, Fengwei Yu, Wanli Ouyang

{"title":"Towards Frame Rate Agnostic Multi-object Tracking","authors":"Weitao Feng, Lei Bai, Yongqiang Yao, Fengwei Yu, Wanli Ouyang","doi":"10.1007/s11263-023-01943-2","DOIUrl":null,"url":null,"abstract":"<p>Multi-object Tracking (MOT) is one of the most fundamental computer vision tasks that contributes to various video analysis applications. Despite the recent promising progress, current MOT research is still limited to a fixed sampling frame rate of the input stream. They are neither as flexible as humans nor well-matched to industrial scenarios which require the trackers to be frame rate insensitive in complicated conditions. In fact, we empirically found that the accuracy of all recent state-of-the-art trackers drops dramatically when the input frame rate changes. For a more intelligent tracking solution, we shift the attention of our research work to the problem of Frame Rate Agnostic MOT (FraMOT), which takes frame rate insensitivity into consideration. In this paper, we propose a Frame Rate Agnostic MOT framework with a Periodic training Scheme (FAPS) to tackle the FraMOT problem for the first time. Specifically, we propose a Frame Rate Agnostic Association Module (FAAM) that infers and encodes the frame rate information to aid identity matching across multi-frame-rate inputs, improving the capability of the learned model in handling complex motion-appearance relations in FraMOT. Moreover, the association gap between training and inference is enlarged in FraMOT because those post-processing steps not included in training make a larger difference in lower frame rate scenarios. To address it, we propose Periodic Training Scheme to reflect all post-processing steps in training via tracking pattern matching and fusion. Along with the proposed approaches, we make the first attempt to establish an evaluation method for this new task of FraMOT. Besides providing simulations and evaluation metrics, we try to solve new challenges in two different modes, i.e., known frame rate and unknown frame rate, aiming to handle a more complex situation. The quantitative experiments on the challenging MOT17/20 dataset (FraMOT version) have clearly demonstrated that the proposed approaches can handle different frame rates better and thus improve the robustness against complicated scenarios.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"29 21","pages":""},"PeriodicalIF":11.6000,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-023-01943-2","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 2

Abstract

Multi-object Tracking (MOT) is one of the most fundamental computer vision tasks that contributes to various video analysis applications. Despite the recent promising progress, current MOT research is still limited to a fixed sampling frame rate of the input stream. They are neither as flexible as humans nor well-matched to industrial scenarios which require the trackers to be frame rate insensitive in complicated conditions. In fact, we empirically found that the accuracy of all recent state-of-the-art trackers drops dramatically when the input frame rate changes. For a more intelligent tracking solution, we shift the attention of our research work to the problem of Frame Rate Agnostic MOT (FraMOT), which takes frame rate insensitivity into consideration. In this paper, we propose a Frame Rate Agnostic MOT framework with a Periodic training Scheme (FAPS) to tackle the FraMOT problem for the first time. Specifically, we propose a Frame Rate Agnostic Association Module (FAAM) that infers and encodes the frame rate information to aid identity matching across multi-frame-rate inputs, improving the capability of the learned model in handling complex motion-appearance relations in FraMOT. Moreover, the association gap between training and inference is enlarged in FraMOT because those post-processing steps not included in training make a larger difference in lower frame rate scenarios. To address it, we propose Periodic Training Scheme to reflect all post-processing steps in training via tracking pattern matching and fusion. Along with the proposed approaches, we make the first attempt to establish an evaluation method for this new task of FraMOT. Besides providing simulations and evaluation metrics, we try to solve new challenges in two different modes, i.e., known frame rate and unknown frame rate, aiming to handle a more complex situation. The quantitative experiments on the challenging MOT17/20 dataset (FraMOT version) have clearly demonstrated that the proposed approaches can handle different frame rates better and thus improve the robustness against complicated scenarios.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

面向帧率不可知的多目标跟踪

多目标跟踪(MOT)是最基本的计算机视觉任务之一，有助于各种视频分析应用。尽管近年来取得了可喜的进展，但目前的MOT研究仍然局限于输入流的固定采样帧率。它们既不像人类那样灵活，也不适合工业场景，因为工业场景要求跟踪器在复杂条件下对帧速率不敏感。事实上，我们根据经验发现，当输入帧率发生变化时，所有最新的最先进的跟踪器的准确性都会急剧下降。为了实现更智能的跟踪解决方案，我们将研究工作的重点转移到考虑帧率不敏感的帧率不可知MOT (FraMOT)问题上。在本文中，我们首次提出了一种带有周期性训练方案(FAPS)的帧率不可知MOT框架来解决帧率不可知问题。具体来说，我们提出了一个帧率不可知关联模块(FAAM)，它可以推断和编码帧率信息，以帮助跨多帧率输入的身份匹配，从而提高学习模型在FraMOT中处理复杂运动-外观关系的能力。此外，在FraMOT中，训练和推理之间的关联差距被扩大，因为那些不包括在训练中的后处理步骤在低帧率场景下会产生更大的差异。为了解决这一问题，我们提出了周期性训练方案，通过跟踪模式匹配和融合来反映训练过程中的所有后处理步骤。结合所提出的方法，我们首次尝试建立一种评价FraMOT新任务的方法。除了提供仿真和评估指标外，我们还尝试在已知帧率和未知帧率两种不同的模式下解决新的挑战，旨在处理更复杂的情况。在具有挑战性的MOT17/20数据集(FraMOT版本)上的定量实验清楚地表明，所提出的方法可以更好地处理不同的帧率，从而提高了对复杂场景的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Computer Vision 工程技术-计算机：人工智能

CiteScore

29.80

自引率

2.10%

发文量

163

审稿时长

6 months

期刊介绍： The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs. Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision. Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community. Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas. In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives. The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research. Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.

期刊最新文献

Reliable Evaluation of Attribution Maps in CNNs: A Perturbation-Based Approach Transformer for Object Re-identification: A Survey One-Shot Generative Domain Adaptation in 3D GANs NAFT and SynthStab: A RAFT-Based Network and a Synthetic Dataset for Digital Video Stabilization CS-CoLBP: Cross-Scale Co-occurrence Local Binary Pattern for Image Classification