CaseVPR: Correlation-Aware Sequential Embedding for Sequence-to-Frame Visual Place Recognition

IF 5.3 2区计算机科学 Q2 ROBOTICS IEEE Robotics and Automation Letters Pub Date : 2025-02-13 DOI:10.1109/LRA.2025.3541452

Heshan Li;Guohao Peng;Jun Zhang;Mingxing Wen;Yingchong Ma;Danwei Wang

{"title":"CaseVPR: Correlation-Aware Sequential Embedding for Sequence-to-Frame Visual Place Recognition","authors":"Heshan Li;Guohao Peng;Jun Zhang;Mingxing Wen;Yingchong Ma;Danwei Wang","doi":"10.1109/LRA.2025.3541452","DOIUrl":null,"url":null,"abstract":"Visual Place Recognition (VPR) is crucial for autonomous vehicles, as it enables their identification of previously visited locations. Compared with conventional single-frame retrieval, leveraging sequences of frames to depict places has been proven effective in alleviating perceptual aliasing. However, mainstream sequence retrieval methods encode multiple frames into a single descriptor, relinquishing the capacity of fine-grained frame-to-frame matching. This limitation hampers the precise positioning of individual frames within the query sequence. On the other hand, sequence matching methods such as SeqSLAM are capable of frame-to-frame matching, but they rely on global brute-force search and the constant speed assumption, which may result in retrieval failures. To address the above issues, we propose a sequence-to-frame hierarchical matching pipeline for VPR, named CaseVPR. It consists of coarse-level sequence retrieval based on sequential descriptor matching to mine potential starting points, followed by fine-grained sequence matching to find frame-to-frame correspondence. Particularly, a CaseNet is proposed to encode the correlation-aware features of consecutive frames into hierarchical descriptors for sequence retrieval and matching. On this basis, an AdaptSeq-V2 searching strategy is proposed to identify frame-level correspondences of the query sequence in candidate regions determined by potential starting points. To validate our hierarchical pipeline, we evaluate CaseVPR on multiple datasets. Experiments demonstrate that our CaseVPR outperforms all benchmark methods in terms of average precision, and achieves new State-of-the-art (SOTA) results for sequence-based VPR.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 4","pages":"3430-3437"},"PeriodicalIF":5.3000,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10884025/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Visual Place Recognition (VPR) is crucial for autonomous vehicles, as it enables their identification of previously visited locations. Compared with conventional single-frame retrieval, leveraging sequences of frames to depict places has been proven effective in alleviating perceptual aliasing. However, mainstream sequence retrieval methods encode multiple frames into a single descriptor, relinquishing the capacity of fine-grained frame-to-frame matching. This limitation hampers the precise positioning of individual frames within the query sequence. On the other hand, sequence matching methods such as SeqSLAM are capable of frame-to-frame matching, but they rely on global brute-force search and the constant speed assumption, which may result in retrieval failures. To address the above issues, we propose a sequence-to-frame hierarchical matching pipeline for VPR, named CaseVPR. It consists of coarse-level sequence retrieval based on sequential descriptor matching to mine potential starting points, followed by fine-grained sequence matching to find frame-to-frame correspondence. Particularly, a CaseNet is proposed to encode the correlation-aware features of consecutive frames into hierarchical descriptors for sequence retrieval and matching. On this basis, an AdaptSeq-V2 searching strategy is proposed to identify frame-level correspondences of the query sequence in candidate regions determined by potential starting points. To validate our hierarchical pipeline, we evaluate CaseVPR on multiple datasets. Experiments demonstrate that our CaseVPR outperforms all benchmark methods in terms of average precision, and achieves new State-of-the-art (SOTA) results for sequence-based VPR.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于关联感知的序列到帧视觉位置识别方法

视觉地点识别（VPR）对自动驾驶汽车来说至关重要，因为它能让自动驾驶汽车识别以前到过的地点。与传统的单帧检索相比，利用帧序列来描述地点已被证明能有效减轻知觉混叠。然而，主流的序列检索方法将多个帧编码为一个描述符，从而放弃了细粒度帧对帧匹配的能力。这种限制妨碍了查询序列中单个帧的精确定位。另一方面，序列匹配方法（如 SeqSLAM）能够进行帧对帧匹配，但它们依赖于全局暴力搜索和恒定速度假设，这可能会导致检索失败。为解决上述问题，我们提出了一种用于 VPR 的序列到帧分层匹配管道，命名为 CaseVPR。它包括基于序列描述符匹配的粗级序列检索，以挖掘潜在的起点，然后通过细粒度序列匹配找到帧与帧之间的对应关系。特别是，建议使用 CaseNet 将连续帧的相关感知特征编码成分层描述符，以便进行序列检索和匹配。在此基础上，我们提出了 AdaptSeq-V2 搜索策略，在由潜在起点确定的候选区域中识别查询序列的帧级对应关系。为了验证我们的分层管道，我们在多个数据集上对 CaseVPR 进行了评估。实验证明，我们的 CaseVPR 在平均精确度方面优于所有基准方法，并为基于序列的 VPR 取得了新的最先进（SOTA）结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.