StructVPR++: Distill Structural and Semantic Knowledge With Weighting Samples for Visual Place Recognition

Yanqing Shen;Sanping Zhou;Jingwen Fu;Ruotong Wang;Shitao Chen;Nanning Zheng
{"title":"StructVPR++: Distill Structural and Semantic Knowledge With Weighting Samples for Visual Place Recognition","authors":"Yanqing Shen;Sanping Zhou;Jingwen Fu;Ruotong Wang;Shitao Chen;Nanning Zheng","doi":"10.1109/TPAMI.2025.3556859","DOIUrl":null,"url":null,"abstract":"Visual place recognition is a challenging task for autonomous driving and robotics, which is usually considered as an image retrieval problem. A commonly used two-stage strategy involves global retrieval followed by re-ranking using patch-level descriptors. Most deep learning-based methods in an end-to-end manner cannot extract global features with sufficient semantic information from RGB images. In contrast, re-ranking can utilize more explicit structural and semantic information in one-to-one matching process, but it is time-consuming. To bridge the gap between global retrieval and re-ranking and achieve a good trade-off between accuracy and efficiency, we propose StructVPR++, a framework that embeds structural and semantic knowledge into RGB global representations via segmentation-guided distillation. Our key innovation lies in decoupling label-specific features from global descriptors, enabling explicit semantic alignment between image pairs without requiring segmentation during deployment. Furthermore, we introduce a sample-wise weighted distillation strategy that prioritizes reliable training pairs while suppressing noisy ones. Experiments on four benchmarks demonstrate that StructVPR++ surpasses state-of-the-art global methods by 5-23% in Recall@1 and even outperforms many two-stage approaches, achieving real-time efficiency with a single RGB input.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 8","pages":"6338-6351"},"PeriodicalIF":18.6000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10946858/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Visual place recognition is a challenging task for autonomous driving and robotics, which is usually considered as an image retrieval problem. A commonly used two-stage strategy involves global retrieval followed by re-ranking using patch-level descriptors. Most deep learning-based methods in an end-to-end manner cannot extract global features with sufficient semantic information from RGB images. In contrast, re-ranking can utilize more explicit structural and semantic information in one-to-one matching process, but it is time-consuming. To bridge the gap between global retrieval and re-ranking and achieve a good trade-off between accuracy and efficiency, we propose StructVPR++, a framework that embeds structural and semantic knowledge into RGB global representations via segmentation-guided distillation. Our key innovation lies in decoupling label-specific features from global descriptors, enabling explicit semantic alignment between image pairs without requiring segmentation during deployment. Furthermore, we introduce a sample-wise weighted distillation strategy that prioritizes reliable training pairs while suppressing noisy ones. Experiments on four benchmarks demonstrate that StructVPR++ surpasses state-of-the-art global methods by 5-23% in Recall@1 and even outperforms many two-stage approaches, achieving real-time efficiency with a single RGB input.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
StructVPR++:基于加权样本提取视觉位置识别的结构和语义知识
视觉位置识别通常被认为是一个图像检索问题,对于自动驾驶和机器人来说是一个具有挑战性的任务。常用的两阶段策略包括全局检索,然后使用补丁级描述符重新排序。大多数基于深度学习的端到端方法无法从RGB图像中提取具有足够语义信息的全局特征。相比之下,在一对一匹配过程中,重新排序可以利用更明确的结构和语义信息,但耗时长。为了弥合全局检索和重新排序之间的差距,实现准确性和效率之间的良好权衡,我们提出了structvpr++框架,该框架通过分割引导蒸馏将结构和语义知识嵌入到RGB全局表示中。我们的关键创新在于从全局描述符中解耦特定于标签的特征,使图像对之间的显式语义对齐无需在部署期间进行分割。此外,我们引入了一种样本加权蒸馏策略,该策略优先考虑可靠的训练对,同时抑制有噪声的训练对。在四个基准测试上的实验表明,structvp++在Recall@1上比最先进的全球方法高出5-23%,甚至优于许多两阶段方法,通过单个RGB输入实现实时效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Spike Camera Optical Flow Estimation Based on Continuous Spike Streams. Bi-C2R: Bidirectional Continual Compatible Representation for Re-Indexing Free Lifelong Person Re-Identification. Modality Equilibrium Matters: Minor-Modality-Aware Adaptive Alternating for Cross-Modal Memory Enhancement. Principled Multimodal Representation Learning. Class-Distribution-Aware Pseudo-Labeling for Semi-Supervised Multi-Label Learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1