ClusVPR: Efficient Visual Place Recognition With Clustering-Based Weighted Transformer

IEEE transactions on artificial intelligence Pub Date : 2024-12-02 DOI:10.1109/TAI.2024.3510479

Yifan Xu;Pourya Shamsolmoali;Masoume Zareapoor;Jie Yang

{"title":"ClusVPR: Efficient Visual Place Recognition With Clustering-Based Weighted Transformer","authors":"Yifan Xu;Pourya Shamsolmoali;Masoume Zareapoor;Jie Yang","doi":"10.1109/TAI.2024.3510479","DOIUrl":null,"url":null,"abstract":"Visual place recognition (VPR) is a highly challenging task that has a wide range of applications, including robot navigation and self-driving vehicles. VPR is a difficult task due to duplicate regions and insufficient attention to small objects in complex scenes, resulting in recognition deviations. In this article, we present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. Different from existing methods that rely on convolutional neural networks (CNNs) for feature map generation, ClusVPR introduces a unique paradigm called clustering-based weighted transformer network (CWTNet). CWTNet uses the power of clustering-based weighted feature maps and integrates global dependencies to effectively address visual deviations encountered in large-scale VPR problems. We also introduce the optimized-VLAD (OptLAD) layer, which significantly reduces the number of parameters and enhances model efficiency. This layer is specifically designed to aggregate the information obtained from scale-wise image patches. Additionally, our pyramid self-supervised strategy focuses on extracting representative and diverse features from scale-wise image patches rather than from entire images. This approach is essential for capturing a broader range of information required for robust VPR. Extensive experiments on four VPR datasets show our model's superior performance compared to existing models while being less complex.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 4","pages":"1038-1049"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10772618/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Visual place recognition (VPR) is a highly challenging task that has a wide range of applications, including robot navigation and self-driving vehicles. VPR is a difficult task due to duplicate regions and insufficient attention to small objects in complex scenes, resulting in recognition deviations. In this article, we present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. Different from existing methods that rely on convolutional neural networks (CNNs) for feature map generation, ClusVPR introduces a unique paradigm called clustering-based weighted transformer network (CWTNet). CWTNet uses the power of clustering-based weighted feature maps and integrates global dependencies to effectively address visual deviations encountered in large-scale VPR problems. We also introduce the optimized-VLAD (OptLAD) layer, which significantly reduces the number of parameters and enhances model efficiency. This layer is specifically designed to aggregate the information obtained from scale-wise image patches. Additionally, our pyramid self-supervised strategy focuses on extracting representative and diverse features from scale-wise image patches rather than from entire images. This approach is essential for capturing a broader range of information required for robust VPR. Extensive experiments on four VPR datasets show our model's superior performance compared to existing models while being less complex.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于聚类加权变压器的高效视觉位置识别

视觉位置识别（VPR）是一项极具挑战性的任务，具有广泛的应用，包括机器人导航和自动驾驶汽车。VPR是一项困难的任务，因为在复杂的场景中存在重复的区域和对小物体的关注不足，导致识别偏差。在本文中，我们介绍了ClusVPR，这是一种解决重复区域和小对象表示中冗余信息的特定问题的新方法。与现有的依赖卷积神经网络（cnn）生成特征图的方法不同，ClusVPR引入了一种独特的范式，称为基于聚类的加权变压器网络（CWTNet）。CWTNet利用基于聚类的加权特征映射的力量，整合全局依赖关系，有效解决大规模VPR问题中遇到的视觉偏差。我们还引入了优化的vlad （OptLAD）层，该层显著减少了参数的数量，提高了模型效率。这一层专门用于聚合从按比例的图像补丁中获得的信息。此外，我们的金字塔自监督策略侧重于从尺度图像斑块中提取具有代表性和多样性的特征，而不是从整个图像中提取。这种方法对于捕获鲁棒VPR所需的更广泛的信息至关重要。在四个VPR数据集上的大量实验表明，与现有模型相比，我们的模型性能优越，且复杂性较低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊