ClusVPR: Efficient Visual Place Recognition With Clustering-Based Weighted Transformer

Yifan Xu;Pourya Shamsolmoali;Masoume Zareapoor;Jie Yang
{"title":"ClusVPR: Efficient Visual Place Recognition With Clustering-Based Weighted Transformer","authors":"Yifan Xu;Pourya Shamsolmoali;Masoume Zareapoor;Jie Yang","doi":"10.1109/TAI.2024.3510479","DOIUrl":null,"url":null,"abstract":"Visual place recognition (VPR) is a highly challenging task that has a wide range of applications, including robot navigation and self-driving vehicles. VPR is a difficult task due to duplicate regions and insufficient attention to small objects in complex scenes, resulting in recognition deviations. In this article, we present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. Different from existing methods that rely on convolutional neural networks (CNNs) for feature map generation, ClusVPR introduces a unique paradigm called clustering-based weighted transformer network (CWTNet). CWTNet uses the power of clustering-based weighted feature maps and integrates global dependencies to effectively address visual deviations encountered in large-scale VPR problems. We also introduce the optimized-VLAD (OptLAD) layer, which significantly reduces the number of parameters and enhances model efficiency. This layer is specifically designed to aggregate the information obtained from scale-wise image patches. Additionally, our pyramid self-supervised strategy focuses on extracting representative and diverse features from scale-wise image patches rather than from entire images. This approach is essential for capturing a broader range of information required for robust VPR. Extensive experiments on four VPR datasets show our model's superior performance compared to existing models while being less complex.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 4","pages":"1038-1049"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10772618/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Visual place recognition (VPR) is a highly challenging task that has a wide range of applications, including robot navigation and self-driving vehicles. VPR is a difficult task due to duplicate regions and insufficient attention to small objects in complex scenes, resulting in recognition deviations. In this article, we present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. Different from existing methods that rely on convolutional neural networks (CNNs) for feature map generation, ClusVPR introduces a unique paradigm called clustering-based weighted transformer network (CWTNet). CWTNet uses the power of clustering-based weighted feature maps and integrates global dependencies to effectively address visual deviations encountered in large-scale VPR problems. We also introduce the optimized-VLAD (OptLAD) layer, which significantly reduces the number of parameters and enhances model efficiency. This layer is specifically designed to aggregate the information obtained from scale-wise image patches. Additionally, our pyramid self-supervised strategy focuses on extracting representative and diverse features from scale-wise image patches rather than from entire images. This approach is essential for capturing a broader range of information required for robust VPR. Extensive experiments on four VPR datasets show our model's superior performance compared to existing models while being less complex.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于聚类加权变压器的高效视觉位置识别
视觉位置识别(VPR)是一项极具挑战性的任务,具有广泛的应用,包括机器人导航和自动驾驶汽车。VPR是一项困难的任务,因为在复杂的场景中存在重复的区域和对小物体的关注不足,导致识别偏差。在本文中,我们介绍了ClusVPR,这是一种解决重复区域和小对象表示中冗余信息的特定问题的新方法。与现有的依赖卷积神经网络(cnn)生成特征图的方法不同,ClusVPR引入了一种独特的范式,称为基于聚类的加权变压器网络(CWTNet)。CWTNet利用基于聚类的加权特征映射的力量,整合全局依赖关系,有效解决大规模VPR问题中遇到的视觉偏差。我们还引入了优化的vlad (OptLAD)层,该层显著减少了参数的数量,提高了模型效率。这一层专门用于聚合从按比例的图像补丁中获得的信息。此外,我们的金字塔自监督策略侧重于从尺度图像斑块中提取具有代表性和多样性的特征,而不是从整个图像中提取。这种方法对于捕获鲁棒VPR所需的更广泛的信息至关重要。在四个VPR数据集上的大量实验表明,与现有模型相比,我们的模型性能优越,且复杂性较低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.70
自引率
0.00%
发文量
0
期刊最新文献
ICAFS: Inter-Client-Aware Feature Selection for Vertical Federated Learning. 2025 Index IEEE Transactions on Artificial Intelligence Table of Contents Front Cover IEEE Transactions on Artificial Intelligence Publication Information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1