CLG-Net: Rethinking Local and Global Perception in Lightweight Two-View Correspondence Learning

IF 11.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-09-11 DOI:10.1109/TCSVT.2024.3457816
Minjun Shen;Guobao Xiao;Changcai Yang;Junwen Guo;Lei Zhu
{"title":"CLG-Net: Rethinking Local and Global Perception in Lightweight Two-View Correspondence Learning","authors":"Minjun Shen;Guobao Xiao;Changcai Yang;Junwen Guo;Lei Zhu","doi":"10.1109/TCSVT.2024.3457816","DOIUrl":null,"url":null,"abstract":"Correspondence learning aims to identify correct correspondences from the initial correspondence set and estimate camera pose between a pair of images. At present, Transformer-based methods have make notable progress in the correspondence learning task due to their powerful non-local information modeling capabilities. However, these methods seem to neglect local structures during feature aggregation from all query-key pairs, resulting in computational inefficiency and inaccurate correspondence identification. To address this issue, we propose a novel Context-aware Local and Global interaction Transformer (CLGFormer), a lightweight Transformer-based module with dual-branches that address local and global context perception in attention mechanisms. CLGFormer explores the relationship between neighborhood consistency observed in correspondences and context-aware weights appearing in vanilla attention and introduces an attention-style convolution operator. On top of that, CLGFormer also incorporates a cascaded operation that splits full features into multiple subsets and then feeds to the attention heads, which not only reduces computational costs but also enhances attention diversity. At last, we also introduce a feature recombination operate with high jointness and a lightweight channel attention module. The culmination of our efforts is the Context-aware Local and Global interaction Network (CLG-Net), which accurately estimates camera pose and identifies inliers. Through rigorous experiments, we demonstrate that our CLG-Net network outperforms existing state-of-the-art methods while exhibiting robust generalization capabilities across various scenarios. Code will be available at <uri>https://github.com/guobaoxiao/CLG</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 1","pages":"207-218"},"PeriodicalIF":11.1000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10678746/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Correspondence learning aims to identify correct correspondences from the initial correspondence set and estimate camera pose between a pair of images. At present, Transformer-based methods have make notable progress in the correspondence learning task due to their powerful non-local information modeling capabilities. However, these methods seem to neglect local structures during feature aggregation from all query-key pairs, resulting in computational inefficiency and inaccurate correspondence identification. To address this issue, we propose a novel Context-aware Local and Global interaction Transformer (CLGFormer), a lightweight Transformer-based module with dual-branches that address local and global context perception in attention mechanisms. CLGFormer explores the relationship between neighborhood consistency observed in correspondences and context-aware weights appearing in vanilla attention and introduces an attention-style convolution operator. On top of that, CLGFormer also incorporates a cascaded operation that splits full features into multiple subsets and then feeds to the attention heads, which not only reduces computational costs but also enhances attention diversity. At last, we also introduce a feature recombination operate with high jointness and a lightweight channel attention module. The culmination of our efforts is the Context-aware Local and Global interaction Network (CLG-Net), which accurately estimates camera pose and identifies inliers. Through rigorous experiments, we demonstrate that our CLG-Net network outperforms existing state-of-the-art methods while exhibiting robust generalization capabilities across various scenarios. Code will be available at https://github.com/guobaoxiao/CLG.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CLG-Net:反思轻量级双视图对应学习中的局部和全局感知
对应学习旨在从初始对应集中识别正确的对应,并估计一对图像之间的相机姿态。目前,基于transformer的方法由于其强大的非局部信息建模能力,在对应学习任务方面取得了显著进展。然而,这些方法在从所有查询键对进行特征聚合时似乎忽略了局部结构,导致计算效率低下和不准确的对应识别。为了解决这个问题,我们提出了一种新的上下文感知本地和全局交互转换器(CLGFormer),这是一种基于转换器的轻量级模块,具有双分支,可解决注意力机制中的本地和全局上下文感知。CLGFormer探讨了在通信中观察到的邻域一致性和在vanilla attention中出现的上下文感知权重之间的关系,并引入了一个attention-style卷积算子。最重要的是,CLGFormer还结合了一个级联操作,将完整的功能分成多个子集,然后馈给注意力头,这不仅减少了计算成本,还增强了注意力多样性。最后,我们还介绍了一种高联合度的特征重组操作和一个轻量级的信道关注模块。我们的努力的高潮是上下文感知本地和全局交互网络(CLG-Net),它准确地估计相机姿势和识别内线。通过严格的实验,我们证明我们的CLG-Net网络优于现有的最先进的方法,同时在各种场景中表现出强大的泛化能力。代码将在https://github.com/guobaoxiao/CLG上提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
13.80
自引率
27.40%
发文量
660
审稿时长
5 months
期刊介绍: The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.
期刊最新文献
IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information 2025 Index IEEE Transactions on Circuits and Systems for Video Technology IEEE Circuits and Systems Society Information IEEE Circuits and Systems Society Information
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1