VRTNet: Vector Rectifier Transformer for Two-View Correspondence Learning

IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Multimedia Pub Date : 2024-12-23 DOI:10.1109/TMM.2024.3521696
Meng Yang;Jun Chen;Xin Tian;Longsheng Wei;Jiayi Ma
{"title":"VRTNet: Vector Rectifier Transformer for Two-View Correspondence Learning","authors":"Meng Yang;Jun Chen;Xin Tian;Longsheng Wei;Jiayi Ma","doi":"10.1109/TMM.2024.3521696","DOIUrl":null,"url":null,"abstract":"Finding reliable correspondences in two-view image and recovering the camera poses are key problems in photogrammetry and image signal processing. Multilayer perceptron (MLP) has a wide application in two-view correspondence learning for which is good at learning disordered sparse correspondences, but it is susceptible to the dominant outliers and requires additional functional blocks to capture context information. CNN can naturally extract local context information, but it cannot handle disordered data and extract global context and channel information. In order to overcome the shortcomings of MLP and CNN, we design a correspondence learning network based on Transformer, named Vector Rectifier Transformer (VRTNet). Transformer is an encoder-decoder structure which can handle disordered sparse correspondences and output sequences of arbitrary length. Therefore, we design two sub-Transformers in VRTNet to achieve the mutual conversion between disordered and ordered correspondences. The self-attention and cross-attention mechanisms in them allow VRTNet to focus on the global context relations of all correspondences. To capture local context and channel information, we propose rectifier network (including CNN and channel attention block) as the backbone of VRTNet, which avoids the complex design of additional blocks. Rectifier network can correct the errors of ordered correspondences to obtain rectified correspondences. Finally, outliers are removed by comparing original and rectified correspondences. VRTNet performs better than the state-of-the-art methods in the tasks of relative pose estimation, outlier removal and image registration.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"515-530"},"PeriodicalIF":9.7000,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10812827/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Finding reliable correspondences in two-view image and recovering the camera poses are key problems in photogrammetry and image signal processing. Multilayer perceptron (MLP) has a wide application in two-view correspondence learning for which is good at learning disordered sparse correspondences, but it is susceptible to the dominant outliers and requires additional functional blocks to capture context information. CNN can naturally extract local context information, but it cannot handle disordered data and extract global context and channel information. In order to overcome the shortcomings of MLP and CNN, we design a correspondence learning network based on Transformer, named Vector Rectifier Transformer (VRTNet). Transformer is an encoder-decoder structure which can handle disordered sparse correspondences and output sequences of arbitrary length. Therefore, we design two sub-Transformers in VRTNet to achieve the mutual conversion between disordered and ordered correspondences. The self-attention and cross-attention mechanisms in them allow VRTNet to focus on the global context relations of all correspondences. To capture local context and channel information, we propose rectifier network (including CNN and channel attention block) as the backbone of VRTNet, which avoids the complex design of additional blocks. Rectifier network can correct the errors of ordered correspondences to obtain rectified correspondences. Finally, outliers are removed by comparing original and rectified correspondences. VRTNet performs better than the state-of-the-art methods in the tasks of relative pose estimation, outlier removal and image registration.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于二视图对应学习的矢量整流变压器
在双视图图像中寻找可靠的对应关系并恢复相机姿态是摄影测量和图像信号处理中的关键问题。多层感知器(MLP)在双视图对应学习中有着广泛的应用,它擅长无序稀疏对应的学习,但容易受到优势异常值的影响,并且需要额外的功能块来捕获上下文信息。CNN可以自然地提取局部上下文信息,但不能处理无序数据,提取全局上下文和频道信息。为了克服MLP和CNN的不足,我们设计了一个基于Transformer的对应学习网络,命名为矢量整流变压器(VRTNet)。变压器是一种可以处理无序稀疏对应和任意长度输出序列的编码器-解码器结构。因此,我们在VRTNet中设计了两个子变压器来实现无序对应和有序对应的相互转换。其中的自注意和交叉注意机制允许VRTNet关注所有通信的全局上下文关系。为了捕获本地上下文和频道信息,我们提出整流网络(包括CNN和频道注意块)作为VRTNet的主干,避免了额外块的复杂设计。整流网络可以对有序通信的误差进行校正,得到整流通信。最后,通过比较原始和校正对应来去除异常值。VRTNet在相对姿态估计、离群值去除和图像配准等方面的性能优于当前最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia 工程技术-电信学
CiteScore
11.70
自引率
11.00%
发文量
576
审稿时长
5.5 months
期刊介绍: The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.
期刊最新文献
Screen Detection from Egocentric Image Streams Leveraging Multi-View Vision Language Model. HMS2Net: Heterogeneous Multimodal State Space Network via CLIP for Dynamic Scene Classification in Livestreaming 2025 Reviewers List Long-Tailed Continual Learning for Visual Food Recognition SSPD: Spatial-Spectral Prior Decoupling Model for Spectral Snapshot Compressive Imaging
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1