{"title":"A unified feature-motion consistency framework for robust image matching","authors":"Yan Zhou, Jinding Gao, Xiaoping Liu","doi":"10.1016/j.isprsjprs.2024.09.021","DOIUrl":null,"url":null,"abstract":"<div><div>Establishing reliable feature matches between a pair of images in various scenarios is a long-standing open problem in photogrammetry. Attention-based detector-free matching with coarse-to-fine architecture has been a typical pipeline to build matches, but the cross-attention module with global receptive field may compromise the structural local consistency by introducing irrelevant regions (outliers). Motion field can maintain structural local consistency under the assumption that matches for adjacent features should be spatially proximate. However, motion field can only estimate local displacements between consecutive images and struggle with long-range displacements estimation in large-scale variation scenarios without spatial correlation priors. Moreover, large-scale variations may also disrupt the geometric consistency with the application of mutual nearest neighbor criterion in patch-level matching, making it difficult to recover accurate matches. In this paper, we propose a unified feature-motion consistency framework for robust image matching (MOMA), to maintain structural consistency at both global and local granularity in scale-discrepancy scenarios. MOMA devises a motion consistency-guided dependency range strategy (MDR) in cross attention, aggregating highly relevant regions within the motion consensus-restricted neighborhood to favor true matchable regions. Meanwhile, a unified framework with hierarchical attention structure is established to couple local motion field with global feature correspondence. The motion field provides local consistency constraints in feature aggregation, while feature correspondence provides spatial context prior to improve motion field estimation. To alleviate geometric inconsistency caused by hard nearest neighbor criterion, we propose an adaptive neighbor search (soft) strategy to address scale discrepancy. Extensive experiments on three datasets demonstrate that our method outperforms solid baselines, with AUC improvements of 4.73/4.02/3.34 in two-view pose estimation task at thresholds of 5°/10°/20° on Megadepth test, and 5.94% increase of accuracy at threshold of 1px in homography task on HPatches datasets. Furthermore, in the downstream tasks such as 3D mapping, the 3D models reconstructed using our method on the self-collected SYSU UAV datasets exhibit significant improvement in structural completeness and detail richness, manifesting its high applicability in wide downstream tasks. The code is publicly available at <span><span>https://github.com/BunnyanChou/MOMA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 368-388"},"PeriodicalIF":10.6000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271624003599","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Establishing reliable feature matches between a pair of images in various scenarios is a long-standing open problem in photogrammetry. Attention-based detector-free matching with coarse-to-fine architecture has been a typical pipeline to build matches, but the cross-attention module with global receptive field may compromise the structural local consistency by introducing irrelevant regions (outliers). Motion field can maintain structural local consistency under the assumption that matches for adjacent features should be spatially proximate. However, motion field can only estimate local displacements between consecutive images and struggle with long-range displacements estimation in large-scale variation scenarios without spatial correlation priors. Moreover, large-scale variations may also disrupt the geometric consistency with the application of mutual nearest neighbor criterion in patch-level matching, making it difficult to recover accurate matches. In this paper, we propose a unified feature-motion consistency framework for robust image matching (MOMA), to maintain structural consistency at both global and local granularity in scale-discrepancy scenarios. MOMA devises a motion consistency-guided dependency range strategy (MDR) in cross attention, aggregating highly relevant regions within the motion consensus-restricted neighborhood to favor true matchable regions. Meanwhile, a unified framework with hierarchical attention structure is established to couple local motion field with global feature correspondence. The motion field provides local consistency constraints in feature aggregation, while feature correspondence provides spatial context prior to improve motion field estimation. To alleviate geometric inconsistency caused by hard nearest neighbor criterion, we propose an adaptive neighbor search (soft) strategy to address scale discrepancy. Extensive experiments on three datasets demonstrate that our method outperforms solid baselines, with AUC improvements of 4.73/4.02/3.34 in two-view pose estimation task at thresholds of 5°/10°/20° on Megadepth test, and 5.94% increase of accuracy at threshold of 1px in homography task on HPatches datasets. Furthermore, in the downstream tasks such as 3D mapping, the 3D models reconstructed using our method on the self-collected SYSU UAV datasets exhibit significant improvement in structural completeness and detail richness, manifesting its high applicability in wide downstream tasks. The code is publicly available at https://github.com/BunnyanChou/MOMA.
期刊介绍:
The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive.
P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields.
In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.