Compared with single-modal image matching, cross-modal image matching can provide more comprehensive and detailed information, which is essential for a series of visual-related tasks. However, the matching process is difficult due to differences in imaging principles, proportions, relative translation and rotation between visible and infrared images. Besides, other detection-based single-modal matching methods have low accuracy, while detection-free methods are time-consuming and fail to handle real-world scenarios. Therefore, this paper proposes CrossGlue, a light cross-modal images matching framework. The framework introduces a cross-modal message transfer (CMT) module to integrate more potential information for each keypoint through one-to-one image transfer, and a visual-gradient graph neural network (VG-GNN) to enhance visible–infrared matching in degraded scenarios. Experimental results on public datasets show that CrossGlue has excellent performance among detection-based methods and outperforms strong baseline methods in tasks such as homography estimation and relative pose estimation.
扫码关注我们
求助内容:
应助结果提醒方式:
