MIRRIFT: Multimodal Image Rotation and Resolution Invariant Feature Transformation

IF 8.6 1区地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Geoscience and Remote Sensing Pub Date : 2025-03-25 DOI:10.1109/TGRS.2025.3554642

Zemin Geng;Bo Yang;Yingdong Pi;Zhongli Fan;Yaxin Dong;Kun Huang;Mi Wang

{"title":"MIRRIFT: Multimodal Image Rotation and Resolution Invariant Feature Transformation","authors":"Zemin Geng;Bo Yang;Yingdong Pi;Zhongli Fan;Yaxin Dong;Kun Huang;Mi Wang","doi":"10.1109/TGRS.2025.3554642","DOIUrl":null,"url":null,"abstract":"Multimodal image-matching success rates (SRs) are often low due to nonlinear radiation differences. Furthermore, when geometric transformations such as rotation and resolution exist between images, the matching SR between multimodal images decreases even further. (It is worth noting that experiments have shown the impact of scale to be relatively small; therefore, this discussion focuses only on the influence of resolution differences on multimodal image matching.) To tackle these challenges, we have enhanced the feature point extraction, description, and association processes in image matching. This has resulted in a robust multimodal image-matching framework that is invariant to rotation and resolution, with a high SR. Specifically, inspired by the concept of image pyramids, we designed a strategy for extracting feature points in multiple resolution dimensions. This enables assigning resolution dimension information to feature points and expanding the set of points to be matched. Building upon feature point extraction, we enhanced the Log-Gabor filter and designed a novel feature descriptor. This descriptor can work robustly in scenarios with modal differences and rotational variances ranging from 0° to 360°. Applying this descriptor to the matching framework helps to eliminate the influence of angle differences on feature point associations between images. Furthermore, to further improve the matching SR, we adopted a resolution dimension traversal retrieval strategy for the association of feature points. Based on this strategy, the number of correct matches (NCM) can be increased under the condition of the same feature points, thereby increasing the inlier rate of the matching results and enhancing the SR of the matching results. To evaluate the performance of this matching framework, we created a testing dataset containing 42496 pairs of images using publicly available datasets. These images cover six categories including optical, synthetic aperture radar (SAR), digital elevation model (DEM), infrared, map, and nighttime light, with three types of transformations between images: translation, rotation, and scaling. We conducted comparative experiments using the multimodal image rotation and resolution invariant feature transformation (MIRRIFT) method against five advanced multimodal feature matching methods with publicly available source code, namely, radiation-invariant feature transform (RIFT), locally normalized image feature transform (LNIFT), histogram of absolute phase consistency gradients (HAPCG), histogram of the orientation of weighted phase (HOWP), and Log-Gabor histogram descriptor (LGHD). The results demonstrate that the MIRRIFT method proposed in this article exhibits robustness to rotational, resolution, and modal differences in images. Specifically, it achieved an average SR improvement of 59%, an increase in average correct matching points by 12%, and an average matching accuracy of 1.97. The executable program and sample data will be made available at: <uri>https://github.com/Geng-Zemin/MIRRIFT</uri>","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-16"},"PeriodicalIF":8.6000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10938688/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Multimodal image-matching success rates (SRs) are often low due to nonlinear radiation differences. Furthermore, when geometric transformations such as rotation and resolution exist between images, the matching SR between multimodal images decreases even further. (It is worth noting that experiments have shown the impact of scale to be relatively small; therefore, this discussion focuses only on the influence of resolution differences on multimodal image matching.) To tackle these challenges, we have enhanced the feature point extraction, description, and association processes in image matching. This has resulted in a robust multimodal image-matching framework that is invariant to rotation and resolution, with a high SR. Specifically, inspired by the concept of image pyramids, we designed a strategy for extracting feature points in multiple resolution dimensions. This enables assigning resolution dimension information to feature points and expanding the set of points to be matched. Building upon feature point extraction, we enhanced the Log-Gabor filter and designed a novel feature descriptor. This descriptor can work robustly in scenarios with modal differences and rotational variances ranging from 0° to 360°. Applying this descriptor to the matching framework helps to eliminate the influence of angle differences on feature point associations between images. Furthermore, to further improve the matching SR, we adopted a resolution dimension traversal retrieval strategy for the association of feature points. Based on this strategy, the number of correct matches (NCM) can be increased under the condition of the same feature points, thereby increasing the inlier rate of the matching results and enhancing the SR of the matching results. To evaluate the performance of this matching framework, we created a testing dataset containing 42496 pairs of images using publicly available datasets. These images cover six categories including optical, synthetic aperture radar (SAR), digital elevation model (DEM), infrared, map, and nighttime light, with three types of transformations between images: translation, rotation, and scaling. We conducted comparative experiments using the multimodal image rotation and resolution invariant feature transformation (MIRRIFT) method against five advanced multimodal feature matching methods with publicly available source code, namely, radiation-invariant feature transform (RIFT), locally normalized image feature transform (LNIFT), histogram of absolute phase consistency gradients (HAPCG), histogram of the orientation of weighted phase (HOWP), and Log-Gabor histogram descriptor (LGHD). The results demonstrate that the MIRRIFT method proposed in this article exhibits robustness to rotational, resolution, and modal differences in images. Specifically, it achieved an average SR improvement of 59%, an increase in average correct matching points by 12%, and an average matching accuracy of 1.97. The executable program and sample data will be made available at: https://github.com/Geng-Zemin/MIRRIFT

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MIRRIFT：多模态图像旋转和分辨率不变特征变换

由于非线性辐射差异，多模态图像匹配成功率往往较低。此外，当图像之间存在旋转和分辨率等几何变换时，多模态图像之间的匹配SR进一步降低。(值得注意的是，实验表明，规模的影响相对较小；因此，本文仅关注分辨率差异对多模态图像匹配的影响。为了解决这些问题，我们对图像匹配中的特征点提取、描述和关联过程进行了改进。这产生了一个鲁棒的多模态图像匹配框架，该框架不受旋转和分辨率的影响，具有高sr。具体而言，受图像金字塔概念的启发，我们设计了一种提取多分辨率维度特征点的策略。这允许为特征点分配分辨率维度信息并扩展要匹配的点集。在特征点提取的基础上，对Log-Gabor滤波器进行了改进，设计了一种新的特征描述符。该描述符可以在模态差异和旋转方差范围从0°到360°的情况下稳健地工作。将此描述符应用于匹配框架有助于消除角度差异对图像间特征点关联的影响。此外，为了进一步提高匹配SR，我们对特征点的关联采用了分辨率维度遍历检索策略。基于该策略，在相同特征点的情况下，可以增加正确匹配的次数，从而提高匹配结果的内嵌率，增强匹配结果的SR。为了评估这个匹配框架的性能，我们使用公开可用的数据集创建了一个包含42496对图像的测试数据集。这些图像涵盖光学、合成孔径雷达（SAR）、数字高程模型（DEM）、红外、地图和夜间灯光等6个类别，图像之间的转换有平移、旋转和缩放三种类型。我们使用多模态图像旋转和分辨率不变特征变换（MIRRIFT）方法与五种先进的多模态特征匹配方法进行了对比实验，这些方法分别是辐射不变特征变换（RIFT）、局部归一化图像特征变换（lift）、绝对相位一致性梯度直方图（HAPCG）、加权相位方向直方图（HOWP）和Log-Gabor直方图描述符（LGHD）。结果表明，本文提出的MIRRIFT方法对图像的旋转、分辨率和模态差异具有鲁棒性。具体而言，该算法的平均匹配准确率提高了59%，平均正确匹配点提高了12%，平均匹配准确率为1.97。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理

CiteScore

11.50

自引率

28.00%

发文量

1912

审稿时长

4.0 months

期刊介绍： IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.