Zemin Geng;Bo Yang;Yingdong Pi;Zhongli Fan;Yaxin Dong;Kun Huang;Mi Wang
{"title":"MIRRIFT: Multimodal Image Rotation and Resolution Invariant Feature Transformation","authors":"Zemin Geng;Bo Yang;Yingdong Pi;Zhongli Fan;Yaxin Dong;Kun Huang;Mi Wang","doi":"10.1109/TGRS.2025.3554642","DOIUrl":null,"url":null,"abstract":"Multimodal image-matching success rates (SRs) are often low due to nonlinear radiation differences. Furthermore, when geometric transformations such as rotation and resolution exist between images, the matching SR between multimodal images decreases even further. (It is worth noting that experiments have shown the impact of scale to be relatively small; therefore, this discussion focuses only on the influence of resolution differences on multimodal image matching.) To tackle these challenges, we have enhanced the feature point extraction, description, and association processes in image matching. This has resulted in a robust multimodal image-matching framework that is invariant to rotation and resolution, with a high SR. Specifically, inspired by the concept of image pyramids, we designed a strategy for extracting feature points in multiple resolution dimensions. This enables assigning resolution dimension information to feature points and expanding the set of points to be matched. Building upon feature point extraction, we enhanced the Log-Gabor filter and designed a novel feature descriptor. This descriptor can work robustly in scenarios with modal differences and rotational variances ranging from 0° to 360°. Applying this descriptor to the matching framework helps to eliminate the influence of angle differences on feature point associations between images. Furthermore, to further improve the matching SR, we adopted a resolution dimension traversal retrieval strategy for the association of feature points. Based on this strategy, the number of correct matches (NCM) can be increased under the condition of the same feature points, thereby increasing the inlier rate of the matching results and enhancing the SR of the matching results. To evaluate the performance of this matching framework, we created a testing dataset containing 42496 pairs of images using publicly available datasets. These images cover six categories including optical, synthetic aperture radar (SAR), digital elevation model (DEM), infrared, map, and nighttime light, with three types of transformations between images: translation, rotation, and scaling. We conducted comparative experiments using the multimodal image rotation and resolution invariant feature transformation (MIRRIFT) method against five advanced multimodal feature matching methods with publicly available source code, namely, radiation-invariant feature transform (RIFT), locally normalized image feature transform (LNIFT), histogram of absolute phase consistency gradients (HAPCG), histogram of the orientation of weighted phase (HOWP), and Log-Gabor histogram descriptor (LGHD). The results demonstrate that the MIRRIFT method proposed in this article exhibits robustness to rotational, resolution, and modal differences in images. Specifically, it achieved an average SR improvement of 59%, an increase in average correct matching points by 12%, and an average matching accuracy of 1.97. The executable program and sample data will be made available at: <uri>https://github.com/Geng-Zemin/MIRRIFT</uri>","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-16"},"PeriodicalIF":8.6000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10938688/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodal image-matching success rates (SRs) are often low due to nonlinear radiation differences. Furthermore, when geometric transformations such as rotation and resolution exist between images, the matching SR between multimodal images decreases even further. (It is worth noting that experiments have shown the impact of scale to be relatively small; therefore, this discussion focuses only on the influence of resolution differences on multimodal image matching.) To tackle these challenges, we have enhanced the feature point extraction, description, and association processes in image matching. This has resulted in a robust multimodal image-matching framework that is invariant to rotation and resolution, with a high SR. Specifically, inspired by the concept of image pyramids, we designed a strategy for extracting feature points in multiple resolution dimensions. This enables assigning resolution dimension information to feature points and expanding the set of points to be matched. Building upon feature point extraction, we enhanced the Log-Gabor filter and designed a novel feature descriptor. This descriptor can work robustly in scenarios with modal differences and rotational variances ranging from 0° to 360°. Applying this descriptor to the matching framework helps to eliminate the influence of angle differences on feature point associations between images. Furthermore, to further improve the matching SR, we adopted a resolution dimension traversal retrieval strategy for the association of feature points. Based on this strategy, the number of correct matches (NCM) can be increased under the condition of the same feature points, thereby increasing the inlier rate of the matching results and enhancing the SR of the matching results. To evaluate the performance of this matching framework, we created a testing dataset containing 42496 pairs of images using publicly available datasets. These images cover six categories including optical, synthetic aperture radar (SAR), digital elevation model (DEM), infrared, map, and nighttime light, with three types of transformations between images: translation, rotation, and scaling. We conducted comparative experiments using the multimodal image rotation and resolution invariant feature transformation (MIRRIFT) method against five advanced multimodal feature matching methods with publicly available source code, namely, radiation-invariant feature transform (RIFT), locally normalized image feature transform (LNIFT), histogram of absolute phase consistency gradients (HAPCG), histogram of the orientation of weighted phase (HOWP), and Log-Gabor histogram descriptor (LGHD). The results demonstrate that the MIRRIFT method proposed in this article exhibits robustness to rotational, resolution, and modal differences in images. Specifically, it achieved an average SR improvement of 59%, an increase in average correct matching points by 12%, and an average matching accuracy of 1.97. The executable program and sample data will be made available at: https://github.com/Geng-Zemin/MIRRIFT
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.