{"title":"Correlation Verification for Image Retrieval and Its Memory Footprint Optimization","authors":"Seongwon Lee;Hongje Seong;Suhyeon Lee;Euntai Kim","doi":"10.1109/TPAMI.2024.3504274","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a novel image retrieval network named Correlation Verification Network (CVNet) to replace the conventional geometric re-ranking with a 4D convolutional neural network that learns diverse geometric matching possibilities. To enable efficient cross-scale matching, we construct feature pyramids and establish cross-scale feature correlations in a single inference, thereby replacing the costly multi-scale inference. Additionally, we employ curriculum learning with the Hide-and-Seek strategy to handle challenging samples. Our proposed CVNet demonstrates state-of-the-art performance on several image retrieval benchmarks by a large margin. From an implementation perspective, however, CVNet has one drawback: it requires high memory usage because it needs to store dense features of all database images. This high memory requirement can be a significant limitation in practical applications. To address this issue, we introduce an extension of CVNet called Dense-to-Sparse CVNet (CVNet<inline-formula><tex-math>$^{DS}$</tex-math></inline-formula>), which can significantly reduce memory usage by sparsifying the features of the database images. The sparsification module in CVNet<inline-formula><tex-math>$^{DS}$</tex-math></inline-formula> learns to select the relevant parts of image features end-to-end using a Gumbel estimator. Since the sparsification is performed offline, CVNet<inline-formula><tex-math>$^{DS}$</tex-math></inline-formula> does not increase online extraction and matching times. CVNet<inline-formula><tex-math>$^{DS}$</tex-math></inline-formula> dramatically reduces the memory footprint while preserving performance levels nearly identical to CVNet.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"1514-1529"},"PeriodicalIF":18.6000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10759842/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we propose a novel image retrieval network named Correlation Verification Network (CVNet) to replace the conventional geometric re-ranking with a 4D convolutional neural network that learns diverse geometric matching possibilities. To enable efficient cross-scale matching, we construct feature pyramids and establish cross-scale feature correlations in a single inference, thereby replacing the costly multi-scale inference. Additionally, we employ curriculum learning with the Hide-and-Seek strategy to handle challenging samples. Our proposed CVNet demonstrates state-of-the-art performance on several image retrieval benchmarks by a large margin. From an implementation perspective, however, CVNet has one drawback: it requires high memory usage because it needs to store dense features of all database images. This high memory requirement can be a significant limitation in practical applications. To address this issue, we introduce an extension of CVNet called Dense-to-Sparse CVNet (CVNet$^{DS}$), which can significantly reduce memory usage by sparsifying the features of the database images. The sparsification module in CVNet$^{DS}$ learns to select the relevant parts of image features end-to-end using a Gumbel estimator. Since the sparsification is performed offline, CVNet$^{DS}$ does not increase online extraction and matching times. CVNet$^{DS}$ dramatically reduces the memory footprint while preserving performance levels nearly identical to CVNet.