{"title":"图像检索的相关性验证及其内存足迹优化","authors":"Seongwon Lee;Hongje Seong;Suhyeon Lee;Euntai Kim","doi":"10.1109/TPAMI.2024.3504274","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a novel image retrieval network named Correlation Verification Network (CVNet) to replace the conventional geometric re-ranking with a 4D convolutional neural network that learns diverse geometric matching possibilities. To enable efficient cross-scale matching, we construct feature pyramids and establish cross-scale feature correlations in a single inference, thereby replacing the costly multi-scale inference. Additionally, we employ curriculum learning with the Hide-and-Seek strategy to handle challenging samples. Our proposed CVNet demonstrates state-of-the-art performance on several image retrieval benchmarks by a large margin. From an implementation perspective, however, CVNet has one drawback: it requires high memory usage because it needs to store dense features of all database images. This high memory requirement can be a significant limitation in practical applications. To address this issue, we introduce an extension of CVNet called Dense-to-Sparse CVNet (CVNet<inline-formula><tex-math>$^{DS}$</tex-math></inline-formula>), which can significantly reduce memory usage by sparsifying the features of the database images. The sparsification module in CVNet<inline-formula><tex-math>$^{DS}$</tex-math></inline-formula> learns to select the relevant parts of image features end-to-end using a Gumbel estimator. Since the sparsification is performed offline, CVNet<inline-formula><tex-math>$^{DS}$</tex-math></inline-formula> does not increase online extraction and matching times. CVNet<inline-formula><tex-math>$^{DS}$</tex-math></inline-formula> dramatically reduces the memory footprint while preserving performance levels nearly identical to CVNet.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"1514-1529"},"PeriodicalIF":18.6000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Correlation Verification for Image Retrieval and Its Memory Footprint Optimization\",\"authors\":\"Seongwon Lee;Hongje Seong;Suhyeon Lee;Euntai Kim\",\"doi\":\"10.1109/TPAMI.2024.3504274\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a novel image retrieval network named Correlation Verification Network (CVNet) to replace the conventional geometric re-ranking with a 4D convolutional neural network that learns diverse geometric matching possibilities. To enable efficient cross-scale matching, we construct feature pyramids and establish cross-scale feature correlations in a single inference, thereby replacing the costly multi-scale inference. Additionally, we employ curriculum learning with the Hide-and-Seek strategy to handle challenging samples. Our proposed CVNet demonstrates state-of-the-art performance on several image retrieval benchmarks by a large margin. From an implementation perspective, however, CVNet has one drawback: it requires high memory usage because it needs to store dense features of all database images. This high memory requirement can be a significant limitation in practical applications. To address this issue, we introduce an extension of CVNet called Dense-to-Sparse CVNet (CVNet<inline-formula><tex-math>$^{DS}$</tex-math></inline-formula>), which can significantly reduce memory usage by sparsifying the features of the database images. The sparsification module in CVNet<inline-formula><tex-math>$^{DS}$</tex-math></inline-formula> learns to select the relevant parts of image features end-to-end using a Gumbel estimator. Since the sparsification is performed offline, CVNet<inline-formula><tex-math>$^{DS}$</tex-math></inline-formula> does not increase online extraction and matching times. CVNet<inline-formula><tex-math>$^{DS}$</tex-math></inline-formula> dramatically reduces the memory footprint while preserving performance levels nearly identical to CVNet.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 3\",\"pages\":\"1514-1529\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2024-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10759842/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10759842/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Correlation Verification for Image Retrieval and Its Memory Footprint Optimization
In this paper, we propose a novel image retrieval network named Correlation Verification Network (CVNet) to replace the conventional geometric re-ranking with a 4D convolutional neural network that learns diverse geometric matching possibilities. To enable efficient cross-scale matching, we construct feature pyramids and establish cross-scale feature correlations in a single inference, thereby replacing the costly multi-scale inference. Additionally, we employ curriculum learning with the Hide-and-Seek strategy to handle challenging samples. Our proposed CVNet demonstrates state-of-the-art performance on several image retrieval benchmarks by a large margin. From an implementation perspective, however, CVNet has one drawback: it requires high memory usage because it needs to store dense features of all database images. This high memory requirement can be a significant limitation in practical applications. To address this issue, we introduce an extension of CVNet called Dense-to-Sparse CVNet (CVNet$^{DS}$), which can significantly reduce memory usage by sparsifying the features of the database images. The sparsification module in CVNet$^{DS}$ learns to select the relevant parts of image features end-to-end using a Gumbel estimator. Since the sparsification is performed offline, CVNet$^{DS}$ does not increase online extraction and matching times. CVNet$^{DS}$ dramatically reduces the memory footprint while preserving performance levels nearly identical to CVNet.