Deep hashing technology is widely used in cross-modal retrieval tasks due to its low storage costs and high computational efficiency. However, most existing supervised hashing methods suffer from the following challenges: (1) Relying on manually labeled semantic affinity levels as supervisory information for hash learning may ignore the underlying structure of semantic information, potentially resulting in semantic structure degradation. (2) They fail to consider both the semantic relationships among labels and the relative significance of each label to individual samples. To address these challenges, we propose a novel adaptive centroid guided hashing (ACGH) method for cross-modal retrieval. Specifically, we extract global and local features using Transformer models, and then fuse them to obtain fine-grained feature representations of multimodal data. Subsequently, the hash centroid generation module leverages the category semantic embedding to construct category hash centers and combine them with learnable Label-Affinity Coefficients (LAC) memory banks to learn adaptive hash centroids. Furthermore, we design a hash centroid guidance module, which employs the hash centroids to guide hash code learning and then updates the hash centers and LAC memory banks through the newly learned hash codes. Extensive experimental results on several benchmark multimodal datasets demonstrate that the proposed ACGH method significantly outperforms other state-of-the-art methods in cross-modal retrieval tasks.
扫码关注我们
求助内容:
应助结果提醒方式:
