首页 > 最新文献

IEEE Transactions on Multimedia最新文献

英文 中文
Graph-based Spatio-Temporal Semantic Reasoning Model for Anti-occlusion Infrared Aerial Target Recognition 基于图的时空语义推理模型用于反遮挡红外航空目标识别
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-05-31 DOI: 10.1109/tmm.2024.3408051
Xi Yang, Shaoyi Li, Saisai Niu, Binbin Yan, Zhongjie Meng
{"title":"Graph-based Spatio-Temporal Semantic Reasoning Model for Anti-occlusion Infrared Aerial Target Recognition","authors":"Xi Yang, Shaoyi Li, Saisai Niu, Binbin Yan, Zhongjie Meng","doi":"10.1109/tmm.2024.3408051","DOIUrl":"https://doi.org/10.1109/tmm.2024.3408051","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"56 1","pages":""},"PeriodicalIF":7.3,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141193333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3DTA: No-Reference 3D Point Cloud Quality Assessment with Twin Attention 3DTA:利用双倍注意力进行无参照三维点云质量评估
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-05-30 DOI: 10.1109/tmm.2024.3407698
Linxia Zhu, Jun Cheng, Xu Wang, Honglei Su, Huan Yang, Hui Yuan, Jari Korhonen
{"title":"3DTA: No-Reference 3D Point Cloud Quality Assessment with Twin Attention","authors":"Linxia Zhu, Jun Cheng, Xu Wang, Honglei Su, Huan Yang, Hui Yuan, Jari Korhonen","doi":"10.1109/tmm.2024.3407698","DOIUrl":"https://doi.org/10.1109/tmm.2024.3407698","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"96 1","pages":""},"PeriodicalIF":7.3,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141198251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-supervised Domain Adaptation via Joint Transductive and Inductive Subspace Learning 通过联合传导和归纳子空间学习实现半监督领域适应性
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-05-30 DOI: 10.1109/tmm.2024.3407696
Hao Luo, Zhiqiang Tian, Kaibing Zhang, Guofa Wang, Shaoyi Du
{"title":"Semi-supervised Domain Adaptation via Joint Transductive and Inductive Subspace Learning","authors":"Hao Luo, Zhiqiang Tian, Kaibing Zhang, Guofa Wang, Shaoyi Du","doi":"10.1109/tmm.2024.3407696","DOIUrl":"https://doi.org/10.1109/tmm.2024.3407696","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"19 1","pages":""},"PeriodicalIF":7.3,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141193753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain-adaptive Energy-based Models for Generalizable Face Anti-Spoofing 基于领域自适应能量的通用人脸反欺骗模型
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-05-30 DOI: 10.1109/tmm.2024.3407697
Dan Zhang, Zhekai Du, Jingjing Li, Lei Zhu, Heng Tao Shen
{"title":"Domain-adaptive Energy-based Models for Generalizable Face Anti-Spoofing","authors":"Dan Zhang, Zhekai Du, Jingjing Li, Lei Zhu, Heng Tao Shen","doi":"10.1109/tmm.2024.3407697","DOIUrl":"https://doi.org/10.1109/tmm.2024.3407697","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"58 1","pages":""},"PeriodicalIF":7.3,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141198374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Secret Image Sharing Resistant to JPEG Recompression Based on Stable Block Condition 基于稳定块条件的抗 JPEG 重压缩的稳健秘密图像共享
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-05-30 DOI: 10.1109/tmm.2024.3407694
Yue Jiang, Kejiang Chen, Wei Yan, Xuehu Yan, Guozheng Yang, Kai Zeng
{"title":"Robust Secret Image Sharing Resistant to JPEG Recompression Based on Stable Block Condition","authors":"Yue Jiang, Kejiang Chen, Wei Yan, Xuehu Yan, Guozheng Yang, Kai Zeng","doi":"10.1109/tmm.2024.3407694","DOIUrl":"https://doi.org/10.1109/tmm.2024.3407694","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"78 1","pages":""},"PeriodicalIF":7.3,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141193458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerated Lloyd's Method for Resampling 3D Point Clouds 重采样三维点云的加速劳埃德方法
IF 7.3 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-05-27 DOI: 10.1109/tmm.2024.3405664
Yanyang Xiao, Tieyi Zhang, Juan Cao, Zhonggui Chen
{"title":"Accelerated Lloyd's Method for Resampling 3D Point Clouds","authors":"Yanyang Xiao, Tieyi Zhang, Juan Cao, Zhonggui Chen","doi":"10.1109/tmm.2024.3405664","DOIUrl":"https://doi.org/10.1109/tmm.2024.3405664","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"2016 1","pages":""},"PeriodicalIF":7.3,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141170528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bipartite Graph-Based Projected Clustering With Local Region Guidance for Hyperspectral Imagery 基于双方图的投影聚类与高光谱图像的局部区域引导
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-04-30 DOI: 10.1109/TMM.2024.3394975
Yongshan Zhang;Guozhu Jiang;Zhihua Cai;Yicong Zhou
Hyperspectral image (HSI) clustering is challenging to divide all pixels into different clusters because of the absent labels, large spectral variability and complex spatial distribution. Anchor strategy provides an attractive solution to the computational bottleneck of graph-based clustering for large HSIs. However, most existing methods require separated learning procedures and ignore noisy as well as spatial information. In this paper, we propose a bipartite graph-based projected clustering (BGPC) method with local region guidance for HSI data. To take full advantage of spatial information, HSI denoising to alleviate noise interference and anchor initialization to construct bipartite graph are conducted within each generated superpixel. With the denoised pixels and initial anchors, projection learning and structured bipartite graph learning are simultaneously performed in a one-step learning model with connectivity constraint to directly provide clustering results. An alternating optimization algorithm is devised to solve the formulated model. The advantage of BGPC is the joint learning of projection and bipartite graph with local region guidance to exploit spatial information and linear time complexity to lessen computational burden. Extensive experiments demonstrate the superiority of the proposed BGPC over the state-of-the-art HSI clustering methods.
由于高光谱图像(HSI)没有标签、光谱变化大且空间分布复杂,因此将所有像素划分为不同的聚类具有挑战性。锚定策略为解决基于图的大型 HSI 聚类的计算瓶颈问题提供了有吸引力的解决方案。然而,现有的大多数方法都需要单独的学习程序,而且忽略了噪声和空间信息。在本文中,我们提出了一种基于双方位图的投影聚类(BGPC)方法,该方法具有对人机交互数据的局部区域引导功能。为了充分利用空间信息,我们在每个生成的超像素中进行了人脸图像去噪以减轻噪声干扰,并进行了锚初始化以构建双元图。利用去噪后的像素和初始锚点,投影学习和结构化双元图学习在一个具有连接性约束的一步学习模型中同时进行,从而直接提供聚类结果。设计了一种交替优化算法来求解所建立的模型。BGPC 的优势在于投影和双元图的联合学习,并通过局部区域引导来利用空间信息,而线性时间复杂度则减轻了计算负担。广泛的实验证明了所提出的 BGPC 优于最先进的人机交互聚类方法。
{"title":"Bipartite Graph-Based Projected Clustering With Local Region Guidance for Hyperspectral Imagery","authors":"Yongshan Zhang;Guozhu Jiang;Zhihua Cai;Yicong Zhou","doi":"10.1109/TMM.2024.3394975","DOIUrl":"10.1109/TMM.2024.3394975","url":null,"abstract":"Hyperspectral image (HSI) clustering is challenging to divide all pixels into different clusters because of the absent labels, large spectral variability and complex spatial distribution. Anchor strategy provides an attractive solution to the computational bottleneck of graph-based clustering for large HSIs. However, most existing methods require separated learning procedures and ignore noisy as well as spatial information. In this paper, we propose a bipartite graph-based projected clustering (BGPC) method with local region guidance for HSI data. To take full advantage of spatial information, HSI denoising to alleviate noise interference and anchor initialization to construct bipartite graph are conducted within each generated superpixel. With the denoised pixels and initial anchors, projection learning and structured bipartite graph learning are simultaneously performed in a one-step learning model with connectivity constraint to directly provide clustering results. An alternating optimization algorithm is devised to solve the formulated model. The advantage of BGPC is the joint learning of projection and bipartite graph with local region guidance to exploit spatial information and linear time complexity to lessen computational burden. Extensive experiments demonstrate the superiority of the proposed BGPC over the state-of-the-art HSI clustering methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"9551-9563"},"PeriodicalIF":8.4,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140831409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Each Performs Its Functions: Task Decomposition and Feature Assignment for Audio-Visual Segmentation 各司其职:音视频分割的任务分解和特征分配
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-04-30 DOI: 10.1109/TMM.2024.3394682
Sen Xu;Shikui Wei;Tao Ruan;Lixin Liao;Yao Zhao
Audio-visual segmentation (AVS) aims to segment the object instances that produce sound at the time of the video frames. Existing related solutions focus on designing cross-modal interaction mechanisms, which try to learn audio-visual correlations and simultaneously segment objects. Despite effectiveness, the close-coupling network structures become increasingly complex and hard to analyze. To address these problems, we propose a simple but effective method, ‘Each Performs Its Functions (PIF),’ which focuses on task decomposition and feature assignment. Inspired by human sensory experiences, PIF decouples AVS into two subtasks, correlation learning, and segmentation refinement, via two branches. Correlation learning aims to learn the correspondence between sound and visible individuals and provide the positional prior. Segmentation refinement focuses on fine segmentation. Then we assign different level features to perform the appropriate duties, i.e., using deep features for cross-modal interaction due to their semantic advantages; using rich textures of shallow features to improve segmentation results. Moreover, we propose the recurrent collaboration block to enhance interbranch communication. Experimental results on AVSBench show that our method outperforms related state-of-the-art methods by a large margin (e.g., +6.0% mIoU and +7.6% F-score on the Multi-Source subset). In addition, by purposely boosting subtasks' performance, our approach can serve as a strong baseline for audio-visual segmentation.
视听分割(AVS)旨在分割在视频帧中发出声音的对象实例。现有的相关解决方案侧重于设计跨模态交互机制,试图学习视听相关性并同时分割对象。尽管效果显著,但紧密耦合的网络结构变得越来越复杂,而且难以分析。为了解决这些问题,我们提出了一种简单而有效的方法--"各司其职(PIF)",其重点在于任务分解和特征分配。受人类感官经验的启发,PIF 通过两个分支将 AVS 分解为两个子任务:相关性学习和细分。相关性学习旨在学习声音和可见个体之间的对应关系,并提供位置先验。细化分割侧重于精细分割。然后,我们分配不同层次的特征来履行相应的职责,即利用深层特征的语义优势来实现跨模态交互;利用浅层特征的丰富纹理来改善分割结果。此外,我们还提出了循环协作块,以加强分支间的交流。在 AVSBench 上的实验结果表明,我们的方法在很大程度上优于相关的先进方法(例如,在多源子集上的 mIoU 和 F-score 分别为 +6.0% 和 +7.6%)。此外,通过有目的地提高子任务的性能,我们的方法可以作为视听分割的有力基准。
{"title":"Each Performs Its Functions: Task Decomposition and Feature Assignment for Audio-Visual Segmentation","authors":"Sen Xu;Shikui Wei;Tao Ruan;Lixin Liao;Yao Zhao","doi":"10.1109/TMM.2024.3394682","DOIUrl":"10.1109/TMM.2024.3394682","url":null,"abstract":"Audio-visual segmentation (AVS) aims to segment the object instances that produce sound at the time of the video frames. Existing related solutions focus on designing cross-modal interaction mechanisms, which try to learn audio-visual correlations and simultaneously segment objects. Despite effectiveness, the close-coupling network structures become increasingly complex and hard to analyze. To address these problems, we propose a simple but effective method, ‘Each \u0000<underline>P</u>\u0000erforms \u0000<underline>I</u>\u0000ts \u0000<underline>F</u>\u0000unctions (PIF),’ which focuses on task decomposition and feature assignment. Inspired by human sensory experiences, PIF decouples AVS into two subtasks, correlation learning, and segmentation refinement, via two branches. Correlation learning aims to learn the correspondence between sound and visible individuals and provide the positional prior. Segmentation refinement focuses on fine segmentation. Then we assign different level features to perform the appropriate duties, i.e., using deep features for cross-modal interaction due to their semantic advantages; using rich textures of shallow features to improve segmentation results. Moreover, we propose the recurrent collaboration block to enhance interbranch communication. Experimental results on AVSBench show that our method outperforms related state-of-the-art methods by a large margin (e.g., +6.0% mIoU and +7.6% F-score on the Multi-Source subset). In addition, by purposely boosting subtasks' performance, our approach can serve as a strong baseline for audio-visual segmentation.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"9489-9498"},"PeriodicalIF":8.4,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140831411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neighborhood-Aware Mutual Information Maximization for Source-Free Domain Adaptation 无源域自适应的邻域感知互信息最大化
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-04-30 DOI: 10.1109/TMM.2024.3394971
Lin Zhang;Yifan Wang;Ran Song;Mingxin Zhang;Xiaolei Li;Wei Zhang
Recently, the source-free domain adaptation (SFDA) problem has attracted much attention, where the pre-trained model for the source domain is adapted to the target domain in the absence of source data. However, due to domain shift, the negative alignment usually exists between samples from the same class, which may lower intra-class feature similarity. To address this issue, we present a self-supervised representation learning strategy for SFDA, named as neighborhood-aware mutual information (NAMI), which maximizes the mutual information (MI) between the representations of target samples and their corresponding neighbors. Moreover, we theoretically demonstrate that NAMI can be decomposed into a weighted sum of local MI, which suggests that the weighted terms can better estimate NAMI. To this end, we introduce neighborhood consensus score over the set of weakly and strongly augmented views and point-wise density based on neighborhood, both of which determine the weights of local MI for NAMI by leveraging the neighborhood information of samples. The proposed method can significantly handle domain shift and adaptively reduce the noise in the neighborhood of each target sample. In combination with the consistency loss over views, NAMI leads to consistent improvement over existing state-of-the-art methods on three popular SFDA benchmarks.
最近,无源域适应(SFDA)问题引起了广泛关注,即在没有源数据的情况下,将源域的预训练模型适应到目标域。然而,由于领域偏移,同一类别的样本之间通常存在负配准,这可能会降低类内特征的相似性。为了解决这个问题,我们提出了一种用于 SFDA 的自监督表示学习策略,即邻域感知互信息(NAMI),它能最大化目标样本及其相应邻域的表示之间的互信息(MI)。此外,我们还从理论上证明了 NAMI 可以分解为局部 MI 的加权和,这表明加权项可以更好地估计 NAMI。为此,我们引入了弱增强视图和强增强视图集合上的邻域共识得分以及基于邻域的点密度,这两种方法都能利用样本的邻域信息来确定 NAMI 的局部 MI 权重。所提出的方法能显著处理域偏移,并自适应地降低每个目标样本邻域的噪声。结合对视图的一致性损失,NAMI 在三个流行的 SFDA 基准上实现了对现有先进方法的持续改进。
{"title":"Neighborhood-Aware Mutual Information Maximization for Source-Free Domain Adaptation","authors":"Lin Zhang;Yifan Wang;Ran Song;Mingxin Zhang;Xiaolei Li;Wei Zhang","doi":"10.1109/TMM.2024.3394971","DOIUrl":"10.1109/TMM.2024.3394971","url":null,"abstract":"Recently, the source-free domain adaptation (SFDA) problem has attracted much attention, where the pre-trained model for the source domain is adapted to the target domain in the absence of source data. However, due to domain shift, the negative alignment usually exists between samples from the same class, which may lower intra-class feature similarity. To address this issue, we present a self-supervised representation learning strategy for SFDA, named as neighborhood-aware mutual information (NAMI), which maximizes the mutual information (MI) between the representations of target samples and their corresponding neighbors. Moreover, we theoretically demonstrate that NAMI can be decomposed into a weighted sum of local MI, which suggests that the weighted terms can better estimate NAMI. To this end, we introduce neighborhood consensus score over the set of weakly and strongly augmented views and point-wise density based on neighborhood, both of which determine the weights of local MI for NAMI by leveraging the neighborhood information of samples. The proposed method can significantly handle domain shift and adaptively reduce the noise in the neighborhood of each target sample. In combination with the consistency loss over views, NAMI leads to consistent improvement over existing state-of-the-art methods on three popular SFDA benchmarks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"9564-9574"},"PeriodicalIF":8.4,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140831488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic-Enhanced Proxy-Guided Hashing for Long-Tailed Image Retrieval 针对长尾图像检索的语义增强型路径引导哈希算法
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-04-29 DOI: 10.1109/TMM.2024.3394684
Hongtao Xie;Yan Jiang;Lei Zhang;Pandeng Li;Dongming Zhang;Yongdong Zhang
Hashing has been studied extensively for large-scale image retrieval due to its efficient computation and storage. Deep hashing methods typically train models with category-balanced data and suffer from a serious performance deterioration when dealing with long-tailed training samples. Recently, several long-tailed hashing methods focus on this newly emerging field for practical purpose. However, existing methods still face challenges that fixed category centers with limited semantic information cannot effectively improve the discriminative ability of tail-category hash codes. To tackle the issue, we propose a novel method called Semantic-enhanced Proxy-guided Hashing in this paper. We leverage two sets of learnable category proxies in the feature space and the Hamming space respectively, which can describe category semantics by getting updated continuously along with the whole model via back-propagation. Based on this, we introduce the Mahalanobis distance metric to characterize relationships accurately and enhance the semantic representation of both proxies and samples concurrently, improving the hash learning process. Moreover, we capture the multilateral correlations between proxies and samples in the feature space and extend a hypergraph neural network to transfer semantic knowledge from proxies to samples in the Hamming space. Extensive experiments show that our method achieves the state-of-the-art performance and surpasses existing methods by 1.47%–7.56% MAP on long-tailed benchmarks, demonstrating the superiority of learnable category proxies and the effectiveness of our proposed learning algorithm for long-tailed hashing.
由于哈希算法具有高效的计算和存储能力,因此在大规模图像检索方面得到了广泛的研究。深度散列方法通常使用类别平衡数据来训练模型,但在处理长尾训练样本时性能会严重下降。最近,一些长尾散列方法开始关注这一新兴领域的实用性。然而,现有方法仍然面临着一个挑战,即固定的类别中心和有限的语义信息无法有效提高尾类散列码的判别能力。为了解决这个问题,我们在本文中提出了一种名为 "语义增强的路径引导散列 "的新方法。我们分别利用特征空间和汉明空间中的两组可学习类别代理,通过反向传播与整个模型一起不断更新,从而描述类别语义。在此基础上,我们引入了 Mahalanobis 距离度量来准确表征关系,并同时增强代理和样本的语义表示,从而改进哈希学习过程。此外,我们还捕捉了特征空间中代理和样本之间的多边相关性,并扩展了超图神经网络,以便在汉明空间中将语义知识从代理转移到样本。广泛的实验表明,我们的方法达到了最先进的性能,并在长尾基准上以 1.47%-7.56% 的 MAP 超过了现有方法,证明了可学习类别代理的优越性和我们提出的长尾哈希学习算法的有效性。
{"title":"Semantic-Enhanced Proxy-Guided Hashing for Long-Tailed Image Retrieval","authors":"Hongtao Xie;Yan Jiang;Lei Zhang;Pandeng Li;Dongming Zhang;Yongdong Zhang","doi":"10.1109/TMM.2024.3394684","DOIUrl":"10.1109/TMM.2024.3394684","url":null,"abstract":"Hashing has been studied extensively for large-scale image retrieval due to its efficient computation and storage. Deep hashing methods typically train models with category-balanced data and suffer from a serious performance deterioration when dealing with long-tailed training samples. Recently, several long-tailed hashing methods focus on this newly emerging field for practical purpose. However, existing methods still face challenges that fixed category centers with limited semantic information cannot effectively improve the discriminative ability of tail-category hash codes. To tackle the issue, we propose a novel method called Semantic-enhanced Proxy-guided Hashing in this paper. We leverage two sets of learnable category proxies in the feature space and the Hamming space respectively, which can describe category semantics by getting updated continuously along with the whole model via back-propagation. Based on this, we introduce the Mahalanobis distance metric to characterize relationships accurately and enhance the semantic representation of both proxies and samples concurrently, improving the hash learning process. Moreover, we capture the multilateral correlations between proxies and samples in the feature space and extend a hypergraph neural network to transfer semantic knowledge from proxies to samples in the Hamming space. Extensive experiments show that our method achieves the state-of-the-art performance and surpasses existing methods by 1.47%–7.56% MAP on long-tailed benchmarks, demonstrating the superiority of learnable category proxies and the effectiveness of our proposed learning algorithm for long-tailed hashing.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"9499-9514"},"PeriodicalIF":8.4,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140831397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1