首页 > 最新文献

Proceedings of the 2019 on International Conference on Multimedia Retrieval最新文献

英文 中文
Relationship Detection Based on Object Semantic Inference and Attention Mechanisms 基于对象语义推理和注意机制的关系检测
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325025
Liang Zhang, Shuai Zhang, Peiyi Shen, Guangming Zhu, Syed Afaq Ali Shah, Bennamoun
Detecting relations among objects is a crucial task for image understanding. However, each relationship involves different objects pair combinations, and different objects pair combinations express diverse interactions. This makes the relationships, based just on visual features, a challenging task. In this paper, we propose a simple yet effective relationship detection model, which is based on object semantic inference and attention mechanisms. Our model is trained to detect relation triples, such as , . To overcome the high diversity of visual appearances, the semantic inference module and the visual features are combined to complement each others. We also introduce two different attention mechanisms for object feature refinement and phrase feature refinement. In order to derive a more detailed and comprehensive representation for each object, the object feature refinement module refines the representation of each object by querying over all the other objects in the image. The phrase feature refinement module is proposed in order to make the phrase feature more effective, and to automatically focus on relative parts, to improve the visual relationship detection task. We validate our model on Visual Genome Relationship dataset. Our proposed model achieves competitive results compared to the state-of-the-art method MOTIFNET.
检测物体之间的关系是图像理解的关键任务。然而,每一种关系都涉及不同的对象对组合,不同的对象对组合表达了不同的相互作用。这使得仅基于视觉特征的关系成为一项具有挑战性的任务。本文提出了一种简单而有效的基于对象语义推理和注意机制的关系检测模型。我们的模型被训练来检测关系三元组,例如,。为了克服视觉外观的高度多样性,将语义推理模块和视觉特征相结合,相辅相成。我们还介绍了对象特征细化和短语特征细化两种不同的注意机制。为了得到每个对象更详细和全面的表示,对象特征细化模块通过查询图像中所有其他对象来细化每个对象的表示。为了使短语特征更有效,并自动聚焦于相关部分,改进视觉关系检测任务,提出了短语特征细化模块。我们在可视化基因组关系数据集上验证了我们的模型。与最先进的MOTIFNET方法相比,我们提出的模型取得了具有竞争力的结果。
{"title":"Relationship Detection Based on Object Semantic Inference and Attention Mechanisms","authors":"Liang Zhang, Shuai Zhang, Peiyi Shen, Guangming Zhu, Syed Afaq Ali Shah, Bennamoun","doi":"10.1145/3323873.3325025","DOIUrl":"https://doi.org/10.1145/3323873.3325025","url":null,"abstract":"Detecting relations among objects is a crucial task for image understanding. However, each relationship involves different objects pair combinations, and different objects pair combinations express diverse interactions. This makes the relationships, based just on visual features, a challenging task. In this paper, we propose a simple yet effective relationship detection model, which is based on object semantic inference and attention mechanisms. Our model is trained to detect relation triples, such as , . To overcome the high diversity of visual appearances, the semantic inference module and the visual features are combined to complement each others. We also introduce two different attention mechanisms for object feature refinement and phrase feature refinement. In order to derive a more detailed and comprehensive representation for each object, the object feature refinement module refines the representation of each object by querying over all the other objects in the image. The phrase feature refinement module is proposed in order to make the phrase feature more effective, and to automatically focus on relative parts, to improve the visual relationship detection task. We validate our model on Visual Genome Relationship dataset. Our proposed model achieves competitive results compared to the state-of-the-art method MOTIFNET.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125771210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Integrity Verification in Medical Image Retrieval Systems using Spread Spectrum Steganography 利用扩频隐写技术的医学图像检索系统的完整性验证
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325020
P. Eze, P. Udaya, R. Evans, Dongxi Liu
The region of interest (ROI) of medical images in content-based image retrieval (CBIR) systems often require content authentication and verification. This is because adversarial modification of the stored image could have lethal effect on research, diagnostic outcome and the outcome of some forensic investigations. In this work, both robust watermarking and Fragile Steganography were combined with image search features to design a medical image retrieval system that incorporates ROI integrity verification. Original ROI features were pre-computed and embedded into archival images and utilised during retrieval for image integrity checks. The average global image PSNR was 38.36dB while the ROI PSNR was maintained at an average of 46dB with all watermark search features retrieved at zero bit error rate (BER) provided the attack on the image is not perceptible.
在基于内容的图像检索(CBIR)系统中,医学图像的感兴趣区域(ROI)通常需要内容认证和验证。这是因为对存储图像的对抗性修改可能对研究、诊断结果和一些法医调查结果产生致命影响。在这项工作中,将鲁棒水印和脆弱隐写术与图像搜索特性相结合,设计了一个包含ROI完整性验证的医学图像检索系统。原始的ROI特征被预先计算并嵌入到存档图像中,并在检索过程中用于图像完整性检查。在攻击不明显的情况下,水印搜索特征的平均全局PSNR为38.36dB, ROI的平均PSNR为46dB。
{"title":"Integrity Verification in Medical Image Retrieval Systems using Spread Spectrum Steganography","authors":"P. Eze, P. Udaya, R. Evans, Dongxi Liu","doi":"10.1145/3323873.3325020","DOIUrl":"https://doi.org/10.1145/3323873.3325020","url":null,"abstract":"The region of interest (ROI) of medical images in content-based image retrieval (CBIR) systems often require content authentication and verification. This is because adversarial modification of the stored image could have lethal effect on research, diagnostic outcome and the outcome of some forensic investigations. In this work, both robust watermarking and Fragile Steganography were combined with image search features to design a medical image retrieval system that incorporates ROI integrity verification. Original ROI features were pre-computed and embedded into archival images and utilised during retrieval for image integrity checks. The average global image PSNR was 38.36dB while the ROI PSNR was maintained at an average of 46dB with all watermark search features retrieved at zero bit error rate (BER) provided the attack on the image is not perceptible.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116823569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Context-Aware Embeddings for Automatic Art Analysis 上下文感知嵌入自动艺术分析
Pub Date : 2019-04-10 DOI: 10.1145/3323873.3325028
Noa García, B. Renoust, Yuta Nakashima
Automatic art analysis aims to classify and retrieve artistic representations from a collection of images by using computer vision and machine learning techniques. In this work, we propose to enhance visual representations from neural networks with contextual artistic information. Whereas visual representations are able to capture information about the content and the style of an artwork, our proposed context-aware embeddings additionally encode relationships between different artistic attributes, such as author, school, or historical period. We design two different approaches for using context in automatic art analysis. In the first one, contextual data is obtained through a multi-task learning model, in which several attributes are trained together to find visual relationships between elements. In the second approach, context is obtained through an art-specific knowledge graph, which encodes relationships between artistic attributes. An exhaustive evaluation of both of our models in several art analysis problems, such as author identification, type classification, or cross-modal retrieval, show that performance is improved by up to 7.3% in art classification and 37.24% in retrieval when context-aware embeddings are used.
自动艺术分析旨在通过使用计算机视觉和机器学习技术从图像集合中分类和检索艺术表现。在这项工作中,我们建议用上下文艺术信息增强神经网络的视觉表征。虽然视觉表示能够捕获有关艺术作品的内容和风格的信息,但我们提出的上下文感知嵌入额外编码不同艺术属性之间的关系,例如作者、学校或历史时期。我们设计了两种不同的方法在自动艺术分析中使用上下文。在第一种方法中,上下文数据通过多任务学习模型获得,其中多个属性被一起训练以找到元素之间的视觉关系。在第二种方法中,通过特定于艺术的知识图获得上下文,该知识图对艺术属性之间的关系进行编码。对我们的模型在几个艺术分析问题(如作者识别、类型分类或跨模态检索)中的详尽评估表明,当使用上下文感知嵌入时,艺术分类的性能提高了7.3%,检索的性能提高了37.24%。
{"title":"Context-Aware Embeddings for Automatic Art Analysis","authors":"Noa García, B. Renoust, Yuta Nakashima","doi":"10.1145/3323873.3325028","DOIUrl":"https://doi.org/10.1145/3323873.3325028","url":null,"abstract":"Automatic art analysis aims to classify and retrieve artistic representations from a collection of images by using computer vision and machine learning techniques. In this work, we propose to enhance visual representations from neural networks with contextual artistic information. Whereas visual representations are able to capture information about the content and the style of an artwork, our proposed context-aware embeddings additionally encode relationships between different artistic attributes, such as author, school, or historical period. We design two different approaches for using context in automatic art analysis. In the first one, contextual data is obtained through a multi-task learning model, in which several attributes are trained together to find visual relationships between elements. In the second approach, context is obtained through an art-specific knowledge graph, which encodes relationships between artistic attributes. An exhaustive evaluation of both of our models in several art analysis problems, such as author identification, type classification, or cross-modal retrieval, show that performance is improved by up to 7.3% in art classification and 37.24% in retrieval when context-aware embeddings are used.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"230 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133417345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Learning Task Relatedness in Multi-Task Learning for Images in Context 语境下图像多任务学习中的学习任务相关性
Pub Date : 2019-04-05 DOI: 10.1145/3323873.3325009
Gjorgji Strezoski, N. V. Noord, M. Worring
Multimedia applications often require concurrent solutions to multiple tasks. These tasks hold clues to each-others solutions, however as these relations can be complex this remains a rarely utilized property. When task relations are explicitly defined based on domain knowledge multi-task learning (MTL) offers such concurrent solutions, while exploiting relatedness between multiple tasks performed over the same dataset. In most cases however, this relatedness is not explicitly defined and the domain expert knowledge that defines it is not available. To address this issue, we introduce Selective Sharing, a method that learns the inter-task relatedness from secondary latent features while the model trains. Using this insight, we can automatically group tasks and allow them to share knowledge in a mutually beneficial way. We support our method with experiments on 5 datasets in classification, regression, and ranking tasks and compare to strong baselines and state-of-the-art approaches showing a consistent improvement in terms of accuracy and parameter counts. In addition, we perform an activation region analysis showing how Selective Sharing affects the learned representation.
多媒体应用程序通常需要多个任务的并发解决方案。这些任务提供了彼此解决方案的线索,但是由于这些关系可能很复杂,这仍然是一个很少使用的属性。当基于领域知识显式定义任务关系时,多任务学习(MTL)提供了这样的并发解决方案,同时利用在同一数据集上执行的多个任务之间的相关性。然而,在大多数情况下,这种相关性没有明确定义,并且定义它的领域专家知识是不可用的。为了解决这个问题,我们引入了选择性共享,这是一种在模型训练时从次要潜在特征中学习任务间相关性的方法。利用这种洞察力,我们可以自动对任务进行分组,并允许它们以互利的方式共享知识。我们在5个数据集上进行了分类、回归和排序任务的实验,并与强基线和最先进的方法进行了比较,显示出在准确性和参数数量方面的一致改进。此外,我们进行了激活区域分析,展示了选择性共享如何影响学习表征。
{"title":"Learning Task Relatedness in Multi-Task Learning for Images in Context","authors":"Gjorgji Strezoski, N. V. Noord, M. Worring","doi":"10.1145/3323873.3325009","DOIUrl":"https://doi.org/10.1145/3323873.3325009","url":null,"abstract":"Multimedia applications often require concurrent solutions to multiple tasks. These tasks hold clues to each-others solutions, however as these relations can be complex this remains a rarely utilized property. When task relations are explicitly defined based on domain knowledge multi-task learning (MTL) offers such concurrent solutions, while exploiting relatedness between multiple tasks performed over the same dataset. In most cases however, this relatedness is not explicitly defined and the domain expert knowledge that defines it is not available. To address this issue, we introduce Selective Sharing, a method that learns the inter-task relatedness from secondary latent features while the model trains. Using this insight, we can automatically group tasks and allow them to share knowledge in a mutually beneficial way. We support our method with experiments on 5 datasets in classification, regression, and ranking tasks and compare to strong baselines and state-of-the-art approaches showing a consistent improvement in terms of accuracy and parameter counts. In addition, we perform an activation region analysis showing how Selective Sharing affects the learned representation.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125372313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Feature Pyramid Hashing 特征金字塔哈希
Pub Date : 2019-04-04 DOI: 10.1145/3323873.3325015
Yifan Yang, Libing Geng, Hanjiang Lai, Yan Pan, Jian Yin
In recent years, deep-networks-based hashing has become a leading approach for large-scale image retrieval. Most deep hashing approaches use the high layer to extract the powerful semantic representations. However, these methods have limited ability for fine-grained image retrieval because the semantic features extracted from the high layer are difficult in capturing the subtle differences. To this end, we propose a novel two-pyramid hashing architecture to learn both the semantic information and the subtle appearance details for fine-grained image search. Inspired by the feature pyramids of convolutional neural network, avertical pyramid is proposed to capture the high-layer features and ahorizontal pyramid combines multiple low-layer features with structural information to capture the subtle differences. To fuse the low-level features, a novel combination strategy, called consensus fusion, is proposed to capture all subtle information from several low-layers for finer retrieval. Extensive evaluation on two fine-grained datasets CUB-200-2011 and Stanford Dogs demonstrate that the proposed method achieves significant performance compared with the state-of-art baselines.
近年来,基于深度网络的哈希算法已成为大规模图像检索的主要方法。大多数深度哈希方法使用高层来提取强大的语义表示。然而,这些方法对细粒度图像检索的能力有限,因为从高层提取的语义特征难以捕捉细微的差异。为此,我们提出了一种新的双金字塔哈希架构来学习语义信息和细微的外观细节,用于细粒度图像搜索。受卷积神经网络特征金字塔的启发,提出了垂直金字塔来捕捉高层特征,水平金字塔结合多个低层特征和结构信息来捕捉细微差异。为了融合低层次特征,提出了一种新的组合策略,即共识融合,从几个低层次中捕获所有细微信息,以进行更精细的检索。在两个细粒度数据集CUB-200-2011和Stanford Dogs上的广泛评估表明,与最先进的基线相比,所提出的方法取得了显著的性能。
{"title":"Feature Pyramid Hashing","authors":"Yifan Yang, Libing Geng, Hanjiang Lai, Yan Pan, Jian Yin","doi":"10.1145/3323873.3325015","DOIUrl":"https://doi.org/10.1145/3323873.3325015","url":null,"abstract":"In recent years, deep-networks-based hashing has become a leading approach for large-scale image retrieval. Most deep hashing approaches use the high layer to extract the powerful semantic representations. However, these methods have limited ability for fine-grained image retrieval because the semantic features extracted from the high layer are difficult in capturing the subtle differences. To this end, we propose a novel two-pyramid hashing architecture to learn both the semantic information and the subtle appearance details for fine-grained image search. Inspired by the feature pyramids of convolutional neural network, avertical pyramid is proposed to capture the high-layer features and ahorizontal pyramid combines multiple low-layer features with structural information to capture the subtle differences. To fuse the low-level features, a novel combination strategy, called consensus fusion, is proposed to capture all subtle information from several low-layers for finer retrieval. Extensive evaluation on two fine-grained datasets CUB-200-2011 and Stanford Dogs demonstrate that the proposed method achieves significant performance compared with the state-of-art baselines.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"200 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122146673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Deep Policy Hashing Network with Listwise Supervision 具有列表监督的深度策略哈希网络
Pub Date : 2019-04-03 DOI: 10.1145/3323873.3325016
Shaoying Wang, Hai-Cheng Lai, Yifan Yang, Jian Yin
Deep-networks-based hashing has become a leading approach for large-scale image retrieval, which learns a similarity-preserving network to map similar images to nearby hash codes. The pairwise and triplet losses are two widely used similarity preserving manners for deep hashing. These manners ignore the fact that hashing is a prediction task on the list of binary codes. However, learning deep hashing with listwise supervision is challenging in 1) how to obtain the rank list of whole training set when the batch size of the deep network is always small and 2) how to utilize the listwise supervision. In this paper, we present a novel deep policy hashing architecture with two systems are learned in parallel: aquery network and a shared and slowly changingdatabase network. The following three steps are repeated until convergence: 1) the database network encodes all training samples into binary codes to obtain whole rank list, 2) the query network is trained based on policy learning to maximize a reward that indicates the performance of the whole ranking list of binary codes, e.g., mean average precision (MAP), and 3) the database network is updated as the query network. Extensive evaluations on several benchmark datasets show that the proposed method brings substantial improvements over state-of-the-art hashing methods.
基于深度网络的哈希已经成为大规模图像检索的主要方法,它学习一个保持相似性的网络来将相似的图像映射到附近的哈希码。双损失和三重损失是深度哈希中广泛使用的两种相似度保持方法。这些方式忽略了一个事实,即散列是对二进制代码列表的预测任务。然而,利用列表监督学习深度哈希存在以下挑战:1)当深度网络的批处理规模总是很小时,如何获得整个训练集的秩表;2)如何利用列表监督。在本文中,我们提出了一种新的深度策略哈希架构,该架构使用两个并行学习的系统:查询网络和共享且缓慢变化的数据库网络。重复以下三个步骤直到收敛:1)数据库网络将所有训练样本编码为二进制代码以获得整个排名表,2)基于策略学习训练查询网络以最大化表示整个二进制代码排名表性能的奖励,例如mean average precision (MAP), 3)数据库网络更新为查询网络。对几个基准数据集的广泛评估表明,所提出的方法比最先进的哈希方法带来了实质性的改进。
{"title":"Deep Policy Hashing Network with Listwise Supervision","authors":"Shaoying Wang, Hai-Cheng Lai, Yifan Yang, Jian Yin","doi":"10.1145/3323873.3325016","DOIUrl":"https://doi.org/10.1145/3323873.3325016","url":null,"abstract":"Deep-networks-based hashing has become a leading approach for large-scale image retrieval, which learns a similarity-preserving network to map similar images to nearby hash codes. The pairwise and triplet losses are two widely used similarity preserving manners for deep hashing. These manners ignore the fact that hashing is a prediction task on the list of binary codes. However, learning deep hashing with listwise supervision is challenging in 1) how to obtain the rank list of whole training set when the batch size of the deep network is always small and 2) how to utilize the listwise supervision. In this paper, we present a novel deep policy hashing architecture with two systems are learned in parallel: aquery network and a shared and slowly changingdatabase network. The following three steps are repeated until convergence: 1) the database network encodes all training samples into binary codes to obtain whole rank list, 2) the query network is trained based on policy learning to maximize a reward that indicates the performance of the whole ranking list of binary codes, e.g., mean average precision (MAP), and 3) the database network is updated as the query network. Extensive evaluations on several benchmark datasets show that the proposed method brings substantial improvements over state-of-the-art hashing methods.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127358641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Unsupervised Rank-Preserving Hashing for Large-Scale Image Retrieval 大规模图像检索的无监督秩保持哈希
Pub Date : 2019-03-04 DOI: 10.1145/3323873.3325038
Svebor Karaman, Xudong Lin, Xuefeng Hu, Shih-Fu Chang
We propose an unsupervised hashing method, exploiting a shallow neural network, that aims to produce binary codes that preserve the ranking induced by an original real-valued representation. This is motivated by the emergence of small-world graph-based approximate search methods that rely on local neighborhood ranking. We formalize the training process in an intuitive way by considering each training sample as a query and aiming to obtain a ranking of a random subset of the training set using the hash codes that is the same as the ranking using the original features. We also explore the use of a decoder to obtain an approximated reconstruction of the original features. At test time, we retrieve the most promising database samples using only the hash codes and perform re-ranking using the reconstructed features, thus allowing the complete elimination of the original real-valued features and the associated high memory cost. Experiments conducted on publicly available large-scale datasets show that our method consistently outperforms all compared state-of-the-art unsupervised hashing methods and that the reconstruction procedure can effectively boost the search accuracy with a minimal constant additional cost.
我们提出了一种无监督哈希方法,利用浅神经网络,旨在产生保留原始实值表示引起的排名的二进制代码。这是由依赖于局部邻域排名的基于小世界图的近似搜索方法的出现所推动的。我们将每个训练样本视为一个查询,以直观的方式形式化训练过程,目标是使用哈希码获得训练集随机子集的排名,该排名与使用原始特征的排名相同。我们还探讨了使用解码器来获得原始特征的近似重建。在测试时,我们只使用哈希码检索最有希望的数据库样本,并使用重建的特征执行重新排序,从而允许完全消除原始实值特征和相关的高内存成本。在公开可用的大规模数据集上进行的实验表明,我们的方法始终优于所有比较的最先进的无监督哈希方法,并且重建过程可以以最小的常数额外成本有效地提高搜索精度。
{"title":"Unsupervised Rank-Preserving Hashing for Large-Scale Image Retrieval","authors":"Svebor Karaman, Xudong Lin, Xuefeng Hu, Shih-Fu Chang","doi":"10.1145/3323873.3325038","DOIUrl":"https://doi.org/10.1145/3323873.3325038","url":null,"abstract":"We propose an unsupervised hashing method, exploiting a shallow neural network, that aims to produce binary codes that preserve the ranking induced by an original real-valued representation. This is motivated by the emergence of small-world graph-based approximate search methods that rely on local neighborhood ranking. We formalize the training process in an intuitive way by considering each training sample as a query and aiming to obtain a ranking of a random subset of the training set using the hash codes that is the same as the ranking using the original features. We also explore the use of a decoder to obtain an approximated reconstruction of the original features. At test time, we retrieve the most promising database samples using only the hash codes and perform re-ranking using the reconstructed features, thus allowing the complete elimination of the original real-valued features and the associated high memory cost. Experiments conducted on publicly available large-scale datasets show that our method consistently outperforms all compared state-of-the-art unsupervised hashing methods and that the reconstruction procedure can effectively boost the search accuracy with a minimal constant additional cost.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128926054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Joint Cluster Unary Loss for Efficient Cross-Modal Hashing 高效跨模态哈希的联合聚类一元损失
Pub Date : 2019-02-02 DOI: 10.1145/3323873.3325059
Shifeng Zhang, Jianmin Li, Bo Zhang
Recently, cross-modal deep hashing has received broad attention for solving cross-modal retrieval problems efficiently. Most cross-modal hashing methods generate $O(n^2)$ data pairs and $O(n^3)$ data triplets for training, but the training procedure is less efficient because the complexity is high for large-scale dataset. In this paper, we propose a novel and efficient cross-modal hashing algorithm named Joint Cluster Cross-Modal Hashing (JCCH). First, We introduce the Cross-Modal Unary Loss (CMUL) with $O(n)$ complexity to bridge the traditional triplet loss and classification-based unary loss, and the JCCH algorithm is introduced with CMUL. Second, a more accurate bound of the triplet loss for structured multilabel data is introduced in CMUL. The resultant hashcodes form several clusters in which the hashcodes in the same cluster share similar semantic information, and the heterogeneity gap on different modalities is diminished by sharing the clusters. Experiments on large-scale datasets show that the proposed method is superior over or comparable with state-of-the-art cross-modal hashing methods, and training with the proposed method is more efficient than others.
近年来,跨模态深度哈希算法因能有效解决跨模态检索问题而受到广泛关注。大多数跨模态哈希方法生成$O(n^2)$数据对和$O(n^3)$数据三元组用于训练,但由于大规模数据集的复杂度较高,训练过程效率较低。本文提出了一种新颖高效的跨模态哈希算法——联合簇跨模态哈希(JCCH)。首先,我们引入复杂度为$O(n)$的跨模态一元损失(Cross-Modal Unary Loss, CMUL)来弥补传统的三元损失和基于分类的一元损失,并将CMUL引入JCCH算法。其次,在CMUL中引入了结构化多标签数据的更精确的三重态损失边界。生成的哈希码形成多个簇,其中同一簇中的哈希码共享相似的语义信息,并且通过共享簇减少了不同模态上的异构差距。在大规模数据集上的实验表明,该方法优于或可与最先进的跨模态哈希方法相媲美,并且使用该方法进行训练比其他方法更有效。
{"title":"Joint Cluster Unary Loss for Efficient Cross-Modal Hashing","authors":"Shifeng Zhang, Jianmin Li, Bo Zhang","doi":"10.1145/3323873.3325059","DOIUrl":"https://doi.org/10.1145/3323873.3325059","url":null,"abstract":"Recently, cross-modal deep hashing has received broad attention for solving cross-modal retrieval problems efficiently. Most cross-modal hashing methods generate $O(n^2)$ data pairs and $O(n^3)$ data triplets for training, but the training procedure is less efficient because the complexity is high for large-scale dataset. In this paper, we propose a novel and efficient cross-modal hashing algorithm named Joint Cluster Cross-Modal Hashing (JCCH). First, We introduce the Cross-Modal Unary Loss (CMUL) with $O(n)$ complexity to bridge the traditional triplet loss and classification-based unary loss, and the JCCH algorithm is introduced with CMUL. Second, a more accurate bound of the triplet loss for structured multilabel data is introduced in CMUL. The resultant hashcodes form several clusters in which the hashcodes in the same cluster share similar semantic information, and the heterogeneity gap on different modalities is diminished by sharing the clusters. Experiments on large-scale datasets show that the proposed method is superior over or comparable with state-of-the-art cross-modal hashing methods, and training with the proposed method is more efficient than others.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114291153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Self-Supervised Visual Representations for Cross-Modal Retrieval 跨模态检索的自监督视觉表示
Pub Date : 2019-01-31 DOI: 10.1145/3323873.3325035
Yash J. Patel, Lluís Gómez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar
Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places. However, collecting and annotating such datasets requires a tremendous amount of human effort and, besides, their annotations are limited to discrete sets of popular visual classes that may not be representative of the richer semantics found on large-scale cross-modal retrieval datasets. In this paper, we present a self-supervised cross-modal retrieval framework that leverages as training data the correlations between images and text on the entire set of Wikipedia articles. Our method consists in training a CNN to predict: (1) the semantic context of the article in which an image is more probable to appear as an illustration, and (2) the semantic context of its caption. Our experiments demonstrate that the proposed method is not only capable of learning discriminative visual representations for solving vision tasks like classification, but that the learned representations are better for cross-modal retrieval when compared to supervised pre-training of the network on the ImageNet dataset.
近年来,随着深度神经网络和大规模标注数据集(如ImageNet和Places)的使用,跨模态检索方法得到了显著改进。然而,收集和注释这些数据集需要大量的人力,此外,它们的注释仅限于流行的视觉类的离散集,这些类可能无法代表大规模跨模态检索数据集上发现的更丰富的语义。在本文中,我们提出了一个自监督跨模态检索框架,该框架利用整个维基百科文章集上图像和文本之间的相关性作为训练数据。我们的方法包括训练CNN来预测:(1)图片更有可能作为插图出现的文章的语义上下文,(2)标题的语义上下文。我们的实验表明,所提出的方法不仅能够学习判别视觉表示来解决分类等视觉任务,而且与在ImageNet数据集上对网络进行监督预训练相比,所学习的表示更适合跨模态检索。
{"title":"Self-Supervised Visual Representations for Cross-Modal Retrieval","authors":"Yash J. Patel, Lluís Gómez, Marçal Rusiñol, Dimosthenis Karatzas, C. V. Jawahar","doi":"10.1145/3323873.3325035","DOIUrl":"https://doi.org/10.1145/3323873.3325035","url":null,"abstract":"Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places. However, collecting and annotating such datasets requires a tremendous amount of human effort and, besides, their annotations are limited to discrete sets of popular visual classes that may not be representative of the richer semantics found on large-scale cross-modal retrieval datasets. In this paper, we present a self-supervised cross-modal retrieval framework that leverages as training data the correlations between images and text on the entire set of Wikipedia articles. Our method consists in training a CNN to predict: (1) the semantic context of the article in which an image is more probable to appear as an illustration, and (2) the semantic context of its caption. Our experiments demonstrate that the proposed method is not only capable of learning discriminative visual representations for solving vision tasks like classification, but that the learned representations are better for cross-modal retrieval when compared to supervised pre-training of the network on the ImageNet dataset.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129827550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Who's Afraid of Adversarial Queries?: The Impact of Image Modifications on Content-based Image Retrieval 谁害怕对抗性查询?图像修改对基于内容的图像检索的影响
Pub Date : 2019-01-29 DOI: 10.1145/3323873.3325052
Zhuoran Liu, Zhengyu Zhao, M. Larson
An adversarial query is an image that has been modified to disrupt content-based image retrieval (CBIR), while appearing nearly untouched to the human eye. This paper presents an analysis of adversarial queries for CBIR based on neural, local, and global features. We introduce an innovative neural image perturbation approach, called Perturbations for Image Retrieval Error (PIRE), that is capable of blocking neural-feature-based CBIR. PIRE differs significantly from existing approaches that create images adversarial with respect to CNN classifiers because it is unsupervised, i.e., it needs no labeled data from the data set to which it is applied. Our experimental analysis demonstrates the surprising effectiveness of PIRE in blocking CBIR, and also covers aspects of PIRE that must be taken into account in practical settings, including saving images, image quality and leaking adversarial queries into the background collection. Our experiments also compare PIRE (a neural approach) with existing keypoint removal and injection approaches (which modify local features). Finally, we discuss the challenges that face multimedia researchers in the future study of adversarial queries.
对抗性查询是经过修改以破坏基于内容的图像检索(CBIR)的图像,而人眼几乎未受影响。本文提出了一种基于神经、局部和全局特征的CBIR对抗查询分析方法。我们介绍了一种创新的神经图像摄动方法,称为图像检索误差摄动(PIRE),它能够阻止基于神经特征的CBIR。PIRE与现有的与CNN分类器创建图像对抗的方法有很大的不同,因为它是无监督的,也就是说,它不需要来自它所应用的数据集中的标记数据。我们的实验分析证明了PIRE在阻止CBIR方面的惊人有效性,并且还涵盖了在实际设置中必须考虑的PIRE方面,包括保存图像,图像质量和将对抗性查询泄漏到后台集合中。我们的实验还将PIRE(一种神经方法)与现有的关键点去除和注入方法(修改局部特征)进行了比较。最后,我们讨论了多媒体研究人员在未来对抗性查询研究中面临的挑战。
{"title":"Who's Afraid of Adversarial Queries?: The Impact of Image Modifications on Content-based Image Retrieval","authors":"Zhuoran Liu, Zhengyu Zhao, M. Larson","doi":"10.1145/3323873.3325052","DOIUrl":"https://doi.org/10.1145/3323873.3325052","url":null,"abstract":"An adversarial query is an image that has been modified to disrupt content-based image retrieval (CBIR), while appearing nearly untouched to the human eye. This paper presents an analysis of adversarial queries for CBIR based on neural, local, and global features. We introduce an innovative neural image perturbation approach, called Perturbations for Image Retrieval Error (PIRE), that is capable of blocking neural-feature-based CBIR. PIRE differs significantly from existing approaches that create images adversarial with respect to CNN classifiers because it is unsupervised, i.e., it needs no labeled data from the data set to which it is applied. Our experimental analysis demonstrates the surprising effectiveness of PIRE in blocking CBIR, and also covers aspects of PIRE that must be taken into account in practical settings, including saving images, image quality and leaking adversarial queries into the background collection. Our experiments also compare PIRE (a neural approach) with existing keypoint removal and injection approaches (which modify local features). Finally, we discuss the challenges that face multimedia researchers in the future study of adversarial queries.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126159623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
期刊
Proceedings of the 2019 on International Conference on Multimedia Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1