Proceedings of the 2019 on International Conference on Multimedia Retrieval最新文献

英文中文

RobustiQ: A Robust ANN Search Method for Billion-scale Similarity Search on GPUs 鲁棒性神经网络搜索方法在gpu上的十亿级相似度搜索

Proceedings of the 2019 on International Conference on Multimedia Retrieval

Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325018

Wei Chen, Jincai Chen, F. Zou, Yuan-Fang Li, Ping Lu, Wei Zhao

GPU-based methods represent state-of-the-art in approximate nearest neighbor (ANN) search, as they are scalable (billion-scale), accurate (high recall) as well as efficient (sub-millisecond query speed). Faiss, the representative GPU-based ANN system, achieves considerably faster query speed than the representative CPU-based systems. The query accuracy of Faiss critically depends on the number of indexing regions, which in turn is dependent on the amount of available memory. At the same time, query speed deteriorates dramatically with the increase in the number of partition regions. Thus, it can be observed that Faiss suffers from a lack of robustness, that the fine-grained partitioning of datasets is achieved at the expense of search speed, and vice versa. In this paper, we introduce a new GPU-based ANN search method, Robust Quantization (RobustiQ), that addresses the robustness limitations of existing GPU-based methods in a holistic way. We design a novel hierarchical indexing structure using vector and bilayer line quantization. This indexing structure, together with our indexing and encoding methods, allows RobustiQ to avoid the need for maintaining a large lookup table, hence reduces not only memory consumption but also query complexity. Our extensive evaluation on two public billion-scale benchmark datasets, SIFT1B and DEEP1B, shows that RobustiQ consistently obtains 2-3 × speedup over Faiss while achieving better query accuracy for different codebook sizes. Compared to the best CPU-based ANN systems, RobustiQ achieves even more pronounced average speedups of 51.8 × and 11 × respectively.

基于gpu的方法代表了最先进的近似最近邻(ANN)搜索，因为它们具有可扩展性(十亿规模)、准确性(高召回率)和效率(亚毫秒级查询速度)。Faiss是具有代表性的基于gpu的人工神经网络系统，其查询速度明显快于具有代表性的基于cpu的系统。Faiss的查询准确性主要取决于索引区域的数量，而索引区域的数量又取决于可用内存的数量。同时，查询速度随着分区数量的增加而急剧下降。因此，可以观察到Faiss缺乏鲁棒性，数据集的细粒度分区是以牺牲搜索速度为代价实现的，反之亦然。在本文中，我们引入了一种新的基于gpu的人工神经网络搜索方法——鲁棒量化(Robust Quantization, RobustiQ)，它从整体上解决了现有基于gpu的方法的鲁棒性限制。利用向量和双层线量化设计了一种新的分层索引结构。这种索引结构，加上我们的索引和编码方法，使RobustiQ避免了维护大型查找表的需要，因此不仅减少了内存消耗，还降低了查询复杂性。我们对两个公开的十亿规模基准数据集SIFT1B和DEEP1B进行了广泛的评估，结果表明，在不同码本大小的情况下，roubustiq始终比Faiss获得2-3倍的加速，同时获得更好的查询精度。与最好的基于cpu的人工神经网络系统相比，roubustiq的平均速度分别达到了51.8倍和11倍。

{"title":"RobustiQ: A Robust ANN Search Method for Billion-scale Similarity Search on GPUs","authors":"Wei Chen, Jincai Chen, F. Zou, Yuan-Fang Li, Ping Lu, Wei Zhao","doi":"10.1145/3323873.3325018","DOIUrl":"https://doi.org/10.1145/3323873.3325018","url":null,"abstract":"GPU-based methods represent state-of-the-art in approximate nearest neighbor (ANN) search, as they are scalable (billion-scale), accurate (high recall) as well as efficient (sub-millisecond query speed). Faiss, the representative GPU-based ANN system, achieves considerably faster query speed than the representative CPU-based systems. The query accuracy of Faiss critically depends on the number of indexing regions, which in turn is dependent on the amount of available memory. At the same time, query speed deteriorates dramatically with the increase in the number of partition regions. Thus, it can be observed that Faiss suffers from a lack of robustness, that the fine-grained partitioning of datasets is achieved at the expense of search speed, and vice versa. In this paper, we introduce a new GPU-based ANN search method, Robust Quantization (RobustiQ), that addresses the robustness limitations of existing GPU-based methods in a holistic way. We design a novel hierarchical indexing structure using vector and bilayer line quantization. This indexing structure, together with our indexing and encoding methods, allows RobustiQ to avoid the need for maintaining a large lookup table, hence reduces not only memory consumption but also query complexity. Our extensive evaluation on two public billion-scale benchmark datasets, SIFT1B and DEEP1B, shows that RobustiQ consistently obtains 2-3 × speedup over Faiss while achieving better query accuracy for different codebook sizes. Compared to the best CPU-based ANN systems, RobustiQ achieves even more pronounced average speedups of 51.8 × and 11 × respectively.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115277955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

qwLSH

Proceedings of the 2019 on International Conference on Multimedia Retrieval

Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325048

Omid Jafari, John Ossorgin, P. Nagarkar

Similarity search queries in high-dimensional spaces are an important type of queries in many domains such as image processing, machine learning, etc. %Since exact similarity search indexing techniques suffer from the well-knowncurse of dimensionality in high-dimensional spaces, approximate search techniques are often utilized instead. Locality Sensitive Hashing (LSH) has been shown to be an effective approximate search method for solving similarity search queries in high-dimensional spaces. Often, queries in real-world settings arrive as part of a query workload. LSH and its variants are particularly designed to solve single queries effectively. They suffer from one major drawback while executing query workloads: they do not take into consideration important data characteristics for effective cache utilization while designing the index structures. In this paper, we presentqwLSH, an index structure %for efficiently processing similarity search query workloads in high-dimensional spaces. We that intelligently divides a given cache during processing of a query workload by using novel cost models. Experimental results show that, given a query workload,qwLSH is able to perform faster than existing techniques due to its unique cost models and strategies to reduce cache misses. %We further present different caching strategies for efficiently processing similarity search query workloads. We evaluate our proposed unique design and cost models ofqwLSH on real datasets against state-of-the-art LSH-based techniques.

{"title":"qwLSH","authors":"Omid Jafari, John Ossorgin, P. Nagarkar","doi":"10.1145/3323873.3325048","DOIUrl":"https://doi.org/10.1145/3323873.3325048","url":null,"abstract":"Similarity search queries in high-dimensional spaces are an important type of queries in many domains such as image processing, machine learning, etc. %Since exact similarity search indexing techniques suffer from the well-knowncurse of dimensionality in high-dimensional spaces, approximate search techniques are often utilized instead. Locality Sensitive Hashing (LSH) has been shown to be an effective approximate search method for solving similarity search queries in high-dimensional spaces. Often, queries in real-world settings arrive as part of a query workload. LSH and its variants are particularly designed to solve single queries effectively. They suffer from one major drawback while executing query workloads: they do not take into consideration important data characteristics for effective cache utilization while designing the index structures. In this paper, we presentqwLSH, an index structure %for efficiently processing similarity search query workloads in high-dimensional spaces. We that intelligently divides a given cache during processing of a query workload by using novel cost models. Experimental results show that, given a query workload,qwLSH is able to perform faster than existing techniques due to its unique cost models and strategies to reduce cache misses. %We further present different caching strategies for efficiently processing similarity search query workloads. We evaluate our proposed unique design and cost models ofqwLSH on real datasets against state-of-the-art LSH-based techniques.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122800962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Multimodal Multimedia Retrieval with vitrivr 多模态多媒体检索与vitrivr

Proceedings of the 2019 on International Conference on Multimedia Retrieval

Pub Date : 2019-06-05 DOI: 10.1145/3323873.3326921

Ralph Gasser, Luca Rossetto, H. Schuldt

The steady growth of multimedia collections - both in terms of size and heterogeneity - necessitates systems that are able to conjointly deal with several types of media as well as large volumes of data. This is especially true when it comes to satisfying a particular information need, i.e., retrieving a particular object of interest from a large collection. Nevertheless, existing multimedia management and retrieval systems are mostly organized in silos and treat different media types separately. Hence, they are limited when it comes to crossing these silos for accessing objects. In this paper, we present vitrivr, a general-purpose content-based multimedia retrieval stack. In addition to the keyword search provided by most media management systems, vitrivr also exploits the object's content in order to facilitate different types of similarity search. This can be done within and, most importantly, across different media types giving rise to new, interesting use cases. To the best of our knowledge, the full vitrivr stack is unique in that it seamlessly integrates support for four different types of media, namely images, audio, videos, and 3D models.

多媒体集合的稳定增长——在大小和异构性方面——需要能够同时处理多种类型的媒体和大量数据的系统。在满足特定信息需求时尤其如此，例如，从大型集合中检索感兴趣的特定对象。然而，现有的多媒体管理和检索系统大多是孤立地组织起来的，分别对待不同的媒体类型。因此，当涉及到跨越这些筒仓访问对象时，它们是有限的。本文提出了一种通用的基于内容的多媒体检索栈vitrivr。除了大多数媒体管理系统提供的关键词搜索之外，vitrivr还利用对象的内容来实现不同类型的相似度搜索。这可以在不同的媒体类型内完成，最重要的是，可以跨媒体类型完成，从而产生新的、有趣的用例。据我们所知，完整的vitrivr堆栈的独特之处在于它无缝地集成了对四种不同类型媒体的支持，即图像、音频、视频和3D模型。

{"title":"Multimodal Multimedia Retrieval with vitrivr","authors":"Ralph Gasser, Luca Rossetto, H. Schuldt","doi":"10.1145/3323873.3326921","DOIUrl":"https://doi.org/10.1145/3323873.3326921","url":null,"abstract":"The steady growth of multimedia collections - both in terms of size and heterogeneity - necessitates systems that are able to conjointly deal with several types of media as well as large volumes of data. This is especially true when it comes to satisfying a particular information need, i.e., retrieving a particular object of interest from a large collection. Nevertheless, existing multimedia management and retrieval systems are mostly organized in silos and treat different media types separately. Hence, they are limited when it comes to crossing these silos for accessing objects. In this paper, we present vitrivr, a general-purpose content-based multimedia retrieval stack. In addition to the keyword search provided by most media management systems, vitrivr also exploits the object's content in order to facilitate different types of similarity search. This can be done within and, most importantly, across different media types giving rise to new, interesting use cases. To the best of our knowledge, the full vitrivr stack is unique in that it seamlessly integrates support for four different types of media, namely images, audio, videos, and 3D models.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114629583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval 对手引导的非对称哈希跨模态检索

Proceedings of the 2019 on International Conference on Multimedia Retrieval

Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325045

Wen Gu, Xiaoyan Gu, Jingzi Gu, B. Li, Zhi Xiong, Weiping Wang

Cross-modal hashing has attracted considerable attention for large-scale multimodal retrieval task. A majority of hashing methods have been proposed for cross-modal retrieval. However, these methods inadequately focus on feature learning process and cannot fully preserve higher-ranking correlation of various item pairs as well as the multi-label semantics of each item, so that the quality of binary codes may be downgraded. To tackle these problems, in this paper, we propose a novel deep cross-modal hashing method, called Adversary Guided Asymmetric Hashing (AGAH). Specifically, it employs an adversarial learning guided multi-label attention module to enhance the feature learning part which can learn discriminative feature representations and keep the cross-modal invariability. Furthermore, in order to generate hash codes which can fully preserve the multi-label semantics of all items, we propose an asymmetric hashing method which utilizes a multi-label binary code map that can equip the hash codes with multi-label semantic information. In addition, to ensure higher-ranking correlation of all similar item pairs than those of dissimilar ones, we adopt a new triplet-margin constraint and a cosine quantization technique for Hamming space similarity preservation. Extensive empirical studies show that AGAH outperforms several state-of-the-art methods for cross-modal retrieval.

跨模态哈希在大规模的多模态检索任务中引起了广泛的关注。大多数哈希方法已经被提出用于跨模态检索。然而，这些方法对特征学习过程关注不够，不能充分保持各项对的高阶相关性和每项的多标签语义，从而降低了二进制码的质量。为了解决这些问题，在本文中，我们提出了一种新的深度跨模态哈希方法，称为对手引导非对称哈希(AGAH)。具体来说，它采用了一个对抗学习引导的多标签注意模块来增强特征学习部分，使其能够学习到判别特征表示并保持跨模态不变性。此外，为了生成能够充分保留所有项目的多标签语义的哈希码，我们提出了一种非对称哈希方法，该方法利用多标签二进制码映射，使哈希码具有多标签语义信息。此外，为了保证所有相似项对的相关度高于不相似项对，我们采用了新的三重边界约束和余弦量化技术来保持Hamming空间相似度。广泛的实证研究表明，AGAH优于几种最先进的跨模式检索方法。

{"title":"Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval","authors":"Wen Gu, Xiaoyan Gu, Jingzi Gu, B. Li, Zhi Xiong, Weiping Wang","doi":"10.1145/3323873.3325045","DOIUrl":"https://doi.org/10.1145/3323873.3325045","url":null,"abstract":"Cross-modal hashing has attracted considerable attention for large-scale multimodal retrieval task. A majority of hashing methods have been proposed for cross-modal retrieval. However, these methods inadequately focus on feature learning process and cannot fully preserve higher-ranking correlation of various item pairs as well as the multi-label semantics of each item, so that the quality of binary codes may be downgraded. To tackle these problems, in this paper, we propose a novel deep cross-modal hashing method, called Adversary Guided Asymmetric Hashing (AGAH). Specifically, it employs an adversarial learning guided multi-label attention module to enhance the feature learning part which can learn discriminative feature representations and keep the cross-modal invariability. Furthermore, in order to generate hash codes which can fully preserve the multi-label semantics of all items, we propose an asymmetric hashing method which utilizes a multi-label binary code map that can equip the hash codes with multi-label semantic information. In addition, to ensure higher-ranking correlation of all similar item pairs than those of dissimilar ones, we adopt a new triplet-margin constraint and a cosine quantization technique for Hamming space similarity preservation. Extensive empirical studies show that AGAH outperforms several state-of-the-art methods for cross-modal retrieval.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127957490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 77

Weakly Supervised Image Retrieval via Coarse-scale Feature Fusion and Multi-level Attention Blocks 基于粗尺度特征融合和多级注意块的弱监督图像检索

Proceedings of the 2019 on International Conference on Multimedia Retrieval

Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325017

Xinyao Nie, Hong Lu, Zijian Wang, Jingyuan Liu, Zehua Guo

In this paper, we propose an end-to-end Attention-Block network for image retrieval (ABIR), which greatly increases the retrieval accuracy without human annotations like bounding boxes. Specifically, our network utilizes coarse-scale feature fusion, which generates the attentive local features via combining the information from different intermediate layers. Detailed feature information is extracted with the application of two attention blocks. Extensive experiments show that our method outperforms the state-of-the-art by a significant margin on four public datasets for image retrieval tasks.

在本文中，我们提出了一种端到端注意力块网络用于图像检索(ABIR)，它大大提高了检索精度，而不需要像边界框这样的人工注释。具体来说，我们的网络利用了粗尺度特征融合，通过结合来自不同中间层的信息来生成关注的局部特征。应用两个注意块提取详细的特征信息。大量的实验表明，我们的方法在图像检索任务的四个公共数据集上的表现明显优于最先进的技术。

引用次数: 5

A Hierarchical Attentive Deep Neural Network Model for Semantic Music Annotation Integrating Multiple Music Representations 一种融合多种音乐表示的语义音乐标注层次关注深度神经网络模型

Proceedings of the 2019 on International Conference on Multimedia Retrieval

Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325031

Qianqian Wang, Feng Su, Yuyang Wang

Automatically assigning a group of appropriate semantic tags to one music piece provides an effective way for people to efficiently utilize the massive and ever increasing on-line and off-line music data. In this paper, we propose a novel content-based automatic music annotation model that hierarchically combines attentive convolutional networks and recurrent networks for music representation learning, structure modelling and tag prediction. The model first exploits two separate attentive convolutional networks composed of multiple gated linear units (GLUs) to learn effective representations from both 1-D raw waveform signals and 2-D Mel-spectrogram of the music, which better captures informative features of the music for the annotation task than exploiting any single representation channel. The model then exploits bidirectional Long Short-Term Memory (LSTM) networks to depict the time-varying structures embedded in the description sequences of the music, and further introduces a dual-state LSTM network to encode temporal correlations between two representation channels, which effectively enriches the descriptions of the music. Finally, the model adaptively aggregates music descriptions generated at every time step with a self-attentive multi-weighting mechanism for music tag prediction. The proposed model achieves state-of-the-art results on the public MagnaTagATune music dataset, demonstrating its effectiveness on music annotation.

为一个音乐作品自动分配一组合适的语义标签，为人们有效地利用海量且不断增长的在线和离线音乐数据提供了一种有效的方法。在本文中，我们提出了一种新的基于内容的自动音乐注释模型，该模型分层地结合了关注卷积网络和循环网络，用于音乐表示学习、结构建模和标签预测。该模型首先利用由多个门通线性单元(glu)组成的两个独立的关注卷积网络，从音乐的一维原始波形信号和二维梅尔谱图中学习有效的表示，这比利用任何单一的表示通道更好地捕获音乐的信息特征。该模型利用双向长短期记忆(LSTM)网络来描述音乐描述序列中嵌入的时变结构，并进一步引入双状态LSTM网络来编码两个表示通道之间的时间相关性，从而有效地丰富了音乐的描述。最后，该模型利用自关注的多权重机制自适应地聚合每个时间步生成的音乐描述，用于音乐标签预测。该模型在公开的MagnaTagATune音乐数据集上取得了最先进的结果，证明了其在音乐标注上的有效性。

{"title":"A Hierarchical Attentive Deep Neural Network Model for Semantic Music Annotation Integrating Multiple Music Representations","authors":"Qianqian Wang, Feng Su, Yuyang Wang","doi":"10.1145/3323873.3325031","DOIUrl":"https://doi.org/10.1145/3323873.3325031","url":null,"abstract":"Automatically assigning a group of appropriate semantic tags to one music piece provides an effective way for people to efficiently utilize the massive and ever increasing on-line and off-line music data. In this paper, we propose a novel content-based automatic music annotation model that hierarchically combines attentive convolutional networks and recurrent networks for music representation learning, structure modelling and tag prediction. The model first exploits two separate attentive convolutional networks composed of multiple gated linear units (GLUs) to learn effective representations from both 1-D raw waveform signals and 2-D Mel-spectrogram of the music, which better captures informative features of the music for the annotation task than exploiting any single representation channel. The model then exploits bidirectional Long Short-Term Memory (LSTM) networks to depict the time-varying structures embedded in the description sequences of the music, and further introduces a dual-state LSTM network to encode temporal correlations between two representation channels, which effectively enriches the descriptions of the music. Finally, the model adaptively aggregates music descriptions generated at every time step with a self-attentive multi-weighting mechanism for music tag prediction. The proposed model achieves state-of-the-art results on the public MagnaTagATune music dataset, demonstrating its effectiveness on music annotation.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128309058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Proceedings of the 2019 on International Conference on Multimedia Retrieval 2019年多媒体检索国际会议论文集

Proceedings of the 2019 on International Conference on Multimedia Retrieval

Pub Date : 2019-06-05 DOI: 10.1145/3323873

引用次数: 3

Similarity Search in 3D Human Motion Data 三维人体运动数据的相似度搜索

Proceedings of the 2019 on International Conference on Multimedia Retrieval

Pub Date : 2019-06-05 DOI: 10.1145/3323873.3326589

J. Sedmidubský, P. Zezula

Motion capture technologies can digitize human movements into a discrete sequence of 3D skeletons. Such spatio-temporal data have a great application potential in many fields, ranging from computer animation, through security and sports to medicine, but their computerized processing is a difficult problem. The objective of this tutorial is to explain fundamental principles and technologies designed for searching, subsequence matching, classification and action detection in the 3D human motion data. These operations inherently require the concept of similarity to determine the degree of accordance between pairs of 3D skeleton sequences. Such similarity can be modeled using a generic approach of metric space by extracting effective deep features and comparing them by efficient distance functions. The metric-space approach also enables applying traditional index structures to efficiently access large datasets of skeleton sequences. We demonstrate the functionality of selected motion-processing operations by interactive web applications.

动作捕捉技术可以将人体运动数字化，形成一个离散的3D骨骼序列。这些时空数据在计算机动画、安全、体育、医学等许多领域都有很大的应用潜力，但它们的计算机化处理是一个难题。本教程的目的是解释在三维人体运动数据中搜索，子序列匹配，分类和动作检测的基本原理和技术。这些操作本质上需要相似性的概念来确定对三维骨架序列之间的一致程度。这种相似性可以使用度量空间的通用方法来建模，通过提取有效的深度特征并通过有效的距离函数进行比较。度量空间方法还允许应用传统的索引结构来有效地访问骨架序列的大型数据集。我们通过交互式web应用程序演示了选定的运动处理操作的功能。

引用次数: 4

Improving What Cross-Modal Retrieval Models Learn through Object-Oriented Inter- and Intra-Modal Attention Networks 改进跨模态检索模型通过面向对象的模态间和模态内注意网络学习

Proceedings of the 2019 on International Conference on Multimedia Retrieval

Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325043

Po-Yao (Bernie) Huang, Vaibhav, Xiaojun Chang, Alexander Hauptmann

Although significant progress has been made for cross-modal retrieval models in recent years, few have explored what those models truly learn and what makes one model superior to another. Start by training two state-of-the-art text-to-image retrieval models with adversarial text inputs, we investigate and quantify the importance of syntactic structure and lexical information in learning the joint visual-semantic embedding space for cross-modal retrieval. The results show that the retrieval power mainly comes from localizing and connecting the visual objects and their cross-modal counter-parts, the textual phrases. Inspired by this observation, we propose a novel model which employs object-oriented encoders along with inter- and intra-modal attention networks to improve inter-modal dependencies for cross-modal retrieval. In addition, we develop a new multimodal structure-preserving objective which additionally emphasizes intra-modal hard negative examples to promote intra-modal discrepancies. Extensive experiments show that the proposed approach outperforms the existing best method by a large margin (16.4% and 6.7% relatively with Recall@1 in the text-to-image retrieval task on the Flickr30K dataset and the MS-COCO dataset respectively).

尽管近年来跨模态检索模型取得了重大进展，但很少有人探索这些模型真正学习了什么，以及是什么使一个模型优于另一个模型。首先，我们训练了两个具有对抗文本输入的最先进的文本到图像检索模型，研究并量化了句法结构和词汇信息在学习跨模态检索的联合视觉语义嵌入空间中的重要性。结果表明，检索能力主要来自于对视觉对象及其跨模态对应部分——文本短语的定位和连接。受这一观察结果的启发，我们提出了一种新的模型，该模型采用面向对象的编码器以及模态间和模态内的注意网络来改善跨模态检索的模态间依赖性。此外，我们开发了一个新的多模态结构保留目标，该目标额外强调了模态内的硬否定例子，以促进模态内的差异。大量实验表明，本文提出的方法在Flickr30K数据集和MS-COCO数据集的文本到图像检索任务中，比现有的最佳方法分别高出16.4%和6.7%。

{"title":"Improving What Cross-Modal Retrieval Models Learn through Object-Oriented Inter- and Intra-Modal Attention Networks","authors":"Po-Yao (Bernie) Huang, Vaibhav, Xiaojun Chang, Alexander Hauptmann","doi":"10.1145/3323873.3325043","DOIUrl":"https://doi.org/10.1145/3323873.3325043","url":null,"abstract":"Although significant progress has been made for cross-modal retrieval models in recent years, few have explored what those models truly learn and what makes one model superior to another. Start by training two state-of-the-art text-to-image retrieval models with adversarial text inputs, we investigate and quantify the importance of syntactic structure and lexical information in learning the joint visual-semantic embedding space for cross-modal retrieval. The results show that the retrieval power mainly comes from localizing and connecting the visual objects and their cross-modal counter-parts, the textual phrases. Inspired by this observation, we propose a novel model which employs object-oriented encoders along with inter- and intra-modal attention networks to improve inter-modal dependencies for cross-modal retrieval. In addition, we develop a new multimodal structure-preserving objective which additionally emphasizes intra-modal hard negative examples to promote intra-modal discrepancies. Extensive experiments show that the proposed approach outperforms the existing best method by a large margin (16.4% and 6.7% relatively with Recall@1 in the text-to-image retrieval task on the Flickr30K dataset and the MS-COCO dataset respectively).","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115323645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

EAGER 急切的

Proceedings of the 2019 on International Conference on Multimedia Retrieval

Pub Date : 2019-06-05 DOI: 10.1145/3323873.3326925

J. He, Xiaobing Liu, Shiliang Zhang

Image understanding is a fundamental task for many multimedia and computer vision applications, such as self-driving, multimedia retrieval, and augmented reality, etc. In this paper, we demonstrate that edge detection could aid image understanding tasks such as semantic segmentation, optical flow estimation, and object proposal generation. Based on our recent research efforts on edge detection, we develop a robust and efficient Edge-Aided imaGe undERstanding system named as EAGER. EAGER is built on a compact and efficient edge detection module, which is constructed with a bi-directional cascade network, multi-scale feature enhancement, and layer-specific training supervision, respectively. Based on detected edges, EAGER achieves accurate semantic segment, optical flow estimation, as well as object bounding-box proposal generation for user-uploaded images and videos.

引用次数: 1

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2019 on International Conference on Multimedia Retrieval

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀