首页 > 最新文献

Proceedings of the 2019 on International Conference on Multimedia Retrieval最新文献

英文 中文
Hierarchical Attention based Neural Network for Explainable Recommendation 基于层次注意的可解释推荐神经网络
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3326592
Dawei Cong, Yanyan Zhao, Bing Qin, Yu Han, Murray Zhang, Alden Liu, Nat Chen
In recent years, recommendation systems have attracted more and more attention due to the rapid development of e-commerce. Reviews information can offer help in modeling user's preference and item's performance. Some existing methods utilize reviews for the recommendation. However, few of those models consider the importance of reviews and words in corpus together. Therefore, we propose an approach for rating prediction using a hierarchical attention-based network named HANN, which can distinguish the importance of reviews at both word level and review level for explanations automatically. Experiments on four real-life datasets from Amazon demonstrate that our model achieves an improvement in prediction compared to several state-of-the-art approaches. The hierarchical attention weights in sampled test data verify the effect on selecting informative words and reviews.
近年来,随着电子商务的飞速发展,推荐系统越来越受到人们的关注。评论信息可以为用户的偏好和商品的性能建模提供帮助。一些现有的方法利用评论进行推荐。然而,这些模型很少将评论和语料库中的单词的重要性放在一起考虑。因此,我们提出了一种基于层次注意网络的评分预测方法,该方法可以自动区分单词级别和评论级别的评论重要性。在亚马逊的四个真实数据集上的实验表明,与几种最先进的方法相比,我们的模型在预测方面取得了进步。抽样测试数据中的分层注意权值验证了在选择信息词和评论方面的效果。
{"title":"Hierarchical Attention based Neural Network for Explainable Recommendation","authors":"Dawei Cong, Yanyan Zhao, Bing Qin, Yu Han, Murray Zhang, Alden Liu, Nat Chen","doi":"10.1145/3323873.3326592","DOIUrl":"https://doi.org/10.1145/3323873.3326592","url":null,"abstract":"In recent years, recommendation systems have attracted more and more attention due to the rapid development of e-commerce. Reviews information can offer help in modeling user's preference and item's performance. Some existing methods utilize reviews for the recommendation. However, few of those models consider the importance of reviews and words in corpus together. Therefore, we propose an approach for rating prediction using a hierarchical attention-based network named HANN, which can distinguish the importance of reviews at both word level and review level for explanations automatically. Experiments on four real-life datasets from Amazon demonstrate that our model achieves an improvement in prediction compared to several state-of-the-art approaches. The hierarchical attention weights in sampled test data verify the effect on selecting informative words and reviews.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130608129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
High-Capacity Convolutional Video Steganography with Temporal Residual Modeling 基于时间残差建模的大容量卷积视频隐写
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325011
Xinyu Weng, Yongzhi Li, Lu Chi, Yadong Mu
Steganography represents the art of unobtrusively concealing a secret message within some cover data. The key scope of this work is about high-capacity visual steganography techniques that hide a full-sized color video within another. We empirically validate that high-capacity image steganography model doesn't naturally extend to the video case for it completely ignores the temporal redundancy within consecutive video frames. Our work proposes a novel solution to this problem(i.e., hiding a video into another video). The technical contributions are two-fold: first, motivated by the fact that the residual between two consecutive frames is highly-sparse, we propose to explicitly consider inter-frame residuals. Specifically, our model contains two branches, one of which is specially designed for hiding inter-frame residual into a cover video frame and the other hides the original secret frame. And then two decoders are devised, revealing residual or frame respectively. Secondly, we develop the model based on deep convolutional neural networks, which is the first of its kind in the literature of video steganography. In experiments, comprehensive evaluations are conducted to compare our model with classic steganography methods and pure high-capacity image steganography models. All results strongly suggest that the proposed model enjoys advantages over previous methods. We also carefully investigate our model's security to steganalyzer and the robustness to video compression.
隐写术是一种不引人注目地将秘密信息隐藏在一些封面数据中的技术。这项工作的关键范围是关于高容量视觉隐写技术,将一个全尺寸彩色视频隐藏在另一个视频中。我们通过经验验证了大容量图像隐写模型不会自然地扩展到视频情况,因为它完全忽略了连续视频帧内的时间冗余。我们的工作为这个问题提出了一个新颖的解决方案。(将一个视频隐藏到另一个视频中)。技术贡献有两个方面:首先,由于两个连续帧之间的残差是高度稀疏的,我们建议明确考虑帧间残差。具体来说,我们的模型包含两个分支,其中一个分支专门用于将帧间残差隐藏到覆盖视频帧中,另一个分支用于隐藏原始秘密帧。然后设计了两个解码器,分别显示残差或帧。其次,我们开发了基于深度卷积神经网络的模型,这在视频隐写的文献中是第一个。在实验中,将我们的模型与经典隐写方法和纯大容量图像隐写模型进行了综合评价。所有结果都有力地表明,所提出的模型比以前的方法具有优势。我们还仔细研究了该模型对隐写分析器的安全性和对视频压缩的鲁棒性。
{"title":"High-Capacity Convolutional Video Steganography with Temporal Residual Modeling","authors":"Xinyu Weng, Yongzhi Li, Lu Chi, Yadong Mu","doi":"10.1145/3323873.3325011","DOIUrl":"https://doi.org/10.1145/3323873.3325011","url":null,"abstract":"Steganography represents the art of unobtrusively concealing a secret message within some cover data. The key scope of this work is about high-capacity visual steganography techniques that hide a full-sized color video within another. We empirically validate that high-capacity image steganography model doesn't naturally extend to the video case for it completely ignores the temporal redundancy within consecutive video frames. Our work proposes a novel solution to this problem(i.e., hiding a video into another video). The technical contributions are two-fold: first, motivated by the fact that the residual between two consecutive frames is highly-sparse, we propose to explicitly consider inter-frame residuals. Specifically, our model contains two branches, one of which is specially designed for hiding inter-frame residual into a cover video frame and the other hides the original secret frame. And then two decoders are devised, revealing residual or frame respectively. Secondly, we develop the model based on deep convolutional neural networks, which is the first of its kind in the literature of video steganography. In experiments, comprehensive evaluations are conducted to compare our model with classic steganography methods and pure high-capacity image steganography models. All results strongly suggest that the proposed model enjoys advantages over previous methods. We also carefully investigate our model's security to steganalyzer and the robustness to video compression.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128648663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Interactive Video Retrieval in the Age of Deep Learning
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3326588
Jakub Lokoč, Klaus Schöffmann, W. Bailer, Luca Rossetto, C. Gurrin
We present a tutorial focusing on video retrieval tasks, where state-of-the-art deep learning approaches still benefit from interactive decisions of users. The tutorial covers general introduction to the interactive video retrieval research area, state-of-the-art video retrieval systems, evaluation campaigns and recently observed results. Moreover, a significant part of the tutorial is dedicated to a practical exercise with three selected state-of-the-art systems in the form of an interactive video retrieval competition. Participants of this tutorial will gain a practical experience and also a general insight of the interactive video retrieval topic, which is a good start to focus their research on unsolved challenges in this area.
我们提出了一个教程,重点是视频检索任务,其中最先进的深度学习方法仍然受益于用户的交互式决策。本教程涵盖了交互式视频检索研究领域的一般介绍,最先进的视频检索系统,评估活动和最近观察到的结果。此外,该教程的重要部分致力于以交互式视频检索竞赛的形式,用三个选定的最先进的系统进行实践练习。本教程的参与者将获得实践经验,并对交互式视频检索主题有一个大致的了解,这是一个很好的开始,可以集中精力研究该领域尚未解决的挑战。
{"title":"Interactive Video Retrieval in the Age of Deep Learning","authors":"Jakub Lokoč, Klaus Schöffmann, W. Bailer, Luca Rossetto, C. Gurrin","doi":"10.1145/3323873.3326588","DOIUrl":"https://doi.org/10.1145/3323873.3326588","url":null,"abstract":"We present a tutorial focusing on video retrieval tasks, where state-of-the-art deep learning approaches still benefit from interactive decisions of users. The tutorial covers general introduction to the interactive video retrieval research area, state-of-the-art video retrieval systems, evaluation campaigns and recently observed results. Moreover, a significant part of the tutorial is dedicated to a practical exercise with three selected state-of-the-art systems in the form of an interactive video retrieval competition. Participants of this tutorial will gain a practical experience and also a general insight of the interactive video retrieval topic, which is a good start to focus their research on unsolved challenges in this area.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133671456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
DietLens-Eout: Large Scale Restaurant Food Photo Recognition DietLens-Eout:大型餐厅食物照片识别
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3326923
Zhipeng Wei, Jingjing Chen, Zhaoyan Ming, C. Ngo, Tat-Seng Chua, F. Zhou
Restaurant dishes represent a significant portion of food that people consume in their daily life. While people are becoming health-conscious in their food intake, convenient restaurant food tracking becomes an essential task in wellness and fitness applications. Given the huge number of dishes (food categories) involved, it becomes extremely challenging for traditional food photo classification to be feasible in both algorithm design and training data availability. In this work, we present a demo that runs on restaurant dish images in a city of millions of residents and tens of thousand restaurants. We propose a rank-loss based convolutional neural network to optimize the image features representation. Context information such as GPS location of the recognition request is also used to further improve the performance. Our experimental results are highly promising. We have shown in our demo that the proposed algorithm is near ready to be deployed in real-world applications.
餐馆的菜肴代表了人们日常生活中消费的食物的重要组成部分。随着人们对食物摄入的健康意识越来越强,方便的餐厅食物跟踪成为健康和健身应用的一项重要任务。由于涉及的菜肴(食物类别)数量庞大,传统的食物照片分类在算法设计和训练数据可用性方面都变得非常具有挑战性。在这项工作中,我们展示了一个在一个拥有数百万居民和数万家餐馆的城市中运行的餐馆菜肴图像的演示。我们提出了一种基于秩损失的卷积神经网络来优化图像特征表示。识别请求的GPS位置等上下文信息也被用于进一步提高性能。我们的实验结果很有希望。在我们的演示中,我们已经展示了所建议的算法几乎可以部署到实际应用程序中。
{"title":"DietLens-Eout: Large Scale Restaurant Food Photo Recognition","authors":"Zhipeng Wei, Jingjing Chen, Zhaoyan Ming, C. Ngo, Tat-Seng Chua, F. Zhou","doi":"10.1145/3323873.3326923","DOIUrl":"https://doi.org/10.1145/3323873.3326923","url":null,"abstract":"Restaurant dishes represent a significant portion of food that people consume in their daily life. While people are becoming health-conscious in their food intake, convenient restaurant food tracking becomes an essential task in wellness and fitness applications. Given the huge number of dishes (food categories) involved, it becomes extremely challenging for traditional food photo classification to be feasible in both algorithm design and training data availability. In this work, we present a demo that runs on restaurant dish images in a city of millions of residents and tens of thousand restaurants. We propose a rank-loss based convolutional neural network to optimize the image features representation. Context information such as GPS location of the recognition request is also used to further improve the performance. Our experimental results are highly promising. We have shown in our demo that the proposed algorithm is near ready to be deployed in real-world applications.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122209711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Geographical-Temporal Awareness Hierarchical Attention Network for Next Point-of-Interest Recommendation 下一个兴趣点推荐的地理-时间感知分层注意网络
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325024
Tongcun Liu, J. Liao, Zhigen Wu, Yulong Wang, Jingyu Wang
Obtaining insight into user mobility for next point-of-interest (POI) recommendations is a vital yet challenging task in location-based social networking. Information is needed not only to estimate user preferences but to leverage sequence relationships from user check-ins. Existing approaches to understanding user mobility gloss over the check-in sequence, making it difficult to capture the subtle POI-POI connections and distinguish relevant check-ins from the irrelevant. We created a geographically-temporally awareness hierarchical attention network (GT-HAN) to resolve those issues. GT-HAN contains an extended attention network that uses a theory of geographical influence to simultaneously uncover the overall sequence dependence and the subtle POI-POI relationships. We show that the mining of subtle POI-POI relationships significantly improves the quality of next POI recommendations. A context-specific co-attention network was designed to learn changing user preferences by adaptively selecting relevant check-in activities from check-in histories, which enabled GT-HAN to distinguish degrees of user preference for different check-ins. Tests using two large-scale datasets (obtained from Foursquare and Gowalla) demonstrated the superiority of GT-HAN over existing approaches and achieved excellent results.
在基于位置的社交网络中,深入了解用户的移动性以提供下一个兴趣点(POI)建议是一项至关重要但具有挑战性的任务。不仅需要信息来估计用户偏好,还需要信息来利用用户签入的序列关系。现有的理解用户移动性的方法掩盖了签入顺序,使得很难捕捉微妙的POI-POI连接,也很难区分相关的签入和不相关的签入。我们创建了一个地理-时间意识分层注意网络(GT-HAN)来解决这些问题。GT-HAN包含一个扩展的注意力网络,该网络使用地理影响理论同时揭示整体序列依赖性和微妙的POI-POI关系。我们表明,挖掘微妙的POI-POI关系显著提高了下一个POI推荐的质量。设计了一个情境特定的共同注意网络,通过自适应地从签入历史中选择相关的签入活动来学习不断变化的用户偏好,从而使GT-HAN能够区分不同签入的用户偏好程度。使用两个大规模数据集(从Foursquare和Gowalla获得)进行的测试表明,GT-HAN优于现有方法,并取得了出色的结果。
{"title":"A Geographical-Temporal Awareness Hierarchical Attention Network for Next Point-of-Interest Recommendation","authors":"Tongcun Liu, J. Liao, Zhigen Wu, Yulong Wang, Jingyu Wang","doi":"10.1145/3323873.3325024","DOIUrl":"https://doi.org/10.1145/3323873.3325024","url":null,"abstract":"Obtaining insight into user mobility for next point-of-interest (POI) recommendations is a vital yet challenging task in location-based social networking. Information is needed not only to estimate user preferences but to leverage sequence relationships from user check-ins. Existing approaches to understanding user mobility gloss over the check-in sequence, making it difficult to capture the subtle POI-POI connections and distinguish relevant check-ins from the irrelevant. We created a geographically-temporally awareness hierarchical attention network (GT-HAN) to resolve those issues. GT-HAN contains an extended attention network that uses a theory of geographical influence to simultaneously uncover the overall sequence dependence and the subtle POI-POI relationships. We show that the mining of subtle POI-POI relationships significantly improves the quality of next POI recommendations. A context-specific co-attention network was designed to learn changing user preferences by adaptively selecting relevant check-in activities from check-in histories, which enabled GT-HAN to distinguish degrees of user preference for different check-ins. Tests using two large-scale datasets (obtained from Foursquare and Gowalla) demonstrated the superiority of GT-HAN over existing approaches and achieved excellent results.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"210 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124717613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Cross-modal Collaborative Manifold Propagation for Image Recommendation 图像推荐的跨模态协同流形传播
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325054
Meng Jian, Ting Jia, Xun Yang, Lifang Wu, Lina Huo
With the rapid evolution of social networks, the increasing user intention gap and visual semantic gap both bring great challenge for users to access satisfied contents. It becomes promising to investigate users' customized multimedia recommendation. In this paper, we propose cross-modal collaborative manifold propagation (CMP) for image recommendation. CMP leverages users' interest distribution to propagate images' user records, which lets users know the trend from others and produces interest-aware image candidates upon users' interests. Visual distribution is investigated simultaneously to propagate users' visual records along dense semantic visual manifold. Visual manifold propagation helps to estimate semantic accurate user-image correlations for the candidate images in recommendation ranking. Experimental performance demonstrate the collaborative user-image inferring ability of CMP with effective user interest manifold propagation and semantic visual manifold propagation in personalized image recommendation.
随着社交网络的快速发展,越来越大的用户意图差距和视觉语义差距都给用户访问满意的内容带来了巨大的挑战。用户自定义多媒体推荐的研究已成为研究的热点。本文提出了一种用于图像推荐的跨模态协同流形传播(CMP)方法。CMP利用用户的兴趣分布来传播图像的用户记录,让用户从其他人那里知道趋势,并根据用户的兴趣产生感兴趣的候选图像。同时研究视觉分布,使用户的视觉记录沿着密集的语义视觉流形传播。视觉流形传播有助于在推荐排序中准确估计候选图像的语义相关性。实验证明了CMP在个性化图像推荐中具有有效的用户兴趣流形传播和语义视觉流形传播的协同用户图像推断能力。
{"title":"Cross-modal Collaborative Manifold Propagation for Image Recommendation","authors":"Meng Jian, Ting Jia, Xun Yang, Lifang Wu, Lina Huo","doi":"10.1145/3323873.3325054","DOIUrl":"https://doi.org/10.1145/3323873.3325054","url":null,"abstract":"With the rapid evolution of social networks, the increasing user intention gap and visual semantic gap both bring great challenge for users to access satisfied contents. It becomes promising to investigate users' customized multimedia recommendation. In this paper, we propose cross-modal collaborative manifold propagation (CMP) for image recommendation. CMP leverages users' interest distribution to propagate images' user records, which lets users know the trend from others and produces interest-aware image candidates upon users' interests. Visual distribution is investigated simultaneously to propagate users' visual records along dense semantic visual manifold. Visual manifold propagation helps to estimate semantic accurate user-image correlations for the candidate images in recommendation ranking. Experimental performance demonstrate the collaborative user-image inferring ability of CMP with effective user interest manifold propagation and semantic visual manifold propagation in personalized image recommendation.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127364599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Understanding, Categorizing and Predicting Semantic Image-Text Relations 理解、分类和预测语义图像-文本关系
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325049
Christian Otto, Matthias Springstein, Avishek Anand, R. Ewerth
Two modalities are often used to convey information in a complementary and beneficial manner, e.g., in online news, videos, educational resources, or scientific publications. The automatic understanding of semantic correlations between text and associated images as well as their interplay has a great potential for enhanced multimodal web search and recommender systems. However, automatic understanding of multimodal information is still an unsolved research problem. Recent approaches such as image captioning focus on precisely describing visual content and translating it to text, but typically address neither semantic interpretations nor the specific role or purpose of an image-text constellation. In this paper, we go beyond previous work and investigate, inspired by research in visual communication, useful semantic image-text relations for multimodal information retrieval. We derive a categorization of eight semantic image-text classes (e.g., "illustration" or "anchorage") and show how they can systematically be characterized by a set of three metrics: cross-modal mutual information, semantic correlation, and the status relation of image and text. Furthermore, we present a deep learning system to predict these classes by utilizing multimodal embeddings. To obtain a sufficiently large amount of training data, we have automatically collected and augmented data from a variety of datasets and web resources, which enables future research on this topic. Experimental results on a demanding test set demonstrate the feasibility of the approach.
通常使用两种方式以互补和有益的方式传达信息,例如在线新闻,视频,教育资源或科学出版物。自动理解文本和相关图像之间的语义相关性以及它们之间的相互作用对于增强多模式网络搜索和推荐系统具有巨大的潜力。然而,多模态信息的自动理解仍然是一个未解决的研究问题。最近的方法,如图像字幕,专注于精确地描述视觉内容并将其翻译成文本,但通常既不解决语义解释,也不解决图像-文本星座的特定角色或目的。在本文中,我们超越了以往的工作,并在视觉传达研究的启发下,研究了多模态信息检索中有用的语义图像-文本关系。我们导出了八个语义图像-文本类的分类(例如,“插图”或“锚定”),并展示了如何通过一组三个指标系统地表征它们:跨模态互信息、语义相关性和图像和文本的状态关系。此外,我们提出了一个深度学习系统,利用多模态嵌入来预测这些类。为了获得足够大的训练数据,我们已经自动收集和增强了来自各种数据集和网络资源的数据,这使得未来对该主题的研究成为可能。实验结果表明,该方法是可行的。
{"title":"Understanding, Categorizing and Predicting Semantic Image-Text Relations","authors":"Christian Otto, Matthias Springstein, Avishek Anand, R. Ewerth","doi":"10.1145/3323873.3325049","DOIUrl":"https://doi.org/10.1145/3323873.3325049","url":null,"abstract":"Two modalities are often used to convey information in a complementary and beneficial manner, e.g., in online news, videos, educational resources, or scientific publications. The automatic understanding of semantic correlations between text and associated images as well as their interplay has a great potential for enhanced multimodal web search and recommender systems. However, automatic understanding of multimodal information is still an unsolved research problem. Recent approaches such as image captioning focus on precisely describing visual content and translating it to text, but typically address neither semantic interpretations nor the specific role or purpose of an image-text constellation. In this paper, we go beyond previous work and investigate, inspired by research in visual communication, useful semantic image-text relations for multimodal information retrieval. We derive a categorization of eight semantic image-text classes (e.g., \"illustration\" or \"anchorage\") and show how they can systematically be characterized by a set of three metrics: cross-modal mutual information, semantic correlation, and the status relation of image and text. Furthermore, we present a deep learning system to predict these classes by utilizing multimodal embeddings. To obtain a sufficiently large amount of training data, we have automatically collected and augmented data from a variety of datasets and web resources, which enables future research on this topic. Experimental results on a demanding test set demonstrate the feasibility of the approach.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131411594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Triplet Fusion Network Hashing for Unpaired Cross-Modal Retrieval 非配对跨模态检索的三重融合网络哈希
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325041
Zhikai Hu, Xin Liu, Xingzhi Wang, Yiu-ming Cheung, N. Wang, Yewang Chen
With the dramatic increase of multi-media data on the Internet, cross-modal retrieval has become an important and valuable task in searching systems. The key challenge of this task is how to build the correlation between multi-modal data. Most existing approaches only focus on dealing with paired data. They use pairwise relationship of multi-modal data for exploring the correlation between them. However, in practice, unpaired data are more common on the Internet but few methods pay attention to them. To utilize both paired and unpaired data, we propose a one-stream framework triplet fusion network hashing (TFNH), which mainly consists of two parts. The first part is a triplet network which is used to handle both kinds of data, with the help of zero padding operation. The second part consists of two data classifiers, which are used to bridge the gap between paired and unpaired data. In addition, we embed manifold learning into the framework for preserving both inter and intra modal similarity, exploring the relationship between unpaired and paired data and bridging the gap between them in learning process. Extensive experiments show that the proposed approach outperforms several state-of-the-art methods on two datasets in paired scenario. We further evaluate its ability of handling unpaired scenario and robustness in regard to pairwise constraint. The results show that even we discard 50% data under the setting in [19], the performance of TFNH is still better than that of other unpaired approaches and that only 70% pairwise relationships are preserved, TFNH can still outperform almost all paired approaches.
随着互联网上多媒体数据的急剧增加,跨模式检索已成为搜索系统中一项重要而有价值的任务。该任务的关键挑战是如何在多模态数据之间建立相关性。大多数现有的方法只关注于处理成对数据。他们使用多模态数据的两两关系来探索它们之间的相关性。然而,在实际应用中,互联网上的非配对数据更为普遍,但很少有方法对其进行关注。为了同时利用成对数据和非成对数据,我们提出了一种单流框架三态融合网络哈希算法(TFNH),该算法主要由两部分组成。第一部分是一个三元组网络,通过零填充操作来处理这两种数据。第二部分由两个数据分类器组成,它们用于弥合成对和非成对数据之间的差距。此外,我们将流形学习嵌入到框架中,以保持模态间和模态内的相似性,探索未配对和配对数据之间的关系,并在学习过程中弥合它们之间的差距。大量的实验表明,该方法在配对场景下的两个数据集上优于几种最先进的方法。我们进一步评估了其处理非配对场景的能力和关于成对约束的鲁棒性。结果表明,即使我们在[19]的设置下丢弃50%的数据,TFNH的性能仍然优于其他未配对的方法,并且仅保留70%的成对关系,TFNH仍然可以优于几乎所有的配对方法。
{"title":"Triplet Fusion Network Hashing for Unpaired Cross-Modal Retrieval","authors":"Zhikai Hu, Xin Liu, Xingzhi Wang, Yiu-ming Cheung, N. Wang, Yewang Chen","doi":"10.1145/3323873.3325041","DOIUrl":"https://doi.org/10.1145/3323873.3325041","url":null,"abstract":"With the dramatic increase of multi-media data on the Internet, cross-modal retrieval has become an important and valuable task in searching systems. The key challenge of this task is how to build the correlation between multi-modal data. Most existing approaches only focus on dealing with paired data. They use pairwise relationship of multi-modal data for exploring the correlation between them. However, in practice, unpaired data are more common on the Internet but few methods pay attention to them. To utilize both paired and unpaired data, we propose a one-stream framework triplet fusion network hashing (TFNH), which mainly consists of two parts. The first part is a triplet network which is used to handle both kinds of data, with the help of zero padding operation. The second part consists of two data classifiers, which are used to bridge the gap between paired and unpaired data. In addition, we embed manifold learning into the framework for preserving both inter and intra modal similarity, exploring the relationship between unpaired and paired data and bridging the gap between them in learning process. Extensive experiments show that the proposed approach outperforms several state-of-the-art methods on two datasets in paired scenario. We further evaluate its ability of handling unpaired scenario and robustness in regard to pairwise constraint. The results show that even we discard 50% data under the setting in [19], the performance of TFNH is still better than that of other unpaired approaches and that only 70% pairwise relationships are preserved, TFNH can still outperform almost all paired approaches.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121459391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
V3C1 Dataset: An Evaluation of Content Characteristics V3C1数据集:内容特征的评估
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325051
Fabian Berns, Luca Rossetto, Klaus Schöffmann, C. Beecks, G. Awad
In this work we analyze content statistics of the V3C1 dataset, which is the first partition of theVimeo Creative Commons Collection (V3C). The dataset has been designed to represent true web videos in the wild, with good visual quality and diverse content characteristics, and will serve as evaluation basis for the Video Browser Showdown 2019-2021 and TREC Video Retrieval (TRECVID) Ad-Hoc Video Search tasks 2019-2021. The dataset comes with a shot segmentation (around 1 million shots) for which we analyze content specifics and statistics. Our research shows that the content of V3C1 is very diverse, has no predominant characteristics and provides a low self-similarity. Thus it is very well suited for video retrieval evaluations as well as for participants of TRECVID AVS or the VBS.
在这项工作中,我们分析了V3C1数据集的内容统计数据,这是vimeo创作共用集合(V3C)的第一个分区。该数据集旨在代表真实的野外网络视频,具有良好的视觉质量和多样化的内容特征,并将作为视频浏览器摊牌2019-2021和TREC视频检索(TRECVID) Ad-Hoc视频搜索任务2019-2021的评估依据。数据集带有镜头分割(大约100万个镜头),我们分析内容细节和统计数据。我们的研究表明,V3C1的含量非常多样化,没有显性特征,自相似性很低。因此,它非常适合视频检索评估以及TRECVID AVS或VBS的参与者。
{"title":"V3C1 Dataset: An Evaluation of Content Characteristics","authors":"Fabian Berns, Luca Rossetto, Klaus Schöffmann, C. Beecks, G. Awad","doi":"10.1145/3323873.3325051","DOIUrl":"https://doi.org/10.1145/3323873.3325051","url":null,"abstract":"In this work we analyze content statistics of the V3C1 dataset, which is the first partition of theVimeo Creative Commons Collection (V3C). The dataset has been designed to represent true web videos in the wild, with good visual quality and diverse content characteristics, and will serve as evaluation basis for the Video Browser Showdown 2019-2021 and TREC Video Retrieval (TRECVID) Ad-Hoc Video Search tasks 2019-2021. The dataset comes with a shot segmentation (around 1 million shots) for which we analyze content specifics and statistics. Our research shows that the content of V3C1 is very diverse, has no predominant characteristics and provides a low self-similarity. Thus it is very well suited for video retrieval evaluations as well as for participants of TRECVID AVS or the VBS.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121610501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Learning Discriminative Features for Image Retrieval 学习判别特征用于图像检索
Pub Date : 2019-06-05 DOI: 10.1145/3323873.3325032
Yinghao Wang, Chen Chen, Jiong Wang, Yingying Zhu
Discriminative local features obtained from activations of convolutional neural networks have proven to be essential for image retrieval. To improve retrieval performance, many recent works aim to obtain more powerful and discriminative features. In this work, we propose a new attention layer to assess the importance of local features and assign higher weights to those more discriminative. Furthermore, we present a scale and mask module to filter out the meaningless local features and scale the major components. This module not only reduces the impact of the various scales of the major components in images by scaling them on the feature maps, but also filters out the redundant and confusing features with the MAX-Mask. Finally, the features are aggregated into the image representation. Experimental evaluations demonstrate that the proposed method outperforms the state-of-the-art methods on standard image retrieval datasets.
从卷积神经网络的激活中获得的判别局部特征已被证明是图像检索的必要条件。为了提高检索性能,最近的许多研究都致力于获得更强大、更有区别的特征。在这项工作中,我们提出了一个新的关注层来评估局部特征的重要性,并为那些更具区别性的特征赋予更高的权重。此外,我们提出了一个缩放和掩码模块,以过滤掉无意义的局部特征并缩放主要组件。该模块不仅通过在特征映射上缩放图像中主要组件的各种比例来减少其影响,而且还通过MAX-Mask过滤掉冗余和混淆的特征。最后,将特征聚合成图像表示。实验评估表明,该方法在标准图像检索数据集上优于最先进的方法。
{"title":"Learning Discriminative Features for Image Retrieval","authors":"Yinghao Wang, Chen Chen, Jiong Wang, Yingying Zhu","doi":"10.1145/3323873.3325032","DOIUrl":"https://doi.org/10.1145/3323873.3325032","url":null,"abstract":"Discriminative local features obtained from activations of convolutional neural networks have proven to be essential for image retrieval. To improve retrieval performance, many recent works aim to obtain more powerful and discriminative features. In this work, we propose a new attention layer to assess the importance of local features and assign higher weights to those more discriminative. Furthermore, we present a scale and mask module to filter out the meaningless local features and scale the major components. This module not only reduces the impact of the various scales of the major components in images by scaling them on the feature maps, but also filters out the redundant and confusing features with the MAX-Mask. Finally, the features are aggregated into the image representation. Experimental evaluations demonstrate that the proposed method outperforms the state-of-the-art methods on standard image retrieval datasets.","PeriodicalId":149041,"journal":{"name":"Proceedings of the 2019 on International Conference on Multimedia Retrieval","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126136569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
Proceedings of the 2019 on International Conference on Multimedia Retrieval
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1