Efficient Tag Mining via Mixture Modeling for Real-Time Search-Based Image Annotation

2012 IEEE International Conference on Multimedia and Expo Pub Date : 2012-07-09 DOI:10.1109/ICME.2012.104

Lican Dai, Xin-Jing Wang, Lei Zhang, Nenghai Yu

{"title":"Efficient Tag Mining via Mixture Modeling for Real-Time Search-Based Image Annotation","authors":"Lican Dai, Xin-Jing Wang, Lei Zhang, Nenghai Yu","doi":"10.1109/ICME.2012.104","DOIUrl":null,"url":null,"abstract":"Although it has been extensively studied for many years, automatic image annotation is still a challenging problem. Recently, data-driven approaches have demonstrated their great success to image auto-annotation. Such approaches leverage abundant partially annotated web images to annotate an uncaptioned image. Specifically, they first retrieve a group of visually closely similar images given an uncaptioned image as a query, then figure out meaningful phrases from the surrounding texts of the image search results. Since the surrounding texts are generally noisy, how to effectively mine meaningful phrases is crucial for the success of such approaches. We propose a mixture modeling approach which assumes that a tag is generated from a convex combination of topics. Different from a typical topic modeling approach like LDA, topics in our approach are explicitly learnt from a definitive catalog of the Web, i.e. the Open Directory Project (ODP). Compared with previous works, it has two advantages: Firstly, it uses an open vocabulary rather than a limited one defined by a training set. Secondly, it is efficient for real-time annotation. Experimental results conducted on two billion web images show the efficiency and effectiveness of the proposed approach.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"119 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Multimedia and Expo","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2012.104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Although it has been extensively studied for many years, automatic image annotation is still a challenging problem. Recently, data-driven approaches have demonstrated their great success to image auto-annotation. Such approaches leverage abundant partially annotated web images to annotate an uncaptioned image. Specifically, they first retrieve a group of visually closely similar images given an uncaptioned image as a query, then figure out meaningful phrases from the surrounding texts of the image search results. Since the surrounding texts are generally noisy, how to effectively mine meaningful phrases is crucial for the success of such approaches. We propose a mixture modeling approach which assumes that a tag is generated from a convex combination of topics. Different from a typical topic modeling approach like LDA, topics in our approach are explicitly learnt from a definitive catalog of the Web, i.e. the Open Directory Project (ODP). Compared with previous works, it has two advantages: Firstly, it uses an open vocabulary rather than a limited one defined by a training set. Secondly, it is efficient for real-time annotation. Experimental results conducted on two billion web images show the efficiency and effectiveness of the proposed approach.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于混合建模的高效标签挖掘基于实时搜索的图像标注

尽管对图像自动标注进行了多年的广泛研究，但它仍然是一个具有挑战性的问题。最近，数据驱动的方法在图像自动标注方面取得了巨大的成功。这种方法利用大量部分注释的web图像来注释未注释的图像。具体来说，他们首先检索一组视觉上非常相似的图像，给出一个没有字幕的图像作为查询，然后从图像搜索结果的周围文本中找出有意义的短语。由于周围的文本通常是嘈杂的，如何有效地挖掘有意义的短语对这种方法的成功至关重要。我们提出了一种混合建模方法，该方法假设标签是由主题的凸组合生成的。与典型的主题建模方法(如LDA)不同，我们的方法中的主题明确地从Web的确定目录中学习，即开放目录项目(ODP)。与以往的工作相比，它有两个优点:首先，它使用了一个开放的词汇表，而不是由训练集定义的有限词汇表。其次，实时标注效率高。在20亿张网络图像上的实验结果表明了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2012 IEEE International Conference on Multimedia and Expo

自引率

0.00%

发文量

期刊最新文献

View Independent Computer Lip-Reading EEG-based Dominance Level Recognition for Emotion-Enabled Interaction Area and Memory Efficient Architectures for 3D Blu-ray-compliant Multimedia Processors Effective Spatial Data Broadcasting Video Copy Detection Using a Soft Cascade of Multimodal Features