Online Learning a Binary Classifier for Improving Google Image Search Results

Q2 Computer Science 自动化学报 Pub Date : 2014-08-01 DOI:10.1016/S1874-1029(14)60018-5

Yu-Chai WAN , Xia-Bi LIU , Fei-Fei HAN , Kun-Qi TONG , Yu LIU

{"title":"Online Learning a Binary Classifier for Improving Google Image Search Results","authors":"Yu-Chai WAN , Xia-Bi LIU , Fei-Fei HAN , Kun-Qi TONG , Yu LIU","doi":"10.1016/S1874-1029(14)60018-5","DOIUrl":null,"url":null,"abstract":"<div><p>It is promising to improve web image search results through exploiting the results' visual contents for learning a binary classifier which is used to refine the results' relevance degrees to the given query. This paper proposes an algorithm framework as a solution to this problem and investigates the key issue of training data selection under the framework. The training data selection process is divided into two stages: initial selection for triggering the classifier learning and dynamic selection in the iterations of classifier learning. We investigate two main ways of initial training data selection, including clustering based and ranking based, and compare automatic training data selection schemes with manual manner. Furthermore, support vector machines and the max-min pseudo-probability (MMP) based Bayesian classifier are employed to support image classification, respectively. By varying these factors in the framework, we implement eight algorithms and tested them on keyword based image search results from Google search engine. The experimental results confirm that how to select the training data from noisy search results is really a key issue in the problem considered in this paper and show that the proposed algorithm is effective to improve Google search results, especially at top ranks, thus is helpful to reduce the user labor in finding the desired images by browsing the ranking in depth. Even so, it is still worth meditative to make automatic training data selection scheme better towards perfect human annotation.</p></div>","PeriodicalId":35798,"journal":{"name":"自动化学报","volume":"40 8","pages":"Pages 1699-1708"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/S1874-1029(14)60018-5","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"自动化学报","FirstCategoryId":"1093","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1874102914600185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 2

Abstract

It is promising to improve web image search results through exploiting the results' visual contents for learning a binary classifier which is used to refine the results' relevance degrees to the given query. This paper proposes an algorithm framework as a solution to this problem and investigates the key issue of training data selection under the framework. The training data selection process is divided into two stages: initial selection for triggering the classifier learning and dynamic selection in the iterations of classifier learning. We investigate two main ways of initial training data selection, including clustering based and ranking based, and compare automatic training data selection schemes with manual manner. Furthermore, support vector machines and the max-min pseudo-probability (MMP) based Bayesian classifier are employed to support image classification, respectively. By varying these factors in the framework, we implement eight algorithms and tested them on keyword based image search results from Google search engine. The experimental results confirm that how to select the training data from noisy search results is really a key issue in the problem considered in this paper and show that the proposed algorithm is effective to improve Google search results, especially at top ranks, thus is helpful to reduce the user labor in finding the desired images by browsing the ranking in depth. Even so, it is still worth meditative to make automatic training data selection scheme better towards perfect human annotation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在线学习二分类器改进谷歌图像搜索结果

利用图像搜索结果的视觉内容学习二分类器来细化结果与给定查询的相关度，有望改善网络图像搜索结果。本文提出了一种算法框架来解决这一问题，并对该框架下的训练数据选择的关键问题进行了研究。将训练数据的选择过程分为触发分类器学习的初始选择和分类器学习迭代中的动态选择两个阶段。研究了基于聚类和基于排序的两种主要的初始训练数据选择方法，并将自动训练数据选择方案与人工训练数据选择方案进行了比较。在此基础上，分别采用支持向量机和基于最大最小伪概率(MMP)的贝叶斯分类器支持图像分类。通过改变框架中的这些因素，我们实现了八种算法，并在谷歌搜索引擎中基于关键字的图像搜索结果上进行了测试。实验结果证实了如何从有噪声的搜索结果中选择训练数据确实是本文所考虑问题的关键问题，并表明所提算法能够有效地改善谷歌搜索结果，特别是在排名靠前的位置，从而有助于减少用户通过深度浏览排名来寻找所需图像的劳动。尽管如此，如何使自动训练数据选择方案更好地走向完美的人工标注，仍然值得我们深思。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

自动化学报 Computer Science-Computer Graphics and Computer-Aided Design

CiteScore

4.80

自引率

0.00%

发文量

6655

期刊介绍： ACTA AUTOMATICA SINICA is a joint publication of Chinese Association of Automation and the Institute of Automation, the Chinese Academy of Sciences. The objective is the high quality and rapid publication of the articles, with a strong focus on new trends, original theoretical and experimental research and developments, emerging technology, and industrial standards in automation.

期刊最新文献

Endocrine therapy and urogenital outcomes among women with a breast cancer diagnosis. Robust Approximations to Joint Chance-constrained Problems A Chebyshev-Gauss Pseudospectral Method for Solving Optimal Control Problems Forward Affine Point Set Matching Under Variational Bayesian Framework SAR Image Despeckling by Sparse Reconstruction Based on Shearlets