学习使用大规模图形将单词和图像关联起来

2017 14th Conference on Computer and Robot Vision (CRV) Pub Date : 2017-05-01 DOI:10.1109/CRV.2017.52

Heqing Ya, Haonan Sun, Jeffrey Helt, T. Lee

{"title":"学习使用大规模图形将单词和图像关联起来","authors":"Heqing Ya, Haonan Sun, Jeffrey Helt, T. Lee","doi":"10.1109/CRV.2017.52","DOIUrl":null,"url":null,"abstract":"We develop an approach for unsupervised learning of associations between co-occurring perceptual events using a large graph. We applied this approach to successfully solve the image captcha of China's railroad system. The approach is based on the principle of suspicious coincidence, originally proposed by Barlow [1], who argued that the brain builds a statistical model of the world by learning associations between events that repeatedly co-occur. In this particular problem, a user is presented with a deformed picture of a Chinese phrase and eight low-resolution images. They must quickly select the relevant images in order to purchase their train tickets. This problem presents several challenges: (1) the teaching labels for both the Chinese phrases and the images were not available for supervised learning, (2) no pre-trained deep convolutional neural networks are available for recognizing these Chinese phrases or the presented images, and (3) each captcha must be solved within a few seconds. We collected 2.6 million captchas, with 2.6 million deformed Chinese phrases and over 21 million images. From these data, we constructed an association graph, composed of over 6 million vertices, and linked these vertices based on co-occurrence information and feature similarity between pairs of images. We then trained a deep convolutional neural network to learn a projection of the Chinese phrases onto a 230- dimensional latent space. Using label propagation, we computed the likelihood of each of the eight images conditioned on the latent space projection of the deformed phrase for each captcha. The resulting system solved captchas with 77% accuracy in 2 seconds on average. Our work, in answering this practical challenge, illustrates the power of this class of unsupervised association learning techniques, which may be related to the brain's general strategy for associating language stimuli with visual objects on the principle of suspicious coincidence.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Learning to Associate Words and Images Using a Large-Scale Graph\",\"authors\":\"Heqing Ya, Haonan Sun, Jeffrey Helt, T. Lee\",\"doi\":\"10.1109/CRV.2017.52\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We develop an approach for unsupervised learning of associations between co-occurring perceptual events using a large graph. We applied this approach to successfully solve the image captcha of China's railroad system. The approach is based on the principle of suspicious coincidence, originally proposed by Barlow [1], who argued that the brain builds a statistical model of the world by learning associations between events that repeatedly co-occur. In this particular problem, a user is presented with a deformed picture of a Chinese phrase and eight low-resolution images. They must quickly select the relevant images in order to purchase their train tickets. This problem presents several challenges: (1) the teaching labels for both the Chinese phrases and the images were not available for supervised learning, (2) no pre-trained deep convolutional neural networks are available for recognizing these Chinese phrases or the presented images, and (3) each captcha must be solved within a few seconds. We collected 2.6 million captchas, with 2.6 million deformed Chinese phrases and over 21 million images. From these data, we constructed an association graph, composed of over 6 million vertices, and linked these vertices based on co-occurrence information and feature similarity between pairs of images. We then trained a deep convolutional neural network to learn a projection of the Chinese phrases onto a 230- dimensional latent space. Using label propagation, we computed the likelihood of each of the eight images conditioned on the latent space projection of the deformed phrase for each captcha. The resulting system solved captchas with 77% accuracy in 2 seconds on average. Our work, in answering this practical challenge, illustrates the power of this class of unsupervised association learning techniques, which may be related to the brain's general strategy for associating language stimuli with visual objects on the principle of suspicious coincidence.\",\"PeriodicalId\":308760,\"journal\":{\"name\":\"2017 14th Conference on Computer and Robot Vision (CRV)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 14th Conference on Computer and Robot Vision (CRV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CRV.2017.52\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th Conference on Computer and Robot Vision (CRV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CRV.2017.52","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

我们开发了一种使用大图对共同发生的感知事件之间的关联进行无监督学习的方法。我们应用这种方法成功地解决了中国铁路系统的图像验证码。该方法基于可疑巧合原理，最初由Barlow[1]提出，他认为大脑通过学习反复共同发生的事件之间的关联来建立世界的统计模型。在这个特殊的问题中，用户会看到一张变形的中文短语图片和八张低分辨率的图片。他们必须快速选择相关图像，以便购买火车票。这个问题提出了几个挑战:(1)中文短语和图像的教学标签无法用于监督学习，(2)没有预训练的深度卷积神经网络可用于识别这些中文短语或呈现的图像，(3)每个验证码必须在几秒钟内解决。我们收集了260万个验证码，260万个变形的中文短语和2100多万张图片。从这些数据中，我们构建了一个由600多万个顶点组成的关联图，并基于图像对之间的共现信息和特征相似性将这些顶点连接起来。然后，我们训练了一个深度卷积神经网络来学习中文短语在230维潜在空间上的投影。使用标签传播，我们计算了每个验证码的变形短语的潜在空间投影条件下的八个图像中的每个图像的可能性。由此产生的系统平均在2秒内以77%的准确率解决了验证码。在回答这一实际挑战时，我们的工作说明了这类无监督联想学习技术的力量，这可能与大脑根据可疑巧合原则将语言刺激与视觉对象联系起来的一般策略有关。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learning to Associate Words and Images Using a Large-Scale Graph

We develop an approach for unsupervised learning of associations between co-occurring perceptual events using a large graph. We applied this approach to successfully solve the image captcha of China's railroad system. The approach is based on the principle of suspicious coincidence, originally proposed by Barlow [1], who argued that the brain builds a statistical model of the world by learning associations between events that repeatedly co-occur. In this particular problem, a user is presented with a deformed picture of a Chinese phrase and eight low-resolution images. They must quickly select the relevant images in order to purchase their train tickets. This problem presents several challenges: (1) the teaching labels for both the Chinese phrases and the images were not available for supervised learning, (2) no pre-trained deep convolutional neural networks are available for recognizing these Chinese phrases or the presented images, and (3) each captcha must be solved within a few seconds. We collected 2.6 million captchas, with 2.6 million deformed Chinese phrases and over 21 million images. From these data, we constructed an association graph, composed of over 6 million vertices, and linked these vertices based on co-occurrence information and feature similarity between pairs of images. We then trained a deep convolutional neural network to learn a projection of the Chinese phrases onto a 230- dimensional latent space. Using label propagation, we computed the likelihood of each of the eight images conditioned on the latent space projection of the deformed phrase for each captcha. The resulting system solved captchas with 77% accuracy in 2 seconds on average. Our work, in answering this practical challenge, illustrates the power of this class of unsupervised association learning techniques, which may be related to the brain's general strategy for associating language stimuli with visual objects on the principle of suspicious coincidence.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 14th Conference on Computer and Robot Vision (CRV)

自引率

0.00%

发文量

期刊最新文献

Towards Transferring Grasping from Human to Robot with RGBD Hand Detection Condition and Viewpoint Invariant Omni-Directional Place Recognition Using CNN Estimating Camera Tilt from Motion without Tracking Person Following Robot Using Selected Online Ada-Boosting with Stereo Camera Unsupervised Online Learning for Fine-Grained Hand Segmentation in Egocentric Video